GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 3 Apr 2020 16:11:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #92 received at 40407 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 40407 <at> debbugs.gnu.org, handa <at> gnu.org, hirofumi <at> mail.parknet.co.jp
Subject: Re: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Mon, 6 Apr 2020 18:55:30 +0200
6 apr. 2020 kl. 18.33 skrev Eli Zaretskii <eliz <at> gnu.org>:

> I think it might be just some convenience thing: utf-7 and utf-8 have
> something in common that made it convenient to treat them the same in
> the internal routines.  Or maybe it's just an accident.

There is nothing common between utf-7 and utf-8 at all (apart from a subset of ASCII being encoded in the same way, and the fact that both encode the Unicode repertoire).

> Why do you think the ASCII encoding contradicts the utf-16
> coding-type?

Because :coding type is the first stage of decoding, or the last stage of encoding. It reflects the low-level structure of the encoded data: using utf-16 as :coding-type implies that utf-7 is encoded into 16-bit parcels, but it's not -- the result of utf-7-imap encoding is a sequence of ASCII bytes. (UTF-16 plays a part in an intermediary step for some values before they are base64-encoded, but that's not visible in the final byte stream.)

> I don't think 'charset' is the right type for this encoding (any
> reason why you've chosen it?), but I will let Handa-san comment.

We could use 'raw-text' as well but that implies that any byte value could be part of an utf-7[-imap] text, which is incorrect.
In fact, utf-7-imap only uses codes 0x20-0x7e (utf-7 is allowed to use a few C0 controls too, as mentioned).

Arguably the heuristics of define-coding-system-internal are somewhat inscrutable. There seems to be leaks between layers -- ascii-compatible-p is an end-to-end property and cannot really be set the way it is by that function. But since it is, fixing it afterwards should be the correct way.





This bug report was last modified 5 years and 91 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.