#40407 - [PATCH] slow ENCODE_FILE and DECODE_FILE

GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 3 Apr 2020 16:11:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp>, Kenichi Handa <handa <at> gnu.org> Cc: 40407 <at> debbugs.gnu.org, mattiase <at> acm.org Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE Date: Mon, 06 Apr 2020 17:21:34 +0300

> From: OGAWA Hirofumi <hirofumi <at> mail.parknet.co.jp> > Cc: Mattias Engdegård <mattiase <at> acm.org>, > 40407 <at> debbugs.gnu.org > Date: Mon, 06 Apr 2020 19:10:48 +0900 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > >> - if (BUFFERP (dst_object)) > >> + if (EQ (dst_object, Qt)) > >> + { > >> + /* Fast path for ASCII-only input and an ASCII-compatible coding: > >> + act as identity. */ > >> + Lisp_Object attrs = CODING_ID_ATTRS (coding.id); > >> + if (! NILP (CODING_ATTR_ASCII_COMPAT (attrs)) > >> + && (STRING_MULTIBYTE (string) > >> + ? (chars == bytes) : string_ascii_p (string))) > >> + return string; > > While using the latest master branch, I noticed this became the cause of > decoding error. > > The simple reproducible test is, > > (decode-coding-string "&abc" 'utf-7-imap) > => "&abc" > > like the above result, decoding utf-7-imap didn't work. > > Because (coding-system-get 'utf-7-imap :ascii-compatible-p) => t. Thanks. > I'm not sure, 'utf-7* should be fixed as non ascii-compatible, or > string_ascii_p() should check more strictly. The former, since UTF-7 is definitely *not* ASCII-compatible. Does the patch below produce good results? Kenichi, why was coding-type of UTF-7 systems set to 'utf-8'? Wouldn't it be better to set it to 'utf-16'? Or is there some subtlety here that we should be aware of? Do you have any comments on the patch below? Thanks. diff --git a/src/coding.c b/src/coding.c index 97a6eb9..71ff93c 100644 --- a/src/coding.c +++ b/src/coding.c @@ -11301,7 +11301,10 @@ DEFUN ("define-coding-system-internal", Fdefine_coding_system_internal, CHECK_CODING_SYSTEM (val); } ASET (attrs, coding_attr_utf_bom, bom); - if (NILP (bom)) + if (NILP (bom) + /* UTF-7 has :coding-type set to 'utf-8' (why not + 'utf-16'?), but it is definitely NOT ASCII-compatible. */ + && !EQ (name, Qutf_7) && !EQ (name, Qutf_7_imap)) ASET (attrs, coding_attr_ascii_compat, Qt); category = (CONSP (bom) ? coding_category_utf_8_auto @@ -11673,6 +11676,9 @@ syms_of_coding (void) DEFSYM (Qutf_8_unix, "utf-8-unix"); DEFSYM (Qutf_8_emacs, "utf-8-emacs"); + DEFSYM (Qutf_7, "utf-7"); + DEFSYM (Qutf_7_imap, "utf-7-imap"); + #if defined (WINDOWSNT) || defined (CYGWIN) /* No, not utf-16-le: that one has a BOM. */ DEFSYM (Qutf_16le, "utf-16le");

This bug report was last modified 5 years and 91 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #40407 [PATCH] slow ENCODE_FILE and DECODE_FILE

GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE