GNU bug report logs - #40407
[PATCH] slow ENCODE_FILE and DECODE_FILE

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 3 Apr 2020 16:11:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #59 received at 40407 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: 40407 <at> debbugs.gnu.org
Subject: Re: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Sun, 05 Apr 2020 16:28:13 +0300
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Sun, 5 Apr 2020 12:14:59 +0200
> Cc: 40407 <at> debbugs.gnu.org
> 
> > I think in the use case where we return a copy, we should make sure
> > the return value is unibyte when encoding and multibyte when decoding.
> 
> I'm not necessarily opposed to the suggestion, but why not return a unibyte string in both cases, simplifying the code?

For compatibility with what happens now:

  (multibyte-string-p (decode-coding-string "abc" 'utf-8)) => t

> In addition, some operations (aref) are faster on unibyte. Either way, it's nothing that a caller could rely on, is there? (In particular when taking NOCOPY into account.)

That is true, of course, but many/most of our strings are multibyte
nowadays, even if they are ASCII.  Suddenly getting a unibyte string
instead would be surprising, I think, even if no one should depend on
it not happening.  (NOCOPY case is different: then it's the caller's
responsibility to deal with the issue.)  So I'd rather we produced a
multibyte string when "decoding" by copying.

> +/* Whether a (unibyte) string only contains chars in the 0..127 range.  */

One subtle point regarding this comment: I'd remove the "unibyte"
part, because (1) you apply this test to multibyte strings as well,
and (2) strings encoded in iso-2022 will look "pure-ASCII", but they
aren't.  The latter subtlety doesn't interfere with the caller,
because iso-2022 is not ASCII-compatible, but it's something I'd
mention in the comment, lest someone uses this function for some
other use case.

The patch is OK otherwise.  Thanks.




This bug report was last modified 5 years and 91 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.