GNU bug report logs - #56469
29.0.50; Unibyte dir in directory_files_internal

Previous Next

Package: emacs;

Reported by: Stefan Monnier <monnier <at> iro.umontreal.ca>

Date: Sat, 9 Jul 2022 17:46:01 UTC

Severity: normal

Found in version 29.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #19 received at 56469 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56469 <at> debbugs.gnu.org
Subject: Re: bug#56469: 29.0.50; Unibyte dir in directory_files_internal
Date: Sun, 10 Jul 2022 10:23:28 -0400
> Please bootstrap Emacs in a directory with such a name, and if that
> works, I'm okay with installing this change.

Pushed, thanks.

W.r.t to the comment, it's indeed unrelated to the patch (other than
the fact that it touches the same code).  The question is when we do:

	  finalname = (nchars == nbytes)
	              ? make_uninit_string (nbytes)
	              : make_uninit_multibyte_string (nchars, nbytes);

the actual bytes are "decoded" (i.e. in our internal UTF-8 encoding), so
(nchars == nbytes) checks whether its "pure ASCII" or not and if it's
pure ASCII we return a unibyte string.

Our file-name manipulation routines always consider unibyte-ASCII and
multibyte-ASCII as "equivalent", and indeed DECODE_FILE and ENCODE_FILE
take advantage of that so as to return their argument as-is when it's
all-ASCII so as to avoid allocating a string unnecessarily.

So in the above code snippet, when the string is all-ASCII, we actually
have a choice, and both a unibyte string and a multibyte string should
work.  Currently in that case we return a unibyte string, but I think in
such cases we're better off returning a multibyte string because the
subsequent "all-ASCII" test (that DE/ENCODE_FILE will perform when we
pass that filename to some further operation) will be more efficient
(it's a constant-time (nchars == nbytes) test whereas when the string is
unibyte it requires looking at each and every byte).

IOW, while it makes sense to return a "decoded unibyte" string from
DECODE_FILE in order to avoid an allocation, I don't think it makes
sense to return such a "decoded unibyte" string when we have to allocate
a new string anyway.


        Stefan





This bug report was last modified 2 years and 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.