GNU bug report logs - #56469
29.0.50; Unibyte dir in directory_files_internal

Previous Next

Package: emacs;

Reported by: Stefan Monnier <monnier <at> iro.umontreal.ca>

Date: Sat, 9 Jul 2022 17:46:01 UTC

Severity: normal

Found in version 29.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 56469 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 56469 <at> debbugs.gnu.org
Subject: Re: bug#56469: 29.0.50; Unibyte dir in directory_files_internal
Date: Sat, 09 Jul 2022 21:17:22 +0300
> Date: Sat, 09 Jul 2022 13:44:52 -0400
> From:  Stefan Monnier via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> If you have a directory named "/tmp/\303a" with a file named "fée"
> inside, then (directory-files "/tmp/\303a" 'full) is likely to return
> a funny string which is multibyte but contains an invalid
> utf-8 sequence (its bytes spell "/tmp/\303a/f\303\251e").
> That strings seems to be printed as "/tmp/¡/fée" which corresponds
> to "/tmp/\303\241/f\303\251e".
> 
> Such a string with an invalid UTF-8 sequence is handled quite graciously
> by Emacs, so I wasn't able to get an actual crash out of it, but it's
> still something we should avoid.
> 
> I suggest the patch below.  In a comment I suggest we don't try to use
> unibyte strings when a multibyte string would work as well.  This is
> because for those ASCII-only strings, it's cheaper to test bytes==chars
> to (re)discover that they are ASCII-only (when they're multibyte) than
> having to loop through the bytes (when they're unibyte).

Please bootstrap Emacs in a directory with such a name, and if that
works, I'm okay with installing this change.

Thanks.




This bug report was last modified 2 years and 290 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.