GNU bug report logs - #73194
ls command converts utf-8 character into escape sequences

Previous Next

Package: coreutils;

Reported by: Simon Wolfe <sekaihenodoa <at> mutsuba.info>

Date: Thu, 12 Sep 2024 10:18:01 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Pádraig Brady <P <at> draigBrady.com>
To: Simon Wolfe <sekaihenodoa <at> mutsuba.info>, 73194 <at> debbugs.gnu.org
Subject: bug#73194: ls command converts utf-8 character into escape sequences
Date: Thu, 12 Sep 2024 11:42:06 +0100
On 12/09/2024 11:16, Simon Wolfe wrote:
> I have one file name that uses Unicode character U+318DF, which is in the tertiary pane, more precisely CJK Unified Ideographs Extension H.
> 
> touch 𱣟
> ls
> 
> returns:
> 
> ''$'\360\261\243\237'
> 
> Extension H was introduced in Unicode 15.0 in 2022.
> 
> I also notice that this bug occurs with any character with Extension I (introduced in 2023).
> 
> Extension G seems to works okay.

ls 9.4 works as expected for me with glibc-2.39 in a UTF-8 locale.
I.e. that file is displayed directly.
Now if I set the locale to non UTF-8 it will display the form above
(which works on all locales BTW).

  $ touch ''$'\360\261\243\237'
  $ ls ''$'\360\261\243\237'
  𱣟
  $ LC_ALL=C ls ''$'\360\261\243\237'
  ''$'\360\261\243\237'

So I suspect your system libs are not updated to recognize this character,
hence the fallback format is used.

cheers,
Pádraig.





This bug report was last modified 190 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.