GNU bug report logs -
#4157
[macOS/HFS] dired doesn't decode ls output when it uses different encoding for filename vs date
Previous Next
Reported by: Peter Dyballa <Peter_Dyballa <at> Freenet.DE>
Date: Sun, 16 Aug 2009 02:25:05 UTC
Severity: minor
Tags: notabug
Found in versions 27.0.50, 23.1.50
Done: Stefan Kangas <stefan <at> marxist.se>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Am 23.08.2009 um 03:49 schrieb Stefan Monnier:
>> In both locales the *file names* are correct and also detected as
>> containing
>
> "correct" doesn't really tell me what you see, but I see what you
> mean.
"Correct" meant that I was seeing what I had typed before in Finder...
>
>> "composed characters," it's a problem with the file's month date.
>> In the
>
> So my guess was right: ls's output uses utf-8 for the filenames, but
> latin-1 for the date, which is why it's difficult for dired to do the
> right thing (it's not impossible, of course, but it's more work and
> dired is currently not setup for that).
>
Here is a little test from a shell (actually *shell* buffer in NS
Emacs.app with UTF-8 locales):
pete 252 /\ gls -lN zo*
-rw-r--r-- 1 pete admin 281829 20. Mär 1998 zoä€.au
pete 253 /\ ls -lw zo*
-rw-r--r-- 1 pete admin 281829 20 Mär 1998 zoä€.au
pete 254 /\ gls -lN zo* | od -j 32 -t a
0000040 0 . sp M \303 \244 r sp 1 9 9 8 sp
sp z o
0000060 a \314 88 \342 82 \254 . a u nl
0000072
pete 255 /\ env LC_CTYPE=de_DE.ISO8859-15 LANG=de_DE.ISO8859-15 gls -
lN zo* | od -j 32 -t a
0000040 0 . sp M \344 r sp 1 9 9 8 sp sp z
o a
0000060 \314 88 \342 82 \254 . a u nl
0000071
pete 256 /\ ls -lw zo* | od -j 32 -t a
0000040 2 9 sp 2 0 sp M \303 \244 r sp 1 9
9 8
0000060 sp z o a \314 88 \342 82 \254 . a u nl
0000075
pete 257 /\ env LC_CTYPE=de_DE.ISO8859-15 LANG=de_DE.ISO8859-15 ls -
lw zo* | od -j 32 -t a
0000040 2 9 sp 2 0 sp M \344 r sp sp 1 9 9
8 sp
0000060 z o a \314 88 \342 82 \254 . a u nl
0000074
So the *ls commands deliver the month date in their locale composed
while the file name is always *de*composed UTF-8:
\303 \244 = C3 A4 = LATIN SMALL LETTER A WITH DIAERESIS ä at U
+00E4
\314 88 = CC 88 = COMBINING DIAERESIS ¨ at U
+0308
\342 82 \254 = E2 88 AC = EURO SIGN € at U
+20AC
--
Greetings
Pete
Bake pizza not war!
This bug report was last modified 5 years and 189 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.