GNU bug report logs -
#22913
filenames mangled by locale
Previous Next
To reply to this bug, email your comments to 22913 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guile <at> gnu.org
:
bug#22913
; Package
guile
.
(Sat, 05 Mar 2016 00:44:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Zefram <zefram <at> fysh.org>
:
New bug report received and forwarded. Copy sent to
bug-guile <at> gnu.org
.
(Sat, 05 Mar 2016 00:44:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
It seems that guile-2.0 applies locale encoding and decoding to pathnames
being used in system calls. This radically breaks file access anywhere
that the locale's character encoding is anything other than a simple
8-bit encoding such as ISO-8859-1. For example, in the default C locale
with its nominal ASCII encoding,
$ guile-2.0 -c '(open-file (list->string (map integer->char '\''(76 195 169 111 110))) "w")'
$ echo L*n | od -tc
0000000 L ? ? o n \n
0000006
Those are literal question marks in the name of the file actually
created, apparently arising as substitutions for the high-half octets in
the requested filename. Existing files with names containing high-half
octets can't be found (resulting in an ENOENT error message that shows the
actually-existing filename), and new ones can't be created (actually being
created under the mangled name instead). There's no warning or exception
advising that the requested name can't be used, just this misbehaviour.
The equivalent problem arises with decoding when filenames are received:
$ echo foo > $'L\303\251on.txt'
$ guile-2.0 -c '(define d (opendir ".")) (let r () (let ((n (readdir d))) (if (eof-object? n) #t (begin (if (eq? (car (reverse (string->list n))) #\t) (begin (write (map char->integer (string->list n))) (newline))) (r)))))'
(76 63 63 111 110 46 116 120 116)
Again no warning or exception, just incorrect data returned.
To work around this would require the program to select a locale with
a more accommodating nominal character encoding. As I've previously
noted, there's no guarantee of such a locale existing. Thus the above
behaviour is fatal to any attempt to write in Guile Scheme a program to
operate on arbitrarily-named files.
Guile even applies this mangling to the pathname of a script that it is
to load:
$ echo '(write "hi")(newline)' > $'L\303\251on.scm'
$ guile-2.0 -s L*n.scm
[big error message saying it couldn't find the file that exists]
Obviously, even if a program could turn off the locale mangling in
general, this instance of it occurs too early for the program to avoid.
The guile framework itself has acquired the kind of 8-bit-cleanliness
bug that it is imposing on the programs that it interprets.
-zefram
This bug report was last modified 9 years and 102 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.