Package: guix;
Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Date: Tue, 16 May 2017 06:59:02 UTC
Severity: important
Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Bug is archived. No further changes may be made.
Message #41 received at 26948 <at> debbugs.gnu.org (full text, mbox):
From: ludo <at> gnu.org (Ludovic Courtès) To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com> Cc: 26948 <at> debbugs.gnu.org Subject: Re: ‘write-file’ output should not be locale-dependent Date: Tue, 30 May 2017 13:57:24 +0200
[Message part 1 (text/plain, inline)]
Hi Maxim, Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis: > ludo <at> gnu.org (Ludovic Courtès) writes: [...] >> But wait! “guix build nss-certs --check -K” fails, and the diff is: >> >> $ LANGUAGE= diff -ur /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2{,-check} >> Only in /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs: AC_Raíz_Certicámara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem >> Only in /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs: AC_Ra?z_Certic?mara_S.A.:2.15.7.126.82.147.123.224.21.227.87.240.105.140.203.236.12.pem >> diff -ur /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0 /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0 >> --- /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2/etc/ssl/certs/ae8153b9.0 1970-01-01 01:00:01.000000000 +0100 >> +++ /gnu/store/3ql0vilc0zv6ra42ghi04787vrg6bb71-nss-certs-3.30.2-check/etc/ssl/certs/ae8153b9.0 1970-01-01 01:00:01.000000000 +0100 >> @@ -3,10 +3,10 @@ >> # distrust= >> # openssl-trust=codeSigning emailProtection serverAuth >> -----BEGIN CERTIFICATE----- >> -MIIHyTCCBbGgAwIBAgIBATANBgkqhkiG9w0BAQUFADB9MQswCQYDVQQGEwJJTDEW >> +MIIHhzCCBW+gAwIBAgIBLTANBgkqhkiG9w0BAQsFADB9MQswCQYDVQQGEwJJTDEW > > Can this be explained by locale alone? That is troubling. Yes it’s troubling, it deserves more investigation. >> There are two ways to create nars. One is via the ‘export-paths’ RPC >> (implemented in the daemon in C++), which does not interpret file names >> and thus leaves them untouched. The other one is via ‘write-file’ from >> (guix serialization), which is written in Scheme and thus converts file >> names from locale encoding (specifically, ‘scandir’ does that.) >> >> ‘guix publish’ uses the latter, so ‘guix publish’ is sensitive to locale >> settings, which is pretty bad. >> >> Guile currently does not allow us to specify whether/how file names >> should be decoded, but possible solutions have been discussed for 2.2. >> >> In the meantime, solutions are: >> >> 1. To run ‘guix publish’ in a UTF-8 locale, which apparently was not >> the case. > > I'm surprised by that. Wouldn't a utf8 locale be the default? Users are free to choose their favorite locale. Also, on a foreign distro where locales are not properly set up, you end up in the C locale with US-ASCII encoding (as was the case here). >> 2. Add to (guix build syscalls) a separate locale-independent >> ‘scandir’ implementation and use that. > > If the general solution is to fix it in Guile, the workaround proposed > in 1. seems preferable. I implemented ‘scandir/utf-8’ and used that in ‘write-file’ (patches attached). Unfortunately that’s not enough since libguile procedures like ‘open-file’ still do locale-dependent conversion, so we’d need to duplicate those as well, which is not great. But on second thought, I think the problem is not in the ‘write-file’ call that ‘guix publish’ makes: if it were, ‘scandir’ would return bogus file names (with question marks), but then trying to open them would fail, so ‘write-file’ wouldn’t produce a bogus nar. So I think the culprit is ‘restore-file-set’ (used in ‘guix offload’ when retrieving store items from a build machine): this one reads file names as UTF-8, via ‘read-store-path’, but then when it tries to create those files using Guile’s primitives, their names can be be converted, possibly with those question marks popping up. Here ‘restore-file-set’ can’t notice that Guile changed the file names behind its back. The short-term fix is to ensure guix-daemon itself runs in a UTF-8 locale. :-/ I’ve restarted guix-daemon and ‘guix publish’ in a UTF-8 locale on hydra.gnu.org. Thanks, Ludo’.
[0001-syscalls-Add-scandir-utf-8.patch (text/x-patch, inline)]
From e7f464bac58e1f09de5ceb194c4a30f6d899b29a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ludovic=20Court=C3=A8s?= <ludo <at> gnu.org> Date: Tue, 30 May 2017 12:03:54 +0200 Subject: [PATCH] syscalls: Add 'scandir/utf-8'. * guix/build/syscalls.scm (%struct-dirent-header): New C struct. (opendir/utf-8, closedir/utf-8, readdir/utf-8, scandir/utf-8): New procedures. * tests/syscalls.scm ("scandir/utf-8, ENOENT") ("scandir/utf-8, ASCII file names") ("scandir/utf8, UTF-8 file names"): New tests. --- guix/build/syscalls.scm | 73 +++++++++++++++++++++++++++++++++++++++++++++++++ tests/syscalls.scm | 39 ++++++++++++++++++++++++++ 2 files changed, 112 insertions(+) diff --git a/guix/build/syscalls.scm b/guix/build/syscalls.scm index 52439afd4..cfb43e93b 100644 --- a/guix/build/syscalls.scm +++ b/guix/build/syscalls.scm @@ -67,6 +67,7 @@ mkdtemp! fdatasync pivot-root + scandir/utf-8 fcntl-flock set-thread-name @@ -812,6 +813,78 @@ system to PUT-OLD." ;;; +;;; Opendir & co. +;;; + +(define-c-struct %struct-dirent-header + sizeof-dirent-header + list + read-dirent-header + write-dirent-header! + (inode int64) + (offset int64) + (length unsigned-short) + (type uint8) + (name uint8)) ;first byte of 'd_name' + +(define opendir/utf-8 + (let ((proc (syscall->procedure '* "opendir" '(*)))) + (lambda (name) + (let-values (((ptr err) + (proc (string->pointer name "UTF-8")))) + (if (null-pointer? ptr) + (throw 'system-error "opendir/utf-8" + "opendir/utf-8: ~A" + (list (strerror err)) + (list err)) + ptr))))) + +(define closedir/utf-8 + (let ((proc (syscall->procedure int "closedir" '(*)))) + (lambda (directory) + (let-values (((ret err) + (proc directory))) + (unless (zero? ret) + (throw 'system-error "closedir" + "closedir: ~A" (list (strerror err)) + (list err))))))) + +(define readdir/utf-8 + (let ((proc (syscall->procedure '* "readdir64" '(*)))) + (lambda (directory) + (let ((ptr (proc directory))) + (and (not (null-pointer? ptr)) + (pointer->string + (make-pointer (+ (pointer-address ptr) + (c-struct-field-offset %struct-dirent-header name))) + -1 + "UTF-8")))))) + +(define* (scandir/utf-8 name #:optional + (select? (const #t)) + (string<? string<?)) + "Like 'scandir', but (1) systematically decode file names as UTF-8, and (2) +raise to 'system-error' when NAME cannot be opened. + +This works around a defect in Guile 2.0/2.2 where 'scandir' decodes file names +according to the current locale, which is not always desirable." + (let ((directory (opendir/utf-8 name))) + (dynamic-wind + (const #t) + (lambda () + (let loop ((result '())) + (match (readdir/utf-8 directory) + (#f + (sort result string<?)) + (entry + (loop (if (select? entry) + (cons entry result) + result)))))) + (lambda () + (closedir/utf-8 directory))))) + + +;;; ;;; Advisory file locking. ;;; diff --git a/tests/syscalls.scm b/tests/syscalls.scm index e20f0600b..02062ec9d 100644 --- a/tests/syscalls.scm +++ b/tests/syscalls.scm @@ -24,6 +24,8 @@ #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) #:use-module (srfi srfi-64) + #:use-module (system foreign) + #:use-module ((ice-9 ftw) #:select (scandir)) #:use-module (ice-9 match)) ;; Test the (guix build syscalls) module, although there's not much that can @@ -184,6 +186,43 @@ (status:exit-val status)))) (eq? #t result)))))))) +(test-equal "scandir/utf-8, ENOENT" + ENOENT + (catch 'system-error + (lambda () + (scandir/utf-8 "/does/not/exist")) + (lambda args + (system-error-errno args)))) + +(test-equal "scandir/utf-8, ASCII file names" + (scandir (dirname (search-path %load-path "guix/base32.scm"))) + (scandir/utf-8 (dirname (search-path %load-path "guix/base32.scm")))) + +(test-equal "scandir/utf8, UTF-8 file names" + '("." ".." "α" "λ") + (call-with-temporary-directory + (lambda (directory) + ;; Wrap 'creat' to make sure that we really pass a UTF-8-encoded file + ;; name to the system call. + (let ((creat (pointer->procedure int + (dynamic-func "creat" (dynamic-link)) + (list '* int)))) + (creat (string->pointer (string-append directory "/α") + "UTF-8") + #o644) + (creat (string->pointer (string-append directory "/λ") + "UTF-8") + #o644) + (let ((locale (setlocale LC_ALL))) + (dynamic-wind + (lambda () + ;; Make sure that even in a C locale we get the right result. + (setlocale LC_ALL "C")) + (lambda () + (scandir/utf-8 directory)) + (lambda () + (setlocale LC_ALL locale)))))))) + (false-if-exception (delete-file temp-file)) (test-equal "fcntl-flock wait" 42 ; the child's exit status -- 2.13.0
[Message part 3 (text/x-patch, inline)]
diff --git a/guix/serialization.scm b/guix/serialization.scm index e6ae2fc30..77a54f904 100644 --- a/guix/serialization.scm +++ b/guix/serialization.scm @@ -18,6 +18,8 @@ (define-module (guix serialization) #:use-module (guix combinators) + #:use-module ((guix build syscalls) + #:select (scandir/utf-8)) #:use-module (rnrs bytevectors) #:use-module (srfi srfi-1) #:use-module (srfi srfi-26) @@ -285,8 +287,11 @@ result of 'lstat'; exclude entries for which SELECT? does not return true." ;; 'scandir' defaults to 'string-locale<?' to sort files, but ;; this happens to be case-insensitive (at least in 'en_US' ;; locale on libc 2.18.) Conversely, we want files to be - ;; sorted in a case-sensitive fashion. - (scandir f (negate (cut member <> '("." ".."))) string<?))) + ;; sorted in a case-sensitive fashion. Also, always decode file + ;; names as UTF-8. + (scandir/utf-8 + f (negate (cut member <> '("." ".."))) + string<?))) (for-each (lambda (e) (let* ((f (string-append f "/" e)) (s (lstat f))) diff --git a/tests/nar.scm b/tests/nar.scm index 61646db96..d2eae65c4 100644 --- a/tests/nar.scm +++ b/tests/nar.scm @@ -25,6 +25,8 @@ #:select (open-sha256-port open-sha256-input-port)) #:use-module ((guix packages) #:select (base32)) + #:use-module ((guix utils) + #:select (call-with-temporary-directory)) #:use-module (rnrs bytevectors) #:use-module (rnrs io ports) #:use-module (srfi srfi-1) @@ -272,6 +274,36 @@ (lambda () (rmdir input))))) +(unless (equal? "UTF-8" (fluid-ref %default-port-encoding)) + (test-skip 1)) +(test-assert "write-file + restore-file, UTF-8 file names" + (let* ((output %test-dir) + (nar (string-append output ".nar")) + (locale (setlocale LC_ALL))) + (dynamic-wind + (lambda () #t) + (lambda () + (call-with-temporary-directory + (lambda (input) + (call-with-output-file (string-append input "/α") + (const #t)) + (call-with-output-file (string-append input "/λ") + (const #t)) + (dynamic-wind + (const #f) + (lambda () + (setlocale LC_ALL "C") + (call-with-output-file nar + (cut write-file input <>))) + (lambda () + (setlocale LC_ALL locale))) + (call-with-input-file nar + (cut restore-file <> output)) + (file-tree-equal? input output)))) + (lambda () + (false-if-exception (delete-file nar)) + (false-if-exception (rm-rf output)))))) + ;; 'restore-file-set' depends on 'open-sha256-input-port', which in turn ;; relies on a Guile 2.0.10+ feature. (test-skip (if (false-if-exception
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.