GNU bug report logs -
#73660
[PATCH] gexp: Improve support of Unicode characters.
Previous Next
Reported by: Tomas Volf <~@wolfsden.cz>
Date: Sun, 6 Oct 2024 15:44:01 UTC
Severity: normal
Tags: patch
Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #35 received at 73660 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hello,
>
> Tomas Volf <~@wolfsden.cz> skribis:
>
>> * guix/gexp.scm (computed-file): Set LANG to C.UTF-8 by default.
>> (compiled-modules): Try to `setlocale'.
>> (gexp->script), (gexp->file): New `locale' argument defaulting to C.UTF-8.
>> (text-file*): Set output port encoding to UTF-8.
>> * doc/guix.texi (G-Expressions)[computed-file]: Document the changes. Use
>> @var. Document #:guile.
>> [gexp->script]: Document #:locale. Fix default value for #:target.
>> [gexp->file]: Document #:locale, #:system and #:target.
>>
>> Change-Id: Ib323b51af88a588b780ff48ddd04db8be7c729fb
>
> [...]
>
>> (define* (computed-file name gexp
>> - #:key guile (local-build? #t) (options '()))
>> + #:key
>> + guile
>> + (local-build? #t)
>> + (options '(#:env-vars (("LANG" . "C.UTF-8")))))
>
> I’d suggest LC_CTYPE (or LC_ALL?) rather than LANG.
Oh, yeah, you are right, after reading the specification (and verifying
Guile takes the variable into account), LC_CTYPE seems like a better
fit.
>
> Also, what about making it the default for the #:env-vars of
> ‘gexp->derivation’? That way it wouldn’t need to be repeated in several
> places.
I *think* the original motivation was to keep the gexp->derivation as
impartial as possible, since I do not know what people are using it for.
But it is some time since I wrote this, so I am not fully sure. There
are few downsides. Testing any changes now takes a long time since I
need to do full bootstrap due to changing gexp->derivation and there are
(I presume new and harmless) warnings `warning: failed to install
locale: Invalid argument' during the bootstrap.
But the change itself is much more localized now and the Unicode support
more likely to "just work" for any new gexp forms added in the future.
So maybe you are right and it is a right way.
Anyway, I followed the suggestion and v2 moves the LC_CTYPE setting to
gexp->derivation. The test script from the commit message still (after
many hours of bootstrapping) works.
>
>> @@ -1700,6 +1703,9 @@ (define* (compiled-modules modules
>> (system base target)
>> (system base compile))
>>
>> + ;; Best effort. The locale is not installed in all contexts.
>> + (false-if-exception (setlocale LC_ALL "C.UTF-8"))
>
> Sounds good. I would make it a separate patch.
Somewhat done. I have made it a separate commit, but still included in
v2.
>
> s/in all contexts/when cross-compiling/
Interesting, I have modified the comment, however would you be willing
to expand on this a bit? Why is the C.UTF-8 locale not available when
cross-compiling? The Guile running this script runs on the build host,
using build host's glibc and build host's locale definitions no? So I
assumed the locale *should* be available. I feel like I am missing
something fundamental about how Guix works here.
>
>> @@ -1990,7 +1996,8 @@ (define* (gexp->script name exp
>> #:key (guile (default-guile))
>> (module-path %load-path)
>> (system (%current-system))
>> - (target 'current))
>> + (target 'current)
>> + (locale "C.UTF-8"))
>
> I would remove this argument and instead add an explicit, hard-coded:
>
> (set-port-encoding! port "UTF-8")
>
> in the body of ‘call-with-output-file’ here, just like you did below.
Done.
>
>> (define* (text-file* name #:rest text)
>> "Return as a monadic value a derivation that builds a text file containing
>> @@ -2108,6 +2119,7 @@ (define* (text-file* name #:rest text)
>> (define builder
>> (gexp (call-with-output-file (ungexp output "out")
>> (lambda (port)
>> + (set-port-encoding! port "UTF-8")
>> (display (string-append (ungexp-splicing text)) port)))))
>
> LGTM. This can be moved to a separate file.
By "separate file" you mean separate patch and/or commit?
>
> How does that sound?
>
> Apologies for not replying earlier!
No worries, thank you for finding the time to look at this. ^_^ The v2
is much smaller.
Tomas
--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.
[signature.asc (application/pgp-signature, inline)]
This bug report was last modified 120 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.