GNU bug report logs -
#67841
[PATCH] Clarify error messages for misuse of m4_warn and --help for -W.
Previous Next
Reported by: Zack Weinberg <zack <at> owlfolio.org>
Date: Fri, 15 Dec 2023 20:45:02 UTC
Severity: normal
Tags: patch
Done: Karl Berry <karl <at> freefriends.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Zack Weinberg wrote:
> On Fri, Dec 15, 2023, at 7:08 PM, Jacob Bachmeyer wrote:
>
>> Zack Weinberg wrote:
>>
>>> [...]
>>> Also, there’s a perl 2.14ism in one place (s///a) which I need
>>> to figure out how to make 2.6-compatible before it can land.
>>>
> ...
>
>>> + $q_channel =~ s/([^\x20-\x7e])/"\\x".sprintf("%02x", ord($1))/aeg;
>>>
> ...
>
>> If I am reading perlre correctly, you should be able to simply drop the
>> /a modifier because it has no effect on the pattern you have written,
>> since you are using an explicit character class and are *not* using the
>> /i modifier.
>>
>
> Thanks, you've made me realize that /a wasn't even what I wanted in the
> first place. What I thought /a would do is force s/// to act byte by
> byte -- or, in the terms of perlunitut, force the target string to be
> treated as a binary string. That might be clearer with a concrete example:
>
> $ perl -e '$_ = "\xE2\x88\x85"; s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";'
> \xe2\x88\x85
> $ perl -e '$_ = "\N{EMPTY SET}"; s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";'
> \x2205
>
> What change do I need to make to the second one-liner to make it also
> print \xe2\x88\x85?
Add -MEncode to the one-liner and insert "$_ = encode_utf8($_);" before
the substitution to declare that you want the string as UTF-8 bytes.
The Encode documentation states:
"All possible characters have a UTF-8 representation so this function
[encode_utf8] cannot fail."
In the actual patch, try "my $q_channel = encode_utf8($channel);" when
initially copying the channel name.
> How do I express that in a way that is backward
> compatible all the way to 5.6.0?
Now the fun part... Perl 5.6 had serious deficiencies in Unicode
support; Encode was introduced with 5.8. You will need to make the
Encode import conditional and generate a stub for encode_utf8 if the
import fails. This should not be a problem since non-ASCII here in the
first place is unlikely, and I think Perl 5.6 would treat non-ASCII as
exactly the octet string you want anyway.
Something like: (untested)
BEGIN {
my $have_Encode = 0;
eval { require Encode; $have_Encode = 1; };
if ($have_Encode) {
Encode->import('encode_utf8');
} else {
# for Perl 5.6, which did not really have Unicode support anyway
eval 'sub encode_utf8 { return pop }';
}
}
Note that the stub is defined using eval STRING rather than eval BLOCK
because "sub" has compile-time effects in Perl and we only want it if
Encode could not be loaded.
> And finally, how do I ensure that there is absolutely nothing I can put in the initial assignment to $_ that will cause the rest of the one-liner to crash? For example
> over in the Python universe it's very easy to get Unicode conversion
> to crash:
>
> $ python3 -c 'print("\uDC00".encode("utf-8"))'
> Traceback (most recent call last):
> File "<string>", line 1, in <module>
> UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 0: surrogates not allowed
>
Not a problem in Perl:
$ perl -MEncode -e '$_ = "\x{dc00}"; $_ = encode_utf8($_);
s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";'
\xed\xb0\x80
:-)
-- Jacob
This bug report was last modified 1 year and 151 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.