#67841 - [PATCH] Clarify error messages for misuse of m4_warn and --help for -W.

GNU bug report logs - #67841
[PATCH] Clarify error messages for misuse of m4_warn and --help for -W.

Reported by: Zack Weinberg <zack <at> owlfolio.org>

Date: Fri, 15 Dec 2023 20:45:02 UTC

Severity: normal

Tags: patch

Done: Karl Berry <karl <at> freefriends.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Jacob Bachmeyer <jcb62281 <at> gmail.com> To: Zack Weinberg <zack <at> owlfolio.org> Cc: Autoconf Patches <autoconf-patches <at> gnu.org>, 67841 <at> debbugs.gnu.org Subject: [bug#67841] [PATCH] Clarify error messages for misuse of m4_warn and --help for -W. Date: Mon, 18 Dec 2023 23:44:47 -0600

Zack Weinberg wrote: > On Fri, Dec 15, 2023, at 7:08 PM, Jacob Bachmeyer wrote: > >> Zack Weinberg wrote: >> >>> [...] >>> Also, there’s a perl 2.14ism in one place (s///a) which I need >>> to figure out how to make 2.6-compatible before it can land. >>> > ... > >>> + $q_channel =~ s/([^\x20-\x7e])/"\\x".sprintf("%02x", ord($1))/aeg; >>> > ... > >> If I am reading perlre correctly, you should be able to simply drop the >> /a modifier because it has no effect on the pattern you have written, >> since you are using an explicit character class and are *not* using the >> /i modifier. >> > > Thanks, you've made me realize that /a wasn't even what I wanted in the > first place. What I thought /a would do is force s/// to act byte by > byte -- or, in the terms of perlunitut, force the target string to be > treated as a binary string. That might be clearer with a concrete example: > > $ perl -e '$_ = "\xE2\x88\x85"; s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";' > \xe2\x88\x85 > $ perl -e '$_ = "\N{EMPTY SET}"; s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";' > \x2205 > > What change do I need to make to the second one-liner to make it also > print \xe2\x88\x85? Add -MEncode to the one-liner and insert "$_ = encode_utf8($_);" before the substitution to declare that you want the string as UTF-8 bytes. The Encode documentation states: "All possible characters have a UTF-8 representation so this function [encode_utf8] cannot fail." In the actual patch, try "my $q_channel = encode_utf8($channel);" when initially copying the channel name. > How do I express that in a way that is backward > compatible all the way to 5.6.0? Now the fun part... Perl 5.6 had serious deficiencies in Unicode support; Encode was introduced with 5.8. You will need to make the Encode import conditional and generate a stub for encode_utf8 if the import fails. This should not be a problem since non-ASCII here in the first place is unlikely, and I think Perl 5.6 would treat non-ASCII as exactly the octet string you want anyway. Something like: (untested) BEGIN { my $have_Encode = 0; eval { require Encode; $have_Encode = 1; }; if ($have_Encode) { Encode->import('encode_utf8'); } else { # for Perl 5.6, which did not really have Unicode support anyway eval 'sub encode_utf8 { return pop }'; } } Note that the stub is defined using eval STRING rather than eval BLOCK because "sub" has compile-time effects in Perl and we only want it if Encode could not be loaded. > And finally, how do I ensure that there is absolutely nothing I can put in the initial assignment to $_ that will cause the rest of the one-liner to crash? For example > over in the Python universe it's very easy to get Unicode conversion > to crash: > > $ python3 -c 'print("\uDC00".encode("utf-8"))' > Traceback (most recent call last): > File "<string>", line 1, in <module> > UnicodeEncodeError: 'utf-8' codec can't encode character '\udc00' in position 0: surrogates not allowed > Not a problem in Perl: $ perl -MEncode -e '$_ = "\x{dc00}"; $_ = encode_utf8($_); s/([^\x20-\x7e])/sprintf("\\x%02x", ord($1))/eg; print "$_\n";' \xed\xb0\x80 :-) -- Jacob

This bug report was last modified 1 year and 205 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #67841 [PATCH] Clarify error messages for misuse of m4_warn and --help for -W.

GNU bug report logs - #67841
[PATCH] Clarify error messages for misuse of m4_warn and --help for -W.