GNU bug report logs -
#18987
the bourne shell printf-vs-\xHH portability trap
Previous Next
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Fri, 7 Nov 2014 17:15:03 UTC
Severity: normal
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
2014-11-08 20:19 GMT-08:00 Jim Meyering <jim <at> meyering.net>:
> On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>> hex_printf_()
>> {
>> hex_printf_format=$(printf '%s\n' "$1" | sed '
>> s/^/_/
>> s/$/_/
>> s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g
>> s/\([^\\]\(\\\\\)*\\x\)\([0-3]\)/\10\3/g
>> s/\([^\\]\(\\\\\)*\\x\)\([4-7]\)/\11\3/g
>> s/\([^\\]\(\\\\\)*\\x\)\([89aAbB]\)/\12\3/g
>> s/\([^\\]\(\\\\\)*\\x\)\([cCdDeEfF]\)/\13\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([0-7]\)/\1,0\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([89aAbBcCdDeEfF]\)/\1,1\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([0-7]\)/\1,2\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([89abcdef]\)/\1,3\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([0-7]\)/\1,4\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([89aAbBcCdDeEfF]\)/\1,5\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([0-7]\)/\1,6\3/g
>> s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([89aAbBcCdDeEfF]\)/\1,7\3/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[08]/\1\3\40/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[19]/\1\3\41/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[2aA]/\1\3\42/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[3bB]/\1\3\43/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[4cC]/\1\3\44/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[5dD]/\1\3\45/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[6eE]/\1\3\46/g
>> s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[7fF]/\1\3\47/g
>> s/^_//
>> s/_$//
>> ')
>> shift
>> printf "$hex_printf_format" "$@"
>> }
>
> How elegantly twisted ;-)
> I like it.
>
> Do you have time to write the complete patch?
> I'd like to make a pre-release snapshot tomorrow.
I tried it, and found that this new function makes the multibyte-white-space
test fail with GNU sed. Here's a simplified example showing where
it goes wrong. This shows that only the first \x285 is transformed
into \x2,05:
$ printf '%s\n' '_\x285\x285\n_' \
|sed 's/\([^\\]\(\\\\\)*\\x[0-3]\)[
048cC]\([0-7]\)/\1,0\3/g'
_\x2,05\x285\n_
The intent was that it transform both, of course.
The trouble arises when the regexp consumes all 3 hex
digits. Then there is no longer a non-backslash remaining
to be consumed on 2nd and subsequent iterations.
There is also a portability problem in that Solaris 5.10's /bin/sed
seems unable to handle some of that code. For example,
using that same example with its /bin/sed, neither \x285
string is transformed.
This bug report was last modified 10 years and 198 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.