GNU bug report logs - #18987
the bourne shell printf-vs-\xHH portability trap

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Fri, 7 Nov 2014 17:15:03 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18987 <18987-done <at> debbugs.gnu.org>
Subject: bug#18987: the bourne shell printf-vs-\xHH portability trap
Date: Sun, 9 Nov 2014 10:19:57 -0800
2014-11-08 20:19 GMT-08:00 Jim Meyering <jim <at> meyering.net>:
> On Sat, Nov 8, 2014 at 7:52 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>   hex_printf_()
>>   {
>>     hex_printf_format=$(printf '%s\n' "$1" | sed '
>>       s/^/_/
>>       s/$/_/
>>       s/\([^\\]\(\\\\\)*\\x\)\([0-9aAbBcCdDeEfF][^0-9aAbBcCdDeEfF]\)/\10\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([0-3]\)/\10\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([4-7]\)/\11\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([89aAbB]\)/\12\3/g
>>       s/\([^\\]\(\\\\\)*\\x\)\([cCdDeEfF]\)/\13\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([0-7]\)/\1,0\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[048cC]\([89aAbBcCdDeEfF]\)/\1,1\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([0-7]\)/\1,2\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[159dD]\([89abcdef]\)/\1,3\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([0-7]\)/\1,4\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[26aAeE]\([89aAbBcCdDeEfF]\)/\1,5\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([0-7]\)/\1,6\3/g
>>       s/\([^\\]\(\\\\\)*\\x[0-3]\)[37bBfF]\([89aAbBcCdDeEfF]\)/\1,7\3/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[08]/\1\3\40/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[19]/\1\3\41/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[2aA]/\1\3\42/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[3bB]/\1\3\43/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[4cC]/\1\3\44/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[5dD]/\1\3\45/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[6eE]/\1\3\46/g
>>       s/\([^\\]\(\\\\\)*\\\)x\([0-3]\),\([0-7]\)[7fF]/\1\3\47/g
>>       s/^_//
>>       s/_$//
>>     ')
>>     shift
>>     printf "$hex_printf_format" "$@"
>>   }
>
> How elegantly twisted ;-)
> I like it.
>
> Do you have time to write the complete patch?
> I'd like to make a pre-release snapshot tomorrow.

I tried it, and found that this new function makes the multibyte-white-space
test fail with GNU sed. Here's a simplified example showing where
it goes wrong. This shows that only the first \x285 is transformed
into \x2,05:

  $ printf '%s\n' '_\x285\x285\n_' \
     |sed 's/\([^\\]\(\\\\\)*\\x[0-3]\)[
048cC]\([0-7]\)/\1,0\3/g'
  _\x2,05\x285\n_

The intent was that it transform both, of course.
The trouble arises when the regexp consumes all 3 hex
digits.  Then there is no longer a non-backslash remaining
to be consumed on 2nd and subsequent iterations.

There is also a portability problem in that Solaris 5.10's /bin/sed
seems unable to handle some of that code. For example,
using that same example with its /bin/sed, neither \x285
string is transformed.




This bug report was last modified 10 years and 198 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.