GNU bug report logs -
#21251
sed: POSIX and the z command
Previous Next
Full log
View this message in rfc822 format
Last one for today ;)
The GNU sed documentation has:
`z'
This command empties the content of pattern space. It is usually
the same as `s/.*//', but is more efficient and works in the
presence of invalid multibyte sequences in the input stream.
POSIX mandates that such sequences are _not_ matched by `.', so
that there is no portable way to clear `sed''s buffers in the
middle of the script in most multibyte locales (including UTF-8
locales).
The part about the POSIX requirement is not true. The behaviour
of sed on non-text input is unspecified, so it doesn't require
that . not match a byte that is not part of a valid character.
GNU sed's (or grep's for that matters) . (or [^[:alnum:]]...)
could just as well match every byte that doesn't otherwise form
part of a valid character (which would be a much better
behaviour IMO) and still be POSIX compliant.
That POSIX requirement is true for regexec() but not for text
utilities.
See that discussion on the Austin Group mailing list:
http://thread.gmane.org/gmane.comp.standards.posix.austin.general/11059/focus=11098
--
Stephane
This bug report was last modified 6 years and 313 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.