GNU bug report logs - #21251
sed: POSIX and the z command

Previous Next

Package: sed;

Reported by: Stephane Chazelas <stephane.chazelas <at> gmail.com>

Date: Thu, 13 Aug 2015 14:56:01 UTC

Severity: wishlist

Tags: moreinfo, notabug

Full log


View this message in rfc822 format

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: 21251 <at> debbugs.gnu.org
Subject: bug#21251: sed: POSIX and the z command
Date: Thu, 13 Aug 2015 15:55:20 +0100
Last one for today ;)

The GNU sed documentation has:

`z'
     This command empties the content of pattern space.  It is usually
     the same as `s/.*//', but is more efficient and works in the
     presence of invalid multibyte sequences in the input stream.
     POSIX mandates that such sequences are _not_ matched by `.', so
     that there is no portable way to clear `sed''s buffers in the
     middle of the script in most multibyte locales (including UTF-8
     locales).

The part about the POSIX requirement is not true. The behaviour
of sed on non-text input is unspecified, so it doesn't require
that . not match a byte that is not part of a valid character.

GNU sed's (or grep's for that matters) . (or [^[:alnum:]]...)
could just as well match every byte that doesn't otherwise form
part of a valid character (which would be a much better
behaviour IMO) and still be POSIX compliant.

That POSIX requirement is true for regexec() but not for text
utilities.

See that discussion on the Austin Group mailing list:
http://thread.gmane.org/gmane.comp.standards.posix.austin.general/11059/focus=11098

-- 
Stephane




This bug report was last modified 6 years and 313 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.