GNU bug report logs -
#51462
sed bug: ASCII NUL not handled in simple pattern
Previous Next
Full log
View this message in rfc822 format
(Adding Eric Blake for POSIX opinion)
Hello,
On 2021-10-28 11:32 a.m., Davide Brini wrote:
> On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter <fw <at> immunant.com>
> wrote:
>>
>> Compare the output of these two sed invocations:
>> ```
>> $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d'
>>
> $ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d'
>
> (\o000, \x00 also work). All documented here:
> https://www.gnu.org/software/sed/manual/sed.html#Escapes
>
> Whether sed maintainers want to also allow the \0 syntax, up to them of
> course.
Thanks Davide for the reply.
In GNU sed, "\0" in the replacement part acts identically to "&" -
referencing the whole matched portion.
This is the implemented behavior (though undocumented?) since GNU sed
version 3, released in December 1995 - so not likely to be changed.
For comparison, in BSDs "\0" acts as literal zero (ASCII 48).
Interestingly, POSIX defines a "BACKREF" as:
[...] The character string consisting of a <backslash> character
followed by a single-digit numeral, '1' to '9'.
( from:
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_05
)
And so one could argue that this is a GNU extension that should be
disabled when used with "sed --posix".
I think we should keep "\0" undocumented to prevent proliferation of
this non-standard behavior.
regards,
- assaf
This bug report was last modified 3 years and 228 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.