GNU bug report logs -
#40242
n as delimiter alias
Previous Next
Reported by: Oğuz <oguzismailuysal <at> gmail.com>
Date: Thu, 26 Mar 2020 15:31:02 UTC
Severity: normal
Tags: confirmed
Merged with 40239
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#40242: Bug in how \cregexpc is handled
which was filed against the sed package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 40239 <at> debbugs.gnu.org.
--
40242: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=40242
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
On Tue, Mar 31, 2020 at 6:36 AM Eric Blake <eblake <at> redhat.com> wrote:
> On 3/31/20 2:00 AM, Oğuz wrote:
> > Thanks for the reply. This might not be a bug though; I sent a similar mail
> > (https://www.mail-archive.com/austin-group-l <at> opengroup.org/msg05881.html)
> > to Austin Group mailing list asking what's the expected behavior in this
> > case, and I was told (
> > https://www.mail-archive.com/austin-group-l <at> opengroup.org/msg05891.html)
> > both behaviors -yielding n or empty line- are correct and standard should
> > *probably* be amended to explicitly state that this is unspecified. And
> > apparently (
> > https://www.mail-archive.com/austin-group-l <at> opengroup.org/msg05893.html)
> > some other UNIXes adopted the same practice as GNU sed (or vice versa, I
> > don't know which one is older).
>
> The POSIX folks will probably declare that use of a \X sequence (for
> arbitrary X; 'n', 't', '1', and probably others all fit this category)
> inside a regex delimited by X is unspecified behavior. But that still
> doesn't stop us from fixing GNU set to at least be consistent - we
> should either blindly declare that \X represents the special meaning of
> X when such a meaning is present regardless of X also being the regex
> delimiter (our current \n behavior - no way to represent the delimiter
> as a literal match), or that use of X as a delimiter renders the special
> meaning of \X useless for that regex (our \t behavior - no way to
> represent the special behavior as part of the match). My personal
> preference is making things consistent to our \t behavior.
>
> >> In the code, the "match_slash" function [1] is used to find
> >> the delimiters of the "s" command (typically "slashes").
> >> Special handling happens if a slash is found [2],
> >> And in lines 557-8 there's this conditional:
> >>
> >> else if (ch == 'n' && regex)
> >> ch = '\n';
> >>
> >> Which forces any "\n" to be a new-line, regardless if the
> >> delimiter itself was an "n".
> >>
>
> >> Interestingly, removing these two lines does not cause
> >> any test failures, so this might be easy to fix without causing
> >> any regressions.
> >>
> >>
> >> For now I'm leaving this item open until we decide how to deal with it.
>
> I'm thus in favor of removing that special-case of 'n'.
Thank you all. Sorry it's taken so long.
I expect to push the following tomorrow.
[sed-tweak.diff (application/octet-stream, attachment)]
[Message part 5 (message/rfc822, inline)]
[Message part 6 (text/plain, inline)]
To whom it may concern,
From man sed, I read:
\cregexpc
Match lines matching the regular expression regexp. The c
may be any character.
On the one hand
- sed '\cncd' <<< n correctly shows empty output, since it's the same as sed
'/n/d' <<< n based on the description above;
- sed '\c\ccd' <<< c correctly shows an empty output too, but in this
case the letter needed to be escaped for obvious reasons.
On the other hand:
- sed '\n\nnd' <<< n results in an output equal to the single character n,
revealing that the backslash is having a double effect:
1. it prevents the following n from closing the opening \n.
2. it interprets the n as a newline instead of the literal letter n;
this is confirmed by executing echo -e 'a\na' | sed -n 'N;\n\nnp'.
The is means that using n in \nregexpn prevevents the use of the literal n
in the regexp.
The issue has come to light in this StackOverflow
<https://stackoverflow.com/questions/60853746/what-is-n-nnd-supposed-to-do>
question.
Kind regards,
Enrico Maria De Angelis
[Message part 7 (text/html, inline)]
This bug report was last modified 2 years and 294 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.