GNU bug report logs - #40242
n as delimiter alias

Previous Next

Package: sed;

Reported by: Oğuz <oguzismailuysal <at> gmail.com>

Date: Thu, 26 Mar 2020 15:31:02 UTC

Severity: normal

Tags: confirmed

Merged with 40239

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Jim Meyering <jim <at> meyering.net>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#40239: closed (Bug in how \cregexpc is handled)
Date: Mon, 24 Oct 2022 06:26:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Sun, 23 Oct 2022 23:25:05 -0700
with message-id <CA+8g5KFyNdT6FJUNfNfRh2OySYs7nNLpOm4OUzbtWE+Rru2TWA <at> mail.gmail.com>
and subject line Re: bug#40242: n as delimiter alias
has caused the debbugs.gnu.org bug report #40242,
regarding Bug in how \cregexpc is handled
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
40242: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=40242
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Enrico Maria De Angelis <enricomaria.dean6elis <at> gmail.com>
To: bug-sed <at> gnu.org
Subject: Bug in how \cregexpc is handled
Date: Thu, 26 Mar 2020 14:18:28 +0000
[Message part 3 (text/plain, inline)]
To whom it may concern,

From man sed, I read:
       \cregexpc
              Match lines matching the regular expression regexp.  The c
may be any character.
On the one hand

   - sed '\cncd' <<< n correctly shows empty output, since it's the same as sed
   '/n/d' <<< n based on the description above;
   - sed '\c\ccd' <<< c correctly shows an empty output too, but in this
   case the letter needed to be escaped for obvious reasons.

 On the other hand:

   - sed '\n\nnd' <<< n results in an output equal to the single character n,
   revealing that the backslash is having a double effect:
      1. it prevents the following n from closing the opening \n.
      2. it interprets the n as a newline instead of the literal letter n;
      this is confirmed by executing echo -e 'a\na' | sed -n 'N;\n\nnp'.

The is means that using n in \nregexpn prevevents the use of the literal n
in the regexp.

The issue has come to light in this StackOverflow
<https://stackoverflow.com/questions/60853746/what-is-n-nnd-supposed-to-do>
question.

Kind regards,
Enrico Maria De Angelis
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: 40242-done <at> debbugs.gnu.org, Assaf Gordon <assafgordon <at> gmail.com>,
 Oğuz <oguzismailuysal <at> gmail.com>
Subject: Re: bug#40242: n as delimiter alias
Date: Sun, 23 Oct 2022 23:25:05 -0700
[Message part 6 (text/plain, inline)]
On Tue, Mar 31, 2020 at 6:36 AM Eric Blake <eblake <at> redhat.com> wrote:
> On 3/31/20 2:00 AM, Oğuz wrote:
> > Thanks for the reply. This might not be a bug though; I sent a similar mail
> > (https://www.mail-archive.com/austin-group-l <at> opengroup.org/msg05881.html)
> > to Austin Group mailing list asking what's the expected behavior in this
> > case, and I was told (
> > https://www.mail-archive.com/austin-group-l <at> opengroup.org/msg05891.html)
> > both behaviors -yielding n or empty line- are correct and standard should
> > *probably* be amended to explicitly state that this is unspecified. And
> > apparently (
> > https://www.mail-archive.com/austin-group-l <at> opengroup.org/msg05893.html)
> > some other UNIXes adopted the same practice as GNU sed (or vice versa, I
> > don't know which one is older).
>
> The POSIX folks will probably declare that use of a \X sequence (for
> arbitrary X; 'n', 't', '1', and probably others all fit this category)
> inside a regex delimited by X is unspecified behavior.  But that still
> doesn't stop us from fixing GNU set to at least be consistent - we
> should either blindly declare that \X represents the special meaning of
> X when such a meaning is present regardless of X also being the regex
> delimiter (our current \n behavior - no way to represent the delimiter
> as a literal match), or that use of X as a delimiter renders the special
> meaning of \X useless for that regex (our \t behavior - no way to
> represent the special behavior as part of the match).  My personal
> preference is making things consistent to our \t behavior.
>
> >> In the code, the "match_slash" function [1] is used to find
> >> the delimiters of the "s" command (typically "slashes").
> >> Special handling happens if a slash is found [2],
> >> And in lines 557-8 there's this conditional:
> >>
> >>                else if (ch == 'n' && regex)
> >>                  ch = '\n';
> >>
> >> Which forces any "\n" to be a new-line, regardless if the
> >> delimiter itself was an "n".
> >>
>
> >> Interestingly, removing these two lines does not cause
> >> any test failures, so this might be easy to fix without causing
> >> any regressions.
> >>
> >>
> >> For now I'm leaving this item open until we decide how to deal with it.
>
> I'm thus in favor of removing that special-case of 'n'.

Thank you all. Sorry it's taken so long.
I expect to push the following tomorrow.
[sed-tweak.diff (application/octet-stream, attachment)]

This bug report was last modified 2 years and 294 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.