GNU bug report logs - #30829
bug: empty regex exits with error when following 2-address like LINENO,/RE/

Previous Next

Package: sed;

Reported by: "Don Crissti" <don_crissti <at> gmx.com>

Date: Thu, 15 Mar 2018 20:34:02 UTC

Severity: normal

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 30829 in the body.
You can then email your comments to 30829 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#30829; Package sed. (Thu, 15 Mar 2018 20:34:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Don Crissti" <don_crissti <at> gmx.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Thu, 15 Mar 2018 20:34:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Don Crissti" <don_crissti <at> gmx.com>
To: bug-sed <at> gnu.org
Subject: bug: empty regex exits with error when following 2-address like
 LINENO,/RE/
Date: Thu, 15 Mar 2018 21:20:37 +0100
[Message part 1 (text/html, inline)]

Reply sent to Assaf Gordon <assafgordon <at> gmail.com>:
You have taken responsibility. (Thu, 15 Mar 2018 22:35:02 GMT) Full text and rfc822 format available.

Notification sent to "Don Crissti" <don_crissti <at> gmx.com>:
bug acknowledged by developer. (Thu, 15 Mar 2018 22:35:02 GMT) Full text and rfc822 format available.

Message #10 received at 30829-done <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Don Crissti <don_crissti <at> gmx.com>
Cc: 30829-done <at> debbugs.gnu.org
Subject: Re: bug#30829: bug: empty regex exits with error when following
 2-address like LINENO, /RE/
Date: Thu, 15 Mar 2018 16:34:07 -0600
Hello,

On Thu, Mar 15, 2018 at 09:20:37PM +0100, Don Crissti wrote:
>    "the empty regular expression ‘//’ repeats the last regular expression
>    match"
> 
>    however this does not work when the empty regex follows a 2-address of
>    the form LINE_NUMBER,/REGEX/
>    e.g.
> 
>    # printf %s\\n {1..5} | sed '2,/5/{//!d}'
> 
>    fails with
> 
>    "sed: -e expression #1, char 0: no previous regular expression"

Thanks for reporting this bug and providing an easy way to reproduce.

Before deciding if it's a bug or not, it's worth comparing to other sed's.
(I'm using a slightly different sed program because multiple
commands on the same line is a GNU extension.)

FreeBSD/OpenBSD/NetBSD:

  $ printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p' 
  sed: first RE may not be empty

BusyBox and ToyBox (output seems incorrect):

  $ printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p'
  1
  2
  2
  3
  3
  4
  4
  5
  5

Heirloom (http://heirloom.sf.net/):

  $ seq 5 | sed-heirloom -n -e '2,/5/p' -e '//p'                            
  2
  3
  4
  5
  5

And surprisingly, GNU sed version 3.02:

  $ seq 5 | sed-gnu-3.02 -n -e '2,/5/p' -e '//p'
  2
  3
  4
  5
  5


GNU sed 4.0 and later:

  $ seq 5 | sed -n -e '2,/5/p' -e '//p'                             
  sed: -e expression #2, char 0: no previous regular expression

=====

Now to why it happens:

POSIX says (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html):

  "If an RE is empty (that is, no pattern is specified) sed shall behave as if the 
  last RE used in the last command applied (either as an address or as part of a 
  substitute command) was specified."

And the interpertation (of both GNU sed >4.0 and *BSD's sed) is
that the "last RE used in the last command *applied*" means the last RE *executed*
- not the last regex that preceeds the empty regex in the program.

And so in this command:

   sed -n -e '2,/5/p' -e '//p'

On the first line, the address 2 is checked (it doesn't match on line 1 obviously).
the regex '/5/' is *not* executed (because 2 didn't match).
Then sed tries '//p' - but there was no RE executed - hence the error.

The reason for this is that empty (last) regex can be changed
during runtime, based on the input.

Consider the following (contrived) example:

 $ printf "%s\n" a ab ab ab \
      | sed '1s/a/X/
             tq
             1s/b/Y/
             :q
             s//*/'
 X
 *b
 *b
 *b

 $ printf "%s\n" b ab ab ab \
      | sed '1s/a/X/
             tq
             1s/b/Y/
             :q
             s//*/'
 Y
 a*
 a*
 a*




The flow is:
1. If line 1 contains 'a' - replace 'a' with 'X' and skip the next check
   ('tq' means "jump to label :q if the last subsitution matched").
2. If line 1 contains 'b' - replace 'b' with 'Y'.
3. For every line, replace the last regex with '*'.

And so you see that the last regex changes dynamically during
runtime, based on whether the first line contained 'a' or 'b'.

In the first case, the three 'a's are replaced with '*'.
In the second case, the three 'b's are replaced with '*'.


I therefore think this is not a bug (and I'm marking it as 'done').
However discussion can continue by replying to this thread,
and if there are different opinions we can always re-open it.

regards,
 - assaf




Information forwarded to bug-sed <at> gnu.org:
bug#30829; Package sed. (Thu, 15 Mar 2018 23:17:02 GMT) Full text and rfc822 format available.

Message #13 received at 30829-done <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Don Crissti <don_crissti <at> gmx.com>
Cc: 30829-done <at> debbugs.gnu.org
Subject: Re: bug#30829: bug: empty regex exits with error when following
 2-address like LINENO, /RE/
Date: Thu, 15 Mar 2018 17:16:34 -0600
Follow-up:

On Thu, Mar 15, 2018 at 04:34:07PM -0600, Assaf Gordon wrote:
> On Thu, Mar 15, 2018 at 09:20:37PM +0100, Don Crissti wrote:
[...]
> >    # printf %s\\n {1..5} | sed '2,/5/{//!d}'
> > 
> >    fails with
> > 
> >    "sed: -e expression #1, char 0: no previous regular expression"
> [...]
> And the interpertation (of both GNU sed >4.0 and *BSD's sed) is
> that the "last RE used in the last command *applied*" means the last RE *executed*
> - not the last regex that preceeds the empty regex in the program.

The previous examples were needlessly complicated.
Here's a simpler example:

  $ printf "%s\n" ccbb aabb | sed -e '/a/!s/b/X/' -e 's//*/'           
  ccX*
  *abb

Whether the 'last regex' is /a/ or /b/ depends on whether the line
contains 'a' or not.
 
> I therefore think this is not a bug (and I'm marking it as 'done').
> However discussion can continue by replying to this thread,
> and if there are different opinions we can always re-open it.

One could argue that the behavior you're expecting (and happens
in sed-heirloom and sed-gnu-3.02) is that if there is no "last regex"
silently treat it as 'no match'.

That's easy to implement but I don't think that's a good change.
The current behaviour is better.

More so, I suspect sed-heirloom's behavior is just buggy:

  $ seq 1 | sed-heirloom -n -e '2p' -e '//p'
  [no output]

  $ seq 2 | sed-heirloom -n -e '2p' -e '//p'
  2
  2

  $ seq 2 | sed-heirloom -n -e '2,5p' -e '//p'
  2

  $ seq 2 | sed-heirloom -n -e '//p'
  First RE may not be null


regards,
 - assaf





Information forwarded to bug-sed <at> gnu.org:
bug#30829; Package sed. (Thu, 15 Mar 2018 23:42:01 GMT) Full text and rfc822 format available.

Message #16 received at 30829 <at> debbugs.gnu.org (full text, mbox):

From: "Don Crissti" <don_crissti <at> gmx.com>
To: 30829 <at> debbugs.gnu.org
Date: Fri, 16 Mar 2018 00:09:05 +0100
Thanks for the prompt reply !
While I understand your explanation I think we are talking about slightly different things. Your example is different than mine. Let me re-write my code so as to be portable:
 
printf %s\\n 1 2 3 4 5 | sed -e '2,/5/{//!d' -e'}'

Now, as you can see, the main difference between my sample above and your sample

printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p'

is the braces (the command grouping). In other words, you are unconditionally using an empty regex while I'm only using it for lines that meet certain criteria. Based on your explanation (i.e. applied regex=executed regex, no executed regex=no previous regex on line 1 hence fail) I can understand why your code exits with error.

However, my code uses empty regex on condition (only for a certain range of lines). It is logical that '//!d' should not be executed for lines outside that range. If I used a plain 'd' instead of '//!d' would sed unconditionally delete all lines from the file ? No. It would delete only the lines in that range. Similarly, sed should not even attempt to evaluate the empty regex in '{//!d' for lines outside that range.

Unless I'm missing something I still see this as a bug.




Information forwarded to bug-sed <at> gnu.org:
bug#30829; Package sed. (Fri, 16 Mar 2018 00:17:02 GMT) Full text and rfc822 format available.

Message #19 received at 30829 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Don Crissti <don_crissti <at> gmx.com>
Cc: 30829 <at> debbugs.gnu.org
Subject: Re: bug#30829: (no subject)
Date: Thu, 15 Mar 2018 18:16:48 -0600
Hello,

On Fri, Mar 16, 2018 at 12:09:05AM +0100, Don Crissti wrote:
[...] 
> However, my code uses empty regex on condition (only for a certain range of lines). It is logical that '//!d' should not be executed for lines outside that range. If I used a plain 'd' instead of '//!d' would sed unconditionally delete all lines from the file ? No. It would delete only the lines in that range. Similarly, sed should not even attempt to evaluate the empty regex in '{//!d' for lines outside that range.
> 
> Unless I'm missing something I still see this as a bug.

There is a subtle issue here:
when using 2 addresses, and the second address is an RE,
the first line matching the first address (line 2 in your case)
will *never* be checked against the RE.

And so, even though the '//!d' is run conditionally,
the condition is true (line 2, before regex is checked),
and then '//!d' is executed but there is no 'last regex' yet.

This is documented in the manual:
https://www.gnu.org/software/sed/manual/sed.html#Range-Addresses
(in the second paragraph, starting with "if the second address is a regexp").

Also,
In the POSIX standard, the relevant text is:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
(under "addresses in sed" section).

  "An editing command with two addresses shall select the inclusive 
  range from the first pattern space that matches the first address 
  through the next pattern space that matches the second. "

The interpretation is that the second address is checked against
"the next pattern space" - impling that the first time the second
address is checkd is not in the first line that matches, but
on the 'next pattern space' (meaning starting at the line following
the line that matched the first address).
[phew, that is a bit confusing....]



Does this clarify the issue?

-assaf




Information forwarded to bug-sed <at> gnu.org:
bug#30829; Package sed. (Fri, 16 Mar 2018 00:44:02 GMT) Full text and rfc822 format available.

Message #22 received at 30829 <at> debbugs.gnu.org (full text, mbox):

From: "Don Crissti" <don_crissti <at> gmx.com>
To: 30829 <at> debbugs.gnu.org
Subject: Re: bug#30829: (no subject)
Date: Fri, 16 Mar 2018 01:39:45 +0100
[Message part 1 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 13 Apr 2018 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 151 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.