GNU bug report logs -
#30829
bug: empty regex exits with error when following 2-address like LINENO,/RE/
Previous Next
Reported by: "Don Crissti" <don_crissti <at> gmx.com>
Date: Thu, 15 Mar 2018 20:34:02 UTC
Severity: normal
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 30829 in the body.
You can then email your comments to 30829 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-sed <at> gnu.org
:
bug#30829
; Package
sed
.
(Thu, 15 Mar 2018 20:34:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Don Crissti" <don_crissti <at> gmx.com>
:
New bug report received and forwarded. Copy sent to
bug-sed <at> gnu.org
.
(Thu, 15 Mar 2018 20:34:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/html, inline)]
Reply sent
to
Assaf Gordon <assafgordon <at> gmail.com>
:
You have taken responsibility.
(Thu, 15 Mar 2018 22:35:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
"Don Crissti" <don_crissti <at> gmx.com>
:
bug acknowledged by developer.
(Thu, 15 Mar 2018 22:35:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 30829-done <at> debbugs.gnu.org (full text, mbox):
Hello,
On Thu, Mar 15, 2018 at 09:20:37PM +0100, Don Crissti wrote:
> "the empty regular expression ‘//’ repeats the last regular expression
> match"
>
> however this does not work when the empty regex follows a 2-address of
> the form LINE_NUMBER,/REGEX/
> e.g.
>
> # printf %s\\n {1..5} | sed '2,/5/{//!d}'
>
> fails with
>
> "sed: -e expression #1, char 0: no previous regular expression"
Thanks for reporting this bug and providing an easy way to reproduce.
Before deciding if it's a bug or not, it's worth comparing to other sed's.
(I'm using a slightly different sed program because multiple
commands on the same line is a GNU extension.)
FreeBSD/OpenBSD/NetBSD:
$ printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p'
sed: first RE may not be empty
BusyBox and ToyBox (output seems incorrect):
$ printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p'
1
2
2
3
3
4
4
5
5
Heirloom (http://heirloom.sf.net/):
$ seq 5 | sed-heirloom -n -e '2,/5/p' -e '//p'
2
3
4
5
5
And surprisingly, GNU sed version 3.02:
$ seq 5 | sed-gnu-3.02 -n -e '2,/5/p' -e '//p'
2
3
4
5
5
GNU sed 4.0 and later:
$ seq 5 | sed -n -e '2,/5/p' -e '//p'
sed: -e expression #2, char 0: no previous regular expression
=====
Now to why it happens:
POSIX says (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html):
"If an RE is empty (that is, no pattern is specified) sed shall behave as if the
last RE used in the last command applied (either as an address or as part of a
substitute command) was specified."
And the interpertation (of both GNU sed >4.0 and *BSD's sed) is
that the "last RE used in the last command *applied*" means the last RE *executed*
- not the last regex that preceeds the empty regex in the program.
And so in this command:
sed -n -e '2,/5/p' -e '//p'
On the first line, the address 2 is checked (it doesn't match on line 1 obviously).
the regex '/5/' is *not* executed (because 2 didn't match).
Then sed tries '//p' - but there was no RE executed - hence the error.
The reason for this is that empty (last) regex can be changed
during runtime, based on the input.
Consider the following (contrived) example:
$ printf "%s\n" a ab ab ab \
| sed '1s/a/X/
tq
1s/b/Y/
:q
s//*/'
X
*b
*b
*b
$ printf "%s\n" b ab ab ab \
| sed '1s/a/X/
tq
1s/b/Y/
:q
s//*/'
Y
a*
a*
a*
The flow is:
1. If line 1 contains 'a' - replace 'a' with 'X' and skip the next check
('tq' means "jump to label :q if the last subsitution matched").
2. If line 1 contains 'b' - replace 'b' with 'Y'.
3. For every line, replace the last regex with '*'.
And so you see that the last regex changes dynamically during
runtime, based on whether the first line contained 'a' or 'b'.
In the first case, the three 'a's are replaced with '*'.
In the second case, the three 'b's are replaced with '*'.
I therefore think this is not a bug (and I'm marking it as 'done').
However discussion can continue by replying to this thread,
and if there are different opinions we can always re-open it.
regards,
- assaf
Information forwarded
to
bug-sed <at> gnu.org
:
bug#30829
; Package
sed
.
(Thu, 15 Mar 2018 23:17:02 GMT)
Full text and
rfc822 format available.
Message #13 received at 30829-done <at> debbugs.gnu.org (full text, mbox):
Follow-up:
On Thu, Mar 15, 2018 at 04:34:07PM -0600, Assaf Gordon wrote:
> On Thu, Mar 15, 2018 at 09:20:37PM +0100, Don Crissti wrote:
[...]
> > # printf %s\\n {1..5} | sed '2,/5/{//!d}'
> >
> > fails with
> >
> > "sed: -e expression #1, char 0: no previous regular expression"
> [...]
> And the interpertation (of both GNU sed >4.0 and *BSD's sed) is
> that the "last RE used in the last command *applied*" means the last RE *executed*
> - not the last regex that preceeds the empty regex in the program.
The previous examples were needlessly complicated.
Here's a simpler example:
$ printf "%s\n" ccbb aabb | sed -e '/a/!s/b/X/' -e 's//*/'
ccX*
*abb
Whether the 'last regex' is /a/ or /b/ depends on whether the line
contains 'a' or not.
> I therefore think this is not a bug (and I'm marking it as 'done').
> However discussion can continue by replying to this thread,
> and if there are different opinions we can always re-open it.
One could argue that the behavior you're expecting (and happens
in sed-heirloom and sed-gnu-3.02) is that if there is no "last regex"
silently treat it as 'no match'.
That's easy to implement but I don't think that's a good change.
The current behaviour is better.
More so, I suspect sed-heirloom's behavior is just buggy:
$ seq 1 | sed-heirloom -n -e '2p' -e '//p'
[no output]
$ seq 2 | sed-heirloom -n -e '2p' -e '//p'
2
2
$ seq 2 | sed-heirloom -n -e '2,5p' -e '//p'
2
$ seq 2 | sed-heirloom -n -e '//p'
First RE may not be null
regards,
- assaf
Information forwarded
to
bug-sed <at> gnu.org
:
bug#30829
; Package
sed
.
(Thu, 15 Mar 2018 23:42:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 30829 <at> debbugs.gnu.org (full text, mbox):
Thanks for the prompt reply !
While I understand your explanation I think we are talking about slightly different things. Your example is different than mine. Let me re-write my code so as to be portable:
printf %s\\n 1 2 3 4 5 | sed -e '2,/5/{//!d' -e'}'
Now, as you can see, the main difference between my sample above and your sample
printf "%s\n" 1 2 3 4 5 | sed -n -e '2,/5/p' -e '//p'
is the braces (the command grouping). In other words, you are unconditionally using an empty regex while I'm only using it for lines that meet certain criteria. Based on your explanation (i.e. applied regex=executed regex, no executed regex=no previous regex on line 1 hence fail) I can understand why your code exits with error.
However, my code uses empty regex on condition (only for a certain range of lines). It is logical that '//!d' should not be executed for lines outside that range. If I used a plain 'd' instead of '//!d' would sed unconditionally delete all lines from the file ? No. It would delete only the lines in that range. Similarly, sed should not even attempt to evaluate the empty regex in '{//!d' for lines outside that range.
Unless I'm missing something I still see this as a bug.
Information forwarded
to
bug-sed <at> gnu.org
:
bug#30829
; Package
sed
.
(Fri, 16 Mar 2018 00:17:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 30829 <at> debbugs.gnu.org (full text, mbox):
Hello,
On Fri, Mar 16, 2018 at 12:09:05AM +0100, Don Crissti wrote:
[...]
> However, my code uses empty regex on condition (only for a certain range of lines). It is logical that '//!d' should not be executed for lines outside that range. If I used a plain 'd' instead of '//!d' would sed unconditionally delete all lines from the file ? No. It would delete only the lines in that range. Similarly, sed should not even attempt to evaluate the empty regex in '{//!d' for lines outside that range.
>
> Unless I'm missing something I still see this as a bug.
There is a subtle issue here:
when using 2 addresses, and the second address is an RE,
the first line matching the first address (line 2 in your case)
will *never* be checked against the RE.
And so, even though the '//!d' is run conditionally,
the condition is true (line 2, before regex is checked),
and then '//!d' is executed but there is no 'last regex' yet.
This is documented in the manual:
https://www.gnu.org/software/sed/manual/sed.html#Range-Addresses
(in the second paragraph, starting with "if the second address is a regexp").
Also,
In the POSIX standard, the relevant text is:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
(under "addresses in sed" section).
"An editing command with two addresses shall select the inclusive
range from the first pattern space that matches the first address
through the next pattern space that matches the second. "
The interpretation is that the second address is checked against
"the next pattern space" - impling that the first time the second
address is checkd is not in the first line that matches, but
on the 'next pattern space' (meaning starting at the line following
the line that matched the first address).
[phew, that is a bit confusing....]
Does this clarify the issue?
-assaf
Information forwarded
to
bug-sed <at> gnu.org
:
bug#30829
; Package
sed
.
(Fri, 16 Mar 2018 00:44:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 30829 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 13 Apr 2018 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 7 years and 151 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.