GNU bug report logs - #31816
Saved Sub String Only Saves Last

Previous Next

Package: sed;

Reported by: Mark.Ot2o <at> gmail.com

Date: Wed, 13 Jun 2018 17:54:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31816 in the body.
You can then email your comments to 31816 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#31816; Package sed. (Wed, 13 Jun 2018 17:54:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mark.Ot2o <at> gmail.com:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Wed, 13 Jun 2018 17:54:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mark Otto <mark.ot2o <at> gmail.com>
To: bug-sed <at> gnu.org
Subject: Saved Sub String Only Saves Last
Date: Wed, 13 Jun 2018 13:03:16 -0400
[Message part 1 (text/plain, inline)]
If I use a saved substring it should capture the maximum number of
characters that fit the pattern, in this case  [0-9][0-9]*.

echo "I'm 2254 years old"|sed "s/^..*\([0-9][0-9]*\) /She's \1 /"
She's 4 years old"


She should be 2254 years old.

It does search correctly because without the substring it replaces all the
digits:

echo "I'm 2287 years old"|sed "s/^..*[0-9][0-9]*/She's many/"
She's many years old"


Here is my version information:

sed --version # On Windows 10
sed (GNU sed) 4.4
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Jay Fenlason, Tom Lord, Ken Pizzini,
and Paolo Bonzini.
GNU sed home page: <http://www.gnu.org/software/sed/>.
General help using GNU software: <http://www.gnu.org/gethelp/>.
E-mail bug reports to: <bug-sed <at> gnu.org>.
[Message part 2 (text/html, inline)]

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 18 Jun 2018 20:05:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Mon, 18 Jun 2018 20:05:03 GMT) Full text and rfc822 format available.

Notification sent to Mark.Ot2o <at> gmail.com:
bug acknowledged by developer. (Mon, 18 Jun 2018 20:05:03 GMT) Full text and rfc822 format available.

Message #12 received at 31816-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Mark.Ot2o <at> gmail.com, 31816-done <at> debbugs.gnu.org,
 GNU bug control <control <at> debbugs.gnu.org>
Subject: Re: bug#31816: Saved Sub String Only Saves Last
Date: Mon, 18 Jun 2018 15:04:23 -0500
tag 31816 notabug
thanks

On 06/13/2018 12:03 PM, Mark Otto wrote:
> If I use a saved substring it should capture the maximum number of
> characters that fit the pattern, in this case  [0-9][0-9]*.

Sed already does that (an operator is as greedy as possible, given what 
has already been matched earlier in the line).  However, you are 
misunderstanding how greedy operators work.

> 
> echo "I'm 2254 years old"|sed "s/^..*\([0-9][0-9]*\) /She's \1 /"
> She's 4 years old"

That is correct output.  Remember, in sed, every pattern is evaluated 
from left to right to find the longest possible substring that will 
match, where patterns on the left use a shorter substring only if 
patterns on the right are not possible with the longest substring. 
Since .* is a greedy pattern, you have matched:

"I" "'m 225" "4"
 ^.  .*       \([0-9][0-9]*\)

> 
> 
> She should be 2254 years old.

If you want the second pattern to match longer as a higher priority than 
the first .* pattern being greedy, you have to use some other pattern on 
the first use, such as:

echo "I'm 2254 years old" | sed "s/^..*[^0-9]\([0-9][0-9]*\)/She's \1/"

which matches as:

"I" "'m" " "     "2254"
 ^.  .*   [^0-9]  \([0-9][0-9]*\)

where my explicit match of a non-digit forced the .* to be less greedy.

Or, you can use other languages, like perl, which have the extension of 
non-greedy operators, as in:

echo "I'm 2254 years old" | perl -pe "s/^..*?([0-9]+) /She's \1/"

perl is more like 'sed -E', but has the additional '.*?' non-greedy 
counterpart to '.*' that sed lacks.

> 
> It does search correctly because without the substring it replaces all the
> digits:
> 
> echo "I'm 2287 years old"|sed "s/^..*[0-9][0-9]*/She's many/"
> She's many years old"

That output is still correct, but wasn't doing what you claimed it was 
doing.  Again, it was matching:

"I" "'m 228" "7"
 ^.  .*       [0-9][0-9]*

then replacing that entire match.

As such, I'm marking this as not a bug.  But feel free to comment 
further if you still need help.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Information forwarded to bug-sed <at> gnu.org:
bug#31816; Package sed. (Wed, 20 Jun 2018 15:14:03 GMT) Full text and rfc822 format available.

Message #15 received at 31816 <at> debbugs.gnu.org (full text, mbox):

From: Mark Otto <mark.ot2o <at> gmail.com>
To: 31816 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#31816: closed (Re: bug#31816: Saved Sub String Only Saves
 Last)
Date: Wed, 20 Jun 2018 07:41:29 -0400
[Message part 1 (text/plain, inline)]
Dear Eric,

Thank you for your thorough explanation of the greediness of sed.  If I was
thinking about sed's greediness, I should have thought that it would be
consistent at every point, including being greedy before my back
reference.  The nongreedy perl operators are intuitive, but their matching
process still needs to be thought through.

I found an explanation of the difference between greedy and non-greedy here
<https://stackoverflow.com/questions/3075130/what-is-the-difference-between-and-regular-expressions>
:

Consider the input 101000000000100.  Using 1.*1, * is greedy  It will match
all the way to the end, and then backtrack until it can match a 1, leaving
you with 1010000000001.  .*? is non-greedy. * will match nothing, but then
will try to match extra characters until it matches a 1, eventually
matching 101.  All quantifiers have a non-greedy mode: .*?, .+?, .{2,6}?,
and even .??.

Sed is a UNIX standard, so I could think harder about how it works rather
than jumping to "It's a bug!"

Best wishes,
Mark

On Mon, Jun 18, 2018 at 4:05 PM GNU bug Tracking System <
help-debbugs <at> gnu.org> wrote:

> Your bug report
>
> #31816: Saved Sub String Only Saves Last
>
> which was filed against the sed package, has been closed.
>
> The explanation is attached below, along with your original report.
> If you require more details, please reply to 31816 <at> debbugs.gnu.org.
>
> --
> 31816: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=31816
> GNU Bug Tracking System
> Contact help-debbugs <at> gnu.org with problems
>
>
>
> ---------- Forwarded message ----------
> From: Eric Blake <eblake <at> redhat.com>
> To: Mark.Ot2o <at> gmail.com, 31816-done <at> debbugs.gnu.org, GNU bug control <
> control <at> debbugs.gnu.org>
> Cc:
> Bcc:
> Date: Mon, 18 Jun 2018 15:04:23 -0500
> Subject: Re: bug#31816: Saved Sub String Only Saves Last
> tag 31816 notabug
> thanks
>
> On 06/13/2018 12:03 PM, Mark Otto wrote:
> > If I use a saved substring it should capture the maximum number of
> > characters that fit the pattern, in this case  [0-9][0-9]*.
>
> Sed already does that (an operator is as greedy as possible, given what
> has already been matched earlier in the line).  However, you are
> misunderstanding how greedy operators work.
>
> >
> > echo "I'm 2254 years old"|sed "s/^..*\([0-9][0-9]*\) /She's \1 /"
> > She's 4 years old"
>
> That is correct output.  Remember, in sed, every pattern is evaluated
> from left to right to find the longest possible substring that will
> match, where patterns on the left use a shorter substring only if
> patterns on the right are not possible with the longest substring.
> Since .* is a greedy pattern, you have matched:
>
> "I" "'m 225" "4"
>   ^.  .*       \([0-9][0-9]*\)
>
> >
> >
> > She should be 2254 years old.
>
> If you want the second pattern to match longer as a higher priority than
> the first .* pattern being greedy, you have to use some other pattern on
> the first use, such as:
>
> echo "I'm 2254 years old" | sed "s/^..*[^0-9]\([0-9][0-9]*\)/She's \1/"
>
> which matches as:
>
> "I" "'m" " "     "2254"
>   ^.  .*   [^0-9]  \([0-9][0-9]*\)
>
> where my explicit match of a non-digit forced the .* to be less greedy.
>
> Or, you can use other languages, like perl, which have the extension of
> non-greedy operators, as in:
>
> echo "I'm 2254 years old" | perl -pe "s/^..*?([0-9]+) /She's \1/"
>
> perl is more like 'sed -E', but has the additional '.*?' non-greedy
> counterpart to '.*' that sed lacks.
>
> >
> > It does search correctly because without the substring it replaces all
> the
> > digits:
> >
> > echo "I'm 2287 years old"|sed "s/^..*[0-9][0-9]*/She's many/"
> > She's many years old"
>
> That output is still correct, but wasn't doing what you claimed it was
> doing.  Again, it was matching:
>
> "I" "'m 228" "7"
>   ^.  .*       [0-9][0-9]*
>
> then replacing that entire match.
>
> As such, I'm marking this as not a bug.  But feel free to comment
> further if you still need help.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
>
>
>
>
> ---------- Forwarded message ----------
> From: Mark Otto <mark.ot2o <at> gmail.com>
> To: bug-sed <at> gnu.org
> Cc:
> Bcc:
> Date: Wed, 13 Jun 2018 13:03:16 -0400
> Subject: Saved Sub String Only Saves Last
> If I use a saved substring it should capture the maximum number of
> characters that fit the pattern, in this case  [0-9][0-9]*.
>
> echo "I'm 2254 years old"|sed "s/^..*\([0-9][0-9]*\) /She's \1 /"
> She's 4 years old"
>
>
> She should be 2254 years old.
>
> It does search correctly because without the substring it replaces all the
> digits:
>
> echo "I'm 2287 years old"|sed "s/^..*[0-9][0-9]*/She's many/"
> She's many years old"
>
>
> Here is my version information:
>
> sed --version # On Windows 10
> sed (GNU sed) 4.4
> Copyright (C) 2017 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> Written by Jay Fenlason, Tom Lord, Ken Pizzini,
> and Paolo Bonzini.
> GNU sed home page: <http://www.gnu.org/software/sed/>.
> General help using GNU software: <http://www.gnu.org/gethelp/>.
> E-mail bug reports to: <bug-sed <at> gnu.org>.
>
[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 19 Jul 2018 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 333 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.