GNU bug report logs - #31816
Saved Sub String Only Saves Last

Previous Next

Package: sed;

Reported by: Mark.Ot2o <at> gmail.com

Date: Wed, 13 Jun 2018 17:54:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #12 received at 31816-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Mark.Ot2o <at> gmail.com, 31816-done <at> debbugs.gnu.org,
 GNU bug control <control <at> debbugs.gnu.org>
Subject: Re: bug#31816: Saved Sub String Only Saves Last
Date: Mon, 18 Jun 2018 15:04:23 -0500
tag 31816 notabug
thanks

On 06/13/2018 12:03 PM, Mark Otto wrote:
> If I use a saved substring it should capture the maximum number of
> characters that fit the pattern, in this case  [0-9][0-9]*.

Sed already does that (an operator is as greedy as possible, given what 
has already been matched earlier in the line).  However, you are 
misunderstanding how greedy operators work.

> 
> echo "I'm 2254 years old"|sed "s/^..*\([0-9][0-9]*\) /She's \1 /"
> She's 4 years old"

That is correct output.  Remember, in sed, every pattern is evaluated 
from left to right to find the longest possible substring that will 
match, where patterns on the left use a shorter substring only if 
patterns on the right are not possible with the longest substring. 
Since .* is a greedy pattern, you have matched:

"I" "'m 225" "4"
 ^.  .*       \([0-9][0-9]*\)

> 
> 
> She should be 2254 years old.

If you want the second pattern to match longer as a higher priority than 
the first .* pattern being greedy, you have to use some other pattern on 
the first use, such as:

echo "I'm 2254 years old" | sed "s/^..*[^0-9]\([0-9][0-9]*\)/She's \1/"

which matches as:

"I" "'m" " "     "2254"
 ^.  .*   [^0-9]  \([0-9][0-9]*\)

where my explicit match of a non-digit forced the .* to be less greedy.

Or, you can use other languages, like perl, which have the extension of 
non-greedy operators, as in:

echo "I'm 2254 years old" | perl -pe "s/^..*?([0-9]+) /She's \1/"

perl is more like 'sed -E', but has the additional '.*?' non-greedy 
counterpart to '.*' that sed lacks.

> 
> It does search correctly because without the substring it replaces all the
> digits:
> 
> echo "I'm 2287 years old"|sed "s/^..*[0-9][0-9]*/She's many/"
> She's many years old"

That output is still correct, but wasn't doing what you claimed it was 
doing.  Again, it was matching:

"I" "'m 228" "7"
 ^.  .*       [0-9][0-9]*

then replacing that entire match.

As such, I'm marking this as not a bug.  But feel free to comment 
further if you still need help.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




This bug report was last modified 6 years and 333 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.