GNU bug report logs - #21865
Parenthesis subexpressions

Previous Next

Package: grep;

Reported by: Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it>

Date: Sun, 8 Nov 2015 21:58:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 21865 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#21865; Package grep. (Sun, 08 Nov 2015 21:58:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 08 Nov 2015 21:58:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it>
To: bug-grep <at> gnu.org
Subject: Parenthesis subexpressions
Date: Sun, 08 Nov 2015 21:42:44 +0100
[Message part 1 (text/plain, inline)]
Hi,

(First time in a GNU mailing list!)

I've already asked this question to my local GNU/Linux user group and in #grep <at> Freenode... I'm still confused.

GNU Grep don't have an arg to choose the subexpression. Right?

Stupid e.g.:
    echo abcde | grep -o -E 'b([a-z])d'
    => "bcd"

What if I want the first subexpression? ("b")? GNU Grep can't do it. Isn't it? (Why?)

I actually use GNU Awk, or GNU Bash with $BASH_REMATCH[$n_sub].

Thank you for the clarification!
--
Valerio Bozzolan
Email sent from Android (CyanogenMod) using K-9 Mail.
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#21865; Package grep. (Sun, 08 Nov 2015 22:36:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it>
To: bug-grep <at> gnu.org
Subject: Re: Parenthesis subexpressions
Date: Sun, 08 Nov 2015 21:49:03 +0100
[Message part 1 (text/plain, inline)]
Sorry... typo...

    echo abcde | grep -o -E 'b([a-z])d'
    => "bcd"

Can't I choose to have only "c"?

Thanks again!

On 8 November 2015 21:42:44 CET, Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it> wrote:
>Hi,
>
>(First time in a GNU mailing list!)
>
>I've already asked this question to my local GNU/Linux user group and
>in #grep <at> Freenode... I'm still confused.
>
>GNU Grep don't have an arg to choose the subexpression. Right?
>
>Stupid e.g.:
>    echo abcde | grep -o -E 'b([a-z])d'
>    => "bcd"
>
>What if I want the first subexpression? ("b")? GNU Grep can't do it.
>Isn't it? (Why?)
>
>I actually use GNU Awk, or GNU Bash with $BASH_REMATCH[$n_sub].
>
>Thank you for the clarification!
>--
>Valerio Bozzolan
>Email sent from Android (CyanogenMod) using K-9 Mail.

-- 
Valerio Bozzolan
Email sent from Android (CyanogenMod) using K-9 Mail.

http://boz.reyboz.it
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#21865; Package grep. (Mon, 09 Nov 2015 13:52:01 GMT) Full text and rfc822 format available.

Message #11 received at 21865 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it>
Cc: 21865 <at> debbugs.gnu.org
Subject: Re: bug#21865: Parenthesis subexpressions
Date: Mon, 9 Nov 2015 13:50:46 +0000
2015-11-08 21:49:03 +0100, Valerio Bozzolan:
> Sorry... typo...
> 
>     echo abcde | grep -o -E 'b([a-z])d'
>     => "bcd"
> 
> Can't I choose to have only "c"?
[...]

That's correct, GNU grep doesn't have that capability (yet).
Recent versions of pcregrep do:

$ echo abc | pcregrep -o1 '.(.).'
b

Now, I'm not a GNU grep maintainer but I suppose the question is
how far do we want to take grep away from its original purpose
(print the lines that match a pattern which is what g/re/p
stands for).

GNU grep is already doing find's job with -r, part of sed's job
with -o/--colour.

Having said that, I do agree it's the logical continuation after
-o.

Note that for now, you can already do:

$ echo abcde | grep -o -P 'b\K[a-z](?=d)'
c


-- 
Stephane




Information forwarded to bug-grep <at> gnu.org:
bug#21865; Package grep. (Mon, 09 Nov 2015 16:05:01 GMT) Full text and rfc822 format available.

Message #14 received at 21865 <at> debbugs.gnu.org (full text, mbox):

From: Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it>
Cc: 21865 <at> debbugs.gnu.org
Subject: Re: bug#21865: Parenthesis subexpressions
Date: Mon, 09 Nov 2015 17:03:39 +0100
[Message part 1 (text/plain, inline)]
Thanks for agreeing with the evolution of the meaning of "-o".

Just to make you a laugh: I was reproducing egrep with $BASH_REMATCH:
https://gist.github.com/valerio-bozzolan/6787675e931dce1ba7e9

Definitely not beautiful... but really effective for me.

So something like "egrep -o $n regex" also can save the world from code similar to mine.

On 9 November 2015 14:50:46 CET, Stephane Chazelas <stephane.chazelas <at> gmail.com> wrote:
>2015-11-08 21:49:03 +0100, Valerio Bozzolan:
>> Sorry... typo...
>> 
>>     echo abcde | grep -o -E 'b([a-z])d'
>>     => "bcd"
>> 
>> Can't I choose to have only "c"?
>[...]
>
>That's correct, GNU grep doesn't have that capability (yet).
>Recent versions of pcregrep do:
>
>$ echo abc | pcregrep -o1 '.(.).'
>b
>
>Now, I'm not a GNU grep maintainer but I suppose the question is
>how far do we want to take grep away from its original purpose
>(print the lines that match a pattern which is what g/re/p
>stands for).
>
>GNU grep is already doing find's job with -r, part of sed's job
>with -o/--colour.
>
>Having said that, I do agree it's the logical continuation after
>-o.
>
>Note that for now, you can already do:
>
>$ echo abcde | grep -o -P 'b\K[a-z](?=d)'
>c
>
>
>-- 
>Stephane

-- 
Valerio Bozzolan
Email sent from Android (CyanogenMod) using K-9 Mail.
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#21865; Package grep. (Mon, 09 Nov 2015 17:28:02 GMT) Full text and rfc822 format available.

Message #17 received at 21865 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: Valerio Bozzolan <bozzolan.valerio <at> educ.di.unito.it>
Subject: Re: bug#21865: Parenthesis subexpressions
Date: Mon, 9 Nov 2015 17:16:19 +0000
2015-11-09 17:00:36 +0100, Valerio Bozzolan:
> Thanks for agreeing with the evolution of the meaning of "-o".
> 
> Just to make you a laugh: I was reproducing egrep with $BASH_REMATCH:
> https://gist.github.com/valerio-bozzolan/6787675e931dce1ba7e9
> 
> Definitely not beautiful... but really effective for me.

You may want to read:

https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
https://unix.stackexchange.com/questions/209123/understand-ifs-read-r-line
https://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo

Here, if there wasn't a pcregrep already, I'd rather do it in
perl or GNU sed than bash.

Like:

perl -lne 'print for /a([a-z])c/g'

Also note that:

echo abac | pcregrep -o1 'a(.)'
b
c

> So something like "egrep -o $n regex" also can save the world from code similar to mine.

GNU grep can't add it like that as that would break backward
compatibility.

grep -o 1 regex file

is currently meant to print the occurrences of "1" in the
"regex" and "file" files.

Even adding it as:

grep -o1 regex file

would probably not be a good idea as that would mean some
ad-hoc parsing of the options (in "grep -o1 regexp", "1" would
be an argument to "-o" while in "grep -oi regexp", "i" currently
is a separate "-i" option).

So reasonably, it should probably be a separate option like -O 1.

-- 
Stephane




Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Thu, 31 Dec 2015 08:56:02 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 169 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.