GNU bug report logs - #26576
-v when used with -C

Previous Next

Package: grep;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Thu, 20 Apr 2017 14:40:01 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 26576 in the body.
You can then email your comments to 26576 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 14:40:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Thu, 20 Apr 2017 14:40:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-grep <at> gnu.org
Subject: -v when used with -C
Date: Thu, 20 Apr 2017 22:39:28 +0800
You know if this only gets five lines,
grep -C 2    ZZZ 00001.vcf|wc - 00001.vcf
      5       5     197 -
   1686    1731   83630 00001.vcf
then this
grep -C 2 -v ZZZ 00001.vcf|wc - 00001.vcf
   1686    1731   83630 -
   1686    1731   83630 00001.vcf
should get all EXCEPT five lines.




Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Thu, 20 Apr 2017 14:58:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Thu, 20 Apr 2017 14:58:02 GMT) Full text and rfc822 format available.

Notification sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
bug acknowledged by developer. (Thu, 20 Apr 2017 14:58:02 GMT) Full text and rfc822 format available.

Message #12 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>,
 26576-done <at> debbugs.gnu.org
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 09:56:56 -0500
[Message part 1 (text/plain, inline)]
tag 26576 notabug
thanks

On 04/20/2017 09:39 AM, 積丹尼 Dan Jacobson wrote:
> You know if this only gets five lines,
> grep -C 2    ZZZ 00001.vcf|wc - 00001.vcf
>       5       5     197 -
>    1686    1731   83630 00001.vcf
> then this
> grep -C 2 -v ZZZ 00001.vcf|wc - 00001.vcf
>    1686    1731   83630 -
>    1686    1731   83630 00001.vcf
> should get all EXCEPT five lines.

Not necessarily true.  Let's simplify your example to something that
doesn't require knowing the contents of 00001.vcf:

$ seq 10 | grep -C 2    5
3
4
5
6
7

That says show all lines that match the regex '5', as well as (up to) 2
context lines on either side.  So we get a total output of five lines,
even though only one of those five lines actually matched.

Now the converse:

$ seq 10 | grep -C 2 -v 5
1
2
3
4
5
6
7
8
9
10

That says to show all lines that do not match the regex '5', as well as
(up to) 2 context lines on either side.  So we get a total output of ten
lines, but that is comprised of 4 matching lines, 1 context line, and 5
more matching lines (grep was smart enough to consolidate the two tail
lines after 4 and the two head lines before 6 into a single output line,
rather than displaying two independent chunks).

For further proof that -C and -v are correctly working together, try
something that excludes enough context lines to actually get two hunks:

$ seq 10 | grep -C 2 -v '[3-8]'
1
2
3
4
--
7
8
9
10

Now you're matching 2 lines, then 2 lines tail context, then a hunk
separator, then 2 lines head context, then 2 more matching lines.

Therefore, I'm tagging this as not a bug.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 15:15:02 GMT) Full text and rfc822 format available.

Message #15 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: 26576-done <at> debbugs.gnu.org
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 23:14:24 +0800
Mmmm, OK, but grep still needs an additional future option to print just
the missing set...




Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 15:22:01 GMT) Full text and rfc822 format available.

Message #18 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 26576-done <at> debbugs.gnu.org
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 10:21:02 -0500
[Message part 1 (text/plain, inline)]
On 04/20/2017 10:14 AM, 積丹尼 Dan Jacobson wrote:
> Mmmm, OK, but grep still needs an additional future option to print just
> the missing set...

What output are you wanting?  If all you want is the non-matching lines,
don't ask for context (since the context will include matching lines).

If you want your request to be acted on, please demonstrate with some
sample input and the resulting output you want to accomplish, and then
we can help you figure out if that particular output can already be
generated using existing options.  But your vague request to "print just
the missing set" doesn't tell me what you really want.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 15:38:01 GMT) Full text and rfc822 format available.

Message #21 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: 26576-done <at> debbugs.gnu.org
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 23:37:03 +0800
I want to do
$ cat file|some_program
but I must must exclude the UGLY line and its two neighbors.

OK I have found the UGLY line, and its two neighbors
$ grep -C 2 UGLY file
bla
bla
UGLY
bla
bla

but I have no way to exclude them before piping to some_program.




Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 16:27:01 GMT) Full text and rfc822 format available.

Message #24 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 26576-done <at> debbugs.gnu.org
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 11:26:47 -0500
[Message part 1 (text/plain, inline)]
On 04/20/2017 10:37 AM, 積丹尼 Dan Jacobson wrote:
> I want to do
> $ cat file|some_program
> but I must must exclude the UGLY line and its two neighbors.
> 
> OK I have found the UGLY line, and its two neighbors
> $ grep -C 2 UGLY file
> bla
> bla
> UGLY
> bla
> bla
> 
> but I have no way to exclude them before piping to some_program.

So it sounds like you are asking for some sort of new --invert-output,
which toggles which lines to display.  Revisiting my example, it would
change:

$ seq 10 | grep -C 2    5
3
4
5
6
7

into:

$ seq 10 | grep -C 2    5 --invert-output
1
2
--
8
9
10

as well as:

$ seq 10 | grep -C 2 -v 5
1
2
3
4
5
6
7
8
9
10
$ seq 10 | grep -C 2 -v '[3-8]'
1
2
3
4
--
7
8
9
10

into:

$ seq 10 | grep -C 2 -v 5 --invert-output
$ seq 10 | grep -C 2 -v '[3-8]' --invert-output
5
6

It's very corner case, so I'm not sure it's worth burning an option and
complicating grep to do this, plus waiting for a future version of grep
with the proposed new option to percolate to your machines, when you
already accomplish the same task using existing tools (admittedly with
more complexity).

For example, you can use sed twice if the data is in a file that can be
re-read or easily regenerated (in this case, I'm skipping d, h, and any
line within -C1 of the ugly lines):

$ printf %s\\n a b c d e f g h i j > file
$ ugly=$(sed -n '/[dh]/ =' file)
$ sed "$(for line in $ugly; do echo "$((line-1)),$((line+1))d;";
   done)" file
a
b
f
j

Or it should be easy enough to write an awk script that stashes all
input lines into one array, then checks for regular expression matches,
and sets multiple entries in a corresponding poison array to 1 (based on
how many lines of context you want to poison), then in an END block only
print out lines if the corresponding poison[] entry is not 1.  Although
I'll leave that as an exercise for the reader.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 16:39:01 GMT) Full text and rfc822 format available.

Message #27 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: 26576-done <at> debbugs.gnu.org
Subject: Re: bug#26576: -v when used with -C
Date: Fri, 21 Apr 2017 00:38:04 +0800
Yes, if somebody ever adds this option perhaps call it --compliment.




Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 16:49:01 GMT) Full text and rfc822 format available.

Message #30 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Cc: 26576-done <at> debbugs.gnu.org
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 11:48:27 -0500
[Message part 1 (text/plain, inline)]
On 04/20/2017 11:38 AM, 積丹尼 Dan Jacobson wrote:
> Yes, if somebody ever adds this option perhaps call it --compliment.

Except that you mean --complement (you are not praising the lines, but
making an opposite selection of lines).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 16:53:02 GMT) Full text and rfc822 format available.

Message #33 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 26576-done <at> debbugs.gnu.org,
 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 16:51:19 +0000
Hello,

On Thu, Apr 20, 2017 at 11:26:47AM -0500, Eric Blake wrote:
>On 04/20/2017 10:37 AM, 積丹尼 Dan Jacobson wrote:
>> I want to do
>> $ cat file|some_program
>> but I must must exclude the UGLY line and its two neighbors.
>>
>> OK I have found the UGLY line, and its two neighbors
>> $ grep -C 2 UGLY file
>> bla
>> bla
>> UGLY
>> bla
>> bla
>>
>> but I have no way to exclude them before piping to some_program.
>
>It's very corner case, so I'm not sure it's worth burning an option and
>complicating grep to do this, plus waiting for a future version of grep
>with the proposed new option to percolate to your machines, when you
>already accomplish the same task using existing tools (admittedly with
>more complexity).
>


If I may suggest the following sed program:

 $ cat file
 a
 b
 c
 bla1
 bla2
 UGLY
 bla3
 bla4
 e
 f
 g

 $ sed -n ':x 1,2{N;bx} ; /UGLY/{ N;N;z;bx }; /./P;N;D' file
 a
 b
 c
 e
 f
 g


The combination of N/P/D commands use sed's pattern space
as a fifo buffer (N appends a new line, P prints the last line,
D deletes the last line).
In between, if the pattern space contains the marker UGLY,
the entire buffer is deleted and the cycle is restarted.

Specifically:

1. ':x 1,2{N;bx}' => Load the buffer with the first two lines.

2. '/UGLY/ {N;N;z;bx}' => If the marker is found in the pattern
  space (which should contain 3 lines now),
  consume two more lines (N;N), clear the buffer (z) and
  jump to the beginning.
  'z' is GNU extension. It can be replaced with 's/.*//'.

3. '/./P' => If the pattern space isn't empty, print up to
  the first line;

4. 'N;D' => Read the next line from the input file and append
  it to the pattern space, Delete the last line from the
  pattern space (the same line that was printed with 'P').



The following program can be used to learn a bit more about how 
the N/P/D commands work. It uses 'l' to the print content
of the pattern space, and you can see it behaves like a FIFO:

 $ sed -n ':x 1,2{N;bx} ; l;P;N;D' file
 a\nb\nc$
 a
 b\nc\nbla1$
 b
 c\nbla1\nbla2$
 c
 bla1\nbla2\nUGLY$
 bla1
 bla2\nUGLY\nbla3$
 bla2
 UGLY\nbla3\nbla4$
 UGLY
 bla3\nbla4\ne$
 bla3
 bla4\ne\nf$
 bla4
 e\nf\ng$
 e


More information about sed's buffers can be found here:
https://www.gnu.org/software/sed/manual/sed.html#advanced-sed

hope this helps,
regards,
- assaf








Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 17:06:02 GMT) Full text and rfc822 format available.

Message #36 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 26576-done <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#26576: -v when used with -C
Date: Fri, 21 Apr 2017 01:05:50 +0800
Yes those are brilliant uses of sed. However for now

‘-v’
‘--invert-match’
     Invert the sense of matching, to select non-matching lines.  (‘-v’
     is specified by POSIX.)

perhaps should mention that "-v is processed before -C, -A, and -B, not after."




Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Thu, 20 Apr 2017 19:35:02 GMT) Full text and rfc822 format available.

Message #39 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 26576-done <at> debbugs.gnu.org,
 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Subject: Re: bug#26576: -v when used with -C
Date: Thu, 20 Apr 2017 14:34:47 -0500
[Message part 1 (text/plain, inline)]
On 04/20/2017 11:51 AM, Assaf Gordon wrote:

> If I may suggest the following sed program:
> 
>  $ cat file
>  a
>  b
>  c
>  bla1
>  bla2
>  UGLY
>  bla3
>  bla4
>  e
>  f
>  g
> 
>  $ sed -n ':x 1,2{N;bx} ; /UGLY/{ N;N;z;bx }; /./P;N;D' file

Works as long as lines 1 and 2 do not contain UGLY. But misbehaves if
UGLY appears early:

$ printf '2\nUGLY\n3\n4\nc\nd\n' | sed -n ':x 1,2{N;bx};
/UGLY/{N;N;z;bx}; /./P;N;D'
d

Oops - missed c.

Also misbehaves if two occurrences of UGLY appear with overlapping context:

$ printf 'a\nb\n1\n2\nUGLY\n3\nUGLY\n4\n5\nc\nd\n' | sed -n ':x
1,2{N;bx}; /UGLY/{N;N;z;bx}; /./P;N;D'
a
b
4
5
c
d

Oops - didn't filter 4 and 5.

May be fixable with even more magic, perhaps by using the hold buffer to
track the status of the last three lines, and suppressing output if any
of the last three inputs were UGLY.  But more complicated than I want to
spend time on for the sake of this email.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#26576; Package grep. (Fri, 21 Apr 2017 15:36:02 GMT) Full text and rfc822 format available.

Message #42 received at 26576-done <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 26576-done <at> debbugs.gnu.org,
 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Subject: Re: bug#26576: -v when used with -C
Date: Fri, 21 Apr 2017 15:34:57 +0000
On Thu, Apr 20, 2017 at 02:34:47PM -0500, Eric Blake wrote:
>On 04/20/2017 11:51 AM, Assaf Gordon wrote:
>
>> If I may suggest the following sed program:
>>
>>  $ sed -n ':x 1,2{N;bx} ; /UGLY/{ N;N;z;bx }; /./P;N;D' file
>
>Works as long as lines 1 and 2 do not contain UGLY. But misbehaves if
>UGLY appears early:
[...]
>Also misbehaves if two occurrences of UGLY appear with overlapping context:
>
[...]
>May be fixable with even more magic, perhaps by using the hold buffer to
>track the status of the last three lines, and suppressing output if any
>of the last three inputs were UGLY.  But more complicated than I want to
>spend time on for the sake of this email.
>

Good catch, thanks for pointing this out.

Indeed, that was an ad-hoc script, suitible for some limited scenarios
but not robust as a general solution.

-assaf






bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 20 May 2017 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 34 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.