GNU bug report logs - #71252
why does grep match literal newlines when there are none, even with -z?

Previous Next

Package: grep;

Reported by: Philippe Cerfon <philcerf <at> gmail.com>

Date: Wed, 29 May 2024 01:04:02 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 71252 in the body.
You can then email your comments to 71252 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#71252; Package grep. (Wed, 29 May 2024 01:04:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Philippe Cerfon <philcerf <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 29 May 2024 01:04:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Philippe Cerfon <philcerf <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: why does grep match literal newlines when there are none,
 even with -z?
Date: Tue, 28 May 2024 22:54:18 +0200
Hey.

I always thought, that grep is line based in a way that the current
string doesn't hold the line terminator.
If so, why does, e.g.:
  $ printf 'foo' | grep $'\n'
  foo
match?

Even with -z.
While:
  $ printf 'foo\nbar' | grep -z $'\n'
  foo
  bar
would make sense to me, why does it also match:
  $ printf 'foobar' | grep -z $'\n'
  foobar
?


In PCRE mode:
  $ printf 'foobar' | grep -P -z '\n'
  $
No match, that I would expect.
  $ printf 'foo\nbar' | grep -P -z '\n'
  foo
  bar
Match, again, expected.
But:
  $ printf 'foobar' | grep -P -z $'\n'
  foobar
Why does that match?

Thanks,
Philippe




Information forwarded to bug-grep <at> gnu.org:
bug#71252; Package grep. (Wed, 29 May 2024 06:10:01 GMT) Full text and rfc822 format available.

Message #8 received at 71252 <at> debbugs.gnu.org (full text, mbox):

From: Martin Schulte <gnu <at> schrader-schulte.de>
To: Philippe Cerfon <philcerf <at> gmail.com>
Cc: 71252 <at> debbugs.gnu.org
Subject: Re: bug#71252: why does grep match literal newlines when there are
 none, even with -z?
Date: Wed, 29 May 2024 08:08:57 +0200
Hi!

> I always thought, that grep is line based in a way that the current
> string doesn't hold the line terminator.
> If so, why does, e.g.:
>   $ printf 'foo' | grep $'\n'
>   foo
> match?

I was surprised at the first moment, too, but I think the answer is in first paragraph of the man page:

PATTERNS is one or more patterns separated by newline characters, and grep prints each line that matches a pattern.

Thus, grep $'a\nb' find all lines that either contain an a or a b.

Best regards

Martin




Information forwarded to bug-grep <at> gnu.org:
bug#71252; Package grep. (Thu, 30 May 2024 02:34:01 GMT) Full text and rfc822 format available.

Message #11 received at 71252 <at> debbugs.gnu.org (full text, mbox):

From: "David G. Pickett" <dgpickett <at> aol.com>
To: "71252 <at> debbugs.gnu.org" <71252 <at> debbugs.gnu.org>, 
 Philippe Cerfon <philcerf <at> gmail.com>
Subject: Re: bug#71252: why does grep match literal newlines when there are
 none, even with -z?
Date: Thu, 30 May 2024 02:33:37 +0000 (UTC)
[Message part 1 (text/plain, inline)]
I have used sed to load multiple lines into the buffer for analysis.  I am not sure grep wants to go multiline.




 

    On Tuesday, May 28, 2024 at 09:04:20 PM EDT, Philippe Cerfon <philcerf <at> gmail.com> wrote:   

 Hey.

I always thought, that grep is line based in a way that the current
string doesn't hold the line terminator.
If so, why does, e.g.:
  $ printf 'foo' | grep $'\n'
  foo
match?

Even with -z.
While:
  $ printf 'foo\nbar' | grep -z $'\n'
  foo
  bar
would make sense to me, why does it also match:
  $ printf 'foobar' | grep -z $'\n'
  foobar
?


In PCRE mode:
  $ printf 'foobar' | grep -P -z '\n'
  $
No match, that I would expect.
  $ printf 'foo\nbar' | grep -P -z '\n'
  foo
  bar
Match, again, expected.
But:
  $ printf 'foobar' | grep -P -z $'\n'
  foobar
Why does that match?

Thanks,
Philippe



  
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#71252; Package grep. (Thu, 30 May 2024 04:16:01 GMT) Full text and rfc822 format available.

Message #14 received at 71252 <at> debbugs.gnu.org (full text, mbox):

From: Philippe Cerfon <philcerf <at> gmail.com>
To: gnu <at> schrader-schulte.de
Cc: 71252 <at> debbugs.gnu.org
Subject: Re: bug#71252: why does grep match literal newlines when there are
 none, even with -z?
Date: Thu, 30 May 2024 06:14:09 +0200
On Wed, May 29, 2024 at 8:09 AM Martin Schulte <gnu <at> schrader-schulte.de> wrote:

> PATTERNS is one or more patterns separated by newline characters, and grep prints each line that matches a pattern.

Dammit, it's so obvious. Sorry for the noise.

Well actually, there is still one curious point left:
If these are now two patterns, than it's two times the empty pattern, right?

I recently stumbled over this in another area, and as far as I
understand POSIX does not define what should happen with BREs/EREs
that are empty (or subpatterns that are empty), but, if I remember
correctly, it's simply not part of the grammar given.

Now GNU grep says "The empty regular expression matches the empty string.".
Yet it still matches "foo" in the example above, which clearly is not empty.

So what exactly does "matches the empty string" mean? Only if the line
is completely empty (in which case I still wouldn't understand why foo
matches)? Or does it assume empty strings before/after each character?

Thanks,
Philippe




Information forwarded to bug-grep <at> gnu.org:
bug#71252; Package grep. (Sun, 09 Jun 2024 14:39:01 GMT) Full text and rfc822 format available.

Message #17 received at 71252 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Philippe Cerfon <philcerf <at> gmail.com>
Cc: 71252 <at> debbugs.gnu.org, gnu <at> schrader-schulte.de
Subject: Re: bug#71252: why does grep match literal newlines when there are
 none, even with -z?
Date: Sun, 09 Jun 2024 16:37:59 +0200
On Mai 30 2024, Philippe Cerfon wrote:

> Now GNU grep says "The empty regular expression matches the empty string.".
> Yet it still matches "foo" in the example above, which clearly is not empty.

It matches the empty string *anywhere* in the text.  The match is not
anchored.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Added tag(s) notabug. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Tue, 11 Jun 2024 16:26:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 71252 <at> debbugs.gnu.org and Philippe Cerfon <philcerf <at> gmail.com> Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Tue, 11 Jun 2024 16:26:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 10 Jul 2024 11:24:17 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 58 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.