GNU bug report logs - #15773
grep-2.15 bug report

Previous Next

Package: grep;

Reported by: Mirraz Mirraz <mirraz1 <at> rambler.ru>

Date: Thu, 31 Oct 2013 18:09:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15773 in the body.
You can then email your comments to 15773 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Thu, 31 Oct 2013 18:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mirraz Mirraz <mirraz1 <at> rambler.ru>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Thu, 31 Oct 2013 18:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Mirraz Mirraz" <mirraz1 <at> rambler.ru>
To: bug-grep <at> gnu.org
Subject: grep-2.15 bug report
Date: Thu, 31 Oct 2013 21:46:55 +0400
After updating from 2.14 to 2.15 grep has started to fail to match patterns that contain '\s*' or '\s\+'
For example:

(grep-2.14)
$ echo '[ ]' | grep '\s*'
[ ]
$

(grep-2.15)
$ echo '[ ]' | grep '\s*'
$




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Thu, 31 Oct 2013 21:57:01 GMT) Full text and rfc822 format available.

Message #8 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Mirraz Mirraz <mirraz1 <at> rambler.ru>
Cc: 15773 <at> debbugs.gnu.org
Subject: Re: bug#15773: grep-2.15 bug report
Date: Thu, 31 Oct 2013 14:55:45 -0700
On Thu, Oct 31, 2013 at 10:46 AM, Mirraz Mirraz <mirraz1 <at> rambler.ru> wrote:
>
> After updating from 2.14 to 2.15 grep has started to fail to match patterns
> that contain '\s*' or '\s\+'
> For example:
>
> (grep-2.14)
> $ echo '[ ]' | grep '\s*'
> [ ]
> $
>
> (grep-2.15)
> $ echo '[ ]' | grep '\s*'
> $

Thank you for the report.
That is clearly a regression.  That is now the most compelling (of 3)
reasons to make a new release.




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Fri, 01 Nov 2013 03:38:02 GMT) Full text and rfc822 format available.

Message #11 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Mirraz Mirraz <mirraz1 <at> rambler.ru>
Cc: 15773 <at> debbugs.gnu.org
Subject: Re: bug#15773: grep-2.15 bug report
Date: Thu, 31 Oct 2013 20:36:42 -0700
[Message part 1 (text/plain, inline)]
On Thu, Oct 31, 2013 at 2:55 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Thu, Oct 31, 2013 at 10:46 AM, Mirraz Mirraz <mirraz1 <at> rambler.ru> wrote:
>>
>> After updating from 2.14 to 2.15 grep has started to fail to match patterns
>> that contain '\s*' or '\s\+'
>> For example:
>>
>> (grep-2.14)
>> $ echo '[ ]' | grep '\s*'
>> [ ]
>> $
>>
>> (grep-2.15)
>> $ echo '[ ]' | grep '\s*'
>> $
>
> Thank you for the report.
> That is clearly a regression.  That is now the most compelling (of 3)
> reasons to make a new release.

Here's a preliminary patch.
I'm about to write the test suite additions to accompany it:
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Fri, 01 Nov 2013 04:39:02 GMT) Full text and rfc822 format available.

Message #14 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Mirraz Mirraz <mirraz1 <at> rambler.ru>
Cc: 15773 <at> debbugs.gnu.org
Subject: Re: bug#15773: grep-2.15 bug report
Date: Thu, 31 Oct 2013 21:38:24 -0700
[Message part 1 (text/plain, inline)]
On Thu, Oct 31, 2013 at 8:36 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Thu, Oct 31, 2013 at 2:55 PM, Jim Meyering <jim <at> meyering.net> wrote:
>> On Thu, Oct 31, 2013 at 10:46 AM, Mirraz Mirraz <mirraz1 <at> rambler.ru> wrote:
>>>
>>> After updating from 2.14 to 2.15 grep has started to fail to match patterns
>>> that contain '\s*' or '\s\+'
>>> For example:
>>>
>>> (grep-2.14)
>>> $ echo '[ ]' | grep '\s*'
>>> [ ]
>>> $
>>>
>>> (grep-2.15)
>>> $ echo '[ ]' | grep '\s*'
>>> $
>>
>> Thank you for the report.
>> That is clearly a regression.  That is now the most compelling (of 3)
>> reasons to make a new release.
>
> Here's a preliminary patch.
> I'm about to write the test suite additions to accompany it:

And here's a proper patch, including NEWS and test suite additions:
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Fri, 01 Nov 2013 07:54:01 GMT) Full text and rfc822 format available.

Message #17 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: mirraz1 <at> rambler.ru, jim <at> meyering.net
Cc: 15773 <at> debbugs.gnu.org
Subject: Re: bug#15773: grep-2.15 bug report
Date: Fri, 01 Nov 2013 09:53:01 +0200
Hello All.

> >>> After updating from 2.14 to 2.15 grep has started to fail to match patterns
> >>> that contain '\s*' or '\s\+'
>
> And here's a proper patch, including NEWS and test suite additions:

FWIW, I can't reproduce this in gawk (gawk-4.1-stable branch).

The program below correctly produces no output, with and without the fix
in dfa.c:lex. (I have added the fix anyway.)

Any ideas why?

Thanks,

Arnold
----------------------------------
BEGIN {
	pat["^\\s*$"] = pat["^\\s+$"] = pat["^\\s?$"] = pat["^\\s{1}$"] = 1
	for (i in pat) {
		if (" " !~ i) {
			printf("pattern \"%s\" failed!\n", i) > "/dev/stderr"
			exit 1
		}
	}
	exit 0
}




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Fri, 01 Nov 2013 15:16:02 GMT) Full text and rfc822 format available.

Message #20 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Aharon Robbins <arnold <at> skeeve.com>
Cc: 15773 <at> debbugs.gnu.org, Mirraz Mirraz <mirraz1 <at> rambler.ru>
Subject: Re: bug#15773: grep-2.15 bug report
Date: Fri, 1 Nov 2013 08:15:08 -0700
[Message part 1 (text/plain, inline)]
On Fri, Nov 1, 2013 at 12:53 AM, Aharon Robbins <arnold <at> skeeve.com> wrote:
> Hello All.
>
>> >>> After updating from 2.14 to 2.15 grep has started to fail to match patterns
>> >>> that contain '\s*' or '\s\+'
>>
>> And here's a proper patch, including NEWS and test suite additions:
>
> FWIW, I can't reproduce this in gawk (gawk-4.1-stable branch).
>
> The program below correctly produces no output, with and without the fix
> in dfa.c:lex. (I have added the fix anyway.)
>
> Any ideas why?
>
> Thanks,
>
> Arnold
> ----------------------------------
> BEGIN {
>         pat["^\\s*$"] = pat["^\\s+$"] = pat["^\\s?$"] = pat["^\\s{1}$"] = 1
>         for (i in pat) {
>                 if (" " !~ i) {
>                         printf("pattern \"%s\" failed!\n", i) > "/dev/stderr"
>                         exit 1
>                 }
>         }
>         exit 0
> }

Thanks for the report.
With that, I realized that my new grep test case was inadequate:
it did not force the use of a multibyte locale, and thus did not fail
even without the fix.
I'm amending the patch (not yet pushed) with this:
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Fri, 01 Nov 2013 15:56:02 GMT) Full text and rfc822 format available.

Message #23 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Stefano Lattarini <stefano.lattarini <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>, Aharon Robbins <arnold <at> skeeve.com>
Cc: 15773 <at> debbugs.gnu.org, Mirraz Mirraz <mirraz1 <at> rambler.ru>
Subject: Re: bug#15773: grep-2.15 bug report
Date: Fri, 01 Nov 2013 15:55:15 +0000
Hi Jim.

On 11/01/2013 03:15 PM, Jim Meyering wrote:
> On Fri, Nov 1, 2013 at 12:53 AM, Aharon Robbins <arnold <at> skeeve.com> wrote:
>> Hello All.
>>
>>>>>> After updating from 2.14 to 2.15 grep has started to fail to match patterns
>>>>>> that contain '\s*' or '\s\+'
>>>
>>> And here's a proper patch, including NEWS and test suite additions:
>>
>> FWIW, I can't reproduce this in gawk (gawk-4.1-stable branch).
>>
>> The program below correctly produces no output, with and without the fix
>> in dfa.c:lex. (I have added the fix anyway.)
>>
>> Any ideas why?
>>
>> Thanks,
>>
>> Arnold
>> ----------------------------------
>> BEGIN {
>>          pat["^\\s*$"] = pat["^\\s+$"] = pat["^\\s?$"] = pat["^\\s{1}$"] = 1
>>          for (i in pat) {
>>                  if (" " !~ i) {
>>                          printf("pattern \"%s\" failed!\n", i) > "/dev/stderr"
>>                          exit 1
>>                  }
>>          }
>>          exit 0
>> }
>
> Thanks for the report.
> With that, I realized that my new grep test case was inadequate:
> it did not force the use of a multibyte locale, and thus did not fail
> even without the fix.
>
This probably calls for a two patch series: the first introducing the test as
an XFAIL, the second fixing the bug without touching the tests, and verifying
that the test succeeds.

> I'm amending the patch (not yet pushed) with this:
>
> diff --git a/tests/backslash-s-and-repetition-operators b/tests/backslash-s-and-repetition-operators
> index 562646d..b1267f8 100755
> --- a/tests/backslash-s-and-repetition-operators
> +++ b/tests/backslash-s-and-repetition-operators
> @@ -9,6 +9,11 @@
>
>  . "${srcdir=.}/init.sh"; path_prepend_ ../src
>
> +require_en_utf8_locale_
> +
> +LC_ALL=en_US.UTF-8
> +export LC_ALL
> +
> printf ' \n' > in || framework_failure_
>
> fail=0
>
Maybe you could even amend the test to run with all of the default locale, the
en_US.UTF-8 locale, and the C locale.  Possibly overly paranoid, but the
enhancement would be trivial, so why not get the extra coverage anyway?

Thanks,
  Stefano




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Sat, 02 Nov 2013 15:22:01 GMT) Full text and rfc822 format available.

Message #26 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: Aharon Robbins <arnold <at> skeeve.com>, 15773 <at> debbugs.gnu.org,
 Mirraz Mirraz <mirraz1 <at> rambler.ru>
Subject: Re: bug#15773: grep-2.15 bug report
Date: Sat, 2 Nov 2013 08:20:51 -0700
[Message part 1 (text/plain, inline)]
On Fri, Nov 1, 2013 at 8:55 AM, Stefano Lattarini
<stefano.lattarini <at> gmail.com> wrote:
> This probably calls for a two patch series: the first introducing the test
> as
> an XFAIL, the second fixing the bug without touching the tests, and
> verifying
> that the test succeeds.

That seems like overkill, and unnecessary churn in git.  Usually, once I
have a complete(including test case) and committed-but-not-pushed patch ,
I either arrange to run the test against the previous binary by replacing
src/grep with the grep from my path, or (probably better) temporarily
backing out the fix, e.g., with "git log -1 -p src/dfa.c|patch -R -p1"
and ensuring that "make check" fails.

> Maybe you could even amend the test to run with all of the default locale,
> the
> en_US.UTF-8 locale, and the C locale.  Possibly overly paranoid, but the
> enhancement would be trivial, so why not get the extra coverage anyway?

That seems worthwhile.
The default locale is set via tests/Makefile.am to LC_ALL=C, so I have
done this:
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Sat, 02 Nov 2013 16:57:02 GMT) Full text and rfc822 format available.

Message #29 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Stefano Lattarini <stefano.lattarini <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Aharon Robbins <arnold <at> skeeve.com>, 15773 <at> debbugs.gnu.org,
 Mirraz Mirraz <mirraz1 <at> rambler.ru>
Subject: Re: bug#15773: grep-2.15 bug report
Date: Sat, 02 Nov 2013 16:56:28 +0000
On 11/02/2013 03:20 PM, Jim Meyering wrote:> On Fri, Nov 1, 2013 at 8:55 AM, Stefano Lattarini
> <stefano.lattarini <at> gmail.com> wrote:
>> This probably calls for a two patch series: the first introducing the test
>> as
>> an XFAIL, the second fixing the bug without touching the tests, and
>> verifying
>> that the test succeeds.
>
> That seems like overkill, and unnecessary churn in git.  Usually, once I
> have a complete(including test case) and committed-but-not-pushed patch ,
> I either arrange to run the test against the previous binary by replacing
> src/grep with the grep from my path, or (probably better) temporarily
> backing out the fix, e.g., with "git log -1 -p src/dfa.c|patch -R -p1"
> and ensuring that "make check" fails.
>
This nit I pointed out was admittedly minor, and in large part a matter
of personal preferences, so I have no problem with you disagreeing and
ignoring it.

>> Maybe you could even amend the test to run with all of the default locale,
>> the
>> en_US.UTF-8 locale, and the C locale.  Possibly overly paranoid, but the
>> enhancement would be trivial, so why not get the extra coverage anyway?
>
> That seems worthwhile.
> The default locale is set via tests/Makefile.am to LC_ALL=C, so I have
> done this:
>
> [SNIP]
>
> Subject: [PATCH] maint.mk: fix "release" target to build _version
>
> [SNIP]

I think you attached the wrong patch ;-)

Regards,
  Stefano




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Sat, 02 Nov 2013 17:36:02 GMT) Full text and rfc822 format available.

Message #32 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: jim <at> meyering.net
Cc: 15773 <at> debbugs.gnu.org, mirraz1 <at> rambler.ru
Subject: Re: bug#15773: grep-2.15 bug report
Date: Sat, 02 Nov 2013 19:35:50 +0200
Hi.

> > The program below correctly produces no output, with and without the fix
> > in dfa.c:lex. (I have added the fix anyway.)

Also with LC_ALL=en_US.utf8, without the fix the program still passes.

So, any ideas?

Thanks,

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Sat, 02 Nov 2013 18:34:01 GMT) Full text and rfc822 format available.

Message #35 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Stefano Lattarini <stefano.lattarini <at> gmail.com>
Cc: Aharon Robbins <arnold <at> skeeve.com>, 15773 <at> debbugs.gnu.org,
 Mirraz Mirraz <mirraz1 <at> rambler.ru>
Subject: Re: bug#15773: grep-2.15 bug report
Date: Sat, 2 Nov 2013 11:32:39 -0700
[Message part 1 (text/plain, inline)]
On Sat, Nov 2, 2013 at 9:56 AM, Stefano Lattarini
<stefano.lattarini <at> gmail.com> wrote:
> On 11/02/2013 03:20 PM, Jim Meyering wrote:> On Fri, Nov 1, 2013 at 8:55 AM, Stefano Lattarini
>> <stefano.lattarini <at> gmail.com> wrote:
>>> This probably calls for a two patch series: the first introducing the test
>>> as
>>> an XFAIL, the second fixing the bug without touching the tests, and
>>> verifying
>>> that the test succeeds.
>>
>> That seems like overkill, and unnecessary churn in git.  Usually, once I
>> have a complete(including test case) and committed-but-not-pushed patch ,
>> I either arrange to run the test against the previous binary by replacing
>> src/grep with the grep from my path, or (probably better) temporarily
>> backing out the fix, e.g., with "git log -1 -p src/dfa.c|patch -R -p1"
>> and ensuring that "make check" fails.
>>
> This nit I pointed out was admittedly minor, and in large part a matter
> of personal preferences, so I have no problem with you disagreeing and
> ignoring it.
>
>>> Maybe you could even amend the test to run with all of the default locale,
>>> the
>>> en_US.UTF-8 locale, and the C locale.  Possibly overly paranoid, but the
>>> enhancement would be trivial, so why not get the extra coverage anyway?
>>
>> That seems worthwhile.
>> The default locale is set via tests/Makefile.am to LC_ALL=C, so I have
>> done this:
...
> I think you attached the wrong patch ;-)

Sigh, you're right.
Here's the intended one (along with a NEWS update):
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Sat, 02 Nov 2013 18:42:02 GMT) Full text and rfc822 format available.

Message #38 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Aharon Robbins <arnold <at> skeeve.com>
Cc: 15773 <at> debbugs.gnu.org, Mirraz Mirraz <mirraz1 <at> rambler.ru>
Subject: Re: bug#15773: grep-2.15 bug report
Date: Sat, 2 Nov 2013 11:40:40 -0700
On Sat, Nov 2, 2013 at 10:35 AM, Aharon Robbins <arnold <at> skeeve.com> wrote:
> Hi.
>
>> > The program below correctly produces no output, with and without the fix
>> > in dfa.c:lex. (I have added the fix anyway.)
>
> Also with LC_ALL=en_US.utf8, without the fix the program still passes.
>
> So, any ideas?

Hi Arnold,
I don't recall how gawk uses dfa.c, so can't really guess.
Does the DFA matcher really accept those?  If so, maybe
gawk somehow manages to reset that dfa.c-internal variable
via some other code path?




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Sat, 02 Nov 2013 18:44:01 GMT) Full text and rfc822 format available.

Message #41 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: jim <at> meyering.net
Cc: 15773 <at> debbugs.gnu.org, mirraz1 <at> rambler.ru
Subject: Re: bug#15773: grep-2.15 bug report
Date: Sat, 02 Nov 2013 20:43:47 +0200
Hi.

> > Hi.
> >
> >> > The program below correctly produces no output, with and without the fix
> >> > in dfa.c:lex. (I have added the fix anyway.)
> >
> > Also with LC_ALL=en_US.utf8, without the fix the program still passes.
> >
> > So, any ideas?
>
> Hi Arnold,
> I don't recall how gawk uses dfa.c, so can't really guess.
> Does the DFA matcher really accept those?  If so, maybe
> gawk somehow manages to reset that dfa.c-internal variable
> via some other code path?

I will look in a debugger.

It's entirely possible that gawk is falling back to regex when
dfa fails.  In which case I should see an internal difference before
and after the fix.

Thanks

Arnold




Information forwarded to bug-grep <at> gnu.org:
bug#15773; Package grep. (Sat, 02 Nov 2013 19:15:02 GMT) Full text and rfc822 format available.

Message #44 received at 15773 <at> debbugs.gnu.org (full text, mbox):

From: Aharon Robbins <arnold <at> skeeve.com>
To: jim <at> meyering.net
Cc: 15773 <at> debbugs.gnu.org, mirraz1 <at> rambler.ru
Subject: Re: bug#15773: grep-2.15 bug report
Date: Sat, 02 Nov 2013 21:14:51 +0200
> > > Hi.
> > >
> > >> > The program below correctly produces no output, with and without the fix
> > >> > in dfa.c:lex. (I have added the fix anyway.)
> > >
> > > Also with LC_ALL=en_US.utf8, without the fix the program still passes.
> > >
> > > So, any ideas?
> >
> > Hi Arnold,
> > I don't recall how gawk uses dfa.c, so can't really guess.
> > Does the DFA matcher really accept those?  If so, maybe
> > gawk somehow manages to reset that dfa.c-internal variable
> > via some other code path?
>
> I will look in a debugger.
>
> It's entirely possible that gawk is falling back to regex when
> dfa fails.  In which case I should see an internal difference before
> and after the fix.

	Hoist by me own petard.
		-- Popeye

Indeed, this is what was happening. dfa would fail and then gawk
would fall back to regex, which would succeed. With the patch
dfa succeeds and regex is bypassed.

But the test is worth having anyway.

Much thanks,

Arnold




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sat, 08 Mar 2014 18:10:02 GMT) Full text and rfc822 format available.

Notification sent to Mirraz Mirraz <mirraz1 <at> rambler.ru>:
bug acknowledged by developer. (Sat, 08 Mar 2014 18:10:03 GMT) Full text and rfc822 format available.

Message #49 received at 15773-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 15773-done <at> debbugs.gnu.org
Subject: Re:  grep-2.15 bug report
Date: Sat, 08 Mar 2014 10:09:27 -0800
Closing this bug as it's fixed in 'grep' now.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 06 Apr 2014 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 134 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.