GNU bug report logs - #23361
【Bug】bug report of GNU grep

Previous Next

Package: grep;

Reported by: 谢敬锋 <xiejingf <at> 139.com>

Date: Sun, 24 Apr 2016 17:27:01 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 23361 in the body.
You can then email your comments to 23361 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#23361; Package grep. (Sun, 24 Apr 2016 17:27:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to 谢敬锋 <xiejingf <at> 139.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sun, 24 Apr 2016 17:27:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 谢敬锋 <xiejingf <at> 139.com>
To: bug-grep <bug-grep <at> gnu.org>
Subject: 【Bug】bug report of GNU grep
Date: Sun, 24 Apr 2016 22:04:45 +0800 (CST)
[Message part 1 (text/plain, inline)]
Hi all,

Suppose the file content is as below:

abc.h

hello world




the output of grep "*.h" file and grep -E "*.h file" are different, from my understanding, they should be the same, '*' is a regular expression meta-character. the output should both be abc.h.

Please help clarifying this issue!






[Message part 2 (text/html, inline)]

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 25 Apr 2016 15:30:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Mon, 25 Apr 2016 15:30:03 GMT) Full text and rfc822 format available.

Notification sent to 谢敬锋 <xiejingf <at> 139.com>:
bug acknowledged by developer. (Mon, 25 Apr 2016 15:30:03 GMT) Full text and rfc822 format available.

Message #12 received at 23361-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: 谢敬锋 <xiejingf <at> 139.com>, 23361-done <at> debbugs.gnu.org
Subject: Re: bug#23361: 【Bug】bug report of GNU grep
Date: Mon, 25 Apr 2016 09:29:01 -0600
[Message part 1 (text/plain, inline)]
tag 23361 notabug
thanks

On 04/24/2016 08:04 AM, 谢敬锋 wrote:
> 
> Hi all,
> 
> Suppose the file content is as below:
> 
> abc.h
> 
> hello world
> 
> 
> 
> 
> the output of grep "*.h" file and grep -E "*.h file" are different,

Correct, and this is not a bug.

POSIX defines two different flavors of regular expressions: basic (when
you use 'grep' without -E) and extended ('grep -E'):

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

> '*' is a regular expression meta-character.

But when it appears as the first character of a regular expression, it
had different meanings.  Read what POSIX says:

For a BRE, in 9.3.3:
"
*
    The <asterisk> shall be special except when used:

        In a bracket expression

        As the first character of an entire BRE (after an initial '^',
if any)
"

which means that as written,
grep "*.h" file

is looking for a LITERAL star character followed by the '.'
metacharacter for any character followed by a literal 'h'.  Your example
file did not contain that pattern.

For an ERE, in 9.4.3:

"
*+?{
    The <asterisk>, <plus-sign>, <question-mark>, and <left-brace> shall
be special except when used in a bracket expression (see RE Bracket
Expression). Any of the following uses produce undefined results:

        If these characters appear first in an ERE, or immediately
following a <vertical-line>, <circumflex>, or <left-parenthesis>
"

which means you have undefined results according to POSIX, and therefore
we can make it mean whatever we want, including ignoring the invalid
"*", and searching for the regular expression ".h" instead.  Which
explains why:

grep -E "*.h" file

has a match, and adding --color shows that the matching portion is the
".h" portion of the "abc.h" line.

> 
> Please help clarifying this issue!

Maybe you are confusing globs (where "*.h" matches "abc.h" because the
'.' is a literal character, and the "*" means "one or more characters")
with regular expressions (where "." means "any character", and "*" means
"zero or more repetitions of the previous regex construct, unless there
is no previous regex construct, in which case it is well-defined for BRE
but undefined for ERE").

At any rate, this is not a bug in grep, so I'm closing the bug report.
But feel free to add further comments or questions on this thread.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 24 May 2016 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 30 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.