GNU bug report logs -
#15410
[:alnum:] is not [:alpha:] AND [:digit:]... [:alnum:] is [:alpha:] OR [:digit:]
Previous Next
Reported by: Nick Aganan <thesysad <at> gmail.com>
Date: Wed, 18 Sep 2013 16:13:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15410 in the body.
You can then email your comments to 15410 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#15410
; Package
grep
.
(Wed, 18 Sep 2013 16:13:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Nick Aganan <thesysad <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Wed, 18 Sep 2013 16:13:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
[:alnum:] is defined as
Alphanumeric characters: ‘[:alpha:]’ *and* ‘[:digit:]’; in the ‘C’ locale
and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.
AND = need to satisfy *BOTH* alpha and digit
OR = need to satisfy *EITHER* alpha or digit
It looks like ‘[:alpha:]’ *AND* ‘[:digit:]’ functions as ‘[:alpha:]’ *OR* ‘
[:digit:]’, See example
Example:
# cat /tmp/c
adc
x1y1z123
456
# grep [[:alpha:]] /tmp/c
adc
x1y1z123
# grep [[:digit:]] /tmp/c
x1y1z123
456
# grep [[:alnum:]] /tmp/c
adc
x1y1z123
456
### if [:alnum] functions as ‘[:alpha:]’ *AND* ‘[:digit:]’, it should show
x1y1z123 only
[Message part 2 (text/html, inline)]
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Wed, 18 Sep 2013 18:32:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Wed, 18 Sep 2013 18:32:03 GMT)
Full text and
rfc822 format available.
Notification sent
to
Nick Aganan <thesysad <at> gmail.com>
:
bug acknowledged by developer.
(Wed, 18 Sep 2013 18:32:03 GMT)
Full text and
rfc822 format available.
Message #12 received at 15410-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
tag 15410 notabug
thanks
On 09/18/2013 10:07 AM, Nick Aganan wrote:
> [:alnum:] is defined as
>
> Alphanumeric characters: ‘[:alpha:]’ *and* ‘[:digit:]’; in the ‘C’ locale
> and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.
This sense of "and" correctly means the combination, where characters
from either class satisfy the regex. Writing '[[:alnum:]]' is the same
as writing '[[:alpha:][:digit:]]'
> Example:
>
> # cat /tmp/c
>
> adc
>
> x1y1z123
>
> 456
[Your mailer is rather unconventional, and sticks lots of useless
whitespace into your content]
>
>
>
> # grep [[:alpha:]] /tmp/c
>
> adc
>
> x1y1z123
Whoops - you didn't quote your shell argument. I suspect you have some
single-character file names in your current directory (further bolstered
by the fact that you named your file /tmp/c, although it is not obvious
whether your current working directory is /tmp or elsewhere).
Therefore, you are falling victim to shell globbing.
Remember, if a file named 'a' exists in the current directory, then
unquoted [] expressions perform globs that might be replaced by that
file name:
$ touch a
$ echo '[[:alpha:]]' [[:alpha:]]
[[:alpha:]] a
You are NOT grepping for the char class "[[:alpha:]]", but for the
entirely different regex that matches the unfortunate file name
expansion of your glob. Use shell quotes properly, and you will then
see the desired answers. Or prepend 'echo' to your grep command to see
what arguments were actually being handed to grep.
Given that the problem is in your lack of shell quoting, and not in
grep, I'm closing this as not a bug. However, feel free to respond if
you have more comments.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15410
; Package
grep
.
(Wed, 18 Sep 2013 18:40:02 GMT)
Full text and
rfc822 format available.
Message #15 received at 15410-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 09/18/2013 12:31 PM, Eric Blake wrote:
>> [:alnum:] is defined as
>>
>> Alphanumeric characters: ‘[:alpha:]’ *and* ‘[:digit:]’; in the ‘C’ locale
>> and ASCII character encoding, this is the same as ‘[0-9A-Za-z]’.
>
> This sense of "and" correctly means the combination, where characters
> from either class satisfy the regex. Writing '[[:alnum:]]' is the same
> as writing '[[:alpha:][:digit:]]'
>
>
> Given that the problem is in your lack of shell quoting, and not in
> grep, I'm closing this as not a bug. However, feel free to respond if
> you have more comments.
>
Re-reading what I just wrote, I think I'd better add more, because it
may not just be a problem with shell globbing, but also a
misunderstanding on your part:
>>
>> ### if [:alnum] functions as ‘[:alpha:]’ *AND* ‘[:digit:]’, it should show
>> x1y1z123 only
In your sample, you specified a regex that matches exactly one byte. It
matches all three lines, because "a" (in the "adc" line) fits the alnum
category, "x" (in the "x1y1z123" line) fits the alnum category, and "4"
(in the "456" line) fits the alnum category. Again, it is NOT a regex
that specifies a multi-byte match, where the match has to include at
least one alpha byte and one digit byte, but a regex that specifies a
range of possible matching bytes, and the range includes both alpha and
digit bytes, but only one byte matches.
In just the same way, you can say that the regex "[ab]" matches both "a"
and "b"; or you can state that you will have a match if either "a" or
"b" is encountered; but it's all a matter of wording for which
conjunction feels most natural for the context you are using for
describing the matching.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#15410
; Package
grep
.
(Wed, 18 Sep 2013 20:15:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 15410 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 09/18/2013 01:56 PM, Nix wrote:
Please keep the list in the loop:
https://rwmj.wordpress.com/2010/11/08/want-help-dont-email-me-directly/
> The documentation is wrong then...
> When you say "AND" that means you need to satisfy "ALL" entity before it
> exit successfully.
> Using OR is fitted in your description. OR satisfies "ALL" or "EITHER" to
> have a successful exit.
>
> Bottom line, it is about the logic of using AND or OR.
It's about WHERE you are using the logic of AND or OR. Let's try again.
If I write:
[@[:alnum:]!]
then I have a bracket expression containing three elements: [:alnum:],
@, and !, which is the SAME as if I had written the bracket expression
[@[:alpha:][:digit:]!]
or even
[[:digit:]!@[:alpha:]]
Either way, I'm using [:alnum:] as shorthand instead of including both
[:alpha:] AND [:digit:]. Then, the resulting bracket expression matches
one byte that can come from a set of characters: @, !, alpha OR digit.
It is the outer bracket expression that is doing OR matching, while the
inner [:alnum:] element within the bracket expression is representing
AND-based combination shorthand of other possible bracket expression
elements.
Similarly, [@!] is a bracket expression containing @ AND ! as expression
elements, where the overall expression will then match @ OR !. But as @
matches only one possible character, rather than being shorthand for a
bunch of characters, you don't get as confused by that wording.
If you want to propose a documentation patch to make it clearer for the
next reader, then by all means do so.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 17 Oct 2013 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 304 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.