GNU bug report logs -
#5970
regex won't do lazy matching
Previous Next
Reported by: a g <mewalig <at> gmail.com>
Date: Mon, 19 Apr 2010 02:02:01 UTC
Severity: normal
Done: Bob Proulx <bob <at> proulx.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 5970 in the body.
You can then email your comments to 5970 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 19 Apr 2010 02:02:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
a g <mewalig <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 19 Apr 2010 02:02:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
This may be a usage problem, but it does not exist with other regex packages
(such as slre) and I can't find anything in the documentation to indicate
that the syntax should be different for coreutils. I am using coreutils 8.4
on ubuntu AMD64, version 9.10. I cannot get the coreutils regex matcher to
do lazy matching. Here is my code:
/**** regex_test.cpp ***/
#include <stdio.h>
#include <stdlib.h>
#include "xalloc.h"
#include "regex.h"
// compile:
// gcc -I coreutils-8.4/lib/ -c regex_test.cpp
// g++ -o regex_test regex_test.o coreutils-8.4/lib/xmalloc.o
coreutils-8.4/lib/xalloc-die.o coreutils-8.4/lib/exitfail.o
coreutils-8.4/lib/regex.o
void print_regerror (int errcode, regex_t *compiled)
{
size_t length = regerror (errcode, compiled, NULL, 0);
char *buffer = (char *)xmalloc (length);
if(!buffer) printf("error: regerror malloc failed!\n");
else {
(void) regerror (errcode, compiled, buffer, length);
printf("error: %s\n", buffer);
free(buffer);
}
}
int main(int argc, char *argv[]){
if(argc < 3) printf("usage: regex_test pattern string\n");
else {
regex_t rx;
int err;
if((err = regcomp(&rx, argv[1], REG_EXTENDED)))
print_regerror(err, &rx);
else {
regmatch_t matches[4];
if(!regexec(&rx, argv[2], 4, matches, 0)) {
int i;
printf("match! \n");
for(i = 0; i < 4; i++) {
if(matches[i].rm_so != -1)
printf(" s:%i, e:%i", matches[i].rm_so, matches[i].rm_eo);
}
printf("\n");
} else printf("match failed.\n");
regfree(&rx);
}
}
}
/********/
Here is the problem. If you execute:
regex_test "a[^x]*?a" "a1a2a"
then you get:
match!
s:0, e:5
But, you should get the same result as when you execute *regex_test
"a[^x]*?a" "a1a"*-- that is:
match!
s:0, e:3
Please advise. Thank you!
[Message part 2 (text/html, inline)]
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 19 Apr 2010 21:37:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 5970 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 04/18/2010 07:33 PM, a g wrote:
> This may be a usage problem, but it does not exist with other regex packages
> (such as slre) and I can't find anything in the documentation to indicate
> that the syntax should be different for coreutils. I am using coreutils 8.4
> on ubuntu AMD64, version 9.10. I cannot get the coreutils regex matcher to
> do lazy matching.
Thanks for the report. However, coreutils does not maintain regex code.
Rather, uses an upstream version from gnulib, which in turn borrows
from glibc. Perhaps the best course of action would be to try your test
app compiled against glibc, and if that still doesn't meet your needs,
then open a bug report against glibc. And if glibc works, then open a
bug report against gnulib that gnulib and glibc disagree.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 19 Apr 2010 22:02:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 5970 <at> debbugs.gnu.org (full text, mbox):
a g writes:
>
> This may be a usage problem, but it does not exist with other regex packages
> (such as slre) and I can't find anything in the documentation to indicate
> that the syntax should be different for coreutils. I am using coreutils 8.4
> on ubuntu AMD64, version 9.10. I cannot get the coreutils regex matcher to
> do lazy matching. Here is my code:
By "lazy" do you mean non-greedy?
> Here is the problem. If you execute:
> regex_test "a[^x]*?a" "a1a2a"
The non-greedy quantifiers like *? are not part of standard regex, they are
extensions found in perl, and in other packages inspired by perl.
--
Alan Curry
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 19 Apr 2010 22:29:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 5970 <at> debbugs.gnu.org (full text, mbox):
a g <mewalig <at> gmail.com> writes:
> regex_test "a[^x]*?a" "a1a2a"
<http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_06>:
9.4.6 EREs Matching Multiple Characters
[...]
The behavior of multiple adjacent duplication symbols ('+', '*', '?',
and intervals) produces undefined results.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Sat, 24 Apr 2010 20:35:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 5970 <at> debbugs.gnu.org (full text, mbox):
tags 5970 + feedback
thanks
a g wrote:
> This may be a usage problem, but it does not exist with other regex packages
> (such as slre) and I can't find anything in the documentation to indicate
> that the syntax should be different for coreutils. I am using coreutils 8.4
> on ubuntu AMD64, version 9.10. I cannot get the coreutils regex matcher to
> do lazy matching. Here is my code:
I read this and was somewhat confused by it. Could you clarify and
educate me as to your use? Coreutils is not a "regex package" in the
same way as pcre or slre. It does use regular expressions such as in
'expr'. So of course I wondered if you were using a command like
'expr' or were trying to extend coreutils with an additional command.
Also, since the bug-coreutils mailing list is attached to a bug
tracking system every message thread of discussion opens an issue
ticket in the bug tracker. I believe this issue has been resolved
satisfactorily by the subsequence responses. Do you agree?
Thanks,
Bob
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 26 Apr 2010 01:57:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 5970 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
thanks for everyone's help. I agree this can be closed, but not for the
reasons mentioned (though I appreciate them and they gave me the info I
needed to find the answer:
osdir.com/ml/lib.gnulib.bugs/2005-04/msg00027.html). Off to try emacs'
regex.c instead.
It would be nice if coreutils could offer the emacs regex.c (or some other
regex that supported non-greedy matching) at least as an additional library
if not the standard one... but, not my call...
On Sat, Apr 24, 2010 at 4:34 PM, Bob Proulx <bob <at> proulx.com> wrote:
> tags 5970 + feedback
> thanks
>
> a g wrote:
> > This may be a usage problem, but it does not exist with other regex
> packages
> > (such as slre) and I can't find anything in the documentation to indicate
> > that the syntax should be different for coreutils. I am using coreutils
> 8.4
> > on ubuntu AMD64, version 9.10. I cannot get the coreutils regex matcher
> to
> > do lazy matching. Here is my code:
>
> I read this and was somewhat confused by it. Could you clarify and
> educate me as to your use? Coreutils is not a "regex package" in the
> same way as pcre or slre. It does use regular expressions such as in
> 'expr'. So of course I wondered if you were using a command like
> 'expr' or were trying to extend coreutils with an additional command.
>
> Also, since the bug-coreutils mailing list is attached to a bug
> tracking system every message thread of discussion opens an issue
> ticket in the bug tracker. I believe this issue has been resolved
> satisfactorily by the subsequence responses. Do you agree?
>
> Thanks,
> Bob
>
[Message part 2 (text/html, inline)]
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 26 Apr 2010 09:37:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 5970 <at> debbugs.gnu.org (full text, mbox):
a g wrote:
> thanks for everyone's help. I agree this can be closed, but not for the
> reasons mentioned (though I appreciate them and they gave me the info I
> needed to find the answer:
> osdir.com/ml/lib.gnulib.bugs/2005-04/msg00027.html). Off to try emacs'
> regex.c instead.
>
> It would be nice if coreutils could offer the emacs regex.c (or some other
> regex that supported non-greedy matching) at least as an additional library
> if not the standard one... but, not my call...
BTW, why do you care what regex code coreutils uses?
Because of expr? Its use of regexp is tightly specified by POSIX,
so we cannot change it without a very good reason.
There are a few other coreutils programs that use regex.c functions,
but they are not used as frequently.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 26 Apr 2010 10:19:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 5970 <at> debbugs.gnu.org (full text, mbox):
Jim Meyering <jim <at> meyering.net> writes:
> Because of expr? Its use of regexp is tightly specified by POSIX,
> so we cannot change it without a very good reason.
In the OP's use of regexp POSIX defines nothing.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Reply sent
to
Bob Proulx <bob <at> proulx.com>
:
You have taken responsibility.
(Tue, 27 Apr 2010 22:31:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
a g <mewalig <at> gmail.com>
:
bug acknowledged by developer.
(Tue, 27 Apr 2010 22:31:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 5970-done <at> debbugs.gnu.org (full text, mbox):
a g wrote:
> thanks for everyone's help. I agree this can be closed, but not for the
> reasons mentioned (though I appreciate them and they gave me the info I
> needed to find the answer:
> osdir.com/ml/lib.gnulib.bugs/2005-04/msg00027.html). Off to try emacs'
> regex.c instead.
Okay. I will close the bug.
> It would be nice if coreutils could offer the emacs regex.c (or some other
> regex that supported non-greedy matching) at least as an additional library
> if not the standard one... but, not my call...
Though you did not answer my question as to your use of the regex
engine in coreutils I assume by this that you are trying to use it as
some type of general purpose library. That really isn't its intended
purpose. In coreutils it is there to support the regular expression
matching done in 'expr'. For a general purpose library items it would
be better if you were to use gnulib or pcre or one of the other
libraries that is intended to be used as a library.
Bob
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#5970
; Package
coreutils
.
(Mon, 03 May 2010 06:28:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 5970 <at> debbugs.gnu.org (full text, mbox):
Andreas Schwab wrote:
> a g <mewalig <at> gmail.com> writes:
>
>> regex_test "a[^x]*?a" "a1a2a"
>
> <http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04_06>:
>
> 9.4.6 EREs Matching Multiple Characters
> [...]
> The behavior of multiple adjacent duplication symbols ('+', '*', '?',
> and intervals) produces undefined results.
>
> Andreas.
---
Sorry, late to conversation, but reading email.
There was a time in QA that "undefined results" and bug was synonymous.
That a spec would say that is lame.
Personally, I think if it isn't compatible with the Perl Regex, it's
a bug, but that's purely informed personal bias. :-)
-l
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 31 May 2010 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 15 years and 24 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.