GNU bug report logs -
#20657
Traditional range expression not accepted in regex/dfa
Previous Next
Reported by: arnold <at> skeeve.com
Date: Tue, 26 May 2015 02:43:02 UTC
Severity: wishlist
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Thu, 21 Apr 2022 19:08:55 -0700
with message-id <89b7650f-bb7a-04d8-128c-e9d4977ed566 <at> cs.ucla.edu>
and subject line Re: Accepting [xyz---abc] - three minus signs to mean one
has caused the debbugs.gnu.org bug report #20657,
regarding Traditional range expression not accepted in regex/dfa
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
20657: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20657
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
Hi.
I received a bug report for gawk by private email that a regexp of
this form: '[^0-9---]' wasn't accepted. The bugaboo here is the "---"; it's
a range expression consisting of minus through minus, and apparently long
ago was how one got a minus into a bracket expression.
This can be seen in current grep also:
$ ./src/grep --version
./src/grep (GNU grep) 2.21
Copyright (C) 2014 Free Software Foundation, Inc.
...
$ ./src/grep '[^0-9---]' /dev/null
./src/grep: Invalid range end
The underlying regex and, I believe, dfa routines don't accept this.
Fixing either of them is beyond my skill range, so I thought I'd
pass this one upstream to you folks.
Thanks!
Arnold
[Message part 3 (message/rfc822, inline)]
[Message part 4 (text/plain, inline)]
On 4/21/22 00:57, Arnold Robbins wrote:
> As far as my testing indicates, dfa.c doesn't need a patch, it seems
> to accept "---" inside brackets for a single minus.
Yes, a brief perusal of the dfa.c source code suggests you're right.
Thanks for looking into this. I tend to agree with you that POSIX is not
likely to outlaw this extension.
> If there are no objections, can we get this into Gnulib?
Although the basic idea looks good, I see a few places where the patch
can be improved.
* The two calls to re_string_peek_byte might go past the end of the
pattern (a subscript violation). This is possible because the pattern is
not necessarily null-terminated.
* The two calls to re_string_fetch_byte can be simplified into a single
call to re_string_skip_bytes.
* No need to assign to token->opr.c, as it already has the correct value.
* Can fall through to the default case to save a bit of duplicate code.
* glibc still uses comments /* like this */ for style reasons, and we
should stick to that.
I wrote a patch with these improvements in mind and installed it into
Gnulib (see attached); hope it works for Gawk too.
[0001-regex-match-.-.-like-V7-grep.patch (text/x-patch, attachment)]
This bug report was last modified 3 years and 33 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.