GNU bug report logs - #20657
Traditional range expression not accepted in regex/dfa

Previous Next

Package: grep;

Reported by: arnold <at> skeeve.com

Date: Tue, 26 May 2015 02:43:02 UTC

Severity: wishlist

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#20657: closed (Traditional range expression not accepted in
 regex/dfa)
Date: Fri, 22 Apr 2022 02:10:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Thu, 21 Apr 2022 19:08:55 -0700
with message-id <89b7650f-bb7a-04d8-128c-e9d4977ed566 <at> cs.ucla.edu>
and subject line Re: Accepting [xyz---abc] - three minus signs to mean one
has caused the debbugs.gnu.org bug report #20657,
regarding Traditional range expression not accepted in regex/dfa
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
20657: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=20657
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: arnold <at> skeeve.com
To: bug-grep <at> gnu.org
Subject: Traditional range expression not accepted in regex/dfa
Date: Tue, 26 May 2015 05:42:19 +0300
Hi.

I received a bug report for gawk by private email that a regexp of
this form: '[^0-9---]' wasn't accepted.  The bugaboo here is the "---"; it's
a range expression consisting of minus through minus, and apparently long
ago was how one got a minus into a bracket expression.

This can be seen in current grep also:

	$ ./src/grep --version
	./src/grep (GNU grep) 2.21
	Copyright (C) 2014 Free Software Foundation, Inc.
	...

	$ ./src/grep '[^0-9---]' /dev/null
	./src/grep: Invalid range end

The underlying regex and, I believe, dfa routines don't accept this.
Fixing either of them is beyond my skill range, so I thought I'd
pass this one upstream to you folks.

Thanks!

Arnold


[Message part 3 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Arnold Robbins <arnold <at> skeeve.com>
Cc: bug-gnulib <at> gnu.org, 20657-done <at> debbugs.gnu.org, beebe <at> math.utah.edu
Subject: Re: Accepting [xyz---abc] - three minus signs to mean one
Date: Thu, 21 Apr 2022 19:08:55 -0700
[Message part 4 (text/plain, inline)]
On 4/21/22 00:57, Arnold Robbins wrote:

> As far as my testing indicates, dfa.c doesn't need a patch, it seems
> to accept "---" inside brackets for a single minus.

Yes, a brief perusal of the dfa.c source code suggests you're right. 
Thanks for looking into this. I tend to agree with you that POSIX is not 
likely to outlaw this extension.


> If there are no objections, can we get this into Gnulib?

Although the basic idea looks good, I see a few places where the patch 
can be improved.

* The two calls to re_string_peek_byte might go past the end of the 
pattern (a subscript violation). This is possible because the pattern is 
not necessarily null-terminated.

* The two calls to re_string_fetch_byte can be simplified into a single 
call to re_string_skip_bytes.

* No need to assign to token->opr.c, as it already has the correct value.

* Can fall through to the default case to save a bit of duplicate code.

* glibc still uses comments /* like this */ for style reasons, and we 
should stick to that.

I wrote a patch with these improvements in mind and installed it into 
Gnulib (see attached); hope it works for Gawk too.
[0001-regex-match-.-.-like-V7-grep.patch (text/x-patch, attachment)]

This bug report was last modified 3 years and 33 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.