GNU bug report logs - #16481
dfa.c and Rational Range Interpretation

Previous Next

Package: grep;

Reported by: Aharon Robbins <arnold <at> skeeve.com>

Date: Fri, 17 Jan 2014 13:41:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Paolo Bonzini <bonzini <at> gnu.org>, arnold <at> skeeve.com, 16481 <at> debbugs.gnu.org
Subject: bug#16481: dfa.c and Rational Range Interpretation
Date: Mon, 10 Feb 2014 11:50:07 -0800
On 02/10/2014 01:18 AM, Paolo Bonzini wrote:
>
> If you use --with-included-regex, the patch is a no-op.

Are we talking about the patch in git commit 
1078b64302bbf5c0a46635772808ff7f75171dbc 
<http://git.savannah.gnu.org/cgit/grep.git/commit/?id=1078b64302bbf5c0a46635772808ff7f75171dbc>?

If so, then the above comment doesn't sound right.  Without the patch, 
the DFA matcher mishandles expressionsin some cases, as described in 
Bug#16481.  For example, "grep -Xawk '[\[-\]]'" will cause dfa.c to try 
to compile the regular expression [[-]], which won't workregardless of 
whether --with-included-regex is being used.

More generally, we already had the problem of subtle differences between 
dfa.c and full-regexp matching on platforms that do not observe RRI, 
because dfa.c already uses RRI in multibyte locales, regardless of 
whether the full matcher uses RRI.  The change causes non-"C" unibyte 
locales to behave consistently with multibyte locales, which in some 
sense is an improvement (though obviously not ideal; it'd be better if 
it was RRI everywhere).

Non-"C" unibyte locales are dying out, so to some extent this is a minor 
issue.  In practice most users these days won't notice or care about 
this change.




This bug report was last modified 11 years and 132 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.