GNU bug report logs -
#16481
dfa.c and Rational Range Interpretation
Previous Next
Reported by: Aharon Robbins <arnold <at> skeeve.com>
Date: Fri, 17 Jan 2014 13:41:01 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Il 10/02/2014 20:50, Paul Eggert ha scritto:
>
> If so, then the above comment doesn't sound right. Without the patch,
> the DFA matcher mishandles expressionsin some cases, as described in
> Bug#16481. For example, "grep -Xawk '[\[-\]]'" will cause dfa.c to try
> to compile the regular expression [[-]], which won't workregardless of
> whether --with-included-regex is being used.
Ok, so there is a real bug. But it is not immediately obvious what the
problem is, and the bug has (AFAICS) no test case and no mention in the
commit message. Without this, I am not sure that the fix should not be
the one in this commit.
> More generally, we already had the problem of subtle differences between
> dfa.c and full-regexp matching on platforms that do not observe RRI,
> because dfa.c already uses RRI in multibyte locales, regardless of
> whether the full matcher uses RRI.
It only does so if the fallback to regex is not requested (dfaexec
invoked with backref = NULL). This is never the case for grep. In
fact, as far as I know it is never the case, and I've been tempted many
times to completely remove the mostly dead code dealing with multibyte
ranges if backref = NULL.
> The change causes non-"C" unibyte
> locales to behave consistently with multibyte locales, which in some
> sense is an improvement (though obviously not ideal; it'd be better if
> it was RRI everywhere).
It would be if glibc were fixed. For me, consistency with other GNU
utilities---especially sed---trumps anything else, and this was the main
point in fixing multibyte matching in GNU grep 2.6 and newer.
> Non-"C" unibyte locales are dying out, so to some extent this is a minor
> issue. In practice most users these days won't notice or care about
> this change.
That's true.
Paolo
This bug report was last modified 11 years and 132 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.