GNU bug report logs - #18762
[PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 18 Oct 2014 12:41:03 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#18762: closed (Re: bug#18762: [PATCH] dfa: don't consider
 RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression)
Date: Mon, 20 Oct 2014 01:26:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 18762 <at> debbugs.gnu.org.

-- 
18762: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18762
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18762-done <at> debbugs.gnu.org
Subject: Re: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and
 RE_DOT_NOT_NULL in matching with a bracket expression
Date: Sun, 19 Oct 2014 18:24:56 -0700
[Message part 3 (text/plain, inline)]
On Sat, Oct 18, 2014 at 7:07 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>> dfa.c's match_mb_charset function *is* used, e.g., in a
>> command like this one:
>>
>>   printf '\0' |src/grep -aE '^\s?$'
>
> Wow, just it isn't good.  I think that behavior of `fails' should be
> same as of `trans' except `fails' checks accepted conditions, including
> following part.  match_mb_charset() should be avoided as far as possible,
> as it doesn't support collating symbols and equivalence classes.
>
>>               /* Falling back to the glibc matcher in this case gives
>>                  better performance (up to 25% better on [a-z], for
>>                  example) and enables support for collating symbols and
>>                  equivalence classes.  */
>>               if (d->states[s].has_mbcset && backref)
>>                 {
>>                   *backref = 1;
>>                   goto done;
>>                 }

Nice change.  I've adjusted the commit log and added the test
above, since no other code even excercised the
now-inaccessible function. I will push it tomorrow.
[0001-dfa-process-all-MBCSET-constructs-via-glibc-s-matche.patch (application/octet-stream, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in
 matching with a bracket expression
Date: Sat, 18 Oct 2014 21:39:37 +0900
[Message part 6 (text/plain, inline)]
RE_DOT_NEW_LINE and NOT_NULL work for '.' only in regex.  OTOH, they
work for MBCSET in addition to '.' in DFA.  This patch adapts the behavior
of DFA to of regex.

BTW, at the moment, grep and gawk never use match_mb_charset function to
be fixed by it.
[0001-dfa-don-t-consider-RE_DOT_NEWLINE-and-RE_DOT_NOT_NUL.patch (text/plain, attachment)]

This bug report was last modified 10 years and 277 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.