GNU bug report logs -
#18762
[PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression
Previous Next
Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Date: Sat, 18 Oct 2014 12:41:03 UTC
Severity: normal
Tags: patch
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18762 in the body.
You can then email your comments to 18762 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#18762
; Package
grep
.
(Sat, 18 Oct 2014 12:41:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sat, 18 Oct 2014 12:41:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
RE_DOT_NEW_LINE and NOT_NULL work for '.' only in regex. OTOH, they
work for MBCSET in addition to '.' in DFA. This patch adapts the behavior
of DFA to of regex.
BTW, at the moment, grep and gawk never use match_mb_charset function to
be fixed by it.
[0001-dfa-don-t-consider-RE_DOT_NEWLINE-and-RE_DOT_NOT_NUL.patch (text/plain, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#18762
; Package
grep
.
(Sat, 18 Oct 2014 17:07:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 18762 <at> debbugs.gnu.org (full text, mbox):
On Sat, Oct 18, 2014 at 5:39 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> RE_DOT_NEW_LINE and NOT_NULL work for '.' only in regex. OTOH, they
> work for MBCSET in addition to '.' in DFA. This patch adapts the behavior
> of DFA to of regex.
>
> BTW, at the moment, grep and gawk never use match_mb_charset function to
> be fixed by it.
Thank you for the patch. It is clearly correct.
However, it presents a puzzle: does your patch induce any semantic
change in grep?
I.e., is this a bug fix, or simply the removal of code that would have
no effect.
So far, I have been unable to construct a case for which it induces a
semantic change.
On the other hand, this does eliminate a few comparisons,
so there may be a small performance improvement.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#18762
; Package
grep
.
(Sat, 18 Oct 2014 23:31:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 18762 <at> debbugs.gnu.org (full text, mbox):
Thanks for the review.
This is a potential bug fix. However, match_mb_charset function isn't
used in grep and gawk, as DFA treats MBCSET as BACKREF by following code
if `backref' is provided. Therefore the fix never induces any semantic
change in grep and gawk.
if (d->states[s].has_mbcset && backref)
{
*backref = 1;
goto done;
}
Essentially, the function is able to be removed. However, if we regard
DFA as a library, we should keep it.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#18762
; Package
grep
.
(Sun, 19 Oct 2014 00:17:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 18762 <at> debbugs.gnu.org (full text, mbox):
On Sat, Oct 18, 2014 at 4:30 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Thanks for the review.
>
> This is a potential bug fix. However, match_mb_charset function isn't
> used in grep and gawk, as DFA treats MBCSET as BACKREF by following code
> if `backref' is provided.
dfa.c's match_mb_charset function *is* used, e.g., in a
command like this one:
printf '\0' |src/grep -aE '^\s?$'
However, as I mentioned, so far I have been unable to
construct a combination of syntax_bits settings and input/RE pairs
that induces a change in behavior.
> Therefore the fix never induces any semantic
> change in grep and gawk.
>
> if (d->states[s].has_mbcset && backref)
> {
> *backref = 1;
> goto done;
> }
>
> Essentially, the function is able to be removed. However, if we regard
> DFA as a library, we should keep it.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#18762
; Package
grep
.
(Sun, 19 Oct 2014 02:09:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 18762 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Jim Meyering <jim <at> meyering.net> wrote:
> dfa.c's match_mb_charset function *is* used, e.g., in a
> command like this one:
>
> printf '\0' |src/grep -aE '^\s?$'
Wow, just it isn't good. I think that behavior of `fails' should be
same as of `trans' except `fails' checks accepted conditions, including
following part. match_mb_charset() should be avoided as far as possible,
as it doesn't support collating symbols and equivalence classes.
> /* Falling back to the glibc matcher in this case gives
> better performance (up to 25% better on [a-z], for
> example) and enables support for collating symbols and
> equivalence classes. */
> if (d->states[s].has_mbcset && backref)
> {
> *backref = 1;
> goto done;
> }
[0001-dfa-fall-MBCSET-back-to-the-glibc-matcher-in-transit.patch (text/plain, attachment)]
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Mon, 20 Oct 2014 01:26:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Norihiro Tanaka <noritnk <at> kcn.ne.jp>
:
bug acknowledged by developer.
(Mon, 20 Oct 2014 01:26:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 18762-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sat, Oct 18, 2014 at 7:07 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> Jim Meyering <jim <at> meyering.net> wrote:
>> dfa.c's match_mb_charset function *is* used, e.g., in a
>> command like this one:
>>
>> printf '\0' |src/grep -aE '^\s?$'
>
> Wow, just it isn't good. I think that behavior of `fails' should be
> same as of `trans' except `fails' checks accepted conditions, including
> following part. match_mb_charset() should be avoided as far as possible,
> as it doesn't support collating symbols and equivalence classes.
>
>> /* Falling back to the glibc matcher in this case gives
>> better performance (up to 25% better on [a-z], for
>> example) and enables support for collating symbols and
>> equivalence classes. */
>> if (d->states[s].has_mbcset && backref)
>> {
>> *backref = 1;
>> goto done;
>> }
Nice change. I've adjusted the commit log and added the test
above, since no other code even excercised the
now-inaccessible function. I will push it tomorrow.
[0001-dfa-process-all-MBCSET-constructs-via-glibc-s-matche.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#18762
; Package
grep
.
(Mon, 20 Oct 2014 03:17:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 18762-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sun, Oct 19, 2014 at 6:24 PM, Jim Meyering <jim <at> meyering.net> wrote:
> On Sat, Oct 18, 2014 at 7:07 PM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
>> Jim Meyering <jim <at> meyering.net> wrote:
>>> dfa.c's match_mb_charset function *is* used, e.g., in a
>>> command like this one:
>>>
>>> printf '\0' |src/grep -aE '^\s?$'
>>
>> Wow, just it isn't good. I think that behavior of `fails' should be
>> same as of `trans' except `fails' checks accepted conditions, including
>> following part. match_mb_charset() should be avoided as far as possible,
>> as it doesn't support collating symbols and equivalence classes.
>>
>>> /* Falling back to the glibc matcher in this case gives
>>> better performance (up to 25% better on [a-z], for
>>> example) and enables support for collating symbols and
>>> equivalence classes. */
>>> if (d->states[s].has_mbcset && backref)
>>> {
>>> *backref = 1;
>>> goto done;
>>> }
>
> Nice change. I've adjusted the commit log and added the test
> above, since no other code even excercised the
> now-inaccessible function. I will push it tomorrow.
By the way, I've also adjusted your preceding patch (see attached),
and will push it tomorrow, too.
[0001-dfa-remove-two-erroneous-clauses-from-a-now-unused-f.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#18762
; Package
grep
.
(Mon, 20 Oct 2014 14:16:01 GMT)
Full text and
rfc822 format available.
Message #28 received at 18762-done <at> debbugs.gnu.org (full text, mbox):
Jim Meyering <jim <at> meyering.net> wrote:
> > Nice change. I've adjusted the commit log and added the test
> > above, since no other code even excercised the
> > now-inaccessible function. I will push it tomorrow.
>
> By the way, I've also adjusted your preceding patch (see attached),
> and will push it tomorrow, too.
Thanks for the adjustments and addition of the test.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 18 Nov 2014 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 276 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.