GNU bug report logs - #17027
[PATCH] grep: prefer regex to DFA for ANYCHAR in non-UTF8 locales

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 17 Mar 2014 15:02:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#17027: closed (Re: bug#17027: [PATCH] grep: prefer regex to
 DFA for ANYCHAR in non-UTF8 locales)
Date: Tue, 08 Apr 2014 04:09:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#17027: [PATCH] grep: prefer regex to DFA for ANYCHAR in non-UTF8 locales

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 17027 <at> debbugs.gnu.org.

-- 
17027: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=17027
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 17027-done <at> debbugs.gnu.org
Subject: Re: bug#17027: [PATCH] grep: prefer regex to DFA for ANYCHAR in
 non-UTF8 locales
Date: Mon, 07 Apr 2014 21:08:44 -0700
Thanks for this patch too.  I pushed it into the savannah git master, 
with a slightly different commit message.

[Message part 3 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] grep: prefer regex to DFA for ANYCHAR in non-UTF8 locales
Date: Tue, 18 Mar 2014 00:01:05 +0900
[Message part 4 (text/plain, inline)]
Package: grep
Tags: patch

When ANYCHAR is included in a pattern in non-UTF8 locales, grep prefer
to DFA engine to regex's.  However, as long as I tested, even after have
applied Patch#17025, regex engine is slower than DFA's for ANYCHAR in
non-UTF8 locales.

This patch prefers regex to DFA for ANYCHAR in non-UTF8 locales.

Create the text.

$ yes abcd.abc | head -1000000 > m

I tested below before applying it.

$ time -p env LC_ALL=ja_JP.eucJP src/grep abcd.abd m
real 1.99
user 1.75
sys 0.28

I re-tested after applying it.

$ time -p env LC_ALL=ja_JP.eucJP src/grep abcd.abd m
real 1.21
user 0.71
sys 0.46

Norihiro
[patch2.txt (text/plain, attachment)]

This bug report was last modified 11 years and 125 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.