GNU bug report logs - #16912
[PATCH] no longer use CSET for non-UTF8 locale in DFA engine

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 1 Mar 2014 09:49:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#16912: closed (Re: bug#16912: [PATCH] no longer use CSET for
 non-UTF8 locale in DFA engine)
Date: Mon, 03 Mar 2014 07:08:03 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 16912 <at> debbugs.gnu.org.

-- 
16912: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=16912
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 16912-done <at> debbugs.gnu.org
Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in
 DFA engine
Date: Sun, 02 Mar 2014 23:07:41 -0800
[Message part 3 (text/plain, inline)]
Thanks, I tweaked the ChangeLog entries a bit and pushed that.  I also 
pushed the attached patch, which fixes some new bugs and some bugs that 
were reintroduced by the revival of trivial_case_ignore.  I wish we 
didn't need that function, as it is a bit of a kludge.


[0001-grep-fix-some-unlikely-bugs-in-trivial_case_ignore.patch (text/plain, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine
Date: Sat, 01 Mar 2014 18:48:22 +0900
[Message part 6 (text/plain, inline)]
Package: grep
Tags: patch

I have overlooked the important thing about optimization by
trivial_case_ignore.  After optimization by trivial_case_ignore,
kwset engine can be used yet.  However, if remove trivial_case_ignore,
it's never used longer because kwsmusts does nothing when MB_CUR_MAX > 1
&& match_icase.

The patch reverts removal of trivial_case_ignore and fixes 200x slower
for non-UTF8 locales with another approach.  It always prefers CSET to
replacement to OR and no longer use CSET for non-UTF8 locales in DFA
engine.

It can also optimize by trivial_case_ignore and enables to speed-up >20x
for non-UTF8 locales. (I tested it with euc-jp)

Norihiro
[patch.txt (application/octet-stream, attachment)]
[tests.txt (application/octet-stream, attachment)]

This bug report was last modified 11 years and 106 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.