GNU bug report logs - #16912
[PATCH] no longer use CSET for non-UTF8 locale in DFA engine

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 1 Mar 2014 09:49:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine
Date: Sat, 01 Mar 2014 18:48:22 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

I have overlooked the important thing about optimization by
trivial_case_ignore.  After optimization by trivial_case_ignore,
kwset engine can be used yet.  However, if remove trivial_case_ignore,
it's never used longer because kwsmusts does nothing when MB_CUR_MAX > 1
&& match_icase.

The patch reverts removal of trivial_case_ignore and fixes 200x slower
for non-UTF8 locales with another approach.  It always prefers CSET to
replacement to OR and no longer use CSET for non-UTF8 locales in DFA
engine.

It can also optimize by trivial_case_ignore and enables to speed-up >20x
for non-UTF8 locales. (I tested it with euc-jp)

Norihiro
[patch.txt (application/octet-stream, attachment)]
[tests.txt (application/octet-stream, attachment)]

This bug report was last modified 11 years and 78 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.