GNU bug report logs - #16912
[PATCH] no longer use CSET for non-UTF8 locale in DFA engine

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 1 Mar 2014 09:49:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 16912 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 16912 <at> debbugs.gnu.org
Subject: Re: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA
 engine
Date: Sun, 02 Mar 2014 10:23:28 +0900
Hi Paul

Thank you for checking the patch.

> First, why does the first patch add those four using_utf8 calls to
> parse_bracket_exp?  Isn't that optimization valid regardless of
> whether the multibyte encoding is UTF-8?

The optimization which MBCSET is changed into CSET in addtok is completed
on UTF8 locale only, because even if work_mbc->cset is defined in non-UTF8
locales, it's treated as not CSET but MBCSET.  So if not CSET to replacement
to OR, dfa will keep MBCSET until last and return backref.  I want to
avoid it.

However I don't understand why the optimization isn't completed on
non-UTF8 locale only.  Can you explain it?

Norihiro





This bug report was last modified 11 years and 78 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.