GNU bug report logs - #16232
[PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Mon, 23 Dec 2013 22:40:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log

View this message in rfc822 format

From: Pádraig Brady <P <at> draigBrady.com>
To: 16232 <at> debbugs.gnu.org, Jim Meyering <jim <at> meyering.net>
Subject: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Date: Sat, 11 Jan 2014 01:49:30 +0000

Cool so it does this transformation:

  sed 's/./[\L&\U&]/g'

Though multi byte case handling has all sorts of edge cases (pardon the pun),
and it may not be always valid to treat each character independently?
For example see some of the tests in:
http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/unicase/test-ulc-casecmp.c;hb=HEAD

I wonder might this faster path be restricted to a safer but very common input subset of:

(MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80))

Also are the following printfs in the test redundant?

> +data=$(      printf "I:$I $i:i")
> +search_str=$(printf "$i:i I:$I")

nice improvement!
Pádraig.

This bug report was last modified 11 years and 82 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16232 [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales

GNU bug report logs - #16232
[PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales