GNU bug report logs - #18454
Improve performance when -P (PCRE) is used in UTF-8 locales

Previous Next

Package: grep;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Fri, 12 Sep 2014 01:26:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #173 received at 18454 <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Vincent Lefevre <vincent <at> vinc17.net>, 18454 <at> debbugs.gnu.org
Subject: Re: bug#18454: Improve performance when -P (PCRE) is used in UTF-8
 locales
Date: Sat, 20 Dec 2014 11:57:39 +0900
On Fri, 19 Dec 2014 18:31:05 -0800
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> If mbrlen does the right thing, grep and sed should do the right thing.

mbrlen() already does the right thing.  So, perhaps, they depend on
behavior of regex.  Even if so, I think that they should also be fixed
in the C library.

cat <<EOF |
#include <stdio.h>
#include <stdlib.h>
#include <wchar.h>
#include <locale.h>

int
main ()
{
  setlocale (LC_ALL, "");
  mbstate_t mbs = { 0 };
  char s[] = { 0xED, 0xA0, 0xBF };
  size_t len = mbrlen (s, 3, &mbs);
  printf ("mbrlen = %d\n", len);
  exit (EXIT_SUCCESS);
}
EOF
gcc -xc - && ./a.out





This bug report was last modified 3 years and 181 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.