GNU bug report logs - #17025
[PATCH] grep: matching line-by-line with regex

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Mon, 17 Mar 2014 14:50:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17025 in the body.
You can then email your comments to 17025 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#17025; Package grep. (Mon, 17 Mar 2014 14:50:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 17 Mar 2014 14:50:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: submit <at> debbugs.gnu.org
Subject: [PATCH] grep: matching line-by-line with regex
Date: Mon, 17 Mar 2014 23:49:20 +0900
[Message part 1 (text/plain, inline)]
Package: grep
Tags: patch

I ran following test, which used the regex enging in non-UTF8 locale.

$ yes abcd.abc | head -10000 > m
$ time -p env LC_ALL=ja_JP.eucJP src/grep abcd.abd m
real 7.28
user 6.36
sys 0.57

It's extremally slow.  When regex engine is used in grep, a text is
splitted by line.  However all of buffer is passed to re_search and
re_match.  I seem that it's wrong.

Norihiro
[patch1.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#17025; Package grep. (Tue, 01 Apr 2014 09:11:01 GMT) Full text and rfc822 format available.

Message #8 received at 17025 <at> debbugs.gnu.org (full text, mbox):

From: Paolo Bonzini <bonzini <at> gnu.org>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17025 <at> debbugs.gnu.org
Subject: Re: bug#17025: [PATCH] grep: matching line-by-line with regex
Date: Tue, 01 Apr 2014 11:10:38 +0200
Il 17/03/2014 15:49, Norihiro Tanaka ha scritto:
> Package: grep
> Tags: patch
>
> I ran following test, which used the regex enging in non-UTF8 locale.
>
> $ yes abcd.abc | head -10000 > m
> $ time -p env LC_ALL=ja_JP.eucJP src/grep abcd.abd m
> real 7.28
> user 6.36
> sys 0.57
>
> It's extremally slow.  When regex engine is used in grep, a text is
> splitted by line.  However all of buffer is passed to re_search and
> re_match.  I seem that it's wrong.

Yes, very good catch.

It's likely that the old bytecode matcher didn't care, but the new one 
in glibc has to process even the "ignored" part of the buffer to find 
the boundaries of multibyte characters.

Paolo





Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 06 Apr 2014 05:28:02 GMT) Full text and rfc822 format available.

Notification sent to Norihiro Tanaka <noritnk <at> kcn.ne.jp>:
bug acknowledged by developer. (Sun, 06 Apr 2014 05:28:03 GMT) Full text and rfc822 format available.

Message #13 received at 17025-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>, 17025-done <at> debbugs.gnu.org
Cc: Paolo Bonzini <bonzini <at> gnu.org>, 17156 <at> debbugs.gnu.org
Subject: Re: bug#17025: [PATCH] grep: matching line-by-line with regex
Date: Sat, 05 Apr 2014 22:26:56 -0700
Thanks for this bug report and patch.  Paolo wrote it up in 
<http://bugs.gnu.org/17156#14>, and I installed it into the savannah 
grep master and am marking Bug#17025 as done.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 04 May 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 109 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.