GNU bug report logs - #24858
URGENT: Question about grep

Previous Next

Package: grep;

Reported by: Greta <romano.greta <at> gmail.com>

Date: Wed, 2 Nov 2016 15:35:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log

Message #21 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Jackson <pj <at> usa.net>
To: bug-grep <at> gnu.org
Subject: Re: bug#24858: URGENT: Question about grep
Date: Wed, 02 Nov 2016 12:24:58 -0500

Greta asked:
>> So what I have to add in grep command to put the limit of 30 characters?

Eric replied:
>> You can't do it with grep. 

Bruce suggested:
>> cut -c 30 filename | grep ACGTAC

Using the following grep command seems to work for me, and is about
40% faster, in terms of user CPU time spent, on my system, using a large
dataset I have (some web server logs)  than using cut and grep in a pipeline,
as the extra CPU cost of the more complex grep expression is more than
compensated for by the reduced copying of the datastream:

grep -E '^.{0,30}GTGTCA

===

A custom C program could make this dramatically faster, especially if:

it avoided using stdio or any other form of line buffering that copied
each line of data within the application,

it used raw read(2) calls,

it used strchr(3) calls to scan to the end of the current line (hence the start
of the next line), and

it used a mix of strchr and unaligned word compares, say of the 4 bytes
"ACGT", then the 2 bytes "AC",  which can be done on CPU's supporting
unaligned word compares.

Finding a programmer who can code that might be difficult, and
such optimization would only make sense if you're burning lots of
CPU time or project time, on this particular scan.

-- 
                Paul Jackson
                pj <at> usa.net

This bug report was last modified 8 years and 255 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #24858 URGENT: Question about grep

GNU bug report logs - #24858
URGENT: Question about grep