GNU bug report logs -
#24858
URGENT: Question about grep
Previous Next
Reported by: Greta <romano.greta <at> gmail.com>
Date: Wed, 2 Nov 2016 15:35:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
Message #21 received at submit <at> debbugs.gnu.org (full text, mbox):
Greta asked:
>> So what I have to add in grep command to put the limit of 30 characters?
Eric replied:
>> You can't do it with grep.
Bruce suggested:
>> cut -c 30 filename | grep ACGTAC
Using the following grep command seems to work for me, and is about
40% faster, in terms of user CPU time spent, on my system, using a large
dataset I have (some web server logs) than using cut and grep in a pipeline,
as the extra CPU cost of the more complex grep expression is more than
compensated for by the reduced copying of the datastream:
grep -E '^.{0,30}GTGTCA
===
A custom C program could make this dramatically faster, especially if:
it avoided using stdio or any other form of line buffering that copied
each line of data within the application,
it used raw read(2) calls,
it used strchr(3) calls to scan to the end of the current line (hence the start
of the next line), and
it used a mix of strchr and unaligned word compares, say of the 4 bytes
"ACGT", then the 2 bytes "AC", which can be done on CPU's supporting
unaligned word compares.
Finding a programmer who can code that might be difficult, and
such optimization would only make sense if you're burning lots of
CPU time or project time, on this particular scan.
--
Paul Jackson
pj <at> usa.net
This bug report was last modified 8 years and 255 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.