GNU bug report logs - #20768
RFC: Multithreaded grep

Previous Next

Package: grep;

Reported by: Zev Weiss <zev <at> bewilderbeest.net>

Date: Mon, 8 Jun 2015 05:33:01 UTC

Severity: normal

Tags: patch

Full log


View this message in rfc822 format

From: Zev Weiss <zev <at> bewilderbeest.net>
To: Aaron Crane <grep <at> aaroncrane.co.uk>
Cc: 20768 <at> debbugs.gnu.org
Subject: bug#20768: RFC: Multithreaded grep
Date: Tue, 9 Jun 2015 14:41:32 -0500
On Tue, Jun 09, 2015 at 12:04:11PM +0100, Aaron Crane wrote:
>Zev Weiss <zev <at> bewilderbeest.net> wrote:
>> Hmm -- I picked --parallel largely for consistency with the corresponding
>> flag for coreutils' sort, which strikes me as a closer relative to grep than
>> either make or parallel.
>
>That's a good point; I wasn't aware of sort's --parallel option.
>Though I also note that "sort --parallel=4" limits the number of
>threads to 4, rather than increasing the number of threads from 1 to
>4, so the comparison isn't exact.
>
>> sort doesn't
>> have a matching short option though, so I went with -M to suggest
>> "mulithreaded" (since, as you point out, -P is already in use).  Though I
>> notice now that lower-case -p is still available; perhaps that might be
>> better than -M.
>
>I'm a little unhappy about the idea of proliferating the world's set
>of short options in this space, to be honest. If grep didn't already
>have -P, I'd be happy enough with -P and either --parallel or
>--max-procs, but I'm not terribly fond of the idea of introducing
>either -M or -p.
>
>-- 
>Aaron Crane ** http://aaroncrane.co.uk/

True, I suppose that's a reasonable concern (especially given how many 
there are now).  My thought was that at least for me (and it sounds like 
perhaps Paul as well) this would be fairly likely to be a commonly used 
option, so I'd like a nice concise way of enabling it.  With sort 
there's no real downside to just enabling multithreading by default, so 
a longopt-only flag is fine.  With grep however (at least with my 
current implementation) there are tradeoffs with output ordering that 
may be undesirable (and which I don't see a good way around without 
introducing a bunch of potentially-complicated and performance-reducing 
per-file output buffering), so I kept it off by default.

There's also the question of the argument parsing mentioned in my 
original email -- as it stands now, '-M' would be the only short option 
with an optional argument, which has potential to be confusing.  
Thinking about it a bit more, I realize that what I really want out of 
the short flag is just a shorter way to say --parallel=NUMCPUS (and not 
have to remember how many CPUs the machine I'm on has), so perhaps 
another possibility on that front would be to leave the long option 
as-is but have the short flag (assuming there is one) not take an 
argument (though I suppose that could perhaps be seen as confusing in 
its own way too).


Zev





This bug report was last modified 341 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.