GNU bug report logs - #32073
Improvements in Grep

Previous Next

Package: grep;

Reported by: Sergiu Hlihor <sh <at> discovergy.com>

Date: Fri, 6 Jul 2018 21:32:02 UTC

Severity: wishlist

Full log


Message #52 received at 32073 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Sergiu Hlihor <sh <at> discovergy.com>
Cc: 32073 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
 Dennis Clarke <dclarke <at> blastwave.org>
Subject: Re: Improvements in Grep (Bug#32073)
Date: Wed, 1 Jan 2020 16:51:00 -0800
On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <sh <at> discovergy.com> wrote:
> Paul, I have to correct you. On a production server you have usually a mix of applications many times including databases. For databases, having a read ahead means one IO less since usually database access patterns are random reads. Here actually best is to disable completely read ahead. In fact, I do have to say that probably best is to disable completely read ahead and let applications deal with it, either in an automatic fashion, like reading the optimal IO block size from device  or in a configurable way with defaults good enough for today's servers. If you now configure the OS to do a read ahead hitting all HDDs then you induce potentially unnecessary IO load for all applications which use it, which when having HDDs is totally unacceptable. That's why the best is to be application specific and ideally configured to use optimal IO block size.
>
> So no, letting OS to do it is stupid.
>
> On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>
>> On 1/1/20 1:15 AM, Sergiu Hlihor wrote:
>> > If you rely on OS, then
>> > you are at the mercy of whatever read ahead configuration you have.
>>
>> Right, and whatever changes you make to the OS and its read-ahead configuration
>> will work for all applications, not just for 'grep'. So, change the OS to do
>> that. There shouldn't be a need to change 'grep' in particular (or 'cp' in
>> particular, or 'awk' in particular, etc.).
>>
>> > The issue of large
>> > block sizes for IO operations is widespread across all tools from Linux,
>> > like rsync or cp and its only getting worse
>>
>> Quite right. And it would be painful to have to modify all those tools, and to
>> maintain those modifications. So modify the OS instead. Scheduling read-ahead is
>> really the OS's job anyway.

Hi Sergiu,

If you would like to help make grep use larger buffer sizes, please
run and report benchmarks measuring how much of a difference it would
make, at least for your hardware. Here are some of the tests I ran to
justify raising it from ~32k to ~96k:
https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.html




This bug report was last modified 5 years and 229 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.