On Wed, Jan 1, 2020 at 12:04 PM Sergiu Hlihor <sh@discovergy.com> wrote:
> Paul, I have to correct you. On a production server you have usually a mix of applications many times including databases. For databases, having a read ahead means one IO less since usually database access patterns are random reads. Here actually best is to disable completely read ahead. In fact, I do have to say that probably best is to disable completely read ahead and let applications deal with it, either in an automatic fashion, like reading the optimal IO block size from deviceĀ or in a configurable way with defaults good enough for today's servers. If you now configure the OS to do a read ahead hitting all HDDs then you induce potentially unnecessary IO load for all applications which use it, which when having HDDs is totally unacceptable. That's why the best is to be application specific and ideally configured to use optimal IO block size.
>
> So no, letting OS to do it is stupid.
>
> On Wed, 1 Jan 2020 at 20:42, Paul Eggert <eggert@cs.ucla.edu> wrote:
>>
>> On 1/1/20 1:15 AM, Sergiu Hlihor wrote:
>> > If you rely on OS, then
>> > you are at the mercy of whatever read ahead configuration you have.
>>
>> Right, and whatever changes you make to the OS and its read-ahead configuration
>> will work for all applications, not just for 'grep'. So, change the OS to do
>> that. There shouldn't be a need to change 'grep' in particular (or 'cp' in
>> particular, or 'awk' in particular, etc.).
>>
>> > The issue of large
>> > block sizes for IO operations is widespread across all tools from Linux,
>> > like rsync or cp and its only getting worse
>>
>> Quite right. And it would be painful to have to modify all those tools, and to
>> maintain those modifications. So modify the OS instead. Scheduling read-ahead is
>> really the OS's job anyway.
Hi Sergiu,
If you would like to help make grep use larger buffer sizes, please
run and report benchmarks measuring how much of a difference it would
make, at least for your hardware. Here are some of the tests I ran to
justify raising it from ~32k to ~96k:
https://lists.gnu.org/archive/html/grep-devel/2018-10/msg00002.html