GNU bug report logs - #32073
Improvements in Grep

Previous Next

Package: grep;

Reported by: Sergiu Hlihor <sh <at> discovergy.com>

Date: Fri, 6 Jul 2018 21:32:02 UTC

Severity: wishlist

Full log


Message #43 received at 32073 <at> debbugs.gnu.org (full text, mbox):

From: arnold <at> skeeve.com
To: sh <at> discovergy.com, arnold <at> skeeve.com
Cc: 32073 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#32073: Improvements in Grep (Bug#32073)
Date: Wed, 01 Jan 2020 13:24:26 -0700
Hi.

Sergiu Hlihor <sh <at> discovergy.com> wrote:

> Arnold, there is no need to write user code, it is already done in
> benchmarks. One of the standard benchmarks when testing HDDs and SSDs is
> read throughput vs block size and at different queue depths.

I think you're misunderstanding me, or I am misunderstanding you.

As the gawk maintainer, I can choose the buffer size to use every time
I issue a read(2) system call for any given input file.  Gawk currently
uses the smaller of (a) the file's size or (b) the st_blksize member of
the struct stat array.

If I understand you correctly, this is "not enough"; gawk (grep,
cp, etc.) should all use an optimal buffer size that depends upon the
underlying storage hardware where the file is located.

So far, so good, except for: How do I determine what that number is?
I cannot run a benchmark before opening each and every file. I don't
know of a system call that will give me that number. (If there is,
please point me to it.)

Do you just want a command line option or environment variable
that you, as the application user, can set?

If the latter, it happens that gawk will let you set AWKBUFSIZE and
it will use whatever number you supply for doing reads. (This is
even documented.)

HTH,

Arnold




This bug report was last modified 5 years and 228 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.