GNU bug report logs - #73721
grep perf docs barely mention mem usage

Previous Next

Package: grep;

Reported by: <mark.yagnatinsky <at> barclays.com>

Date: Wed, 9 Oct 2024 17:03:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#73721: closed (grep perf docs barely mention mem usage)
Date: Wed, 09 Oct 2024 19:04:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Wed, 9 Oct 2024 12:03:16 -0700
with message-id <f08d0761-ef91-41b6-a7a8-519203bda62b <at> cs.ucla.edu>
and subject line Re: bug#73721: grep perf docs barely mention mem usage
has caused the debbugs.gnu.org bug report #73721,
regarding grep perf docs barely mention mem usage
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
73721: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=73721
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: <mark.yagnatinsky <at> barclays.com>
To: <bug-grep <at> gnu.org>
Subject: grep perf docs barely mention mem usage
Date: Wed, 9 Oct 2024 17:01:22 +0000
[Message part 3 (text/plain, inline)]
This page: https://www.gnu.org/software/grep/manual/html_node/Performance.html
discusses how to get good performance out of grep.
That is nice, but the perf advice focuses almost entirely on speed.
I'm more interested in how to avoid excessive memory usage.
After a bit of research, it seems that once upon a time, grep used mmap where possible, but it no longer does this.
Thus, peak memory usage will be proportional to the length of the longest line in the file.
Thus, if use the "-z multiline hack" to search across lines, grep will read the whole file into memory.
Thus, if I try this on a huge file, I will likely have a bad time.
(e.g., a 5 gig file would fail in a 32-bit grep, and would increase memory pressure on the system on a 64-bit grep.)

Is the above about right?

This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________ 
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: mark.yagnatinsky <at> barclays.com
Cc: 73721-done <at> debbugs.gnu.org
Subject: Re: bug#73721: grep perf docs barely mention mem usage
Date: Wed, 9 Oct 2024 12:03:16 -0700
On 2024-10-09 10:01, mark.yagnatinsky--- via Bug reports for GNU grep wrote:
> After a bit of research, it seems that once upon a time, grep used mmap where possible, but it no longer does this.
> Thus, peak memory usage will be proportional to the length of the longest line in the file.
> Thus, if use the "-z multiline hack" to search across lines, grep will read the whole file into memory.
> Thus, if I try this on a huge file, I will likely have a bad time.
> (e.g., a 5 gig file would fail in a 32-bit grep, and would increase memory pressure on the system on a 64-bit grep.)
> 
> Is the above about right?

Sounds right.

mmap likely wouldn't help much. As I recall, it typically made 'grep' 
slower.


This bug report was last modified 278 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.