GNU bug report logs -
#73721
grep perf docs barely mention mem usage
Previous Next
Reported by: <mark.yagnatinsky <at> barclays.com>
Date: Wed, 9 Oct 2024 17:03:02 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Wed, 9 Oct 2024 12:03:16 -0700
with message-id <f08d0761-ef91-41b6-a7a8-519203bda62b <at> cs.ucla.edu>
and subject line Re: bug#73721: grep perf docs barely mention mem usage
has caused the debbugs.gnu.org bug report #73721,
regarding grep perf docs barely mention mem usage
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
73721: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=73721
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
This page: https://www.gnu.org/software/grep/manual/html_node/Performance.html
discusses how to get good performance out of grep.
That is nice, but the perf advice focuses almost entirely on speed.
I'm more interested in how to avoid excessive memory usage.
After a bit of research, it seems that once upon a time, grep used mmap where possible, but it no longer does this.
Thus, peak memory usage will be proportional to the length of the longest line in the file.
Thus, if use the "-z multiline hack" to search across lines, grep will read the whole file into memory.
Thus, if I try this on a huge file, I will likely have a bad time.
(e.g., a 5 gig file would fail in a 32-bit grep, and would increase memory pressure on the system on a 64-bit grep.)
Is the above about right?
This message is for information purposes only. It is not a recommendation, advice, offer or solicitation to buy or sell a product or service, nor an official confirmation of any transaction. It is directed at persons who are professionals and is intended for the recipient(s) only. It is not directed at retail customers. This message is subject to the terms at: https://www.ib.barclays/disclosures/web-and-email-disclaimer.html.
For important disclosures, please see: https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding marketing commentary from Barclays Sales and/or Trading desks, who are active market participants; https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html regarding our standard terms for Barclays Investment Bank where we trade with you in principal-to-principal wholesale markets transactions; and in respect to Barclays Research, including disclosures relating to specific issuers, see: https://publicresearch.barclays.com.
__________________________________________________________________________________
If you are incorporated or operating in Australia, read these important disclosures: https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: https://www.ib.barclays/disclosures/personal-information-use.html.
__________________________________________________________________________________
[Message part 4 (text/html, inline)]
[Message part 5 (message/rfc822, inline)]
On 2024-10-09 10:01, mark.yagnatinsky--- via Bug reports for GNU grep wrote:
> After a bit of research, it seems that once upon a time, grep used mmap where possible, but it no longer does this.
> Thus, peak memory usage will be proportional to the length of the longest line in the file.
> Thus, if use the "-z multiline hack" to search across lines, grep will read the whole file into memory.
> Thus, if I try this on a huge file, I will likely have a bad time.
> (e.g., a 5 gig file would fail in a 32-bit grep, and would increase memory pressure on the system on a 64-bit grep.)
>
> Is the above about right?
Sounds right.
mmap likely wouldn't help much. As I recall, it typically made 'grep'
slower.
This bug report was last modified 278 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.