GNU bug report logs -
#34133
Huge memory usage and output size when using "H" and "G"
Previous Next
Reported by: Hongxu Chen <leftcopy.chx <at> gmail.com>
Date: Sat, 19 Jan 2019 09:54:01 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
tags 34133 notabug
close 34133
stop
Hello,
On 2019-01-19 2:53 a.m., Hongxu Chen wrote:
> We found an issue that are relevant to use of "H" and "G" for appending
> hold space and pattern space.
It is an "issue" in the sense that your example does consume large
amounts of memory, but it is not a bug - this is how sed works.
> The input file is attached which is a file of 30 lines and 80 columns
> filled with 'a'. And my memory is 64G with equivalent swap.
>
> # these two may eat up the memory
> sed 's/a/d/; G; H;' input
> sed '/b/d; G; H;' input
Let's simplify:
The "s/a/d/" does not change anything related to memory
(it changes a single letter "a" to "d" in the input), so I'll omit it.
The '/b/d' command is a no-op, because your input does not contain
the letter "b".
We're left with:
sed 'G;H'
The length of each line also doesn't matter, so I'll use shorter lines.
Now observe the following:
$ printf "%s\n" 0 | sed 'G;H' | wc -l
2
$ printf "%s\n" 0 1 | sed 'G;H' | wc -l
6
$ printf "%s\n" 0 1 2 | sed 'G;H' | wc -l
14
$ printf "%s\n" 0 1 2 3 | sed 'G;H' | wc -l
30
$ printf "%s\n" 0 1 2 3 4 | sed 'G;H' | wc -l
62
$ printf "%s\n" 0 1 2 3 4 5 | sed 'G;H' | wc -l
126
$ printf "%s\n" 0 1 2 3 4 5 6 | sed 'G;H' | wc -l
254
$ printf "%s\n" 0 1 2 3 4 5 6 7 | sed 'G;H' | wc -l
510
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 | sed 'G;H' | wc -l
1022
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 | sed 'G;H' | wc -l
2046
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 | sed 'G;H' | wc -l
4094
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 | sed 'G;H' | wc -l
8190
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 12 | sed 'G;H' | wc -l
16382
Notice the trend?
The number of lines (and by proxy: size of buffer and memory usage)
is exponential.
With 20 lines, you'll need O(2^20) = 1M memory (plus size of each line,
and size of pointers overhead, etc.). Still doable.
With 30 lines, you'll need O(2^30) = 1G of lines.
If each of your lines is 80 characters, you'll need 80GB (before
counting overhead of pointers).
> # this is fine
> sed '/a/d; G; H;' input
This is "fine" because the "/a/d" command deletes all lines of your
input, hence nothing is stored in the pattern/hold buffers.
> I learned from http://www.grymoire.com/Unix/Sed.html that 'G' appends
> hold space to pattern space, and 'H' does the inverse.
> In the first two examples, the buffer of hold space will be appended to
> pattern space, and subsequently content of pattern space will be appended
> to hold space once more. With one more input line, the two buffers will be
> doubled; and as long as the input file is big enough, sed may finally eat
> up the memory and populate the output.
Yes, that how it works.
> We think this is vulnerable since it may eat up the memory in a few
> seconds.
Any program that keeps the input in memory is vulnerable
to unbounded input size. That is not a bug.
As such, I'm closing this as "not a bug", but discussion can continue
by replying to this thread.
regards,
- assaf
This bug report was last modified 6 years and 202 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.