#34133 - Huge memory usage and output size when using "H" and "G"

GNU bug report logs - #34133
Huge memory usage and output size when using "H" and "G"

Package: sed;

Reported by: Hongxu Chen <leftcopy.chx <at> gmail.com>

Date: Sat, 19 Jan 2019 09:54:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Assaf Gordon <assafgordon <at> gmail.com> To: Hongxu Chen <leftcopy.chx <at> gmail.com>, 34133 <at> debbugs.gnu.org Subject: bug#34133: Huge memory usage and output size when using "H" and "G" Date: Sat, 19 Jan 2019 14:27:30 -0700

tags 34133 notabug close 34133 stop Hello, On 2019-01-19 2:53 a.m., Hongxu Chen wrote: > We found an issue that are relevant to use of "H" and "G" for appending > hold space and pattern space. It is an "issue" in the sense that your example does consume large amounts of memory, but it is not a bug - this is how sed works. > The input file is attached which is a file of 30 lines and 80 columns > filled with 'a'. And my memory is 64G with equivalent swap. > > # these two may eat up the memory > sed 's/a/d/; G; H;' input > sed '/b/d; G; H;' input Let's simplify: The "s/a/d/" does not change anything related to memory (it changes a single letter "a" to "d" in the input), so I'll omit it. The '/b/d' command is a no-op, because your input does not contain the letter "b". We're left with: sed 'G;H' The length of each line also doesn't matter, so I'll use shorter lines. Now observe the following: $ printf "%s\n" 0 | sed 'G;H' | wc -l 2 $ printf "%s\n" 0 1 | sed 'G;H' | wc -l 6 $ printf "%s\n" 0 1 2 | sed 'G;H' | wc -l 14 $ printf "%s\n" 0 1 2 3 | sed 'G;H' | wc -l 30 $ printf "%s\n" 0 1 2 3 4 | sed 'G;H' | wc -l 62 $ printf "%s\n" 0 1 2 3 4 5 | sed 'G;H' | wc -l 126 $ printf "%s\n" 0 1 2 3 4 5 6 | sed 'G;H' | wc -l 254 $ printf "%s\n" 0 1 2 3 4 5 6 7 | sed 'G;H' | wc -l 510 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 | sed 'G;H' | wc -l 1022 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 | sed 'G;H' | wc -l 2046 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 | sed 'G;H' | wc -l 4094 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 | sed 'G;H' | wc -l 8190 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 12 | sed 'G;H' | wc -l 16382 Notice the trend? The number of lines (and by proxy: size of buffer and memory usage) is exponential. With 20 lines, you'll need O(2^20) = 1M memory (plus size of each line, and size of pointers overhead, etc.). Still doable. With 30 lines, you'll need O(2^30) = 1G of lines. If each of your lines is 80 characters, you'll need 80GB (before counting overhead of pointers). > # this is fine > sed '/a/d; G; H;' input This is "fine" because the "/a/d" command deletes all lines of your input, hence nothing is stored in the pattern/hold buffers. > I learned from http://www.grymoire.com/Unix/Sed.html that 'G' appends > hold space to pattern space, and 'H' does the inverse. > In the first two examples, the buffer of hold space will be appended to > pattern space, and subsequently content of pattern space will be appended > to hold space once more. With one more input line, the two buffers will be > doubled; and as long as the input file is big enough, sed may finally eat > up the memory and populate the output. Yes, that how it works. > We think this is vulnerable since it may eat up the memory in a few > seconds. Any program that keeps the input in memory is vulnerable to unbounded input size. That is not a bug. As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. regards, - assaf

This bug report was last modified 6 years and 202 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #34133 Huge memory usage and output size when using "H" and "G"

GNU bug report logs - #34133
Huge memory usage and output size when using "H" and "G"