GNU bug report logs - #34133
Huge memory usage and output size when using "H" and "G"

Previous Next

Package: sed;

Reported by: Hongxu Chen <leftcopy.chx <at> gmail.com>

Date: Sat, 19 Jan 2019 09:54:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Hongxu Chen <leftcopy.chx <at> gmail.com>, 34133 <at> debbugs.gnu.org
Subject: bug#34133: Huge memory usage and output size when using "H" and "G"
Date: Sat, 19 Jan 2019 14:27:30 -0700
tags 34133 notabug
close 34133
stop

Hello,

On 2019-01-19 2:53 a.m., Hongxu Chen wrote:
>      We found an issue that are relevant to use of "H" and "G" for appending
> hold space and pattern space.

It is an "issue" in the sense that your example does consume large
amounts of memory, but it is not a bug - this is how sed works.

>      The input file is attached which is a file of 30 lines and 80 columns
> filled with 'a'. And my memory is 64G with equivalent swap.
> 
>        # these two may eat up the memory
>      sed 's/a/d/; G; H;' input
>      sed '/b/d; G; H;' input


Let's simplify:
The "s/a/d/" does not change anything related to memory
(it changes a single letter "a" to "d" in the input), so I'll omit it.

The '/b/d' command is a no-op, because your input does not contain
the letter "b".

We're left with:
   sed 'G;H'
The length of each line also doesn't matter, so I'll use shorter lines.

Now observe the following:

$ printf "%s\n" 0 | sed 'G;H' | wc -l
2
$ printf "%s\n" 0 1 | sed 'G;H' | wc -l
6
$ printf "%s\n" 0 1 2 | sed 'G;H' | wc -l
14
$ printf "%s\n" 0 1 2 3 | sed 'G;H' | wc -l
30
$ printf "%s\n" 0 1 2 3 4 | sed 'G;H' | wc -l
62
$ printf "%s\n" 0 1 2 3 4 5 | sed 'G;H' | wc -l
126
$ printf "%s\n" 0 1 2 3 4 5 6 | sed 'G;H' | wc -l
254
$ printf "%s\n" 0 1 2 3 4 5 6 7 | sed 'G;H' | wc -l
510
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 | sed 'G;H' | wc -l
1022
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 | sed 'G;H' | wc -l
2046
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 | sed 'G;H' | wc -l
4094
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 | sed 'G;H' | wc -l
8190
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 12 | sed 'G;H' | wc -l
16382

Notice the trend?
The number of lines (and by proxy: size of buffer and memory usage)
is exponential.

With 20 lines, you'll need O(2^20) = 1M memory (plus size of each line,
and size of pointers overhead, etc.). Still doable.

With 30 lines, you'll need O(2^30) = 1G of lines.
If each of your lines is 80 characters, you'll need 80GB (before
counting overhead of pointers).


>       # this is fine
>      sed '/a/d; G; H;' input

This is "fine" because the "/a/d" command deletes all lines of your
input, hence nothing is stored in the pattern/hold buffers.

>      I learned from http://www.grymoire.com/Unix/Sed.html that 'G' appends
> hold space to pattern space, and 'H' does the inverse.
>      In the first two examples, the buffer of hold space will be appended to
> pattern space, and subsequently content of pattern space will be appended
> to hold space once more. With one more input line, the two buffers will be
> doubled; and as long as the input file is big enough, sed may finally eat
> up the memory and populate the output.

Yes, that how it works.

>      We think this is vulnerable since it may eat up the memory in a few
> seconds.

Any program that keeps the input in memory is vulnerable
to unbounded input size. That is not a bug.

As such, I'm closing this as "not a bug", but discussion can continue
by replying to this thread.

regards,
 - assaf





This bug report was last modified 6 years and 202 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.