Hi Assaf, While I still think that this is sed's defect, I agree that programmers should ensure the script will not result in a bomb. What I'm thinking is whether there can be some builtin check to avoid this (e.g., roughly suppress use of ā€œGā€ directly followed by "H"). In addition, the error messages can be more meaningful in certain scenarios. I believe there involve different aspects of optimizations in the following scripts, but the stderr output can be improved. sed '/b/e; G; H;' input >/dev/null # instant exit, fine sed '/a/e; G; H;' input >/dev/null # many lines of `sh: 1: Syntax error: ";" unexpected` sed '/a/x; G; H;' input >/dev/null # instant exit, fine sed '/b/x; G; H;' input >/dev/null # long wait Generally, the *semantics* of sed scripts had better to be much clearer. To be honest, I'm really confused about their meaning until I think for quite a while. Best Regards, Hongxu On Sun, Jan 20, 2019 at 11:37 AM Assaf Gordon wrote: > Hello, > > On 2019-01-19 7:23 p.m., Hongxu Chen wrote: > > We think the way sed works may suffer from attacks. > > Not more than any other program that uses memory or disk > based on its input. > > > If the user downloads some > > sed scripts and run *without root privilege*, the host machine may soon > > exceed > > the memory; > > First, > root privileges have nothing to do with it. > A non-privileged user can consume as much memory as they want, > and a poorly configured machine will be brought to its knees. > > Well configured machines will not allow user-space programs > to cripple them (but I readily admit that such configuration > is not trivial to achieve). > > Second, > Any user who downloads any program from untrusted source is > expected to know what they're doing. > If they don't - then the can cause a lot of damage. > > This has nothing to do with sed. > > > in my case, the machine actually hangs and I have to restart > > it. > > Yes, many common installations are not configured to handle > memory exhaustion. > > > The > > problem may be severer when the machine is hosting some service or does > > the sed relevant service such as text processing (may be rare) itself > > even inside > > some sandbox. The issue may also be triggered unconsciously thus cause > > surprise > > and trouble. > > That is all true, but has nothing to do with sed. > > > > > > Any program that keeps the input in memory is vulnerable to unbounded > > input size > > > > I think input size is not big; and the size can still be reduced as > > long as more "G;H"s > > are appended to the script. > > Maybe sed can do something flush to avoid memory usage? > > > > I'll rephrase my words: > > Your sed program has O(2^N) space requirements, > Any program that have exponential behavior (be it space or time) > will quickly lead to pathological cases. > > So while your input file is not too big (N=30 lines), > it leads to huge memory requirements. > > I recommend reading some background about complexity > (most of these deals with Time complexity, but it applies to space as > well): > http://bigocheatsheet.com/ > https://en.wikipedia.org/wiki/Time_complexity#Exponential_time > https://en.wikipedia.org/wiki/Computational_complexity > > To illustrate my point further, here are similar examples in AWK and > PERL that would choke with your input: > > awk '{ buf = buf $1 ; buf = buf buf } END { print buf }' > perl -ne '$buf .= $_ ; $buf .= $buf ; print $buf' < input > > Just as you wrote above, a non-root user who runs these on your input > will consume a huge amount of memory. This is why I said it has nothing > to do with sed. > > To see the unfolding of exponential growth in action, > try the following C program (which goes to show it is not just > (awk/perl/sed scripts): > > /* Compile with: > gcc -o 1 1.c -lm > > Run with: > seq 30 | ./1 > */ > #define _GNU_SOURCE > #define _POSIX_C_SOURCE 200809L > #include > #include > #include > > int main() > { > char *buf,*line; > size_t linenum = 0; > size_t buf_size = 0; > size_t n; > ssize_t i; > > while ( (i = getline(&line,&n, stdin)) != -1) { > ++linenum; > buf_size += n; > buf_size *= 2; > printf ("line %zu (%zu bytes), 2^%zu == %g, buf_size = %zu > bytes\n", > linenum, n, linenum, pow(2, linenum) , buf_size); > } > return 0; > > } > > > The above does not actually allocate memory, it just shows how much > would be allocated. Run it with your input and see for yourself: > > line 1 (120 bytes), 2^1 == 2, buf_size = 240 bytes > line 2 (120 bytes), 2^2 == 4, buf_size = 720 bytes > line 3 (120 bytes), 2^3 == 8, buf_size = 1680 bytes > [...] > line 30 (120 bytes), 2^30 == 1.07374e+09, buf_size = 257698037520 bytes > > If you tried to do "malloc(buf_size)" it would consume all your memory > (if you have less than 256GB). > > ---- > > This applies not just to memory (RAM), but to disk space as well > (which conceptually is just different type of storage). > > Try this example: > > printf a > in > for i in $(seq 20); do cat in in > out ; mv out in ; done ; > > The input file starts with 1 byte. > After 20 rounds, the file size will be 1MB. > If you try 30 rounds, the file will be 1GB. > The "20" and "30" here correspond to your input size (number of lines). > Small change in input leads to large changes in output (thus > "exponential" programs are considered bad). > > If you are still not convinced, try 40 rounds - that will attempt > to create a 1TB file. If you disk is smaller - it will become full > (which is just like memory exhaustion). > > ---- > > I hope this resolves the issue. > > regards, > - assaf > > > > > > > >