From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 19 04:53:41 2019 Received: (at submit) by debbugs.gnu.org; 19 Jan 2019 09:53:41 +0000 Received: from localhost ([127.0.0.1]:37301 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gknJt-0004H2-0r for submit@debbugs.gnu.org; Sat, 19 Jan 2019 04:53:41 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42437) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gknJr-0004Gp-CF for submit@debbugs.gnu.org; Sat, 19 Jan 2019 04:53:39 -0500 Received: from lists.gnu.org ([209.51.188.17]:39163) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gknJi-00042h-DJ for submit@debbugs.gnu.org; Sat, 19 Jan 2019 04:53:31 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52620) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gknJe-0007LU-Kd for bug-sed@gnu.org; Sat, 19 Jan 2019 04:53:30 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gknJX-0003vC-4h for bug-sed@gnu.org; Sat, 19 Jan 2019 04:53:23 -0500 Received: from mail-io1-xd31.google.com ([2607:f8b0:4864:20::d31]:46624) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gknJW-0003u9-Qh for bug-sed@gnu.org; Sat, 19 Jan 2019 04:53:19 -0500 Received: by mail-io1-xd31.google.com with SMTP id s8so547919iob.13 for ; Sat, 19 Jan 2019 01:53:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=IjJbCwr7cTJ7HCx+v0iYXL668qO3Omi7a5KcV8k8kn8=; b=mX8yq6AbcZ8x9yQW9m+KpVMiZl/vdMWovY1GVNTId822dQXSrlRPSq+acIAqFtQDlt Q5CaBGVOyT6C9+IwH//QKnYUFjH1T0+yWl6rbi8VrU5CoW26UwFg2OMKOrJn2HhIIhda MEh+UPe8ZMT5G8qPvWJ0OrICBrJC10wemwGlEv90EeR5FfnuM/nXDIyPaLowhUC5AYs2 HAahUSX9WIfSKzUAMwswKs2WLnuhK+6d64IhyVmA6e8qkLqE5RahcL2R1YQQcieTVtrs ePGIUm11mUOr/oPgxhkFsEvSJfkycjVl2EzMiPu2SlTtp8b5FTq5vxpI2hQKtQqUBBsY T7dw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=IjJbCwr7cTJ7HCx+v0iYXL668qO3Omi7a5KcV8k8kn8=; b=IdyHxp+HVIFED9lui0w6wyTI+WPRxoej52bH0bYkr4qTgGCx6zcurqq21fr74Gug0N c4ZzfbzpLlFHrZQIB6xiYUvCl55UJrVnUh3p7qeIJWCvxbkxQp72mVn9Chc/CS1HuGAG 8mcpk1CmZd1Zqqs8KBhJwF4vjY2g6jYvoIjf7ctmkNre8wDcofxSbCq1ZsgwgQiPbMdU 1KxdV+uQ9fwSJ4FyQnwTokc1CLQIdjSdfyfe+K9eGQb13KeO09CPcnPsB54rmimN3Ygd XuBifJ6XxdrBqJVfLAgfiYtGSZZykJSafyMtXFK30ygRtBqiJKeTgtevy0Yw+BqhULBh OULw== X-Gm-Message-State: AJcUukd1lX/IkkFpDWPTTRDA0HzmxsxkgmWLQOI85x42WCOw8mniasLk znt2uwqeFujPL+HQPHGQU6raiYck8+Q5zoUpHuXtYMD0 X-Google-Smtp-Source: ALg8bN7mLO7g6c4eIMrgwBrsfBVbCUMG/ic9otyzS5Bp9jcjjQxMUzPVKJxAZ6By0wAzEliihVw9DnQg387ZNeN0V8M= X-Received: by 2002:a5e:d808:: with SMTP id l8mr11976586iok.299.1547891596697; Sat, 19 Jan 2019 01:53:16 -0800 (PST) MIME-Version: 1.0 From: Hongxu Chen Date: Sat, 19 Jan 2019 17:53:05 +0800 Message-ID: Subject: Huge memory usage and output size when using "H" and "G" To: bug-sed@gnu.org Content-Type: multipart/mixed; boundary="000000000000c5d98e057fcc98bc" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::d31 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) --000000000000c5d98e057fcc98bc Content-Type: multipart/alternative; boundary="000000000000c5d98a057fcc98ba" --000000000000c5d98a057fcc98ba Content-Type: text/plain; charset="UTF-8" Hi, We found an issue that are relevant to use of "H" and "G" for appending hold space and pattern space. The input file is attached which is a file of 30 lines and 80 columns filled with 'a'. And my memory is 64G with equivalent swap. # these two may eat up the memory sed 's/a/d/; G; H;' input sed '/b/d; G; H;' input # this is fine sed '/a/d; G; H;' input I learned from http://www.grymoire.com/Unix/Sed.html that 'G' appends hold space to pattern space, and 'H' does the inverse. In the first two examples, the buffer of hold space will be appended to pattern space, and subsequently content of pattern space will be appended to hold space once more. With one more input line, the two buffers will be doubled; and as long as the input file is big enough, sed may finally eat up the memory and populate the output. We think this is vulnerable since it may eat up the memory in a few seconds. Best Regards, Hongxu --000000000000c5d98a057fcc98ba Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

=C2=A0 =C2=A0 We found an issue= that are relevant to use of "H" and "G" for appending = hold space and pattern space.

=C2=A0 =C2=A0 The in= put file is attached which is a file of 30 lines and 80 columns filled with= 'a'. And my memory is 64G with equivalent swap.

=C2=A0 =C2=A0 =C2=A0 # these two may eat up the memory
=C2= =A0 =C2=A0=C2=A0sed 's/a/d/; G; H;' input
=C2=A0 =C2=A0= =C2=A0sed '/b/d; G; H;' input

=C2=A0 =C2= =A0 =C2=A0# this is fine
=C2=A0 =C2=A0 sed '/a/d; G; H;' = input

=C2=A0 =C2=A0 I learned from=C2=A0http://www.grymoire.com/Unix/Sed.ht= ml that 'G' appends hold space to pattern space, and 'H'= ; does the inverse.
=C2=A0 =C2=A0 In the first two examples, the = buffer of hold space will be appended to pattern space, and subsequently co= ntent of pattern space will be appended to hold space once more. With one m= ore input line, the two buffers will be doubled; and as long as the input f= ile is big enough, sed may finally eat up the memory and populate the outpu= t.
=C2=A0 =C2=A0 We think this is vulnerable since it may eat up = the memory in a few seconds.

Best= Regards,
Hongxu
--000000000000c5d98a057fcc98ba-- --000000000000c5d98e057fcc98bc Content-Type: application/octet-stream; name=input Content-Disposition: attachment; filename=input Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_jr39ujpk0 YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEK YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWEKYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFh YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEK --000000000000c5d98e057fcc98bc-- From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 19 04:57:58 2019 Received: (at 34133) by debbugs.gnu.org; 19 Jan 2019 09:57:59 +0000 Received: from localhost ([127.0.0.1]:37306 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gknO2-0004NU-O3 for submit@debbugs.gnu.org; Sat, 19 Jan 2019 04:57:58 -0500 Received: from mail-it1-f172.google.com ([209.85.166.172]:55195) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gknO0-0004NG-Q6 for 34133@debbugs.gnu.org; Sat, 19 Jan 2019 04:57:57 -0500 Received: by mail-it1-f172.google.com with SMTP id i145so10283311ita.4 for <34133@debbugs.gnu.org>; Sat, 19 Jan 2019 01:57:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=RmL7mkYw23bqqnWYuwJiAoTUg0R8THG8GiipJ+Qroa4=; b=ZuuGyZRecqa40KJdI0R6gryM9AGug8/G4ymxoItvatW+E3zlhMcip3I5UUIiPTbecA rJtpcvHTUKanVmmlVDTGYWZAcyXjV40hPWEaNFCxnCWqMJoldjqoXcdx953v1gPsRcb1 m3ZtY6X3UyYVYgHUCbddW05h7tdY5xKzgNWJc+hULRu1gvMHHZDy5f4aoDskmnfkdROU YXdqItjVnhhlHwigDiMNOFgs5pugC/4gjHMOLz/4JfwwilOlBv30c9CfVAnJl7F3lCg/ sB+SFPh0QRX9aj4EMMz+MtuxVBZ9WVZdqK/Vl300i8Ys3UrMThgqWYL9MJMACOv6oZXj JvDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=RmL7mkYw23bqqnWYuwJiAoTUg0R8THG8GiipJ+Qroa4=; b=ZE6dOBeOswBWS+obFIbj+Ka0H0BdROBkIKyfNaBIJREIBn+jOTHxDrDRQxrh9qngDF Gonq+RHP9wdC+61TpQjeGXvyLLykqBtoFLzlAtqpDyy3h8oKzaUR3ZH+0k2XvzFhgisY fNmspbhEmz7gk5M68rnUPz/pYZ68mzdED9BxNw6fPwpkKBH2CJYa6Zt7192Rg0uJUdj2 UFs8guAA8QQcel0Ec+S2UGxDEVVyRMDiy7GEFFm1nSbm2Clfb2FMkDuZbQC6kL80/DSZ d/gAJxQbDB31JW8khWBx0uLiqKA7dR391WJnMliHqGkeMibOBlalgD1i/EtsBUuFGNcZ Bb7w== X-Gm-Message-State: AJcUukeAjL3rtWSl9r12lEQPpRF0zo+TlBKgK2wsJjMgA2YDJaxV+OvC 5pZinEZOAvXfEFkTG9C2kROP/kkNzDC1f0Zs25K4ABk1 X-Google-Smtp-Source: ALg8bN4JNqCzLTG13u5h6j5yVde4KYzHVyoXFo2zEIRLNHLsufnmjIaUMxznv5P1jVWPb1u56yzkXuJeTLPwzoRah0g= X-Received: by 2002:a02:9d4b:: with SMTP id m11mr13104889jal.121.1547891870610; Sat, 19 Jan 2019 01:57:50 -0800 (PST) MIME-Version: 1.0 From: Hongxu Chen Date: Sat, 19 Jan 2019 17:57:39 +0800 Message-ID: Subject: Duplicate of 34133 To: 34133@debbugs.gnu.org Content-Type: multipart/alternative; boundary="000000000000193af5057fcca9d0" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 34133 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000193af5057fcca9d0 Content-Type: text/plain; charset="UTF-8" Hi GNU sed maintainers, Sorry I mistakenly sent an email with empty content. Please close this issue and track #34133 instead. Thank you! Best Regards, Hongxu --000000000000193af5057fcca9d0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi GNU sed maintainers,

=C2=A0 =C2=A0 Sorry I mistakenly sent an email with empty content.
<= div>=C2=A0 =C2=A0 Please close this issue and track #34133 instead.
=C2=A0 =C2=A0 Thank you!

= Best Regards,
Hongxu
--000000000000193af5057fcca9d0-- From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 19 16:27:42 2019 Received: (at 34133) by debbugs.gnu.org; 19 Jan 2019 21:27:42 +0000 Received: from localhost ([127.0.0.1]:38096 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gky9V-00028b-PW for submit@debbugs.gnu.org; Sat, 19 Jan 2019 16:27:42 -0500 Received: from mail-pg1-f173.google.com ([209.85.215.173]:40689) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gky9U-00028H-1N; Sat, 19 Jan 2019 16:27:40 -0500 Received: by mail-pg1-f173.google.com with SMTP id z10so7697797pgp.7; Sat, 19 Jan 2019 13:27:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=YYYitr2rM+iekmYRU11pvONcXqjxUizuFczPwObyLf0=; b=WMZeaQabEb4XL2wDB9OtIFImVpDjjbh5pfaGIhpnrduuDIu/b4p77jdEvH4o2oKW9m 6hKnUWbxRDLuIurvdelMlpIkR2PlhEkJ7DvqSFpkOii1tBwmZuOW4XvD3i9ExbwTKk4C 61vWWCge3gSxThOrBeKd+qiNWmy4woVDPPiMy84kej8qfXUIynwFQwjHiksnNK02NudY R1QOkFGoVwmpqDiKBNkjtGiAdKEzOfDvqVPtzz1J5fCyPNnvNm0i5ug2HlQq1WhPBkHD 6tI6ebvXP593/IxbmEl56b6Ak0n/fv/0FNNdEY69cYI5BhSFuNlOYljooHyFpZRNLtu3 /FCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=YYYitr2rM+iekmYRU11pvONcXqjxUizuFczPwObyLf0=; b=SARHcq0IyLsQkS3siHckUTslUILvA2iL1zQVbqcwpTWL3sB17/P0omJ5IolzmfaZtO sGuwM85UKQzRQOMaY3jwcHY4QfzEPrI4pVKs9n7At3CPGxO5PMAQ1iozYbavijrpC0kb gXPIpsKtrwK2HkWcyoyGJ+H9+1BTkbS54biDBsgp/ai+nvQrB/TfBb+rbyQx5yT/+/CD XX2KGWYubvwET3UnZdXAhI8AGl6Jh5Jx+iLPW8xfY2HTc1/DtDpzOYFTrSy+tVVnIIyH 1lces3t8HE8VPtAGvo1r3PTwo19aoWsihi0DciUNlMxJXVI1QZBUN95r1Xqe4JhN1FA/ /3qA== X-Gm-Message-State: AJcUukekl3brfhQZrKVAnJChePIy1xkpgvAS0WHCq5cIflXWRDXMQz3e xVvnOVWVJViEu9V6XJhd6V/Epc9L X-Google-Smtp-Source: ALg8bN4i02x4NeKSJ9K3hbnnmJ+8NawqM2zQTI9aL+FhLsaSitPd/cSYHlIlYA0DX/SCR7d7hJR4Uw== X-Received: by 2002:a63:451a:: with SMTP id s26mr9994514pga.150.1547933253323; Sat, 19 Jan 2019 13:27:33 -0800 (PST) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id t90sm12283121pfj.23.2019.01.19.13.27.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 19 Jan 2019 13:27:31 -0800 (PST) Subject: Re: bug#34133: Huge memory usage and output size when using "H" and "G" To: Hongxu Chen , 34133@debbugs.gnu.org References: From: Assaf Gordon Message-ID: <16b6994c-7224-8869-baf5-6df68a2ded79@gmail.com> Date: Sat, 19 Jan 2019 14:27:30 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 34133 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 34133 notabug close 34133 stop Hello, On 2019-01-19 2:53 a.m., Hongxu Chen wrote: > We found an issue that are relevant to use of "H" and "G" for appending > hold space and pattern space. It is an "issue" in the sense that your example does consume large amounts of memory, but it is not a bug - this is how sed works. > The input file is attached which is a file of 30 lines and 80 columns > filled with 'a'. And my memory is 64G with equivalent swap. > > # these two may eat up the memory > sed 's/a/d/; G; H;' input > sed '/b/d; G; H;' input Let's simplify: The "s/a/d/" does not change anything related to memory (it changes a single letter "a" to "d" in the input), so I'll omit it. The '/b/d' command is a no-op, because your input does not contain the letter "b". We're left with: sed 'G;H' The length of each line also doesn't matter, so I'll use shorter lines. Now observe the following: $ printf "%s\n" 0 | sed 'G;H' | wc -l 2 $ printf "%s\n" 0 1 | sed 'G;H' | wc -l 6 $ printf "%s\n" 0 1 2 | sed 'G;H' | wc -l 14 $ printf "%s\n" 0 1 2 3 | sed 'G;H' | wc -l 30 $ printf "%s\n" 0 1 2 3 4 | sed 'G;H' | wc -l 62 $ printf "%s\n" 0 1 2 3 4 5 | sed 'G;H' | wc -l 126 $ printf "%s\n" 0 1 2 3 4 5 6 | sed 'G;H' | wc -l 254 $ printf "%s\n" 0 1 2 3 4 5 6 7 | sed 'G;H' | wc -l 510 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 | sed 'G;H' | wc -l 1022 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 | sed 'G;H' | wc -l 2046 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 | sed 'G;H' | wc -l 4094 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 | sed 'G;H' | wc -l 8190 $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 12 | sed 'G;H' | wc -l 16382 Notice the trend? The number of lines (and by proxy: size of buffer and memory usage) is exponential. With 20 lines, you'll need O(2^20) = 1M memory (plus size of each line, and size of pointers overhead, etc.). Still doable. With 30 lines, you'll need O(2^30) = 1G of lines. If each of your lines is 80 characters, you'll need 80GB (before counting overhead of pointers). > # this is fine > sed '/a/d; G; H;' input This is "fine" because the "/a/d" command deletes all lines of your input, hence nothing is stored in the pattern/hold buffers. > I learned from http://www.grymoire.com/Unix/Sed.html that 'G' appends > hold space to pattern space, and 'H' does the inverse. > In the first two examples, the buffer of hold space will be appended to > pattern space, and subsequently content of pattern space will be appended > to hold space once more. With one more input line, the two buffers will be > doubled; and as long as the input file is big enough, sed may finally eat > up the memory and populate the output. Yes, that how it works. > We think this is vulnerable since it may eat up the memory in a few > seconds. Any program that keeps the input in memory is vulnerable to unbounded input size. That is not a bug. As such, I'm closing this as "not a bug", but discussion can continue by replying to this thread. regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 19 21:23:36 2019 Received: (at 34133) by debbugs.gnu.org; 20 Jan 2019 02:23:36 +0000 Received: from localhost ([127.0.0.1]:38196 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gl2lr-00037L-Ul for submit@debbugs.gnu.org; Sat, 19 Jan 2019 21:23:36 -0500 Received: from mail-io1-f53.google.com ([209.85.166.53]:37122) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gl2lq-000379-BC for 34133@debbugs.gnu.org; Sat, 19 Jan 2019 21:23:35 -0500 Received: by mail-io1-f53.google.com with SMTP id g8so13907357iok.4 for <34133@debbugs.gnu.org>; Sat, 19 Jan 2019 18:23:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+3mge0rb/UZkIdiR0ftqS8v0FD3ar3jNIaa4+mBXXX8=; b=sWWzGSyQGkwrfEt6FDFsNdCdHhnPkRpyizUhy7DaNUsWBaVKCKT79QevXgd12l55mu fOhiqN87Ff8P6L5xVYQI7P8C/TfH+yP/ZQuYMlKlc2LchTtUTFlIEmE4Ckfa+m4Ba4Vg +AaNgm8vEFuU8xPlfaKa3G4Yl0hV8DexQsp22TIcChmx6s0etZdAQyuhfdl45GM9KbfK L161iule9BT6NBpsjUCqwAckHtpgcudpOSgIXkeoh5vPusB6pFr3fIR3kvU+K3tOIv3A dV15a1we9fhFNmJHkKJ+XgJ7LSpLz3JTyVMoAjsprE7JCzRDNkWlpHhJagHc/BZYB8nM ZSSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+3mge0rb/UZkIdiR0ftqS8v0FD3ar3jNIaa4+mBXXX8=; b=MrJHa5ovo34SQSEMO6Q6kL25tascJBz93EG0DG0UhH0vrUXY3opmhKQTrMGRWKzoCW yNyob7+4GgIrvXGHZ4ibGc9bwgV4YkAfdzkaJnqkhYaMFUuZK31miSXlT+YiJZ9OVi6X YoBlVIbn/cln9ImDm0W64Rs1/MZ2laJvOOMvueWKvNluqP1D2VLV7nOxJsFEFIJ79tbo xn1IHqVXErrJBEqrgkAmFv0zs4UHypcWrk5ZGEMECj6n5Rmi2cCoMvT4aOHtryKyqpqy a4bIOwDkJNM3yafJ2MpCr7SvH4zfMHLmAKEHaT5q2tAcGzK3o76ypmwXOCJLLzUP6mF4 D85w== X-Gm-Message-State: AJcUukdTIHrrFYT81Y5yOTjwfwV5wox65DOux+JRr8m/Vdxdkn9m88Bb GWauZwQQddSsF/fc0yGP9d9AoZJrf59ttPnPYiY= X-Google-Smtp-Source: ALg8bN54hMdY9mX8i+JDBMI9SALRq48o1gCcy9q1AEhE0YYvdZ548abhGWYB14oALwv5F+qHmSpy1sJzknM7QT99jJc= X-Received: by 2002:a5e:d808:: with SMTP id l8mr13208068iok.299.1547951008470; Sat, 19 Jan 2019 18:23:28 -0800 (PST) MIME-Version: 1.0 References: <16b6994c-7224-8869-baf5-6df68a2ded79@gmail.com> In-Reply-To: <16b6994c-7224-8869-baf5-6df68a2ded79@gmail.com> From: Hongxu Chen Date: Sun, 20 Jan 2019 10:23:16 +0800 Message-ID: Subject: Re: bug#34133: Huge memory usage and output size when using "H" and "G" To: Assaf Gordon Content-Type: multipart/alternative; boundary="000000000000fd5b39057fda6d7b" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 34133 Cc: 34133@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000fd5b39057fda6d7b Content-Type: text/plain; charset="UTF-8" Hi Assaf, Thanks for the explanation. We think the way sed works may suffer from attacks. If the user downloads some sed scripts and run *without root privilege*, the host machine may soon exceed the memory; in my case, the machine actually hangs and I have to restart it. The problem may be severer when the machine is hosting some service or does the sed relevant service such as text processing (may be rare) itself even inside some sandbox. The issue may also be triggered unconsciously thus cause surprise and trouble. > Any program that keeps the input in memory is vulnerable to unbounded input size I think input size is not big; and the size can still be reduced as long as more "G;H"s are appended to the script. Maybe sed can do something flush to avoid memory usage? Best Regards, Hongxu On Sun, Jan 20, 2019 at 5:27 AM Assaf Gordon wrote: > tags 34133 notabug > close 34133 > stop > > Hello, > > On 2019-01-19 2:53 a.m., Hongxu Chen wrote: > > We found an issue that are relevant to use of "H" and "G" for > appending > > hold space and pattern space. > > It is an "issue" in the sense that your example does consume large > amounts of memory, but it is not a bug - this is how sed works. > > > The input file is attached which is a file of 30 lines and 80 > columns > > filled with 'a'. And my memory is 64G with equivalent swap. > > > > # these two may eat up the memory > > sed 's/a/d/; G; H;' input > > sed '/b/d; G; H;' input > > > Let's simplify: > The "s/a/d/" does not change anything related to memory > (it changes a single letter "a" to "d" in the input), so I'll omit it. > > The '/b/d' command is a no-op, because your input does not contain > the letter "b". > > We're left with: > sed 'G;H' > The length of each line also doesn't matter, so I'll use shorter lines. > > Now observe the following: > > $ printf "%s\n" 0 | sed 'G;H' | wc -l > 2 > $ printf "%s\n" 0 1 | sed 'G;H' | wc -l > 6 > $ printf "%s\n" 0 1 2 | sed 'G;H' | wc -l > 14 > $ printf "%s\n" 0 1 2 3 | sed 'G;H' | wc -l > 30 > $ printf "%s\n" 0 1 2 3 4 | sed 'G;H' | wc -l > 62 > $ printf "%s\n" 0 1 2 3 4 5 | sed 'G;H' | wc -l > 126 > $ printf "%s\n" 0 1 2 3 4 5 6 | sed 'G;H' | wc -l > 254 > $ printf "%s\n" 0 1 2 3 4 5 6 7 | sed 'G;H' | wc -l > 510 > $ printf "%s\n" 0 1 2 3 4 5 6 7 8 | sed 'G;H' | wc -l > 1022 > $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 | sed 'G;H' | wc -l > 2046 > $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 | sed 'G;H' | wc -l > 4094 > $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 | sed 'G;H' | wc -l > 8190 > $ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 12 | sed 'G;H' | wc -l > 16382 > > Notice the trend? > The number of lines (and by proxy: size of buffer and memory usage) > is exponential. > > With 20 lines, you'll need O(2^20) = 1M memory (plus size of each line, > and size of pointers overhead, etc.). Still doable. > > With 30 lines, you'll need O(2^30) = 1G of lines. > If each of your lines is 80 characters, you'll need 80GB (before > counting overhead of pointers). > > > > # this is fine > > sed '/a/d; G; H;' input > > This is "fine" because the "/a/d" command deletes all lines of your > input, hence nothing is stored in the pattern/hold buffers. > > > I learned from http://www.grymoire.com/Unix/Sed.html that 'G' > appends > > hold space to pattern space, and 'H' does the inverse. > > In the first two examples, the buffer of hold space will be > appended to > > pattern space, and subsequently content of pattern space will be appended > > to hold space once more. With one more input line, the two buffers will > be > > doubled; and as long as the input file is big enough, sed may finally eat > > up the memory and populate the output. > > Yes, that how it works. > > > We think this is vulnerable since it may eat up the memory in a few > > seconds. > > Any program that keeps the input in memory is vulnerable > to unbounded input size. That is not a bug. > > As such, I'm closing this as "not a bug", but discussion can continue > by replying to this thread. > > regards, > - assaf > > --000000000000fd5b39057fda6d7b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi=C2=A0Assaf,

<= /div>
=C2=A0 =C2=A0 Thanks=C2=A0 for the explanation.

=C2=A0 =C2=A0 We think the way sed works may suffer from attacks. I= f the user downloads some
sed scripts and run *without root privi= lege*, the host machine may soon exceed
the memory; in my case, t= he machine actually hangs and I have to restart it. The
problem m= ay be severer when the machine is hosting some service or does
th= e sed relevant service such as text processing (may be rare) itself even in= side
some sandbox. The issue may also be triggered=C2=A0unconscio= usly thus cause surprise
and trouble.

&g= t; Any program that keeps the input in memory is vulnerable to unbounded in= put size

=C2=A0 =C2=A0 I think input size is not b= ig; and the size can still be reduced as long as more "G;H"s
<= /div>
are appended to the script.
=C2=A0Maybe sed can do some= thing flush to avoid memory usage?=C2=A0=C2=A0

=
Best Regards,
Hongxu


On Sun, Jan 20, 2019 at 5:27 AM Assaf Gordon &= lt;assafgordon@gmail.com> w= rote:
tags 34133= notabug
close 34133
stop

Hello,

On 2019-01-19 2:53 a.m., Hongxu Chen wrote:
>=C2=A0 =C2=A0 =C2=A0 We found an issue that are relevant to use of &quo= t;H" and "G" for appending
> hold space and pattern space.

It is an "issue" in the sense that your example does consume larg= e
amounts of memory, but it is not a bug - this is how sed works.

>=C2=A0 =C2=A0 =C2=A0 The input file is attached which is a file of 30 l= ines and 80 columns
> filled with 'a'. And my memory is 64G with equivalent swap. >
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 # these two may eat up the memory
>=C2=A0 =C2=A0 =C2=A0 sed 's/a/d/; G; H;' input
>=C2=A0 =C2=A0 =C2=A0 sed '/b/d; G; H;' input


Let's simplify:
The "s/a/d/" does not change anything related to memory
(it changes a single letter "a" to "d" in the input), s= o I'll omit it.

The '/b/d' command is a no-op, because your input does not contain<= br> the letter "b".

We're left with:
=C2=A0 =C2=A0 sed 'G;H'
The length of each line also doesn't matter, so I'll use shorter li= nes.

Now observe the following:

$ printf "%s\n" 0 | sed 'G;H' | wc -l
2
$ printf "%s\n" 0 1 | sed 'G;H' | wc -l
6
$ printf "%s\n" 0 1 2 | sed 'G;H' | wc -l
14
$ printf "%s\n" 0 1 2 3 | sed 'G;H' | wc -l
30
$ printf "%s\n" 0 1 2 3 4 | sed 'G;H' | wc -l
62
$ printf "%s\n" 0 1 2 3 4 5 | sed 'G;H' | wc -l
126
$ printf "%s\n" 0 1 2 3 4 5 6 | sed 'G;H' | wc -l
254
$ printf "%s\n" 0 1 2 3 4 5 6 7 | sed 'G;H' | wc -l
510
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 | sed 'G;H' | wc -l
1022
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 | sed 'G;H' | wc -l 2046
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 | sed 'G;H' | wc -= l
4094
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 | sed 'G;H' | w= c -l
8190
$ printf "%s\n" 0 1 2 3 4 5 6 7 8 9 10 11 12 | sed 'G;H' = | wc -l
16382

Notice the trend?
The number of lines (and by proxy: size of buffer and memory usage)
is exponential.

With 20 lines, you'll need O(2^20) =3D 1M memory (plus size of each lin= e,
and size of pointers overhead, etc.). Still doable.

With 30 lines, you'll need O(2^30) =3D 1G of lines.
If each of your lines is 80 characters, you'll need 80GB (before
counting overhead of pointers).


>=C2=A0 =C2=A0 =C2=A0 =C2=A0# this is fine
>=C2=A0 =C2=A0 =C2=A0 sed '/a/d; G; H;' input

This is "fine" because the "/a/d" command deletes all l= ines of your
input, hence nothing is stored in the pattern/hold buffers.

>=C2=A0 =C2=A0 =C2=A0 I learned from http://www.grymoire.com= /Unix/Sed.html that 'G' appends
> hold space to pattern space, and 'H' does the inverse.
>=C2=A0 =C2=A0 =C2=A0 In the first two examples, the buffer of hold spac= e will be appended to
> pattern space, and subsequently content of pattern space will be appen= ded
> to hold space once more. With one more input line, the two buffers wil= l be
> doubled; and as long as the input file is big enough, sed may finally = eat
> up the memory and populate the output.

Yes, that how it works.

>=C2=A0 =C2=A0 =C2=A0 We think this is vulnerable since it may eat up th= e memory in a few
> seconds.

Any program that keeps the input in memory is vulnerable
to unbounded input size. That is not a bug.

As such, I'm closing this as "not a bug", but discussion can = continue
by replying to this thread.

regards,
=C2=A0 - assaf

--000000000000fd5b39057fda6d7b-- From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 19 22:37:47 2019 Received: (at 34133) by debbugs.gnu.org; 20 Jan 2019 03:37:47 +0000 Received: from localhost ([127.0.0.1]:38222 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gl3vf-0004qP-9A for submit@debbugs.gnu.org; Sat, 19 Jan 2019 22:37:47 -0500 Received: from mail-pg1-f193.google.com ([209.85.215.193]:45151) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gl3vd-0004qA-4v for 34133@debbugs.gnu.org; Sat, 19 Jan 2019 22:37:45 -0500 Received: by mail-pg1-f193.google.com with SMTP id y4so7898526pgc.12 for <34133@debbugs.gnu.org>; Sat, 19 Jan 2019 19:37:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=8G1YOisziD9i2DORk6WmtTlTQf7hLDLqTmv63tvcZ9A=; b=lHMhX41XNXDayZUNDiMvqTca+XdprmorjjNuGcZQ+WDdUVgFpg+lQEYRROoQmuNWOp UfY6Ouyl9Cjp8KEnIYqpZqnzT6cRwp+hlJA7A1lGKY1w7OTjk4MBj4J2NgOA9DLnufim TjzFooEp9WIGYPU8uyjXJrL+LBo3T29UXscV+BoJRKM/sBPe8MlYeRQhaNZBqz72idDe xdew9A6tEFohCiAjUOsi0F+L4TYbdCympJnljCrn2WfT70PfM5iQ5UnOuPdNXc/sbALf uMSFoMpX82quAnuvDdoqFrFqIGY4xk1qgZlNaVGJ1gmLdFN7u3O/FExB1IRgRMk/pN6k +SeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=8G1YOisziD9i2DORk6WmtTlTQf7hLDLqTmv63tvcZ9A=; b=DNpfk1T3PHvlsUPyM0h3U9yMMXtuAtfBBjzqXPtWoVc/z0lPTF8zRtK0Phj2xV/7ze gAK1pPaq+EJsPEKf7nc0Mt65gfxrZ7us2u3861i+tFmfa0N9XA6xfchUIJOE1bv1zq/6 +lDju/MkqaTb+mmY2hY3QPkzbCvyfT+SvesbdoqQE6kgyqKl9YDe2wnKFomrJFh6m0Uu oHOldHsJk4LRbBfkGoQh14Ik5w4IrSzVGxVD1T8QY4jv5b7g0kMwz6C6zZDsA4SR/iMR 9Us3mZBkW/wB803F7jyX0GAYA6hgs/dGMnH1QI8GoM+7FO5apjEu7yrBLb+NSATCiTaq 4zOw== X-Gm-Message-State: AJcUukfo7wpkjrPrG2ahuRyuum151XekZr9eC/T873zcGHpk0yvAQRMK DzMHgguXVWaQdhXsNniUg3303ITB X-Google-Smtp-Source: ALg8bN5rsPH0jYlYlQlHXtMwassCuG2CLwgPyzY/TsiPOAj8LmPgjnoH23ZC1ik3gz+L49258ex4uA== X-Received: by 2002:a62:6ec8:: with SMTP id j191mr25297614pfc.198.1547955458228; Sat, 19 Jan 2019 19:37:38 -0800 (PST) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id 202sm11359827pfy.87.2019.01.19.19.37.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 19 Jan 2019 19:37:36 -0800 (PST) Subject: Re: bug#34133: Huge memory usage and output size when using "H" and "G" To: Hongxu Chen References: <16b6994c-7224-8869-baf5-6df68a2ded79@gmail.com> From: Assaf Gordon Message-ID: Date: Sat, 19 Jan 2019 20:37:35 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 34133 Cc: 34133@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, On 2019-01-19 7:23 p.m., Hongxu Chen wrote: >     We think the way sed works may suffer from attacks. Not more than any other program that uses memory or disk based on its input. > If the user downloads some > sed scripts and run *without root privilege*, the host machine may soon > exceed > the memory; First, root privileges have nothing to do with it. A non-privileged user can consume as much memory as they want, and a poorly configured machine will be brought to its knees. Well configured machines will not allow user-space programs to cripple them (but I readily admit that such configuration is not trivial to achieve). Second, Any user who downloads any program from untrusted source is expected to know what they're doing. If they don't - then the can cause a lot of damage. This has nothing to do with sed. > in my case, the machine actually hangs and I have to restart > it. Yes, many common installations are not configured to handle memory exhaustion. > The > problem may be severer when the machine is hosting some service or does > the sed relevant service such as text processing (may be rare) itself > even inside > some sandbox. The issue may also be triggered unconsciously thus cause > surprise > and trouble. That is all true, but has nothing to do with sed. > > > Any program that keeps the input in memory is vulnerable to unbounded > input size > >     I think input size is not big; and the size can still be reduced as > long as more "G;H"s > are appended to the script. >  Maybe sed can do something flush to avoid memory usage? > I'll rephrase my words: Your sed program has O(2^N) space requirements, Any program that have exponential behavior (be it space or time) will quickly lead to pathological cases. So while your input file is not too big (N=30 lines), it leads to huge memory requirements. I recommend reading some background about complexity (most of these deals with Time complexity, but it applies to space as well): http://bigocheatsheet.com/ https://en.wikipedia.org/wiki/Time_complexity#Exponential_time https://en.wikipedia.org/wiki/Computational_complexity To illustrate my point further, here are similar examples in AWK and PERL that would choke with your input: awk '{ buf = buf $1 ; buf = buf buf } END { print buf }' perl -ne '$buf .= $_ ; $buf .= $buf ; print $buf' < input Just as you wrote above, a non-root user who runs these on your input will consume a huge amount of memory. This is why I said it has nothing to do with sed. To see the unfolding of exponential growth in action, try the following C program (which goes to show it is not just (awk/perl/sed scripts): /* Compile with: gcc -o 1 1.c -lm Run with: seq 30 | ./1 */ #define _GNU_SOURCE #define _POSIX_C_SOURCE 200809L #include #include #include int main() { char *buf,*line; size_t linenum = 0; size_t buf_size = 0; size_t n; ssize_t i; while ( (i = getline(&line,&n, stdin)) != -1) { ++linenum; buf_size += n; buf_size *= 2; printf ("line %zu (%zu bytes), 2^%zu == %g, buf_size = %zu bytes\n", linenum, n, linenum, pow(2, linenum) , buf_size); } return 0; } The above does not actually allocate memory, it just shows how much would be allocated. Run it with your input and see for yourself: line 1 (120 bytes), 2^1 == 2, buf_size = 240 bytes line 2 (120 bytes), 2^2 == 4, buf_size = 720 bytes line 3 (120 bytes), 2^3 == 8, buf_size = 1680 bytes [...] line 30 (120 bytes), 2^30 == 1.07374e+09, buf_size = 257698037520 bytes If you tried to do "malloc(buf_size)" it would consume all your memory (if you have less than 256GB). ---- This applies not just to memory (RAM), but to disk space as well (which conceptually is just different type of storage). Try this example: printf a > in for i in $(seq 20); do cat in in > out ; mv out in ; done ; The input file starts with 1 byte. After 20 rounds, the file size will be 1MB. If you try 30 rounds, the file will be 1GB. The "20" and "30" here correspond to your input size (number of lines). Small change in input leads to large changes in output (thus "exponential" programs are considered bad). If you are still not convinced, try 40 rounds - that will attempt to create a 1TB file. If you disk is smaller - it will become full (which is just like memory exhaustion). ---- I hope this resolves the issue. regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 01:38:37 2019 Received: (at 34133) by debbugs.gnu.org; 20 Jan 2019 06:38:37 +0000 Received: from localhost ([127.0.0.1]:38264 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gl6ke-00016j-Cx for submit@debbugs.gnu.org; Sun, 20 Jan 2019 01:38:36 -0500 Received: from mail-io1-f66.google.com ([209.85.166.66]:44672) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gl6kc-00016T-Sf for 34133@debbugs.gnu.org; Sun, 20 Jan 2019 01:38:35 -0500 Received: by mail-io1-f66.google.com with SMTP id r200so14028954iod.11 for <34133@debbugs.gnu.org>; Sat, 19 Jan 2019 22:38:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=J7sS3tcdbBndzTazMWNXdCOk63GXvazMqedhNl2fu5w=; b=KNpv5N8zzMIRUqnnKfzJ08L1a6cfqEfDOJBoKDMbYGx4nDDF6ptfe+rR4lZ9kuTFF4 TS8A4K9KJRqOOwz9VnEy6aZR1WQilgk92ncAdGfwrvprBtx5n/htW8b/kckfss8J59PH HuYbAtl1WfmLCIcPYGJxznlYYxF/KQOHwd1Qq1scFRmJ6z49vUMDBlqNPkGxl21xkpR/ lm+1RJNtOSOR7k5B0+0IDBgBd44YU61Is0EQEQ/KldhuEomMBN5L/pUL6+4T3DCTIfdp qYqdg4QFG9sLArpwMUa+w2jtzLwG4JHqERdY8M1f4C8KqEaNchLPgSPIil6yaXK9kwT/ U7fA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=J7sS3tcdbBndzTazMWNXdCOk63GXvazMqedhNl2fu5w=; b=Ljx7wn5wirfPQkSQnCokBRO9E7tk766fi4yRexV5a2ERJ7EYYk0WhdXLKUaplrIhmo 9YD4THTIZByKV8A+xSb8t6Yo26l2imC88ym9OaJdYCq68rksBckbouFrqDXPghE7DJJt oQztQxqzdTAJVMKqk56Y6Kblv4JRYgiQJXP+2/+DSKVV3Oyf43Jid5WTIYgtFa7Y5AF8 24xMEhmw7qbWjuLZ+1B4X60BkUr5tns+9ZDcFVVN4uocLfQW99LRwRpdRMwo690Y2Y1j hM5vZrKD7VXGQwHDCrPLAu1ogh2LgZBQ3F7vV9p+ju9gGkII6YcrNoHFPrn+h5FW3UBC 2sCw== X-Gm-Message-State: AJcUukc/bUGLlFUxw674N/i0M+tyVUi05BXa/Rn224TASovDmx3oY8Oj kSAUr1rx+u5hMo6RfgRz6/OipxwjUFvSRGMLHPs= X-Google-Smtp-Source: ALg8bN7sttCihJ17wCcpPbOKFIqGxPd7BsOzoEWvexaG0zoZ5Q6qmSr0cmbuDfh2vYPJG0E9TOWo//oJDLEhpdO+oz0= X-Received: by 2002:a6b:b28a:: with SMTP id b132mr13859291iof.256.1547966308904; Sat, 19 Jan 2019 22:38:28 -0800 (PST) MIME-Version: 1.0 References: <16b6994c-7224-8869-baf5-6df68a2ded79@gmail.com> In-Reply-To: From: Hongxu Chen Date: Sun, 20 Jan 2019 14:38:18 +0800 Message-ID: Subject: Re: bug#34133: Huge memory usage and output size when using "H" and "G" To: Assaf Gordon Content-Type: multipart/alternative; boundary="000000000000f772e4057fddfdc6" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 34133 Cc: 34133@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000f772e4057fddfdc6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Assaf, While I still think that this is sed's defect, I agree that programmers should ensure the script will not result in a bomb. What I'm thinking is whether there can be some builtin check to avoid this (e.g., roughly suppress use of =E2=80=9CG=E2=80= =9D directly followed by "H"). In addition, the error messages can be more meaningful in certain scenarios. I believe there involve different aspects of optimizations in the following scripts, but the stderr output can be improved. sed '/b/e; G; H;' input >/dev/null # instant exit, fine sed '/a/e; G; H;' input >/dev/null # many lines of `sh: 1: Syntax error: ";" unexpected` sed '/a/x; G; H;' input >/dev/null # instant exit, fine sed '/b/x; G; H;' input >/dev/null # long wait Generally, the *semantics* of sed scripts had better to be much clearer. To be honest, I'm really confused about their meaning until I think for quite a while. Best Regards, Hongxu On Sun, Jan 20, 2019 at 11:37 AM Assaf Gordon wrote= : > Hello, > > On 2019-01-19 7:23 p.m., Hongxu Chen wrote: > > We think the way sed works may suffer from attacks. > > Not more than any other program that uses memory or disk > based on its input. > > > If the user downloads some > > sed scripts and run *without root privilege*, the host machine may soon > > exceed > > the memory; > > First, > root privileges have nothing to do with it. > A non-privileged user can consume as much memory as they want, > and a poorly configured machine will be brought to its knees. > > Well configured machines will not allow user-space programs > to cripple them (but I readily admit that such configuration > is not trivial to achieve). > > Second, > Any user who downloads any program from untrusted source is > expected to know what they're doing. > If they don't - then the can cause a lot of damage. > > This has nothing to do with sed. > > > in my case, the machine actually hangs and I have to restart > > it. > > Yes, many common installations are not configured to handle > memory exhaustion. > > > The > > problem may be severer when the machine is hosting some service or does > > the sed relevant service such as text processing (may be rare) itself > > even inside > > some sandbox. The issue may also be triggered unconsciously thus cause > > surprise > > and trouble. > > That is all true, but has nothing to do with sed. > > > > > > Any program that keeps the input in memory is vulnerable to unbounde= d > > input size > > > > I think input size is not big; and the size can still be reduced a= s > > long as more "G;H"s > > are appended to the script. > > Maybe sed can do something flush to avoid memory usage? > > > > I'll rephrase my words: > > Your sed program has O(2^N) space requirements, > Any program that have exponential behavior (be it space or time) > will quickly lead to pathological cases. > > So while your input file is not too big (N=3D30 lines), > it leads to huge memory requirements. > > I recommend reading some background about complexity > (most of these deals with Time complexity, but it applies to space as > well): > http://bigocheatsheet.com/ > https://en.wikipedia.org/wiki/Time_complexity#Exponential_time > https://en.wikipedia.org/wiki/Computational_complexity > > To illustrate my point further, here are similar examples in AWK and > PERL that would choke with your input: > > awk '{ buf =3D buf $1 ; buf =3D buf buf } END { print buf }' > perl -ne '$buf .=3D $_ ; $buf .=3D $buf ; print $buf' < input > > Just as you wrote above, a non-root user who runs these on your input > will consume a huge amount of memory. This is why I said it has nothing > to do with sed. > > To see the unfolding of exponential growth in action, > try the following C program (which goes to show it is not just > (awk/perl/sed scripts): > > /* Compile with: > gcc -o 1 1.c -lm > > Run with: > seq 30 | ./1 > */ > #define _GNU_SOURCE > #define _POSIX_C_SOURCE 200809L > #include > #include > #include > > int main() > { > char *buf,*line; > size_t linenum =3D 0; > size_t buf_size =3D 0; > size_t n; > ssize_t i; > > while ( (i =3D getline(&line,&n, stdin)) !=3D -1) { > ++linenum; > buf_size +=3D n; > buf_size *=3D 2; > printf ("line %zu (%zu bytes), 2^%zu =3D=3D %g, buf_size =3D %z= u > bytes\n", > linenum, n, linenum, pow(2, linenum) , buf_size); > } > return 0; > > } > > > The above does not actually allocate memory, it just shows how much > would be allocated. Run it with your input and see for yourself: > > line 1 (120 bytes), 2^1 =3D=3D 2, buf_size =3D 240 bytes > line 2 (120 bytes), 2^2 =3D=3D 4, buf_size =3D 720 bytes > line 3 (120 bytes), 2^3 =3D=3D 8, buf_size =3D 1680 bytes > [...] > line 30 (120 bytes), 2^30 =3D=3D 1.07374e+09, buf_size =3D 25769803752= 0 bytes > > If you tried to do "malloc(buf_size)" it would consume all your memory > (if you have less than 256GB). > > ---- > > This applies not just to memory (RAM), but to disk space as well > (which conceptually is just different type of storage). > > Try this example: > > printf a > in > for i in $(seq 20); do cat in in > out ; mv out in ; done ; > > The input file starts with 1 byte. > After 20 rounds, the file size will be 1MB. > If you try 30 rounds, the file will be 1GB. > The "20" and "30" here correspond to your input size (number of lines). > Small change in input leads to large changes in output (thus > "exponential" programs are considered bad). > > If you are still not convinced, try 40 rounds - that will attempt > to create a 1TB file. If you disk is smaller - it will become full > (which is just like memory exhaustion). > > ---- > > I hope this resolves the issue. > > regards, > - assaf > > > > > > > > --000000000000f772e4057fddfdc6 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Assaf,

= =C2=A0 =C2=A0 While I still think that this is sed's defect, I agree th= at programmers should ensure
the script will not result in a bomb= . What I'm thinking is whether there can be some
builtin chec= k to avoid this (e.g., roughly suppress use of =E2=80=9CG=E2=80=9D directly= followed by "H").=C2=A0

In addition, th= e error messages can be more meaningful in certain scenarios. I believe
there involve different aspects of optimizations in the following sc= ripts, but the stderr
output can be improved.

sed '/b/e; G; H;' input >/dev/null=C2=A0 # instant exit, f= ine
sed '/a/e; G; H;' input >/dev/null=C2=A0 # man= y lines of `sh: 1: Syntax error: ";" unexpected`
se= d '/a/x; G; H;' input >/dev/null # instant exit, fine
<= div>sed '/b/x; G; H;' input >/dev/null # long wait

Generally, the *semantics* of sed scripts had better t= o be much clearer. To be honest,
I'm really confused ab= out their meaning until I think for quite a while.


Best Regar= ds,
Hongxu


On Sun, Jan 20, 2019 at 11:3= 7 AM Assaf Gordon <assafgordon@gmail.com> wrote:
Hello,

On 2019-01-19 7:23 p.m., Hongxu Chen wrote:
>=C2=A0 =C2=A0 =C2=A0 We think the way sed works may suffer from attacks= .

Not more than any other program that uses memory or disk
based on its input.

> If the user downloads some
> sed scripts and run *without root privilege*, the host machine may soo= n
> exceed
> the memory;

First,
root privileges have nothing to do with it.
A non-privileged user can consume as much memory as they want,
and a poorly configured machine will be brought to its knees.

Well configured machines will not allow user-space programs
to cripple them (but I readily admit that such configuration
is not trivial to achieve).

Second,
Any user who downloads any program from untrusted source is
expected to know what they're doing.
If they don't - then the can cause a lot of damage.

This has nothing to do with sed.

> in my case, the machine actually hangs and I have to restart
> it.

Yes, many common installations are not configured to handle
memory exhaustion.

> The
> problem may be severer when the machine is hosting some service or doe= s
> the sed relevant service such as text processing (may be rare) itself =
> even inside
> some sandbox. The issue may also be triggered=C2=A0unconsciously thus = cause
> surprise
> and trouble.

That is all true, but has nothing to do with sed.

>
>=C2=A0 > Any program that keeps the input in memory is vulnerable to= unbounded
> input size
>
>=C2=A0 =C2=A0 =C2=A0 I think input size is not big; and the size can st= ill be reduced as
> long as more "G;H"s
> are appended to the script.
>=C2=A0 =C2=A0Maybe sed can do something flush to avoid memory usage? >

I'll rephrase my words:

Your sed program has O(2^N) space requirements,
Any program that have exponential behavior (be it space or time)
will quickly lead to pathological cases.

So while your input file is not too big (N=3D30 lines),
it leads to huge memory requirements.

I recommend reading some background about complexity
(most of these deals with Time complexity, but it applies to space as well)= :
=C2=A0 =C2=A0http://bigocheatsheet.com/
=C2=A0 =C2=A0https://en.wikipedia.org/= wiki/Time_complexity#Exponential_time
=C2=A0 =C2=A0https://en.wikipedia.org/wiki/Com= putational_complexity

To illustrate my point further, here are similar examples in AWK and
PERL that would choke with your input:

=C2=A0 =C2=A0awk '{ buf =3D buf $1 ; buf =3D buf buf } END { print buf = }'
=C2=A0 =C2=A0perl -ne '$buf .=3D $_ ; $buf .=3D $buf ; print $buf' = < input

Just as you wrote above, a non-root user who runs these on your input
will consume a huge amount of memory. This is why I said it has nothing
to do with sed.

To see the unfolding of exponential growth in action,
try the following C program (which goes to show it is not just
(awk/perl/sed scripts):

=C2=A0 =C2=A0 =C2=A0/* Compile with:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0gcc -o 1 1.c -lm

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Run with:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 seq 30 | ./1
=C2=A0 =C2=A0 =C2=A0*/
=C2=A0 =C2=A0 =C2=A0#define _GNU_SOURCE
=C2=A0 =C2=A0 =C2=A0#define _POSIX_C_SOURCE 200809L
=C2=A0 =C2=A0 =C2=A0#include <stdlib.h>
=C2=A0 =C2=A0 =C2=A0#include <stdio.h>
=C2=A0 =C2=A0 =C2=A0#include <math.h>

=C2=A0 =C2=A0 =C2=A0int main()
=C2=A0 =C2=A0 =C2=A0{
=C2=A0 =C2=A0 =C2=A0 =C2=A0char *buf,*line;
=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t linenum =3D 0;
=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t buf_size =3D 0;
=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t n;
=C2=A0 =C2=A0 =C2=A0 =C2=A0ssize_t i;

=C2=A0 =C2=A0 =C2=A0 =C2=A0while ( (i =3D getline(&line,&n, stdin))= !=3D -1) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ++linenum;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 buf_size +=3D n;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 buf_size *=3D 2;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 printf ("line %zu (%zu bytes), 2^%z= u =3D=3D %g, buf_size =3D %zu
bytes\n",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 linenum, n, linenum= , pow(2, linenum) , buf_size);
=C2=A0 =C2=A0 =C2=A0 =C2=A0}
=C2=A0 =C2=A0 =C2=A0 =C2=A0return 0;

=C2=A0 =C2=A0 =C2=A0}


The above does not actually allocate memory, it just shows how much
would be allocated. Run it with your input and see for yourself:

=C2=A0 =C2=A0line 1 (120 bytes), 2^1 =3D=3D 2, buf_size =3D 240 bytes
=C2=A0 =C2=A0line 2 (120 bytes), 2^2 =3D=3D 4, buf_size =3D 720 bytes
=C2=A0 =C2=A0line 3 (120 bytes), 2^3 =3D=3D 8, buf_size =3D 1680 bytes
=C2=A0 =C2=A0[...]
=C2=A0 =C2=A0line 30 (120 bytes), 2^30 =3D=3D 1.07374e+09, buf_size =3D 257= 698037520 bytes

If you tried to do "malloc(buf_size)" it would consume all your m= emory
(if you have less than 256GB).

----

This applies not just to memory (RAM), but to disk space as well
(which conceptually is just different type of storage).

Try this example:

=C2=A0 =C2=A0 =C2=A0printf a > in
=C2=A0 =C2=A0 =C2=A0for i in $(seq 20); do cat in in > out ; mv out in ;= done ;

The input file starts with 1 byte.
After 20 rounds, the file size will be 1MB.
If you try 30 rounds, the file will be 1GB.
The "20" and "30" here correspond to your input size (n= umber of lines).
Small change in input leads to large changes in output (thus
"exponential" programs are considered bad).

If you are still not convinced, try 40 rounds - that will attempt
to create a 1TB file. If you disk is smaller - it will become full
(which is just like memory exhaustion).

----

I hope this resolves the issue.

regards,
=C2=A0 - assaf







--000000000000f772e4057fddfdc6-- From unknown Fri Sep 05 20:55:01 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 17 Feb 2019 12:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator