From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 05 03:43:47 2020 Received: (at submit) by debbugs.gnu.org; 5 Feb 2020 08:43:47 +0000 Received: from localhost ([127.0.0.1]:45106 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1izGHj-0005S8-2b for submit@debbugs.gnu.org; Wed, 05 Feb 2020 03:43:47 -0500 Received: from lists.gnu.org ([209.51.188.17]:43418) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1izFmQ-0004JL-3Q for submit@debbugs.gnu.org; Wed, 05 Feb 2020 03:11:26 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:33041) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1izFmO-0008SO-1q for bug-sed@gnu.org; Wed, 05 Feb 2020 03:11:25 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1izFmN-00088w-1q for bug-sed@gnu.org; Wed, 05 Feb 2020 03:11:23 -0500 Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]:35240) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1izFmM-00082B-Qi for bug-sed@gnu.org; Wed, 05 Feb 2020 03:11:23 -0500 Received: by mail-lf1-x12a.google.com with SMTP id z18so826689lfe.2 for ; Wed, 05 Feb 2020 00:11:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=MgjkhSYR3J91c8ninu49ItB73WkZ7ohIeqKFlIBnVn0=; b=bFO/OrOHsoVNT5DktzTFP6lpjZi20A0fBls19rvt63sLvAR/5IbBNTuOADpAY/JI12 ZWRSumwZKxLS/xFCCkO7YIhsTQUc3M8P5rwWdPljBc2JMgkLpTF3jAUCe6xcR9oa0/pV MP10akHNIzUxMN3Z/4QSShmRjjQb9K48812W15kWM84hMIRZD3N/KgKWT/wAdvg2qz5d OV0Afkq8vehttlb2VqzlFmUKmaAuGkPfyXiSeapeja1YAFZvrNhFjQvYKsBHuTy7cfSP 4GhDj7wDHEbOMK3mTXzXcxqv4yu/n8dNWAFNsvYYJxWbIL5e5KKzee1HtXwCF0B/MQVk Nlnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=MgjkhSYR3J91c8ninu49ItB73WkZ7ohIeqKFlIBnVn0=; b=b4t6uN/44zYrMesPTZe5fADmqqxkbHf7QGJhPBCy6e6OW2F9AnITbKlDTMe8YBe5F+ 2PjNH40ynWVsiltIOUpmV4QDhWUdy0xhVPSAMRaidw0vVsPPtMSbqXazPJxAFibQ0g6V +dp6HcAYxZva6FU8r442diq4/VN9T91Fo793pBxScP0quhLhMnhCCAew5gWLJcSZ5Jav uIq6bWiDE35iKVvtd8HGevqdUTAeNkMJ6znqQA9RvKP61PlKGcofWnNfCueIfIOb/tUk y1Jf4PByDbiv1HVRmi3Ot47EedRayV6I2KYQO4Or22jYBs3ZMgJw35+VVuzbfOO9Qzmy BE5w== X-Gm-Message-State: APjAAAUwacdl/sK/fMrO1GcFOXUZWF/OPWBdpPRatsjS3bC2dYLGC7JK KxsiJOKI6NEnU045BiDag7o62yxAXTYpSefB5egA1w== X-Google-Smtp-Source: APXvYqxr6ZFw55gZ7GLLz1TUdb68O9B+BgFQUDHiQQwBIpLQxfxlRq4g7Fg8VJ8zi5SKrQjIF9S7NXmSA8YjCbvdFYM= X-Received: by 2002:ac2:5444:: with SMTP id d4mr16679911lfn.49.1580890280632; Wed, 05 Feb 2020 00:11:20 -0800 (PST) MIME-Version: 1.0 From: Paul Fox Date: Wed, 5 Feb 2020 08:11:09 +0000 Message-ID: Subject: Long line issue in sed 4.8 To: bug-sed@gnu.org Content-Type: multipart/alternative; boundary="0000000000009b23c0059dcfb391" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::12a X-Spam-Score: 2.3 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Seems there are bugs in sed handling long lines. You can reproduce by: 1. generate file > 2GB - must be a single line. 2. sed -e s/xxx/yyy/ List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --0000000000009b23c0059dcfb391 Content-Type: text/plain; charset="UTF-8" Seems there are bugs in sed handling long lines. You can reproduce by: 1. generate file > 2GB - must be a single line. 2. sed -e s/xxx/yyy/ Seems there are bugs in sed handling long lines. You can r= eproduce by:

1. generate file > 2GB - must be a singl= e line.
2. sed -e s/xxx/yyy/ <file

Th= e search expression doesn't matter whether it matches or not. Using a f= ile filled with "The quick brown fox..." (no newlines), sed abort= s saying something like:

reg_exp: INT_MAX overflow=

Prior versions of sed incorrectly handled ck_getl= ine to read very long lines (int vs size_t). Whilst that was fixed, subsequ= ent code trips over with lengths being put into int's.

If anyone needs more info - let me know. I did run sed under gdb t= o diagnose, at my work area. But likely I can reproduce from this machine.<= /div>

(Not critical issue)

--0000000000009b23c0059dcfb391-- From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 05 14:15:19 2020 Received: (at 39432) by debbugs.gnu.org; 5 Feb 2020 19:15:20 +0000 Received: from localhost ([127.0.0.1]:46733 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1izQ8t-0003eS-GF for submit@debbugs.gnu.org; Wed, 05 Feb 2020 14:15:19 -0500 Received: from mail-pj1-f50.google.com ([209.85.216.50]:35036) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1izQ8r-0003e9-K8; Wed, 05 Feb 2020 14:15:18 -0500 Received: by mail-pj1-f50.google.com with SMTP id q39so1418452pjc.0; Wed, 05 Feb 2020 11:15:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=YJcEKv8p0AxlaZFklDQ1VLzrhF6kFXOuq+LNPjTfjlE=; b=J+7Mhq8FhIY6e6gQ8FD38Dm2gy+uw4yV8iB1kKXL1OzKwfDlDcFnf1uUAQWO1MxyJA UmASJ3VedUivYfMLGiSKT/x3G1tuY3CfBeUpCMdPy5t770J8oltXiz01kx5GN23MKX2/ tlVIOwd5xdgsIVV+SyxWb9xr3fANc9bcU6aAdSm3XSmCwCt6eUSuzizr0TFpf9BnO+XC 2JlUhs3/jNuq2efLNj4+Byye3tqs8HupXcGCaSdRmCpG11pu7QqSnJVE0+G+aEZRd5SG JtG7vDIW0I39f+joeGeDEaQB11ADqmDpQdus7h2o7xKN3fHUtLSIeR1jzIxtVwpeYQXT Jy6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=YJcEKv8p0AxlaZFklDQ1VLzrhF6kFXOuq+LNPjTfjlE=; b=YAfiUdahMVSI5ZP5Br/Il9s5VEGECLnrnnjI5Ov/+vRnITOeaxU4orKMh2rE+K6Be/ Y3LrpXsDJxkrA4Kluq8188Kh3D43g4C1/OJlJv+w64ZItg3Er9wiW16REjFJmwjJPah+ sdj4VqwkVVggKZHcSFrB2kasTSdovTcrWjgCSn1U5cTEzd9OH/kMkPA2S1aQpgE3pbIx UCxbr084JbeG1zBTtenOr1VRUN7BxLdTu06YdEQ2O5lCHiNugpBZhULY+SzIgTSicACY I8K8v9KkFm6QKCZq6IJ0tEowWD1c9WWXgmkjBsXxW8rhYPXRB34hJurARupc51amNf9i kh2Q== X-Gm-Message-State: APjAAAU7q0nkBLEp6B/HJPtaqxtYAOorPTCiDXXkn8oMGZwsHbvQ3N5u klTABoyCj5m+FdMEi2DJlkxnXB5i X-Google-Smtp-Source: APXvYqzsVYjKeNq7uAsh1R7V/datJQF1KbIlxcIkgoZF3437QuivljnqH7aveiireczuNWi/nGKLVw== X-Received: by 2002:a17:902:ac8f:: with SMTP id h15mr37533033plr.44.1580930111321; Wed, 05 Feb 2020 11:15:11 -0800 (PST) Received: from tomato.moose.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id f3sm254672pfg.115.2020.02.05.11.15.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 05 Feb 2020 11:15:10 -0800 (PST) Subject: Re: bug#39432: Long line issue in sed 4.8 To: Paul Fox , 39432@debbugs.gnu.org References: From: Assaf Gordon Message-ID: Date: Wed, 5 Feb 2020 12:15:08 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 39432 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tag 39432 notabug close 39432 stop Hello, On 2020-02-05 1:11 a.m., Paul Fox wrote: > Seems there are bugs in sed handling long lines. You can reproduce by: > > 1. generate file > 2GB - must be a single line. > 2. sed -e s/xxx/yyy/ reg_exp: INT_MAX overflow This is indeed the intended behavior, as the regular-expression module can't handle strings larger than 2GB. This was reported in https://bugs.gnu.org/30520 and the error was added in https://git.savannah.gnu.org/cgit/sed.git/commit/?id=5433dc245b222f6c98ab1436e170fd5e3e6e3907 If in the future gnulib's regex module is improved to handle large buffers, we can revisit this issue and remove the message. As such I'm closing this as "not a bug", but discussion can continue by replying to this thread. regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 05 17:19:45 2020 Received: (at 39432) by debbugs.gnu.org; 5 Feb 2020 22:19:45 +0000 Received: from localhost ([127.0.0.1]:46905 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1izT1M-0001ZN-GE for submit@debbugs.gnu.org; Wed, 05 Feb 2020 17:19:44 -0500 Received: from mail-lf1-f41.google.com ([209.85.167.41]:37946) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1izT0k-0001Y6-FM for 39432@debbugs.gnu.org; Wed, 05 Feb 2020 17:19:07 -0500 Received: by mail-lf1-f41.google.com with SMTP id r14so2662636lfm.5 for <39432@debbugs.gnu.org>; Wed, 05 Feb 2020 14:19:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SdIyMvw16Jg/aqp8zYPQ0X0w5y3iRTFQDg7D4yW44XI=; b=d91IGMvJwG2KRktysAyA+9NF80a/xCUV7WyWkKud4olg4SboV+u939dlV7nlffTjyc cTuRigjAFY8+PHTYchRYrTvoEbWxdcmPJaBee/CrHH5Qo7RYMS0tddsG2GknunzwP4Th yda74C6X0bk4OylOjnZhoHIpdLU0lQvmJZo7KZBApihsM6dANqMrftZ9hwGp5zJKCJhj rm4gTG/VQnq6Q2tOJuN9oXz+KTL0CRqN2X2AGMpGCg0r9kgl1C8/kphwGawQr6IAoS/u rGgD1N/bXQ7BOnlZNLpuOY6/2rloEYYBOGDKjaXU2blcooTpj2Qg173SXuiXGzd1JSOC SFWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SdIyMvw16Jg/aqp8zYPQ0X0w5y3iRTFQDg7D4yW44XI=; b=C00Z86kKS9hS68r99jUbg/lUmCA4ab/RGOwvR5qw9568ZpNSiyt3lpQBsJ0rpu3JY9 ywmVmBy2fA68UxSH7UNqCFADd3Oz322ncsGw6sbZ1QcU/FpcOzuyazSPZd3l7teFfCSm DsDezoLS03t6W9ZXTLn+24bYgBqkF9jUCEio+GT9Iic2WWQ9Gai2im8GFGeh+NTQBAoN Q5eUBXxA1MZek3Vtdlc0y6GERGHwTuafQ/fs/CCH1ywj5mgz8F8QVFkOYEMkLjzzpI7J EfTXc1eqrbTyXfI0PJ2Z8Hg02EN+e/1GO3K8R+nnxZNTqmuerST2QHL2QKwvyKTavVQK Gxnw== X-Gm-Message-State: APjAAAUCQrFdux9LeF7lcTPh+LDnrartcZADSLl7zPpGsaA6k0eXBbk0 bne2SKM9BIA9JrW4sT4A7+sWgrp8ZPX5E5yAFCk= X-Google-Smtp-Source: APXvYqwYkFLHVwzTs8MIWXXfC32VFzZtehuCPvH5/1nQ12ubS+xOPqYJArSKXySCuwOmHYDcEgiBE+oRQ1VRg6t6Jhs= X-Received: by 2002:a19:4208:: with SMTP id p8mr19317056lfa.160.1580941140398; Wed, 05 Feb 2020 14:19:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Paul Fox Date: Wed, 5 Feb 2020 22:18:49 +0000 Message-ID: Subject: Re: bug#39432: Long line issue in sed 4.8 To: Assaf Gordon Content-Type: multipart/alternative; boundary="000000000000160e7c059ddb8bdf" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 39432 X-Mailman-Approved-At: Wed, 05 Feb 2020 17:19:43 -0500 Cc: 39432@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000160e7c059ddb8bdf Content-Type: text/plain; charset="UTF-8" hello Assaf thank you for taking the time to respond to this. I am wondering if it "really is a bug". I appreciate the regexp package may have limitations. I havent examined in detail how it "compiles" the regexp to byte code. Older regexp patterns will limit themselves to typical "int" sizes, so that very complex regexps cannot be of arbitrary complexity. However, the issue here is the item to search is a "long line" (>2GB). Whilst maybe regexp itself will have issues keeping track of any grouping patterns and backtrack, it "should ideally just work". I am more than familiar with the pains of 16/32/64 bit ness - so please dont assume I am being naive. And happy to take your word for it as maintainer. However, one thing that is fairly disappointing in sed, is the unhelpful INT_MAX panic. At least the message should say something like: * .... is larger than INT_MAX (%lu) <= insert the value I had to disassemble the code, to pick out the 2^32-2 value that was being used. And even better still "what" is exceeding the byte length. The code says or implies the regexp is too complex, but its the search target which is too long (ie "line is too long, sorry, we cant handle that just now. Please try later!"). I was fairly amazed at the bug, having lived with gnu sed since probably the 0.01 days. And its a shame to not be a little more intuitive to an end user. (I didnt hit the bug - someone at my org said "this didnt work", so I was curious how/why/where). many thanks and really appreciate your efforts on this. On Wed, 5 Feb 2020 at 19:15, Assaf Gordon wrote: > tag 39432 notabug > close 39432 > stop > > Hello, > > On 2020-02-05 1:11 a.m., Paul Fox wrote: > > Seems there are bugs in sed handling long lines. You can reproduce by: > > > > 1. generate file > 2GB - must be a single line. > > 2. sed -e s/xxx/yyy/ [...] > > reg_exp: INT_MAX overflow > > This is indeed the intended behavior, as the regular-expression module > can't handle strings larger than 2GB. > > This was reported in https://bugs.gnu.org/30520 > and the error was added in > > https://git.savannah.gnu.org/cgit/sed.git/commit/?id=5433dc245b222f6c98ab1436e170fd5e3e6e3907 > > If in the future gnulib's regex module is improved > to handle large buffers, we can revisit this issue and > remove the message. > > As such I'm closing this as "not a bug", but discussion > can continue by replying to this thread. > > regards, > - assaf > > > > > --000000000000160e7c059ddb8bdf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
hello Assaf

thank you for ta= king the time to respond to this. I am wondering if it "really is a bu= g". I appreciate the regexp package may have limitations. I havent exa= mined in detail how it "compiles" the regexp to byte code. Older = regexp patterns will limit themselves to typical "int" sizes, so = that very complex regexps cannot be of arbitrary complexity.

=
However, the issue here is the item to search is a "long li= ne" (>2GB). Whilst maybe regexp itself will have issues keeping tra= ck of any grouping patterns and backtrack, it "should ideally just wor= k".

I am more than familiar with the pains of= 16/32/64 bit ness - so please dont assume I am being naive. And happy to t= ake your word for it as maintainer.

However, one t= hing that is fairly disappointing in sed, is the unhelpful INT_MAX panic. A= t least the message should say something like:

=C2= =A0=C2=A0 * .... is larger than INT_MAX (%lu) <=3D insert the value

I had to disassemble the code, to pick out the 2^32-2 = value that was being used. And even better still "what" is exceed= ing the byte length. The code says or implies the regexp is too complex, bu= t its the search target which is too long (ie "line is too long, sorry= , we cant handle that just now. Please try later!").

I was fairly amazed at the bug, having lived with gnu sed since pro= bably the 0.01 days. And its a shame to not be a little more intuitive to a= n end user.

(I didnt hit the bug - someone at my o= rg said "this didnt work", so I was curious how/why/where).
=

many thanks and really appreciate your efforts on this.=


On Wed, 5 Feb 2020 at 19:15, Assaf Gordon <assafgordon@gmail.com> wrote:
<= /div>
tag 39432 notabug close 39432
stop

Hello,

On 2020-02-05 1:11 a.m., Paul Fox wrote:
> Seems there are bugs in sed handling long lines. You can reproduce by:=
>
> 1. generate file > 2GB - must be a single line.
> 2. sed -e s/xxx/yyy/ <file
[...]
> reg_exp: INT_MAX overflow

This is indeed the intended behavior, as the regular-expression module
can't handle strings larger than 2GB.

This was reported in https://bugs.gnu.org/30520
and the error was added in
https= ://git.savannah.gnu.org/cgit/sed.git/commit/?id=3D5433dc245b222f6c98ab1436e= 170fd5e3e6e3907

If in the future gnulib's regex module is improved
to handle large buffers, we can revisit this issue and
remove the message.

As such I'm closing this as "not a bug", but discussion
can continue by replying to this thread.

regards,
=C2=A0 - assaf




--000000000000160e7c059ddb8bdf-- From unknown Tue Aug 19 14:24:05 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 05 Mar 2020 12:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator