GNU bug report logs - #39432
Long line issue in sed 4.8

Previous Next

Package: sed;

Reported by: Paul Fox <paul.d.fox <at> gmail.com>

Date: Wed, 5 Feb 2020 08:44:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 39432 in the body.
You can then email your comments to 39432 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#39432; Package sed. (Wed, 05 Feb 2020 08:44:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Fox <paul.d.fox <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Wed, 05 Feb 2020 08:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Fox <paul.d.fox <at> gmail.com>
To: bug-sed <at> gnu.org
Subject: Long line issue in sed 4.8
Date: Wed, 5 Feb 2020 08:11:09 +0000
[Message part 1 (text/plain, inline)]
Seems there are bugs in sed handling long lines. You can reproduce by:

1. generate file > 2GB - must be a single line.
2. sed -e s/xxx/yyy/ <file

The search expression doesn't matter whether it matches or not. Using a
file filled with "The quick brown fox..." (no newlines), sed aborts saying
something like:

reg_exp: INT_MAX overflow

Prior versions of sed incorrectly handled ck_getline to read very long
lines (int vs size_t). Whilst that was fixed, subsequent code trips over
with lengths being put into int's.

If anyone needs more info - let me know. I did run sed under gdb to
diagnose, at my work area. But likely I can reproduce from this machine.

(Not critical issue)
[Message part 2 (text/html, inline)]

Information forwarded to bug-sed <at> gnu.org:
bug#39432; Package sed. (Wed, 05 Feb 2020 19:16:01 GMT) Full text and rfc822 format available.

Message #8 received at 39432 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Paul Fox <paul.d.fox <at> gmail.com>, 39432 <at> debbugs.gnu.org
Subject: Re: bug#39432: Long line issue in sed 4.8
Date: Wed, 5 Feb 2020 12:15:08 -0700
tag 39432 notabug
close 39432
stop

Hello,

On 2020-02-05 1:11 a.m., Paul Fox wrote:
> Seems there are bugs in sed handling long lines. You can reproduce by:
> 
> 1. generate file > 2GB - must be a single line.
> 2. sed -e s/xxx/yyy/ <file
[...]
> reg_exp: INT_MAX overflow

This is indeed the intended behavior, as the regular-expression module
can't handle strings larger than 2GB.

This was reported in https://bugs.gnu.org/30520
and the error was added in 
https://git.savannah.gnu.org/cgit/sed.git/commit/?id=5433dc245b222f6c98ab1436e170fd5e3e6e3907

If in the future gnulib's regex module is improved
to handle large buffers, we can revisit this issue and
remove the message.

As such I'm closing this as "not a bug", but discussion
can continue by replying to this thread.

regards,
 - assaf








Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 05 Feb 2020 19:16:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 39432 <at> debbugs.gnu.org and Paul Fox <paul.d.fox <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 05 Feb 2020 19:16:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-sed <at> gnu.org:
bug#39432; Package sed. (Wed, 05 Feb 2020 22:20:02 GMT) Full text and rfc822 format available.

Message #15 received at 39432 <at> debbugs.gnu.org (full text, mbox):

From: Paul Fox <paul.d.fox <at> gmail.com>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 39432 <at> debbugs.gnu.org
Subject: Re: bug#39432: Long line issue in sed 4.8
Date: Wed, 5 Feb 2020 22:18:49 +0000
[Message part 1 (text/plain, inline)]
hello Assaf

thank you for taking the time to respond to this. I am wondering if it
"really is a bug". I appreciate the regexp package may have limitations. I
havent examined in detail how it "compiles" the regexp to byte code. Older
regexp patterns will limit themselves to typical "int" sizes, so that very
complex regexps cannot be of arbitrary complexity.

However, the issue here is the item to search is a "long line" (>2GB).
Whilst maybe regexp itself will have issues keeping track of any grouping
patterns and backtrack, it "should ideally just work".

I am more than familiar with the pains of 16/32/64 bit ness - so please
dont assume I am being naive. And happy to take your word for it as
maintainer.

However, one thing that is fairly disappointing in sed, is the unhelpful
INT_MAX panic. At least the message should say something like:

   * .... is larger than INT_MAX (%lu) <= insert the value

I had to disassemble the code, to pick out the 2^32-2 value that was being
used. And even better still "what" is exceeding the byte length. The code
says or implies the regexp is too complex, but its the search target which
is too long (ie "line is too long, sorry, we cant handle that just now.
Please try later!").

I was fairly amazed at the bug, having lived with gnu sed since probably
the 0.01 days. And its a shame to not be a little more intuitive to an end
user.

(I didnt hit the bug - someone at my org said "this didnt work", so I was
curious how/why/where).

many thanks and really appreciate your efforts on this.


On Wed, 5 Feb 2020 at 19:15, Assaf Gordon <assafgordon <at> gmail.com> wrote:

> tag 39432 notabug
> close 39432
> stop
>
> Hello,
>
> On 2020-02-05 1:11 a.m., Paul Fox wrote:
> > Seems there are bugs in sed handling long lines. You can reproduce by:
> >
> > 1. generate file > 2GB - must be a single line.
> > 2. sed -e s/xxx/yyy/ <file
> [...]
> > reg_exp: INT_MAX overflow
>
> This is indeed the intended behavior, as the regular-expression module
> can't handle strings larger than 2GB.
>
> This was reported in https://bugs.gnu.org/30520
> and the error was added in
>
> https://git.savannah.gnu.org/cgit/sed.git/commit/?id=5433dc245b222f6c98ab1436e170fd5e3e6e3907
>
> If in the future gnulib's regex module is improved
> to handle large buffers, we can revisit this issue and
> remove the message.
>
> As such I'm closing this as "not a bug", but discussion
> can continue by replying to this thread.
>
> regards,
>   - assaf
>
>
>
>
>
[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 05 Mar 2020 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 168 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.