GNU bug report logs - #32228
sed -i does not buffer, causing excessive writes

Previous Next

Package: sed;

Reported by: Vidar Holen <vidar <at> vidarholen.net>

Date: Fri, 20 Jul 2018 20:25:02 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32228 in the body.
You can then email your comments to 32228 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#32228; Package sed. (Fri, 20 Jul 2018 20:25:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Vidar Holen <vidar <at> vidarholen.net>:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Fri, 20 Jul 2018 20:25:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Vidar Holen <vidar <at> vidarholen.net>
To: bug-sed <at> gnu.org
Subject: sed -i does not buffer, causing excessive writes
Date: Fri, 20 Jul 2018 13:23:35 -0700
Hi,

I'm using noticing that `sed -i` does not do any kind of output
buffering. While behavior is correct, this causes an excessive number of
small writes:

    $ strace -e write ./sed -i 's/foo/bar/g' file.txt
    write(4, "                    GNU GENERAL "..., 47) = 47
    write(4, "                       Version 3"..., 47) = 47
    write(4, "\n", 1)                       = 1

Compare this to redirection:

    $ strace -e write ./sed 's/foo/bar/g' file.txt > tmpfile
    write(1, "                    GNU GENERAL "..., 4096) = 4096
    write(1, "om or adapt all or part of the w"..., 4096) = 4096
    write(1, ".\n\n  You may make, run and propa"..., 4096) = 4096

On my system, this makes `sed -i` take 7x longer than redirection+mv:
3.7s vs 0.5s for 100MB, for 675 vs 10 writes on the same input.

I'm seeing this on Debian with the latest sed commit (c52a676) and libc
2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
unlike `fopen` does not buffer by default.

Enabling buffering should be simple and effective, resulting in a nice
speedup.

Regards,
Vidar Holen




Information forwarded to bug-sed <at> gnu.org:
bug#32228; Package sed. (Sat, 21 Jul 2018 01:49:02 GMT) Full text and rfc822 format available.

Message #8 received at 32228 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Vidar Holen <vidar <at> vidarholen.net>, 32228 <at> debbugs.gnu.org
Subject: Re: bug#32228: sed -i does not buffer, causing excessive writes
Date: Fri, 20 Jul 2018 19:48:28 -0600
[Message part 1 (text/plain, inline)]
Hello,

On 20/07/18 02:23 PM, Vidar Holen wrote:
> I'm using noticing that `sed -i` does not do any kind of output
> buffering. While behavior is correct, this causes an excessive number of
> small writes:

> I'm seeing this on Debian with the latest sed commit (c52a676) and libc
> 2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
> unlike `fopen` does not buffer by default.

Thank you for reporting this issue, and providing clear way to reproduce
it. I can confirm seeing the same behavior on Debian 9.4 with glibc 2.24 
(and latest sed).

I think the technical culprit is slightly different:
It is not due to 'ck_mkstemp/fdopen', but because sed explicitly flushes
the output after every line, except if the output is STDOUT.

This happens in execute.c:flush_output():
https://opengrok.housegordon.com/source/xref/sed/sed/execute.c#flush_output

Changing this function enables default buffering with "sed -i"
(using whatever the buffering default is for glibc).

This change does have a side-effect:
It also changes the buffering of other write commands such as 'w', 'W'
and 's///w'.
It might have other side effects I'm haven't spotted.
Based on the ChangeLog-2014 file, I see this function was added in 2003.

I wonder if there are good reasons to explicitly flush all output -
ideas anyone?

If you can give this a quick test and let me know if you also notice
improvements on your system - would be appreciated.


Comments and suggestions welcomed,
 - assaf




[0001-sed-do-not-flush-output-stream-unless-in-unbuffered-.patch (text/x-patch, attachment)]

Information forwarded to bug-sed <at> gnu.org:
bug#32228; Package sed. (Sat, 21 Jul 2018 02:28:02 GMT) Full text and rfc822 format available.

Message #11 received at 32228 <at> debbugs.gnu.org (full text, mbox):

From: Vidar Holen <vidar <at> vidarholen.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 32228 <at> debbugs.gnu.org
Subject: Re: bug#32228: sed -i does not buffer, causing excessive writes
Date: Fri, 20 Jul 2018 19:26:42 -0700
[Message part 1 (text/plain, inline)]
Thanks for looking into it. I can confirm that this patch significantly
improves the performance of `sed -i` for me.

On Fri, Jul 20, 2018 at 6:48 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:

> Hello,
>
> On 20/07/18 02:23 PM, Vidar Holen wrote:
>
>> I'm using noticing that `sed -i` does not do any kind of output
>> buffering. While behavior is correct, this causes an excessive number of
>> small writes:
>>
>
> I'm seeing this on Debian with the latest sed commit (c52a676) and libc
>> 2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
>> unlike `fopen` does not buffer by default.
>>
>
> Thank you for reporting this issue, and providing clear way to reproduce
> it. I can confirm seeing the same behavior on Debian 9.4 with glibc 2.24
> (and latest sed).
>
> I think the technical culprit is slightly different:
> It is not due to 'ck_mkstemp/fdopen', but because sed explicitly flushes
> the output after every line, except if the output is STDOUT.
>
> This happens in execute.c:flush_output():
> https://opengrok.housegordon.com/source/xref/sed/sed/execute
> .c#flush_output
>
> Changing this function enables default buffering with "sed -i"
> (using whatever the buffering default is for glibc).
>
> This change does have a side-effect:
> It also changes the buffering of other write commands such as 'w', 'W'
> and 's///w'.
> It might have other side effects I'm haven't spotted.
> Based on the ChangeLog-2014 file, I see this function was added in 2003.
>
> I wonder if there are good reasons to explicitly flush all output -
> ideas anyone?
>
> If you can give this a quick test and let me know if you also notice
> improvements on your system - would be appreciated.
>
>
> Comments and suggestions welcomed,
>  - assaf
>
>
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-sed <at> gnu.org:
bug#32228; Package sed. (Thu, 02 Aug 2018 14:42:02 GMT) Full text and rfc822 format available.

Message #14 received at 32228 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 32228 <at> debbugs.gnu.org, Vidar Holen <vidar <at> vidarholen.net>
Subject: Re: bug#32228: sed -i does not buffer, causing excessive writes
Date: Thu, 2 Aug 2018 16:40:38 +0200
On Sat, Jul 21, 2018 at 3:48 AM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> Hello,
>
> On 20/07/18 02:23 PM, Vidar Holen wrote:
>>
>> I'm using noticing that `sed -i` does not do any kind of output
>> buffering. While behavior is correct, this causes an excessive number of
>> small writes:
>
>
>> I'm seeing this on Debian with the latest sed commit (c52a676) and libc
>> 2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
>> unlike `fopen` does not buffer by default.
>
>
> Thank you for reporting this issue, and providing clear way to reproduce
> it. I can confirm seeing the same behavior on Debian 9.4 with glibc 2.24
> (and latest sed).
>
> I think the technical culprit is slightly different:
> It is not due to 'ck_mkstemp/fdopen', but because sed explicitly flushes
> the output after every line, except if the output is STDOUT.
>
> This happens in execute.c:flush_output():
> https://opengrok.housegordon.com/source/xref/sed/sed/execute.c#flush_output
>
> Changing this function enables default buffering with "sed -i"
> (using whatever the buffering default is for glibc).
>
> This change does have a side-effect:
> It also changes the buffering of other write commands such as 'w', 'W'
> and 's///w'.
> It might have other side effects I'm haven't spotted.
> Based on the ChangeLog-2014 file, I see this function was added in 2003.
>
> I wonder if there are good reasons to explicitly flush all output -
> ideas anyone?

Thank you, Assaf. This change looks fine.
I too tried to determine why that code was written that way to no
avail. I have no idea what the motivation might have been.




Information forwarded to bug-sed <at> gnu.org:
bug#32228; Package sed. (Sat, 04 Aug 2018 01:19:02 GMT) Full text and rfc822 format available.

Message #17 received at 32228 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 32228 <at> debbugs.gnu.org, Vidar Holen <vidar <at> vidarholen.net>
Subject: Re: bug#32228: sed -i does not buffer, causing excessive writes
Date: Fri, 3 Aug 2018 19:18:33 -0600
tags 32228 fixed
close 32228
stop


On 02/08/18 08:40 AM, Jim Meyering wrote:
> On Sat, Jul 21, 2018 at 3:48 AM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
>>
>> On 20/07/18 02:23 PM, Vidar Holen wrote:
>>>
>>> I'm using noticing that `sed -i` does not do any kind of output
>>> buffering.
>>
>> I wonder if there are good reasons to explicitly flush all output -
>> ideas anyone?
> 
> Thank you, Assaf. This change looks fine.
> I too tried to determine why that code was written that way to no
> avail. I have no idea what the motivation might have been.

Thanks for the review.

Pushed here:
https://git.savannah.gnu.org/cgit/sed.git/commit/?id=0144eeeb

regards,
 - assaf




Added tag(s) fixed. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 04 Aug 2018 01:19:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 32228 <at> debbugs.gnu.org and Vidar Holen <vidar <at> vidarholen.net> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sat, 04 Aug 2018 01:19:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 01 Sep 2018 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 295 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.