GNU bug report logs -
#32228
sed -i does not buffer, causing excessive writes
Previous Next
Reported by: Vidar Holen <vidar <at> vidarholen.net>
Date: Fri, 20 Jul 2018 20:25:02 UTC
Severity: normal
Tags: fixed
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32228 in the body.
You can then email your comments to 32228 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-sed <at> gnu.org
:
bug#32228
; Package
sed
.
(Fri, 20 Jul 2018 20:25:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Vidar Holen <vidar <at> vidarholen.net>
:
New bug report received and forwarded. Copy sent to
bug-sed <at> gnu.org
.
(Fri, 20 Jul 2018 20:25:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
I'm using noticing that `sed -i` does not do any kind of output
buffering. While behavior is correct, this causes an excessive number of
small writes:
$ strace -e write ./sed -i 's/foo/bar/g' file.txt
write(4, " GNU GENERAL "..., 47) = 47
write(4, " Version 3"..., 47) = 47
write(4, "\n", 1) = 1
Compare this to redirection:
$ strace -e write ./sed 's/foo/bar/g' file.txt > tmpfile
write(1, " GNU GENERAL "..., 4096) = 4096
write(1, "om or adapt all or part of the w"..., 4096) = 4096
write(1, ".\n\n You may make, run and propa"..., 4096) = 4096
On my system, this makes `sed -i` take 7x longer than redirection+mv:
3.7s vs 0.5s for 100MB, for 675 vs 10 writes on the same input.
I'm seeing this on Debian with the latest sed commit (c52a676) and libc
2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
unlike `fopen` does not buffer by default.
Enabling buffering should be simple and effective, resulting in a nice
speedup.
Regards,
Vidar Holen
Information forwarded
to
bug-sed <at> gnu.org
:
bug#32228
; Package
sed
.
(Sat, 21 Jul 2018 01:49:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 32228 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
On 20/07/18 02:23 PM, Vidar Holen wrote:
> I'm using noticing that `sed -i` does not do any kind of output
> buffering. While behavior is correct, this causes an excessive number of
> small writes:
> I'm seeing this on Debian with the latest sed commit (c52a676) and libc
> 2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
> unlike `fopen` does not buffer by default.
Thank you for reporting this issue, and providing clear way to reproduce
it. I can confirm seeing the same behavior on Debian 9.4 with glibc 2.24
(and latest sed).
I think the technical culprit is slightly different:
It is not due to 'ck_mkstemp/fdopen', but because sed explicitly flushes
the output after every line, except if the output is STDOUT.
This happens in execute.c:flush_output():
https://opengrok.housegordon.com/source/xref/sed/sed/execute.c#flush_output
Changing this function enables default buffering with "sed -i"
(using whatever the buffering default is for glibc).
This change does have a side-effect:
It also changes the buffering of other write commands such as 'w', 'W'
and 's///w'.
It might have other side effects I'm haven't spotted.
Based on the ChangeLog-2014 file, I see this function was added in 2003.
I wonder if there are good reasons to explicitly flush all output -
ideas anyone?
If you can give this a quick test and let me know if you also notice
improvements on your system - would be appreciated.
Comments and suggestions welcomed,
- assaf
[0001-sed-do-not-flush-output-stream-unless-in-unbuffered-.patch (text/x-patch, attachment)]
Information forwarded
to
bug-sed <at> gnu.org
:
bug#32228
; Package
sed
.
(Sat, 21 Jul 2018 02:28:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 32228 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks for looking into it. I can confirm that this patch significantly
improves the performance of `sed -i` for me.
On Fri, Jul 20, 2018 at 6:48 PM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> Hello,
>
> On 20/07/18 02:23 PM, Vidar Holen wrote:
>
>> I'm using noticing that `sed -i` does not do any kind of output
>> buffering. While behavior is correct, this causes an excessive number of
>> small writes:
>>
>
> I'm seeing this on Debian with the latest sed commit (c52a676) and libc
>> 2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
>> unlike `fopen` does not buffer by default.
>>
>
> Thank you for reporting this issue, and providing clear way to reproduce
> it. I can confirm seeing the same behavior on Debian 9.4 with glibc 2.24
> (and latest sed).
>
> I think the technical culprit is slightly different:
> It is not due to 'ck_mkstemp/fdopen', but because sed explicitly flushes
> the output after every line, except if the output is STDOUT.
>
> This happens in execute.c:flush_output():
> https://opengrok.housegordon.com/source/xref/sed/sed/execute
> .c#flush_output
>
> Changing this function enables default buffering with "sed -i"
> (using whatever the buffering default is for glibc).
>
> This change does have a side-effect:
> It also changes the buffering of other write commands such as 'w', 'W'
> and 's///w'.
> It might have other side effects I'm haven't spotted.
> Based on the ChangeLog-2014 file, I see this function was added in 2003.
>
> I wonder if there are good reasons to explicitly flush all output -
> ideas anyone?
>
> If you can give this a quick test and let me know if you also notice
> improvements on your system - would be appreciated.
>
>
> Comments and suggestions welcomed,
> - assaf
>
>
>
>
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-sed <at> gnu.org
:
bug#32228
; Package
sed
.
(Thu, 02 Aug 2018 14:42:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 32228 <at> debbugs.gnu.org (full text, mbox):
On Sat, Jul 21, 2018 at 3:48 AM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
> Hello,
>
> On 20/07/18 02:23 PM, Vidar Holen wrote:
>>
>> I'm using noticing that `sed -i` does not do any kind of output
>> buffering. While behavior is correct, this causes an excessive number of
>> small writes:
>
>
>> I'm seeing this on Debian with the latest sed commit (c52a676) and libc
>> 2.27-2. The root cause appears to be `ck_mkstemp` using `fdopen`, which
>> unlike `fopen` does not buffer by default.
>
>
> Thank you for reporting this issue, and providing clear way to reproduce
> it. I can confirm seeing the same behavior on Debian 9.4 with glibc 2.24
> (and latest sed).
>
> I think the technical culprit is slightly different:
> It is not due to 'ck_mkstemp/fdopen', but because sed explicitly flushes
> the output after every line, except if the output is STDOUT.
>
> This happens in execute.c:flush_output():
> https://opengrok.housegordon.com/source/xref/sed/sed/execute.c#flush_output
>
> Changing this function enables default buffering with "sed -i"
> (using whatever the buffering default is for glibc).
>
> This change does have a side-effect:
> It also changes the buffering of other write commands such as 'w', 'W'
> and 's///w'.
> It might have other side effects I'm haven't spotted.
> Based on the ChangeLog-2014 file, I see this function was added in 2003.
>
> I wonder if there are good reasons to explicitly flush all output -
> ideas anyone?
Thank you, Assaf. This change looks fine.
I too tried to determine why that code was written that way to no
avail. I have no idea what the motivation might have been.
Information forwarded
to
bug-sed <at> gnu.org
:
bug#32228
; Package
sed
.
(Sat, 04 Aug 2018 01:19:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 32228 <at> debbugs.gnu.org (full text, mbox):
tags 32228 fixed
close 32228
stop
On 02/08/18 08:40 AM, Jim Meyering wrote:
> On Sat, Jul 21, 2018 at 3:48 AM, Assaf Gordon <assafgordon <at> gmail.com> wrote:
>>
>> On 20/07/18 02:23 PM, Vidar Holen wrote:
>>>
>>> I'm using noticing that `sed -i` does not do any kind of output
>>> buffering.
>>
>> I wonder if there are good reasons to explicitly flush all output -
>> ideas anyone?
>
> Thank you, Assaf. This change looks fine.
> I too tried to determine why that code was written that way to no
> avail. I have no idea what the motivation might have been.
Thanks for the review.
Pushed here:
https://git.savannah.gnu.org/cgit/sed.git/commit/?id=0144eeeb
regards,
- assaf
Added tag(s) fixed.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Sat, 04 Aug 2018 01:19:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
32228 <at> debbugs.gnu.org and Vidar Holen <vidar <at> vidarholen.net>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Sat, 04 Aug 2018 01:19:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 01 Sep 2018 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 295 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.