GNU bug report logs -
#13135
Loss of data while copying
Previous Next
Reported by: xojoc <at> gmx.com
Date: Mon, 10 Dec 2012 15:00:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13135 in the body.
You can then email your comments to 13135 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Mon, 10 Dec 2012 15:00:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
xojoc <at> gmx.com
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 10 Dec 2012 15:00:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9924+76 records in
1951183+1 records out
*999005921* bytes (999 MB) copied, 21.6135 s, 46.2 MB/s
bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9887+113 records in
1950792+1 records out
*998805894* bytes (999 MB) copied, 21.3409 s, 46.8 MB/s
bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9890+110 records in
1950792+1 records out
*998805919* bytes (999 MB) copied, 21.5801 s, 46.3 MB/s
bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9884+116 records in
1950011+1 records out
*998405915* bytes (998 MB) copied, 25.2695 s, 39.5 MB/s
WTF?!
Best regards,
Cojocaru Alexandru
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Mon, 10 Dec 2012 15:11:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Mon, 10 Dec 2012 15:11:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
xojoc <at> gmx.com
:
bug acknowledged by developer.
(Mon, 10 Dec 2012 15:11:03 GMT)
Full text and
rfc822 format available.
Message #12 received at 13135-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
tag 13135 notabug
thanks
On 12/10/2012 07:58 AM, Cojocaru Alexandru wrote:
> bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
> 9924+76 records in
Thanks for the report. Based on this output, short reads occurred. dd
transferred exactly 10000 reads as requested, but since some of those
were short, it transferred less than 10000*100001 bytes. This is
expected (and the behavior is described in that way by POSIX); the
solution you are looking for is to _also_ use the iconv=fullblock
option, to force dd to re-read until it has a full input block rather
than immediately transferring short input reads to output.
As such, I'm closing this as not a bug. Do feel free to add further
comments to this thread, though, if you have more questions about why dd
does this.
Oh, and by the way, 'echo -n' is not portable. You want to use
printf(1) instead.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Mon, 10 Dec 2012 15:23:01 GMT)
Full text and
rfc822 format available.
Message #15 received at 13135 <at> debbugs.gnu.org (full text, mbox):
On 12/10/2012 03:09 PM, Eric Blake wrote:
> tag 13135 notabug
> thanks
>
> On 12/10/2012 07:58 AM, Cojocaru Alexandru wrote:
>> bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
>> 9924+76 records in
>
> Thanks for the report. Based on this output, short reads occurred. dd
> transferred exactly 10000 reads as requested, but since some of those
> were short, it transferred less than 10000*100001 bytes. This is
> expected (and the behavior is described in that way by POSIX); the
> solution you are looking for is to _also_ use the iconv=fullblock
> option, to force dd to re-read until it has a full input block rather
> than immediately transferring short input reads to output.
>
> As such, I'm closing this as not a bug. Do feel free to add further
> comments to this thread, though, if you have more questions about why dd
> does this.
>
> Oh, and by the way, 'echo -n' is not portable. You want to use
> printf(1) instead.
>
Yes, because a count was specified,
dd will operate in its default awkward but POSIX specified mode
of counting each read() call, even if it returned less than specified.
This is especially noticeable with pipes:
yes blah | dd of=/dev/null ibs=100001 count=10000
To avoid that you can use iflag=fullblock (not iconv as Eric mentioned above):
yes blah | dd of=/dev/null ibs=100001 count=10000 iflag=fullblock
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Mon, 10 Dec 2012 17:27:01 GMT)
Full text and
rfc822 format available.
Message #18 received at submit <at> debbugs.gnu.org (full text, mbox):
On 12/10/2012 07:21 AM, Pádraig Brady wrote:
>> On 12/10/2012 07:58 AM, Cojocaru Alexandru wrote:
>>> bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
>>> 9924+76 records in
The original poster should know better: the best bug reports include
not only what actually happened, but also the version number of the components,
what the original poster expected to happen, and an explicit identification
of the differences. For instance:
(I am running dd from coreutils-8.15.)
The 'yes' will generate an infinite stream of lines, each containing
one hundred thousand 'a' characters followed by a terminating newline.
I expect that "ibs=100001" causes dd to read each entire line (including
the newline) in one operation, and that "count=10000" causes dd to stop
after copying ten thousand lines. Thus the dd summary should say:
10000 records in
10000 records out
1000010000 bytes (1000 MB) copied
> Yes, because a count was specified,
> dd will operate in its default awkward but POSIX specified mode
> of counting each read() call, even if it returned less than specified.
> This is especially noticeable with pipes:
So this bug report is really about the execrable documentation for 'dd'.
Despite similar complaints appearing yearly [or so],
the text of "info dd" does not contain the string "pipe". SHAME ON COREUTILS.
Explaining the most common error, and how to avoid it, certainly does
belong in the documentation. The purpose of documentation is to *FACILITATE*
the correct use of the tool, and not merely to erect the minimal legal defense
of the code.
--
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Mon, 10 Dec 2012 18:04:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 13135 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 12/10/2012 10:06 AM, John Reiser wrote:
>> Yes, because a count was specified,
>> dd will operate in its default awkward but POSIX specified mode
>> of counting each read() call, even if it returned less than specified.
>> This is especially noticeable with pipes:
>
> So this bug report is really about the execrable documentation for 'dd'.
> Despite similar complaints appearing yearly [or so],
> the text of "info dd" does not contain the string "pipe". SHAME ON COREUTILS.
> Explaining the most common error, and how to avoid it, certainly does
> belong in the documentation. The purpose of documentation is to *FACILITATE*
> the correct use of the tool, and not merely to erect the minimal legal defense
> of the code.
Rather than complaining, how about you submit a patch to improve the
documentation?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Mon, 10 Dec 2012 20:44:01 GMT)
Full text and
rfc822 format available.
Message #24 received at submit <at> debbugs.gnu.org (full text, mbox):
On 12/10/2012 10:03 AM, Eric Blake wrote:
> On 12/10/2012 10:06 AM, John Reiser wrote:
>>> Yes, because a count was specified,
>>> dd will operate in its default awkward but POSIX specified mode
>>> of counting each read() call, even if it returned less than specified.
>>> This is especially noticeable with pipes:
>>
>> So this bug report is really about the execrable documentation for 'dd'.
>> Despite similar complaints appearing yearly [or so],
>> the text of "info dd" does not contain the string "pipe". SHAME ON COREUTILS.
>> Explaining the most common error, and how to avoid it, certainly does
>> belong in the documentation. The purpose of documentation is to *FACILITATE*
>> the correct use of the tool, and not merely to erect the minimal legal defense
>> of the code.
>
> Rather than complaining, how about you submit a patch to improve the
> documentation?
>
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 21400ad..c2282eb 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8055,6 +8055,7 @@ OS/360 JCL.
@item if=@var{file}
@opindex if
Read from @var{file} instead of standard input.
+(If the input is a pipe then see @samp{fullblock} below.)
@item of=@var{file}
@opindex of
@@ -8397,6 +8398,9 @@ may return early if a full block is not available.
When that happens, continue calling @code{read} to fill the remainder
of the block.
This flag can be used only with @code{iflag}.
+If the input is a pipe and argument @samp{count=} also is specified,
+then probably @samp{iflag=fullblock} should be used
+in order to prevent surprises caused by short reads.
@item count_bytes
@opindex count_bytes
--
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Mon, 10 Dec 2012 20:44:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Sat, 15 Dec 2012 03:17:01 GMT)
Full text and
rfc822 format available.
Message #30 received at 13135 <at> debbugs.gnu.org (full text, mbox):
On 12/10/2012 08:43 PM, John Reiser wrote:
> On 12/10/2012 10:03 AM, Eric Blake wrote:
>> On 12/10/2012 10:06 AM, John Reiser wrote:
>>>> Yes, because a count was specified,
>>>> dd will operate in its default awkward but POSIX specified mode
>>>> of counting each read() call, even if it returned less than specified.
>>>> This is especially noticeable with pipes:
>>>
>>> So this bug report is really about the execrable documentation for 'dd'.
>>> Despite similar complaints appearing yearly [or so],
>>> the text of "info dd" does not contain the string "pipe". SHAME ON COREUTILS.
>>> Explaining the most common error, and how to avoid it, certainly does
>>> belong in the documentation. The purpose of documentation is to *FACILITATE*
>>> the correct use of the tool, and not merely to erect the minimal legal defense
>>> of the code.
We've tried really hard to make this issue obvious.
Even going to the effort of auto prompting the user
to use iflag=fullblock.
The full discussion of the awkward auto suggestion logic
can be seen in http://bugs.gnu.org/7362
In more "normal" cases users will get the warning:
$ yes blah | src/dd of=/dev/null bs=100001 count=10000
dd: warning: partial read (53248 bytes); suggest iflag=fullblock
We didn't prompt in this case because it's
a bit of an edge case in that ibs is specified
rather than bs. So since there is write aggregation
in that case and to support use cases like the following,
we don't warn here:
$ (echo part1; sleep 1; echo part2; sleep 1; echo discard) |
dd count=2 ibs=4096 obs=1 2>/dev/null
>> Rather than complaining, how about you submit a patch to improve the
>> documentation?
>>
>
> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
> index 21400ad..c2282eb 100644
> --- a/doc/coreutils.texi
> +++ b/doc/coreutils.texi
> @@ -8055,6 +8055,7 @@ OS/360 JCL.
> @item if=@var{file}
> @opindex if
> Read from @var{file} instead of standard input.
> +(If the input is a pipe then see @samp{fullblock} below.)
I think I'll move the warning to count=
as it's mostly an issue when that is specified.
>
> @item of=@var{file}
> @opindex of
> @@ -8397,6 +8398,9 @@ may return early if a full block is not available.
> When that happens, continue calling @code{read} to fill the remainder
> of the block.
> This flag can be used only with @code{iflag}.
> +If the input is a pipe and argument @samp{count=} also is specified,
> +then probably @samp{iflag=fullblock} should be used
> +in order to prevent surprises caused by short reads.
How about this instead?
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 5f8fad7..b916a86 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8117,6 +8117,11 @@ Copy @var{n} @samp{ibs}-byte blocks from the input file,
of everything until the end of the file.
if @samp{iflag=count_bytes} is specified, @var{n} is interpreted
as a byte count rather than a block count.
+Note if the input may return short reads as could be the case
+when reading from a pipe for example, @samp{iflag=fullblock}
+will ensure that @samp{count=} corresponds to complete input blocks
+rather than the traditional POSIX specified behavior of counting
+input read operations.
@item status=@var{which}
@opindex status
@@ -8397,6 +8402,10 @@ may return early if a full block is not available.
When that happens, continue calling @code{read} to fill the remainder
of the block.
This flag can be used only with @code{iflag}.
+This flag is useful with pipes for example
+as they may return short reads. I that case,
+this flag is needed to ensure that a @samp{count=} argument is
+interpreted as a block count rather than a count of read operations.
@item count_bytes
@opindex count_bytes
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13135
; Package
coreutils
.
(Sat, 15 Dec 2012 04:18:02 GMT)
Full text and
rfc822 format available.
Message #33 received at 13135 <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady wrote:
...
> I think I'll move the warning to count=
> as it's mostly an issue when that is specified.
Good idea.
>> @item of=@var{file}
>> @opindex of
>> @@ -8397,6 +8398,9 @@ may return early if a full block is not available.
>> When that happens, continue calling @code{read} to fill the remainder
>> of the block.
>> This flag can be used only with @code{iflag}.
>> +If the input is a pipe and argument @samp{count=} also is specified,
>> +then probably @samp{iflag=fullblock} should be used
>> +in order to prevent surprises caused by short reads.
>
> How about this instead?
Looks good. Thanks.
> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
...
> @@ -8397,6 +8402,10 @@ may return early if a full block is not available.
> When that happens, continue calling @code{read} to fill the remainder
> of the block.
> This flag can be used only with @code{iflag}.
> +This flag is useful with pipes for example
> +as they may return short reads. I that case,
s/I/In/
...
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 12 Jan 2013 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 12 years and 160 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.