GNU bug report logs - #13135
Loss of data while copying

Date: Mon, 10 Dec 2012 15:00:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13135 in the body.
You can then email your comments to 13135 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Mon, 10 Dec 2012 15:00:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to xojoc <at> gmx.com:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 10 Dec 2012 15:00:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Cojocaru Alexandru <xojoc <at> gmx.com>
To: bug-coreutils <at> gnu.org
Subject: Loss of data while copying
Date: Mon, 10 Dec 2012 15:58:48 +0100

bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9924+76 records in
1951183+1 records out
*999005921* bytes (999 MB) copied, 21.6135 s, 46.2 MB/s

bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9887+113 records in
1950792+1 records out
*998805894* bytes (999 MB) copied, 21.3409 s, 46.8 MB/s

bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9890+110 records in
1950792+1 records out
*998805919* bytes (999 MB) copied, 21.5801 s, 46.3 MB/s

bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
9884+116 records in
1950011+1 records out
*998405915* bytes (998 MB) copied, 25.2695 s, 39.5 MB/s

WTF?!

Best regards,
Cojocaru Alexandru

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 10 Dec 2012 15:11:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Mon, 10 Dec 2012 15:11:02 GMT) Full text and rfc822 format available.

Notification sent to xojoc <at> gmx.com:
bug acknowledged by developer. (Mon, 10 Dec 2012 15:11:03 GMT) Full text and rfc822 format available.

Message #12 received at 13135-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: xojoc <at> gmx.com
Cc: 13135-done <at> debbugs.gnu.org
Subject: Re: bug#13135: Loss of data while copying
Date: Mon, 10 Dec 2012 08:09:44 -0700

[Message part 1 (text/plain, inline)]

tag 13135 notabug
thanks

On 12/10/2012 07:58 AM, Cojocaru Alexandru wrote:
> bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
> 9924+76 records in

Thanks for the report.  Based on this output, short reads occurred.  dd
transferred exactly 10000 reads as requested, but since some of those
were short, it transferred less than 10000*100001 bytes.  This is
expected (and the behavior is described in that way by POSIX); the
solution you are looking for is to _also_ use the iconv=fullblock
option, to force dd to re-read until it has a full input block rather
than immediately transferring short input reads to output.

As such, I'm closing this as not a bug.  Do feel free to add further
comments to this thread, though, if you have more questions about why dd
does this.

Oh, and by the way, 'echo -n' is not portable.  You want to use
printf(1) instead.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Mon, 10 Dec 2012 15:23:01 GMT) Full text and rfc822 format available.

Message #15 received at 13135 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 13135 <at> debbugs.gnu.org, xojoc <at> gmx.com
Subject: Re: bug#13135: Loss of data while copying
Date: Mon, 10 Dec 2012 15:21:29 +0000

On 12/10/2012 03:09 PM, Eric Blake wrote:
> tag 13135 notabug
> thanks
>
> On 12/10/2012 07:58 AM, Cojocaru Alexandru wrote:
>> bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
>> 9924+76 records in
>
> Thanks for the report.  Based on this output, short reads occurred.  dd
> transferred exactly 10000 reads as requested, but since some of those
> were short, it transferred less than 10000*100001 bytes.  This is
> expected (and the behavior is described in that way by POSIX); the
> solution you are looking for is to _also_ use the iconv=fullblock
> option, to force dd to re-read until it has a full input block rather
> than immediately transferring short input reads to output.
>
> As such, I'm closing this as not a bug.  Do feel free to add further
> comments to this thread, though, if you have more questions about why dd
> does this.
>
> Oh, and by the way, 'echo -n' is not portable.  You want to use
> printf(1) instead.
>

Yes, because a count was specified,
dd will operate in its default awkward but POSIX specified mode
of counting each read() call, even if it returned less than specified.
This is especially noticeable with pipes:

  yes blah | dd of=/dev/null ibs=100001 count=10000

To avoid that you can use iflag=fullblock (not iconv as Eric mentioned above):

  yes blah | dd of=/dev/null ibs=100001 count=10000 iflag=fullblock

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Mon, 10 Dec 2012 17:27:01 GMT) Full text and rfc822 format available.

Message #18 received at submit <at> debbugs.gnu.org (full text, mbox):

From: John Reiser <jreiser <at> bitwagon.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#13135: Loss of data while copying
Date: Mon, 10 Dec 2012 09:06:09 -0800

On 12/10/2012 07:21 AM, Pádraig Brady wrote:

>> On 12/10/2012 07:58 AM, Cojocaru Alexandru wrote:
>>> bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000
>>> 9924+76 records in

The original poster should know better: the best bug reports include
not only what actually happened, but also the version number of the components,
what the original poster expected to happen, and an explicit identification
of the differences.  For instance:

   (I am running dd from coreutils-8.15.)
   The 'yes' will generate an infinite stream of lines, each containing
   one hundred thousand 'a' characters followed by a terminating newline.
   I expect that "ibs=100001" causes dd to read each entire line (including
   the newline) in one operation, and that "count=10000" causes dd to stop
   after copying ten thousand lines.  Thus the dd summary should say:
      10000 records in
      10000 records out
      1000010000 bytes (1000 MB) copied

> Yes, because a count was specified,
> dd will operate in its default awkward but POSIX specified mode
> of counting each read() call, even if it returned less than specified.
> This is especially noticeable with pipes:

So this bug report is really about the execrable documentation for 'dd'.
Despite similar complaints appearing yearly [or so],
the text of "info dd" does not contain the string "pipe".  SHAME ON COREUTILS.
Explaining the most common error, and how to avoid it, certainly does
belong in the documentation.  The purpose of documentation is to *FACILITATE*
the correct use of the tool, and not merely to erect the minimal legal defense
of the code.

--

Information forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Mon, 10 Dec 2012 18:04:02 GMT) Full text and rfc822 format available.

Message #21 received at 13135 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: John Reiser <jreiser <at> bitwagon.com>
Cc: 13135 <at> debbugs.gnu.org
Subject: Re: bug#13135: Loss of data while copying
Date: Mon, 10 Dec 2012 11:03:10 -0700

[Message part 1 (text/plain, inline)]

On 12/10/2012 10:06 AM, John Reiser wrote:
>> Yes, because a count was specified,
>> dd will operate in its default awkward but POSIX specified mode
>> of counting each read() call, even if it returned less than specified.
>> This is especially noticeable with pipes:
> 
> So this bug report is really about the execrable documentation for 'dd'.
> Despite similar complaints appearing yearly [or so],
> the text of "info dd" does not contain the string "pipe".  SHAME ON COREUTILS.
> Explaining the most common error, and how to avoid it, certainly does
> belong in the documentation.  The purpose of documentation is to *FACILITATE*
> the correct use of the tool, and not merely to erect the minimal legal defense
> of the code.

Rather than complaining, how about you submit a patch to improve the
documentation?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Mon, 10 Dec 2012 20:44:01 GMT) Full text and rfc822 format available.

Message #24 received at submit <at> debbugs.gnu.org (full text, mbox):

From: John Reiser <jreiser <at> bitwagon.com>
To: bug-coreutils <at> gnu.org
Cc: 13135 <at> debbugs.gnu.org
Subject: Re: bug#13135: Loss of data while copying
Date: Mon, 10 Dec 2012 12:43:29 -0800

On 12/10/2012 10:03 AM, Eric Blake wrote:
> On 12/10/2012 10:06 AM, John Reiser wrote:
>>> Yes, because a count was specified,
>>> dd will operate in its default awkward but POSIX specified mode
>>> of counting each read() call, even if it returned less than specified.
>>> This is especially noticeable with pipes:
>>
>> So this bug report is really about the execrable documentation for 'dd'.
>> Despite similar complaints appearing yearly [or so],
>> the text of "info dd" does not contain the string "pipe".  SHAME ON COREUTILS.
>> Explaining the most common error, and how to avoid it, certainly does
>> belong in the documentation.  The purpose of documentation is to *FACILITATE*
>> the correct use of the tool, and not merely to erect the minimal legal defense
>> of the code.
> 
> Rather than complaining, how about you submit a patch to improve the
> documentation?
> 

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 21400ad..c2282eb 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8055,6 +8055,7 @@ OS/360 JCL.
 @item if=@var{file}
 @opindex if
 Read from @var{file} instead of standard input.
+(If the input is a pipe then see @samp{fullblock} below.)

 @item of=@var{file}
 @opindex of
@@ -8397,6 +8398,9 @@ may return early if a full block is not available.
 When that happens, continue calling @code{read} to fill the remainder
 of the block.
 This flag can be used only with @code{iflag}.
+If the input is a pipe and argument @samp{count=} also is specified,
+then probably @samp{iflag=fullblock} should be used
+in order to prevent surprises caused by short reads.

 @item count_bytes
 @opindex count_bytes


--

Information forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Mon, 10 Dec 2012 20:44:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Sat, 15 Dec 2012 03:17:01 GMT) Full text and rfc822 format available.

Message #30 received at 13135 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: John Reiser <jreiser <at> bitwagon.com>
Cc: 13135 <at> debbugs.gnu.org
Subject: Re: bug#13135: Loss of data while copying
Date: Sat, 15 Dec 2012 03:15:29 +0000

On 12/10/2012 08:43 PM, John Reiser wrote:
> On 12/10/2012 10:03 AM, Eric Blake wrote:
>> On 12/10/2012 10:06 AM, John Reiser wrote:
>>>> Yes, because a count was specified,
>>>> dd will operate in its default awkward but POSIX specified mode
>>>> of counting each read() call, even if it returned less than specified.
>>>> This is especially noticeable with pipes:
>>>
>>> So this bug report is really about the execrable documentation for 'dd'.
>>> Despite similar complaints appearing yearly [or so],
>>> the text of "info dd" does not contain the string "pipe".  SHAME ON COREUTILS.
>>> Explaining the most common error, and how to avoid it, certainly does
>>> belong in the documentation.  The purpose of documentation is to *FACILITATE*
>>> the correct use of the tool, and not merely to erect the minimal legal defense
>>> of the code.

We've tried really hard to make this issue obvious.
Even going to the effort of auto prompting the user
to use iflag=fullblock.
The full discussion of the awkward auto suggestion logic
can be seen in http://bugs.gnu.org/7362

In more "normal" cases users will get the warning:

$ yes blah | src/dd of=/dev/null bs=100001 count=10000
dd: warning: partial read (53248 bytes); suggest iflag=fullblock

We didn't prompt in this case because it's
a bit of an edge case in that ibs is specified
rather than bs. So since there is write aggregation
in that case and to support use cases like the following,
we don't warn here:

$ (echo part1; sleep 1; echo part2; sleep 1; echo discard) |
  dd count=2 ibs=4096 obs=1 2>/dev/null

>> Rather than complaining, how about you submit a patch to improve the
>> documentation?
>>
>
> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
> index 21400ad..c2282eb 100644
> --- a/doc/coreutils.texi
> +++ b/doc/coreutils.texi
> @@ -8055,6 +8055,7 @@ OS/360 JCL.
>   @item if=@var{file}
>   @opindex if
>   Read from @var{file} instead of standard input.
> +(If the input is a pipe then see @samp{fullblock} below.)

I think I'll move the warning to count=
as it's mostly an issue when that is specified.

>
>   @item of=@var{file}
>   @opindex of
> @@ -8397,6 +8398,9 @@ may return early if a full block is not available.
>   When that happens, continue calling @code{read} to fill the remainder
>   of the block.
>   This flag can be used only with @code{iflag}.
> +If the input is a pipe and argument @samp{count=} also is specified,
> +then probably @samp{iflag=fullblock} should be used
> +in order to prevent surprises caused by short reads.

How about this instead?

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 5f8fad7..b916a86 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -8117,6 +8117,11 @@ Copy @var{n} @samp{ibs}-byte blocks from the input file,
 of everything until the end of the file.
 if @samp{iflag=count_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.
+Note if the input may return short reads as could be the case
+when reading from a pipe for example, @samp{iflag=fullblock}
+will ensure that @samp{count=} corresponds to complete input blocks
+rather than the traditional POSIX specified behavior of counting
+input read operations.

 @item status=@var{which}
 @opindex status
@@ -8397,6 +8402,10 @@ may return early if a full block is not available.
 When that happens, continue calling @code{read} to fill the remainder
 of the block.
 This flag can be used only with @code{iflag}.
+This flag is useful with pipes for example
+as they may return short reads. I that case,
+this flag is needed to ensure that a @samp{count=} argument is
+interpreted as a block count rather than a count of read operations.

 @item count_bytes
 @opindex count_bytes

Information forwarded to bug-coreutils <at> gnu.org:
bug#13135; Package coreutils. (Sat, 15 Dec 2012 04:18:02 GMT) Full text and rfc822 format available.

Message #33 received at 13135 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: John Reiser <jreiser <at> bitwagon.com>, 13135 <at> debbugs.gnu.org
Subject: Re: bug#13135: Loss of data while copying
Date: Sat, 15 Dec 2012 05:16:09 +0100

Pádraig Brady wrote:
...
> I think I'll move the warning to count=
> as it's mostly an issue when that is specified.

Good idea.

>>   @item of=@var{file}
>>   @opindex of
>> @@ -8397,6 +8398,9 @@ may return early if a full block is not available.
>>   When that happens, continue calling @code{read} to fill the remainder
>>   of the block.
>>   This flag can be used only with @code{iflag}.
>> +If the input is a pipe and argument @samp{count=} also is specified,
>> +then probably @samp{iflag=fullblock} should be used
>> +in order to prevent surprises caused by short reads.
>
> How about this instead?

Looks good.  Thanks.

> diff --git a/doc/coreutils.texi b/doc/coreutils.texi
...
> @@ -8397,6 +8402,10 @@ may return early if a full block is not available.
>  When that happens, continue calling @code{read} to fill the remainder
>  of the block.
>  This flag can be used only with @code{iflag}.
> +This flag is useful with pipes for example
> +as they may return short reads. I that case,

s/I/In/

...

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 12 Jan 2013 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 217 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #13135 Loss of data while copying

GNU bug report logs - #13135
Loss of data while copying