GNU bug report logs - #8490
dd reads random number of records from pipes - named or otherwise - coreutils 8.9

Reported by: Jesse Gordon <jesseg <at> nikola.com>

Date: Wed, 13 Apr 2011 02:46:02 UTC

Severity: normal

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 8490 in the body.
You can then email your comments to 8490 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8490; Package coreutils. (Wed, 13 Apr 2011 02:46:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jesse Gordon <jesseg <at> nikola.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 13 Apr 2011 02:46:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jesse Gordon <jesseg <at> nikola.com>
To: bug-coreutils <at> gnu.org
Subject: dd reads random number of records from pipes - named or otherwise
	- coreutils 8.9
Date: Tue, 12 Apr 2011 14:02:12 -0700

I can't believe such an obvious bug would exist this long, but on 
the other hand the test is so simple I can't see where it's user 
error.

dd, when reading from stdin or from a named pipe sometimes (but 
not always) reads a random number of records a bit less then what 
it should.

I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and 
even  mkfifo pip; yes > pipe & dd if=pipe -- and all sometimes 
failed.
However, if=arealfile seems to always work perfectly.

To replicate:

yes|dd bs=1000 count=1000|wc -c
cat /dev/zero |dd bs=1000 count=1000 |wc -c

If it works perfectly the first time, just keep trying. For me, 
it randomly works and doesn't work. Some conditions are more 
likely to work, and others to fail.
For me, using the "yes" method above almost always reads the 
incorrect number of bytes, while  the /dev/zero method usually 
works correctly but occasionally reads the wrong number of bytes.

The problem exists on the following coronations:

Slackware 12.0.0 (2.6.21.5-smp) / coreutils 8.9
Slackware 12.0.0 (2.6.21.5-smp) / coreutils 6.9
Slackware 13.0.0 (2.6.29.6) / coreutils 7.9
Kubuntu lenny/sid (2.6.24-27-generic SMP) / coreutils 6.10

(I only mention the older versions of coreutils because it may be 
helpful. I'm only filing the bug report for 8.9!)

Thanks,

Jesse

For example:

root <at> stats:~# dd --version
dd (coreutils) 8.9
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin, David MacKenzie, and Stuart Kemp.
root <at> stats:~# yes|dd bs=1000 count=1000|wc -c
597+403 records in
597+403 records out
608232 bytes (608 kB) copied, 0.0263828 s, 23.1 MB/s
608232
root <at> stats:~# yes|dd bs=1000 count=1000|wc -c
885+115 records in
885+115 records out
887784 bytes (888 kB) copied, 0.054488 s, 16.3 MB/s
887784
root <at> stats:~# yes|dd bs=1000 count=1000|wc -c
696+304 records in
696+304 records out
705536 bytes (706 kB) copied, 0.0271789 s, 26.0 MB/s
705536
root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c
1000000
1000+0 records in
1000+0 records out
1000000 bytes (1.0 MB) copied, 0.00879434 s, 114 MB/s
root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c
972+28 records in
972+28 records out
993040 bytes (993 kB) copied, 0.00582009 s, 171 MB/s
993040
root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c
983+17 records in
983+17 records out
996040 bytes (996 kB) copied, 0.0102457 s, 97.2 MB/s
996040
root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c
1000000
1000+0 records in
1000+0 records out
1000000 bytes (1.0 MB) copied, 0.0181759 s, 55.0 MB/s
root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c
1000+0 records in
1000+0 records out
1000000 bytes (1.0 MB) copied, 0.010386 s, 96.3 MB/s
1000000
root <at> stats:~#

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Wed, 13 Apr 2011 14:08:02 GMT) Full text and rfc822 format available.

Notification sent to Jesse Gordon <jesseg <at> nikola.com>:
bug acknowledged by developer. (Wed, 13 Apr 2011 14:08:02 GMT) Full text and rfc822 format available.

Message #10 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Jesse Gordon <jesseg <at> nikola.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 08:07:39 -0600

[Message part 1 (text/plain, inline)]

On 04/12/2011 03:02 PM, Jesse Gordon wrote:
> I can't believe such an obvious bug would exist this long, but on the
> other hand the test is so simple I can't see where it's user error.

Thanks for the report.  And you are correct in surmising that it is user
error and not a bug in dd.

> dd, when reading from stdin or from a named pipe sometimes (but not
> always) reads a random number of records a bit less then what it should.

Rather, dd reads as many bytes as possible, but unless that is less than
PIPE_MAX, it is not guaranteed to be an atomic read.  In turn, if you
have asked dd to pad out partial reads into complete writes, then that
explains your problem.  Unfortunately, it is rather easy to do this
without realizing it; the POSIX wording on how dd behaves is rather
detailed.

> I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and even 
> mkfifo pip; yes > pipe & dd if=pipe -- and all sometimes failed.
> However, if=arealfile seems to always work perfectly.
> 
> To replicate:
> 
> yes|dd bs=1000 count=1000|wc -c

There's your problem.  bs=1000 is the key that tells dd to always write
1000 byte output blocks, even if the input block hit a short read, and
stop after 1000 reads.

Instead, try:

yes|dd ibs=1000 obs=1000 count=1000|wc -c

which tells dd to explicitly read in input blocks of 1000 bytes, even if
it requires multiple reads, prior to doing output blocks of 1000 bytes,
and stop after 1000 writes.

> 
> The problem exists on the following coronations:

It exists everywhere that dd complies with POSIX, even with non-GNU dd.
 Because POSIX requires the difference in behavior between bs=nnn vs.
ibs=nnn obs=nnn.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Message #11 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jesse Gordon <jesseg <at> nikola.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 15:09:35 +0100

On 12/04/11 22:02, Jesse Gordon wrote:
> I can't believe such an obvious bug would exist this long, but on the
> other hand the test is so simple I can't see where it's user error.
> 
> dd, when reading from stdin or from a named pipe sometimes (but not
> always) reads a random number of records a bit less then what it should.
> 
> I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and even 
> mkfifo pip; yes > pipe & dd if=pipe -- and all sometimes failed.
> However, if=arealfile seems to always work perfectly.
> 
> To replicate:
> 
> yes|dd bs=1000 count=1000|wc -c

With the about to be released coreutils, we now warn about this:

$ yes|dd bs=1000 count=1000|wc -c
dd: warning: partial read (960 bytes); suggest iflag=fullblock

We can't do that by default for backwards compat reasons.

cheers,
Pádraig.

p.s. I just noticed that this is not in NEWS.
I'll fix that now...

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8490; Package coreutils. (Wed, 13 Apr 2011 14:40:02 GMT) Full text and rfc822 format available.

Message #14 received at 8490 <at> debbugs.gnu.org (full text, mbox):

From: Bjartur Thorlacius <svartman95 <at> gmail.com>
To: Jesse Gordon <jesseg <at> nikola.com>
Cc: 8490 <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named or
	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 14:39:41 +0000

Have you looked into fullblock? If you only specify bs and  count (and
not ibs or obs) dd may fill the buffer partially. It'll do try to do
count copies, but each copy may contain less data than expected. This
sort of makes sense on HDDs or tapes with variable block sizes (where
a read would return a whole block, but the block would be smaller than
user specified bs). In this case dd will preserve the original block
size. I've never encountered such an odditie — Or maybe I have,
without noticing.

I think about noone that hasn't been involved in the development, in
one way or another, gets this wrong (I don't quite get it yet). I
think this should be changed, unless the user provids a hypothetical
partblock option.

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8490; Package coreutils. (Wed, 13 Apr 2011 15:03:01 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: John Reiser <jreiser <at> bitwagon.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 08:02:06 -0700

> dd, when reading from stdin or from a named pipe sometimes (but not always) reads a random number of records a bit less then what it should.

> yes|dd bs=1000 count=1000|wc -c
> cat /dev/zero |dd bs=1000 count=1000 |wc -c

dd does "read(fd,buf,bs)" for 'count' times, and writes whatever it gets each time.
If the operating system does not deliver 'bs' bytes each time, then the total output
will be less than bs*count bytes.  Because neither /usr/bin/yes nor /dev/zero
generates records with a blocksize of 1000 bytes (nor divisible by 1000 bytes)
then you are at the mercy of multiprocessing delays and pipe buffering.  read()
on a pipe waits only for non-empty, not necessarily for the size requested.

--

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8490; Package coreutils. (Wed, 13 Apr 2011 15:15:02 GMT) Full text and rfc822 format available.

Message #20 received at 8490 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Bjartur Thorlacius <svartman95 <at> gmail.com>
Cc: 8490 <at> debbugs.gnu.org, Jesse Gordon <jesseg <at> nikola.com>
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 09:13:59 -0600

[Message part 1 (text/plain, inline)]

On 04/13/2011 08:39 AM, Bjartur Thorlacius wrote:
> Have you looked into fullblock? If you only specify bs and  count (and
> not ibs or obs) dd may fill the buffer partially. It'll do try to do
> count copies, but each copy may contain less data than expected. This
> sort of makes sense on HDDs or tapes with variable block sizes (where
> a read would return a whole block, but the block would be smaller than
> user specified bs). In this case dd will preserve the original block
> size. I've never encountered such an odditie — Or maybe I have,
> without noticing.
> 
> I think about noone that hasn't been involved in the development, in
> one way or another, gets this wrong (I don't quite get it yet). I
> think this should be changed, unless the user provids a hypothetical
> partblock option.

It can't be changed without changing POSIX.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Message #21 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Jesse Gordon <jesseg <at> nikola.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 11:28:02 -0700

On 4/13/2011 7:07 AM, Eric Blake wrote:
> On 04/12/2011 03:02 PM, Jesse Gordon wrote:
>> I can't believe such an obvious bug would exist this long, but on the
>> other hand the test is so simple I can't see where it's user error.
> Thanks for the report.  And you are correct in surmising that it is user
> error and not a bug in dd.
>
>> dd, when reading from stdin or from a named pipe sometimes (but not
>> always) reads a random number of records a bit less then what it should.
> Rather, dd reads as many bytes as possible, but unless that is less than
> PIPE_MAX, it is not guaranteed to be an atomic read.  In turn, if you
> have asked dd to pad out partial reads into complete writes, then that
> explains your problem.  Unfortunately, it is rather easy to do this
> without realizing it; the POSIX wording on how dd behaves is rather
> detailed.
>
How have I asked dd to pad out partial read? I'm not specifying 
pad or sync or anything.
And why is reading from a pipe a partial read when there is 
neither EOF or error?

Behold:  ifp=stdin; readbytes=fread(buffer, 1, ibs, ifp);

That reads in ibs bytes quite nicely from a pipe. It waits for 
all the data to fill into buffer, and only bails for the 
legitimate reasons -- like EOF, or some real error.

If POSIX really requires dd to abort a read for any reason other 
then EOF or an error, then I'm dumbfounded. To me, it seems 
obvious that the rule should be "When asked to read from a pipe, 
don't quit till it's done or becomes impossible."

dd doesn't seem to abort early when reading hard drives, even if 
the block size isn't the same as the hard drive IO read size, or 
even if it has to wait a few ms for the drive to seek.

I guess it's a good thing that POSIX doesn't say "dd must perform 
a segfault every 10th run on a full moon" ha ha :=)

It sure looks like poorly implemented non-blocking IO to me.

However, iflags=fullblock seems to fix it.

>> I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and even
>> mkfifo pip; yes>  pipe&  dd if=pipe -- and all sometimes failed.
>> However, if=arealfile seems to always work perfectly.
>>
>> To replicate:
>>
>> yes|dd bs=1000 count=1000|wc -c
> There's your problem.  bs=1000 is the key that tells dd to always write
> 1000 byte output blocks, even if the input block hit a short read, and
> stop after 1000 reads.
>
> Instead, try:
>
> yes|dd ibs=1000 obs=1000 count=1000|wc -c
>
I tried this but it still stops early. I copied and pasted 
exactly your command:

root <at> stats:~# yes|dd ibs=1000 obs=1000 count=1000|wc -c
564+436 records in
574+1 records out
574464 bytes (574 kB) copied, 0.0180807 s, 31.8 MB/s
574464

> which tells dd to explicitly read in input blocks of 1000 bytes, even if
> it requires multiple reads, prior to doing output blocks of 1000 bytes,
> and stop after 1000 writes.
>
>> The problem exists on the following coronations:
> It exists everywhere that dd complies with POSIX, even with non-GNU dd.
>   Because POSIX requires the difference in behavior between bs=nnn vs.
> ibs=nnn obs=nnn.
>
I still cannot fathom why it would ever be acceptable to abort 
early when there's no error and no EOF and the pipe is still 
sending data. That just goes against all good programming common 
sense as far as my tiny brain cell can tell.

Is there actually EVER a real reason for dd to need to abort a 
read when there's no EOF and no error? Why would POSIX require this?

~~~~~

Okay, I poked around in the source code a bit. Now I'll tell you 
I'm not great programmer so don't be afraid to say "Jesse, here's 
where you're wrong..."

However, I added in some fprintf(stderr, statements in various 
places to help me understand where and why dd is bailing.
I put an fprintf() at each break inside of the while(1){} main 
loop, and it's breaking at the first one:

  while (1)
    {
      if ((r_partial + r_full) >= max_records)
        {
                static uintmax_t temp;
                temp=r_partial + r_full;
                fprintf(stderr,"break 001: ");
                fprintf(stderr,"r_partial=%d ",r_partial);
                fprintf(stderr,"r_full=%d ",r_full);
                fprintf(stderr,"max_records=%d ",max_records);
                fprintf(stderr,"temp=%d\n",temp);
                break;
        }

And when I run it, I get:

root <at> stats:/big/src/coreutils-8.9# yes|src/dd bs=1000 
count=1000|wc -c
Setting max_records=1000 due to count=1000 being specified.
break 001: r_partial=394 r_full=606 max_records=1000 temp=1000
606+394 records in
606+394 records out
618304 bytes (618 kB) copied, 0.0757696 s, 8.2 MB/s
618304

Okay, so that makes sense enough: If it's read count records 
(whole or partially) then it's done reading because it's 
satisfied count=.

So the question is why is it reading partial records?
r_partial gets incremented whenever iread() returns less then 
input_blocksize. So why should iread() return less then ibs from 
a stream when simply trying again should finish the read?

static ssize_t
iread (int fd, char *buf, size_t size)
{
  while (true)
    {
      ssize_t nread;
      process_signals ();
      nread = read (fd, buf, size);
      if (! (nread < 0 && errno == EINTR))
        return nread;
    }
}

As you can see, it is supposed to keep trying until it gets some 
data. It does block if there is no data to read yet. The problem 
is when it happens to retry when some but not all of the data has 
arrived at the pipe.

I added in more fprintf()s and in my case, read() is always 
setting errno to 29=ESPIPE (something about an error seeking on a 
pipe.)

In trying to understand the inverted logic of the return code, 
I'm re ordering it here:

      if(nread < 0 && errno == EINTR) //If read returned -1 
(ERROR) and that error is "Interrupted by Signal, then"
      {
          //keep trying
       }
        else //If we read 0 or more bytes, OR there was an error 
OTHER then EINTR (Interrupted by Signal) then
        {
            return nread;
        }

So why don't we want to return on EINTR? or am I all confused?
~~~~
Anyway, of course just below iread() is iread_fullblock() which 
it appears does exactly how it should for reading from pipes.

Maybe iread_fullblock() should always be used for reading from pipes.

Ohwell.

I really have a hard time believing that posix requries DD to 
abort a pipe read just because the data wasn't ready quick enough.

What use of dd would actually be broken if iread() didn't abort  
for ESPIPE when reading a pipe?
It should be as simple as making it to keep reading and appending 
of errno==ESPIPE and read() returned other then 0.

Can someone point me to where POSIX requires this current 
behavior of dd? or am I just hopelessly confused?
Is the requirement actually specific to reading from pipes? or is 
reading from pipes broken due to trying to satisfy some 
requirement relating to the reading of block devices or whatever?

But at least I have a work around -- iflags=fullblock.

Thank you all very much,

Jesse

Message #22 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Jesse Gordon <jesseg <at> nikola.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 13:12:35 -0600

[Message part 1 (text/plain, inline)]

On 04/13/2011 12:28 PM, Jesse Gordon wrote:
> 
> 
> On 4/13/2011 7:07 AM, Eric Blake wrote:
>> On 04/12/2011 03:02 PM, Jesse Gordon wrote:
>>> I can't believe such an obvious bug would exist this long, but on the
>>> other hand the test is so simple I can't see where it's user error.
>> Thanks for the report.  And you are correct in surmising that it is user
>> error and not a bug in dd.
>>
>>> dd, when reading from stdin or from a named pipe sometimes (but not
>>> always) reads a random number of records a bit less then what it should.
>> Rather, dd reads as many bytes as possible, but unless that is less than
>> PIPE_MAX, it is not guaranteed to be an atomic read.  In turn, if you
>> have asked dd to pad out partial reads into complete writes, then that
>> explains your problem.  Unfortunately, it is rather easy to do this
>> without realizing it; the POSIX wording on how dd behaves is rather
>> detailed.
>>
> How have I asked dd to pad out partial read? I'm not specifying pad or
> sync or anything.

Sorry, I assumed there was a conv=sync in the mix; without that, there
is no padding (a partial read becomes a complete write with no padding).

> And why is reading from a pipe a partial read when there is neither EOF
> or error?

Because the writer (yes) is getting ahead of the reader (dd), and is not
writing in the same block size as the reader.  For the sake of argument,
let's suppose that yes uses stdio, which buffers to 4096 bytes before it
calls write().  Then the kernel swaps over to dd, which does four reads
of 1000 bytes each, then another read() which only has 96 bytes
available immediately without swapping back to yes.  So the kernel gives
dd a short read.

> That reads in ibs bytes quite nicely from a pipe. It waits for all the
> data to fill into buffer, and only bails for the legitimate reasons --
> like EOF, or some real error.

Short reads are not an error, but are a real phenomenon when reading
from pipes.

> 
> If POSIX really requires dd to abort a read for any reason other then
> EOF or an error, then I'm dumbfounded. To me, it seems obvious that the
> rule should be "When asked to read from a pipe, don't quit till it's
> done or becomes impossible."

POSIX expects the following (and note carefully that bs=nnn is MUCH
different than ibs=nnn obs=nnn):

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html

If the bs= expr operand is specified and no conversions other than sync,
noerror, or notrunc are requested, the data returned from each input
block shall be written as a separate output block; if the read returns
less than a full block and the sync conversion is not specified, the
resulting output block shall be the same size as the input block. If the
bs= expr operand is not specified, or a conversion other than sync,
noerror, or notrunc is requested, the input shall be processed and
collected into full-sized output blocks until the end of the input is
reached.

> dd doesn't seem to abort early when reading hard drives, even if the
> block size isn't the same as the hard drive IO read size, or even if it
> has to wait a few ms for the drive to seek.

That's because hard drives, being physical devices, have all of their
data handy at once.  The kernel doesn't have to task swap over to
another process to get more bytes, but can proceed with the full read
request right up until EOF.

> However, iflags=fullblock seems to fix it.

That's one fix.  But it's GNU-specific.  If you want the POSIX-compliant
fix, then use ibs/obs instead of bs.

>>
> I still cannot fathom why it would ever be acceptable to abort early
> when there's no error and no EOF and the pipe is still sending data.

> Is there actually EVER a real reason for dd to need to abort a read when
> there's no EOF and no error? Why would POSIX require this?

It's called 40 years of history.  That was the original way dd was
written, back when the default medium was _not_ disks, but tapes, and
tapes had variable size blocks.  It made sense for the default back
then.  And changing it now _WILL_ break existing scripts that have come
to rely on the standardized behavior, even if the standardized behavior
makes no sense if dd were being developed from scratch today.

> So the question is why is it reading partial records?

Because that's the way pipes behave.

> I really have a hard time believing that posix requries DD to abort a
> pipe read just because the data wasn't ready quick enough.

It is NOT aborting a pipe read, it is doing exactly what you told it,
and writing as soon as read returns, even if read() had a short read
value, because you specified bs.

> Can someone point me to where POSIX requires this current behavior of
> dd?

I just did.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Message #23 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Jesse Gordon <jesseg <at> nikola.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Wed, 13 Apr 2011 16:36:35 -0700


On 4/13/2011 12:12 PM, Eric Blake wrote:
> On 04/13/2011 12:28 PM, Jesse Gordon wrote:
>>
>> On 4/13/2011 7:07 AM, Eric Blake wrote:
>>> On 04/12/2011 03:02 PM, Jesse Gordon wrote:
>>>> I can't believe such an obvious bug would exist this long, but on the
>>>> other hand the test is so simple I can't see where it's user error.
>>> Thanks for the report.  And you are correct in surmising that it is user
>>> error and not a bug in dd.
>>>
>>>> dd, when reading from stdin or from a named pipe sometimes (but not
>>>> always) reads a random number of records a bit less then what it should.
>>> Rather, dd reads as many bytes as possible, but unless that is less than
>>> PIPE_MAX, it is not guaranteed to be an atomic read.  In turn, if you
>>> have asked dd to pad out partial reads into complete writes, then that
>>> explains your problem.  Unfortunately, it is rather easy to do this
>>> without realizing it; the POSIX wording on how dd behaves is rather
>>> detailed.
>>>
>> How have I asked dd to pad out partial read? I'm not specifying pad or
>> sync or anything.
> Sorry, I assumed there was a conv=sync in the mix; without that, there
> is no padding (a partial read becomes a complete write with no padding).
>
>> And why is reading from a pipe a partial read when there is neither EOF
>> or error?
> Because the writer (yes) is getting ahead of the reader (dd), and is not
> writing in the same block size as the reader.  For the sake of argument,
> let's suppose that yes uses stdio, which buffers to 4096 bytes before it
> calls write().  Then the kernel swaps over to dd, which does four reads
> of 1000 bytes each, then another read() which only has 96 bytes
> available immediately without swapping back to yes.  So the kernel gives
> dd a short read.
>
>> That reads in ibs bytes quite nicely from a pipe. It waits for all the
>> data to fill into buffer, and only bails for the legitimate reasons --
>> like EOF, or some real error.
> Short reads are not an error, but are a real phenomenon when reading
> from pipes.
>
I agree - short reads from pipes are real. But I don't see why 
they should ever need to cause dd to skip data from the pipe.
(Mind you, I'm talking only about reading from pipes here!)

>> If POSIX really requires dd to abort a read for any reason other then
>> EOF or an error, then I'm dumbfounded. To me, it seems obvious that the
>> rule should be "When asked to read from a pipe, don't quit till it's
>> done or becomes impossible."
> POSIX expects the following (and note carefully that bs=nnn is MUCH
> different than ibs=nnn obs=nnn):
>
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html
>
> If the bs= expr operand is specified and no conversions other than sync,
> noerror, or notrunc are requested, the data returned from each input
> block shall be written as a separate output block; if the read returns
> less than a full block and the sync conversion is not specified, the
> resulting output block shall be the same size as the input block. If the
> bs= expr operand is not specified, or a conversion other than sync,
> noerror, or notrunc is requested, the input shall be processed and
> collected into full-sized output blocks until the end of the input is
> reached.
>
>> dd doesn't seem to abort early when reading hard drives, even if the
>> block size isn't the same as the hard drive IO read size, or even if it
>> has to wait a few ms for the drive to seek.
> That's because hard drives, being physical devices, have all of their
> data handy at once.
Is that really true? Sure they are block devices - but on the 
nanosecond scale, do they really have all of their data handy? In 
fact I was doing some tests the other day and I found that dd can 
read from /dev/zero about 20 times faster then it can /dev/hda 
--- so I know that dd has to wait for /dev/hda to get its data. 
Furthermore, my /dev/hda almost certainly has a block size other 
then 1000 (which is the block size I set for dd for my test)

So for example:

dd bs=1000 count=1000 of=/dev/null if=/dev/hda

And let's say my disk driver transfers 2048 bytes at a time: The 
first two reads will be fine, but the third read will only get 48 
bytes. *boom*

But, it doesn't go boom. What gives? Doesn't the kernel have to 
task swap to the disk driver so it can queue up some more bytes, 
leaving read() with a short read?

And even if my ibs was the same as my drive's driver, I just 
happen to know that dd can read waaaay faster then any of my 
drives can send data.
Obviously, dd has to wait for the data to become available.

>   The kernel doesn't have to task swap over to
> another process to get more bytes, but can proceed with the full read
> request right up until EOF.
>
>> However, iflags=fullblock seems to fix it.
> That's one fix.  But it's GNU-specific.  If you want the POSIX-compliant
> fix, then use ibs/obs instead of bs.
>
Setting instead ibs and obs does _NOT_  fix. Try it yourself! :-)

yes|dd ibs=1000 obs=1000 count=1000| wc -c
694+306 records in
703+1 records out
703488 bytes (703 kB) copied, 0.0220068 s, 32.0 MB/s
703488


>> I still cannot fathom why it would ever be acceptable to abort early
>> when there's no error and no EOF and the pipe is still sending data.
>> Is there actually EVER a real reason for dd to need to abort a read when
>> there's no EOF and no error? Why would POSIX require this?
> It's called 40 years of history.  That was the original way dd was
> written, back when the default medium was _not_ disks, but tapes, and
> tapes had variable size blocks.  It made sense for the default back
> then.

>   And changing it now _WILL_ break existing scripts that have come
> to rely on the standardized behavior,...

I can't think of a single scenario where any script would rely on 
dd dropping bytes from the input pipe.
Can anyone else?

I mean, seriously, it goes on reading some of the bytes and 
dropping others and then finishes up like everything's normal.
Remember, I'm talking about pipes here, nothing else.

> .... even if the standardized behavior
> makes no sense if dd were being developed from scratch today.
>

>> So the question is why is it reading partial records?
> Because that's the way pipes behave.
>
>> I really have a hard time believing that posix requries DD to abort a
>> pipe read just because the data wasn't ready quick enough.
> It is NOT aborting a pipe read, it is doing exactly what you told it,
> and writing as soon as read returns, even if read() had a short read
> value, because you specified bs.
>
>> Can someone point me to where POSIX requires this current behavior of
>> dd?
> I just did.
>
Thank you very much!

I now see the real problem: The POSIX document is not aware of 
pipes. It states that certain things should be certain ways and 
that if read() gets a short read, it should count it as a partial 
record and write it as such.

And the dd authors have just obediently followed the letter of 
the POSIX not realizing it's for a context that does not include 
pipes.

The problem is that while dd's behavior for short reads is 
perfect for  random access files and devices, it's lousy for pipes.

With a file or a disk device, a short read only happens for a 
reason - like EOF or data unreadable or driver timeout or device 
disconnect perhaps.

And maybe even the tape drivers give a short read as a way of 
signaling the end of a variable-length block. That really makes 
sense. That way you could tell it to read x number of blocks off 
of the tape, and it would read x number of blocks, ignoring the 
fact that they were different sizes.

But pipes do not have variable length blocks, and a short read 
does not signal anything to worry about or deal with: It just 
means it was the end of the block and another will be coming 
right after it.

So I think it is clear that the POSIX lack of comment on pipes 
combined with the absurdity of the current behavior with pipes is 
due to POSIX being followed to the letter without  the 
realization that it's being applied in a context for which it was 
not written.

The problem with this is that a short read on a pipe has an 
entirely different meaning then a short read on most other things 
-- and treating them the same is what causes this strangeness.

(And there may be a bug in DD since setting ibs and obs instead 
of bs still causes partial reads on a pipe.)

Do I seem confused? or would a normal thinking person arrive at 
the same place as I have? Am I missing something?

Thank you very much,

Jesse

Message #24 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Jesse Gordon <jesseg <at> nikola.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Thu, 14 Apr 2011 08:34:31 -0600

[Message part 1 (text/plain, inline)]

On 04/13/2011 05:36 PM, Jesse Gordon wrote:
>> Short reads are not an error, but are a real phenomenon when reading
>> from pipes.
>>
> I agree - short reads from pipes are real. But I don't see why they
> should ever need to cause dd to skip data from the pipe.

dd is not skipping data.  It is stopping after exactly count=1000 reads,
just like you asked; so it is ending shy of the amount of data you
thought you were getting.

>>> dd doesn't seem to abort early when reading hard drives, even if the
>>> block size isn't the same as the hard drive IO read size, or even if it
>>> has to wait a few ms for the drive to seek.
>> That's because hard drives, being physical devices, have all of their
>> data handy at once.
> Is that really true?

From the application's perspective - yes.  With pipes, the kernel has to
schedule another process to run, and that other process can take an
indefinite amount of time producing data.  Since the kernel has no
control over when other processes will actually produce that data, a
short read is the only viable answer that doesn't deadlock the system.
But with files, even if the kernel has to swap out in order to continue
reading from the disk, it has complete control over the data and does
not have to wait on any external processes, so the kernel has no
arbitrary waits - it may have a finite (and even long) wait while
spinning to the next sector for the next portion of data, but the wait
is independent of all other system activity, and therefore the kernel
can afford to avoid short reads when reading from devices since there is
no way to deadlock the system while getting the rest of the data.

>>> However, iflags=fullblock seems to fix it.
>> That's one fix.  But it's GNU-specific.  If you want the POSIX-compliant
>> fix, then use ibs/obs instead of bs.
>>
> Setting instead ibs and obs does _NOT_  fix. Try it yourself! :-)

I stand corrected.  On re-reading POSIX, I indeed concur that there is
no way using _just_ POSIX options to require that a particular amount of
input data be read, regardless of short reads; you _have_ to use the GNU
extension of iflags=fullblock.

> I can't think of a single scenario where any script would rely on dd
> dropping bytes from the input pipe.
> Can anyone else?

It doesn't matter what you think; the problem is that you can't change
existing behavior.  You can add new commands that give new behavior, but
there are 40 years worth of scripts that rely on existing behavior, so
even if _you_ can't think of someone that wants short reads from a pipe,
someone else may already be wanting it, and even relying on it, because
it is standardized that way.

Maybe the best thing to do is work on having POSIX standardize the GNU
extension of iflags=fullblock.

> 
> I now see the real problem: The POSIX document is not aware of pipes. It
> states that certain things should be certain ways and that if read()
> gets a short read, it should count it as a partial record and write it
> as such.
> 
> And the dd authors have just obediently followed the letter of the POSIX
> not realizing it's for a context that does not include pipes.

POSIX is very much aware of pipes.  POSIX was written long after dd was
written, and standardized existing practice.  It's not POSIX that got it
wrong, but the original dd implementors.  But they got it wrong so long
ago that people have come to rely on that behavior, and the only way to
get new behavior is to mandate a new option.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Message #25 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Jesse Gordon <jesseg <at> nikola.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Thu, 14 Apr 2011 13:53:20 -0700

On 4/14/2011 7:34 AM, Eric Blake wrote:
> On 04/13/2011 05:36 PM, Jesse Gordon wrote:
>
>> I can't think of a single scenario where any script would rely on dd
>> dropping bytes from the input pipe.
>> Can anyone else?
> It doesn't matter what you think; the problem is that you can't change
> existing behavior.

Is that really true? What if it was a buffer overrun? What if it 
was a bug where it segfaulted if a certain sequence of bytes was 
copied?
Of course those can be fixed...

What if it was a limitation to 2G bytes read?

My point is that the existing be

>   You can add new commands that give new behavior, but
> there are 40 years worth of scripts that rely on existing behavior,

I'll admit that as a scientist and engineer, it bothers me when 
an intelligent person makes a broad sweeping claim without even 
stopping to think weather they can cite a single example - 
hypothetical or real. Making claims are easy and prove nothing - 
and may not be of much benefit.

Can you even imagine a hypothetical situation where someone would 
depend on short reads from a pipe? Maybe in a random number 
generator?

I can cite lots of examples where some poor sysadmin wants to 
just get his job done and needs dd to read all data from the 
input pipe.... :-)

>   so
> even if _you_ can't think of someone that wants short reads from a pipe,
> someone else may already be wanting it, and even relying on it, because
> it is standardized that way.
>
But what if short pipe reads cause reading from pipes to be useless?
> Maybe the best thing to do is work on having POSIX standardize the GNU
> extension of iflags=fullblock.
>
Basically, because the users desired block size may not coincide 
with PIPE_MAX (or whatever,) there is never a guarantee that dd 
will always read the whole input data from a pipe (using POSIX 
options.)

But, look at POSIX: It says under INPUT FILES that "The input 
file can be any file type" -- if we take that to include blocks 
and pipes, then we can see that dd is not compliant because it 
cannot reliably read from a pipe.

Fact: Here's a prime example: Let's say I happen to want to read 
exactly 999983 blocks of 2099 bytes each from a pipe. With purely 
POSIX DD, it's impossible. Doesn't that violate POSIX?
Doesn't a program's ability to function as described in POSIX 
preempt technical details about short reads from read()?

And if POSIX really intends for pipe as input file to be not 
guaranteed usable, then it ought to mention something about that, 
but it doesn't.

Any thinking observer would, I believe, have to conclude that dd 
is supposed to work perfectly to read 999983 blocks of 2099 bytes 
each from a pipe.

So I asked myself how we might have this apparent contradiction, 
so I looked at POSIX for read() -- 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

It turns out that (Unless specified otherwise) read() is supposed 
to block when reading from a pipe until it has "some data."

Unfortunately, POSIX for read() isn't specific about whether 
read() from a pipe must give a short read or wait for a full buffer.
In other words, it looks like POSIX allows read() to either give 
a short read on a pipe (like it does now) or to read the 
requested number of bytes from the pipe before returning. 
(Stopping only, of course, for EOF/broken/etc.)

Do you really think that the POSIX folks meant for dd to be 
unusable for reading from pipes?

Just think of the poor average sysadmin who reads over the man 
page for dd and says "Oh, this'll work" but doesn't realize that 
there's such a thing as PIPE_MAX hidden deep down, and that his 
dd command is going to fail miserably, _loosing important data_. 
How can we say that it was user error? Does he really have to be 
a kernel programmer to be able to use dd correctly? Is that 
really what POSIX means?

Clearly, it's an oversight in the document one way or another. 
POSIX should either say "dd is not required to read correctly 
from pipes, and must warn or refuse to read from a pipe"  
otherwise it needs to be changed to specify that FULLBLOCK should 
be default for pipes -- one way or the other.

From a practical standpoint, the most disgusting kind of design 
flaw in a copy program is for it to partially copy the data and 
not give specific warning to that effect.
It's like a backup program that doesn't warn you if it fails to 
complete, or a download program that doesn't warn you if it fails 
to finish.

POSIX, by Indicating that dd can read from pipes while not 
mentioning that it doesn't work under many common circumstances 
is not sane.

I notice that the POSIX documents we're reading are copyright 
through 2008 -- is there a newer one out now?

Anyway, I guess my next step is to try to file a complaint with 
IEEE to get this contradiction fixed.

Do you have any suggestions for getting in touch with IEEE?

Thank you very much,

Jesse

Message #26 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Jesse Gordon <jesseg <at> nikola.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Thu, 14 Apr 2011 15:45:51 -0600

[Message part 1 (text/plain, inline)]

On 04/14/2011 02:53 PM, Jesse Gordon wrote:
> But, look at POSIX: It says under INPUT FILES that "The input file can
> be any file type" -- if we take that to include blocks and pipes, then
> we can see that dd is not compliant because it cannot reliably read from
> a pipe.

dd _is_ reliably reading from a pipe.  The problem is that POSIX has no
option like iflags=fullblock to consolidate short reads into a single
buffer before doing output, so the reliable read ends shy of the amount
of data that you were hoping for.

The solution to this is not to rant on this list, but to propose an
enhancement to POSIX to standardize iflags=fullblock.

> Fact: Here's a prime example: Let's say I happen to want to read exactly
> 999983 blocks of 2099 bytes each from a pipe. With purely POSIX DD, it's
> impossible. Doesn't that violate POSIX?

No.  Just because POSIX doesn't have the command line options _in dd
alone_ to do what you want doesn't mean that dd is broken (just that the
POSIX standard is not as useful as it could be, by the fact that you
have to rely on extensions to the standard), nor does it mean that you
can't do what you want with other POSIX tools.

Remember, the problem is not that you can't read from pipes, but that
when you couple the count=nnn with short reads you don't get full data.
 So the solution, using only POSIX, is to get rid of count=nnn, and
isolate the input counting from the output blocking.  Unfortunately,
POSIX also states that 'head -c' is not portable, but that other
utilities in the standard provide the same functionality, without
stating what those other utilities are (I'm guessing that whoever wrote
that sentence was referring to dd).

If you don't care about efficiency, then you can avoid the problem by
avoiding short reads (read from a blocking pipe will always return at
least one byte, since returning 0 implies EOF):

yes | dd ibs=1 count=$((999983*2099)) | dd obs=1000 | wc -c

proves that you can read exactly 999983*2099 bytes, with only one
partial write in the second dd.

> Doesn't a program's ability to function as described in POSIX preempt
> technical details about short reads from read()?

Coreutils' dd _does_ function as described in POSIX.

> 
> And if POSIX really intends for pipe as input file to be not guaranteed
> usable, then it ought to mention something about that, but it doesn't.

Raise that as a bug against POSIX, then:
http://austingroupbugs.net/main_page.php

> Any thinking observer would, I believe, have to conclude that dd is
> supposed to work perfectly to read 999983 blocks of 2099 bytes each from
> a pipe.

There's nothing in POSIX that requires that, unfortunately.

> Do you really think that the POSIX folks meant for dd to be unusable for
> reading from pipes?

POSIX merely intended for the default behavior of dd to match its
historical behavior, which really was unusable for reading from pipes as
historically implemented.  That said, there is nothing wrong with
proposing that POSIX add a new option to make dd more useful when
reading from pipes.

> I notice that the POSIX documents we're reading are copyright through
> 2008 -- is there a newer one out now?

2008 is the newest version.  Technical Corrigendum 1 to that revision
will probably come out later this year.  Membership in the Austin Group
(the committee that maintains POSIX) is free to anyone who wants to join.

> Do you have any suggestions for getting in touch with IEEE?

Join the Austin Group - the IEEE generally blanket-approves whatever the
Austin Group says when it comes to POSIX.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 13 May 2011 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 98 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #8490 dd reads random number of records from pipes - named or otherwise - coreutils 8.9

GNU bug report logs - #8490
dd reads random number of records from pipes - named or otherwise - coreutils 8.9