Package: coreutils;
Reported by: Jesse Gordon <jesseg <at> nikola.com>
Date: Wed, 13 Apr 2011 02:46:02 UTC
Severity: normal
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 8490 in the body.
You can then email your comments to 8490 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:bug#8490
; Package coreutils
.
(Wed, 13 Apr 2011 02:46:02 GMT) Full text and rfc822 format available.Jesse Gordon <jesseg <at> nikola.com>
:bug-coreutils <at> gnu.org
.
(Wed, 13 Apr 2011 02:46:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Jesse Gordon <jesseg <at> nikola.com> To: bug-coreutils <at> gnu.org Subject: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Tue, 12 Apr 2011 14:02:12 -0700
I can't believe such an obvious bug would exist this long, but on the other hand the test is so simple I can't see where it's user error. dd, when reading from stdin or from a named pipe sometimes (but not always) reads a random number of records a bit less then what it should. I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and even mkfifo pip; yes > pipe & dd if=pipe -- and all sometimes failed. However, if=arealfile seems to always work perfectly. To replicate: yes|dd bs=1000 count=1000|wc -c cat /dev/zero |dd bs=1000 count=1000 |wc -c If it works perfectly the first time, just keep trying. For me, it randomly works and doesn't work. Some conditions are more likely to work, and others to fail. For me, using the "yes" method above almost always reads the incorrect number of bytes, while the /dev/zero method usually works correctly but occasionally reads the wrong number of bytes. The problem exists on the following coronations: Slackware 12.0.0 (2.6.21.5-smp) / coreutils 8.9 Slackware 12.0.0 (2.6.21.5-smp) / coreutils 6.9 Slackware 13.0.0 (2.6.29.6) / coreutils 7.9 Kubuntu lenny/sid (2.6.24-27-generic SMP) / coreutils 6.10 (I only mention the older versions of coreutils because it may be helpful. I'm only filing the bug report for 8.9!) Thanks, Jesse For example: root <at> stats:~# dd --version dd (coreutils) 8.9 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Paul Rubin, David MacKenzie, and Stuart Kemp. root <at> stats:~# yes|dd bs=1000 count=1000|wc -c 597+403 records in 597+403 records out 608232 bytes (608 kB) copied, 0.0263828 s, 23.1 MB/s 608232 root <at> stats:~# yes|dd bs=1000 count=1000|wc -c 885+115 records in 885+115 records out 887784 bytes (888 kB) copied, 0.054488 s, 16.3 MB/s 887784 root <at> stats:~# yes|dd bs=1000 count=1000|wc -c 696+304 records in 696+304 records out 705536 bytes (706 kB) copied, 0.0271789 s, 26.0 MB/s 705536 root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c 1000000 1000+0 records in 1000+0 records out 1000000 bytes (1.0 MB) copied, 0.00879434 s, 114 MB/s root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c 972+28 records in 972+28 records out 993040 bytes (993 kB) copied, 0.00582009 s, 171 MB/s 993040 root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c 983+17 records in 983+17 records out 996040 bytes (996 kB) copied, 0.0102457 s, 97.2 MB/s 996040 root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c 1000000 1000+0 records in 1000+0 records out 1000000 bytes (1.0 MB) copied, 0.0181759 s, 55.0 MB/s root <at> stats:~# cat /dev/zero |dd bs=1000 count=1000 |wc -c 1000+0 records in 1000+0 records out 1000000 bytes (1.0 MB) copied, 0.010386 s, 96.3 MB/s 1000000 root <at> stats:~#
Eric Blake <eblake <at> redhat.com>
:Jesse Gordon <jesseg <at> nikola.com>
:Message #10 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Eric Blake <eblake <at> redhat.com> To: Jesse Gordon <jesseg <at> nikola.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 08:07:39 -0600
[Message part 1 (text/plain, inline)]
On 04/12/2011 03:02 PM, Jesse Gordon wrote: > I can't believe such an obvious bug would exist this long, but on the > other hand the test is so simple I can't see where it's user error. Thanks for the report. And you are correct in surmising that it is user error and not a bug in dd. > dd, when reading from stdin or from a named pipe sometimes (but not > always) reads a random number of records a bit less then what it should. Rather, dd reads as many bytes as possible, but unless that is less than PIPE_MAX, it is not guaranteed to be an atomic read. In turn, if you have asked dd to pad out partial reads into complete writes, then that explains your problem. Unfortunately, it is rather easy to do this without realizing it; the POSIX wording on how dd behaves is rather detailed. > I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and even > mkfifo pip; yes > pipe & dd if=pipe -- and all sometimes failed. > However, if=arealfile seems to always work perfectly. > > To replicate: > > yes|dd bs=1000 count=1000|wc -c There's your problem. bs=1000 is the key that tells dd to always write 1000 byte output blocks, even if the input block hit a short read, and stop after 1000 reads. Instead, try: yes|dd ibs=1000 obs=1000 count=1000|wc -c which tells dd to explicitly read in input blocks of 1000 bytes, even if it requires multiple reads, prior to doing output blocks of 1000 bytes, and stop after 1000 writes. > > The problem exists on the following coronations: It exists everywhere that dd complies with POSIX, even with non-GNU dd. Because POSIX requires the difference in behavior between bs=nnn vs. ibs=nnn obs=nnn. -- Eric Blake eblake <at> redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Message #11 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com> To: Jesse Gordon <jesseg <at> nikola.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 15:09:35 +0100
On 12/04/11 22:02, Jesse Gordon wrote: > I can't believe such an obvious bug would exist this long, but on the > other hand the test is so simple I can't see where it's user error. > > dd, when reading from stdin or from a named pipe sometimes (but not > always) reads a random number of records a bit less then what it should. > > I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and even > mkfifo pip; yes > pipe & dd if=pipe -- and all sometimes failed. > However, if=arealfile seems to always work perfectly. > > To replicate: > > yes|dd bs=1000 count=1000|wc -c With the about to be released coreutils, we now warn about this: $ yes|dd bs=1000 count=1000|wc -c dd: warning: partial read (960 bytes); suggest iflag=fullblock We can't do that by default for backwards compat reasons. cheers, Pádraig. p.s. I just noticed that this is not in NEWS. I'll fix that now...
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:bug#8490
; Package coreutils
.
(Wed, 13 Apr 2011 14:40:02 GMT) Full text and rfc822 format available.Message #14 received at 8490 <at> debbugs.gnu.org (full text, mbox):
From: Bjartur Thorlacius <svartman95 <at> gmail.com> To: Jesse Gordon <jesseg <at> nikola.com> Cc: 8490 <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 14:39:41 +0000
Have you looked into fullblock? If you only specify bs and count (and not ibs or obs) dd may fill the buffer partially. It'll do try to do count copies, but each copy may contain less data than expected. This sort of makes sense on HDDs or tapes with variable block sizes (where a read would return a whole block, but the block would be smaller than user specified bs). In this case dd will preserve the original block size. I've never encountered such an odditie — Or maybe I have, without noticing. I think about noone that hasn't been involved in the development, in one way or another, gets this wrong (I don't quite get it yet). I think this should be changed, unless the user provids a hypothetical partblock option.
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:bug#8490
; Package coreutils
.
(Wed, 13 Apr 2011 15:03:01 GMT) Full text and rfc822 format available.Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):
From: John Reiser <jreiser <at> bitwagon.com> To: bug-coreutils <at> gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 08:02:06 -0700
> dd, when reading from stdin or from a named pipe sometimes (but not always) reads a random number of records a bit less then what it should. > yes|dd bs=1000 count=1000|wc -c > cat /dev/zero |dd bs=1000 count=1000 |wc -c dd does "read(fd,buf,bs)" for 'count' times, and writes whatever it gets each time. If the operating system does not deliver 'bs' bytes each time, then the total output will be less than bs*count bytes. Because neither /usr/bin/yes nor /dev/zero generates records with a blocksize of 1000 bytes (nor divisible by 1000 bytes) then you are at the mercy of multiprocessing delays and pipe buffering. read() on a pipe waits only for non-empty, not necessarily for the size requested. --
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:bug#8490
; Package coreutils
.
(Wed, 13 Apr 2011 15:15:02 GMT) Full text and rfc822 format available.Message #20 received at 8490 <at> debbugs.gnu.org (full text, mbox):
From: Eric Blake <eblake <at> redhat.com> To: Bjartur Thorlacius <svartman95 <at> gmail.com> Cc: 8490 <at> debbugs.gnu.org, Jesse Gordon <jesseg <at> nikola.com> Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 09:13:59 -0600
[Message part 1 (text/plain, inline)]
On 04/13/2011 08:39 AM, Bjartur Thorlacius wrote: > Have you looked into fullblock? If you only specify bs and count (and > not ibs or obs) dd may fill the buffer partially. It'll do try to do > count copies, but each copy may contain less data than expected. This > sort of makes sense on HDDs or tapes with variable block sizes (where > a read would return a whole block, but the block would be smaller than > user specified bs). In this case dd will preserve the original block > size. I've never encountered such an odditie — Or maybe I have, > without noticing. > > I think about noone that hasn't been involved in the development, in > one way or another, gets this wrong (I don't quite get it yet). I > think this should be changed, unless the user provids a hypothetical > partblock option. It can't be changed without changing POSIX. -- Eric Blake eblake <at> redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Message #21 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Jesse Gordon <jesseg <at> nikola.com> To: Eric Blake <eblake <at> redhat.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 11:28:02 -0700
On 4/13/2011 7:07 AM, Eric Blake wrote: > On 04/12/2011 03:02 PM, Jesse Gordon wrote: >> I can't believe such an obvious bug would exist this long, but on the >> other hand the test is so simple I can't see where it's user error. > Thanks for the report. And you are correct in surmising that it is user > error and not a bug in dd. > >> dd, when reading from stdin or from a named pipe sometimes (but not >> always) reads a random number of records a bit less then what it should. > Rather, dd reads as many bytes as possible, but unless that is less than > PIPE_MAX, it is not guaranteed to be an atomic read. In turn, if you > have asked dd to pad out partial reads into complete writes, then that > explains your problem. Unfortunately, it is rather easy to do this > without realizing it; the POSIX wording on how dd behaves is rather > detailed. > How have I asked dd to pad out partial read? I'm not specifying pad or sync or anything. And why is reading from a pipe a partial read when there is neither EOF or error? Behold: ifp=stdin; readbytes=fread(buffer, 1, ibs, ifp); That reads in ibs bytes quite nicely from a pipe. It waits for all the data to fill into buffer, and only bails for the legitimate reasons -- like EOF, or some real error. If POSIX really requires dd to abort a read for any reason other then EOF or an error, then I'm dumbfounded. To me, it seems obvious that the rule should be "When asked to read from a pipe, don't quit till it's done or becomes impossible." dd doesn't seem to abort early when reading hard drives, even if the block size isn't the same as the hard drive IO read size, or even if it has to wait a few ms for the drive to seek. I guess it's a good thing that POSIX doesn't say "dd must perform a segfault every 10th run on a full moon" ha ha :=) It sure looks like poorly implemented non-blocking IO to me. However, iflags=fullblock seems to fix it. >> I tried it like /dev/zero|dd, like yes|dd, cat somefile|dd, and even >> mkfifo pip; yes> pipe& dd if=pipe -- and all sometimes failed. >> However, if=arealfile seems to always work perfectly. >> >> To replicate: >> >> yes|dd bs=1000 count=1000|wc -c > There's your problem. bs=1000 is the key that tells dd to always write > 1000 byte output blocks, even if the input block hit a short read, and > stop after 1000 reads. > > Instead, try: > > yes|dd ibs=1000 obs=1000 count=1000|wc -c > I tried this but it still stops early. I copied and pasted exactly your command: root <at> stats:~# yes|dd ibs=1000 obs=1000 count=1000|wc -c 564+436 records in 574+1 records out 574464 bytes (574 kB) copied, 0.0180807 s, 31.8 MB/s 574464 > which tells dd to explicitly read in input blocks of 1000 bytes, even if > it requires multiple reads, prior to doing output blocks of 1000 bytes, > and stop after 1000 writes. > >> The problem exists on the following coronations: > It exists everywhere that dd complies with POSIX, even with non-GNU dd. > Because POSIX requires the difference in behavior between bs=nnn vs. > ibs=nnn obs=nnn. > I still cannot fathom why it would ever be acceptable to abort early when there's no error and no EOF and the pipe is still sending data. That just goes against all good programming common sense as far as my tiny brain cell can tell. Is there actually EVER a real reason for dd to need to abort a read when there's no EOF and no error? Why would POSIX require this? ~~~~~ Okay, I poked around in the source code a bit. Now I'll tell you I'm not great programmer so don't be afraid to say "Jesse, here's where you're wrong..." However, I added in some fprintf(stderr, statements in various places to help me understand where and why dd is bailing. I put an fprintf() at each break inside of the while(1){} main loop, and it's breaking at the first one: while (1) { if ((r_partial + r_full) >= max_records) { static uintmax_t temp; temp=r_partial + r_full; fprintf(stderr,"break 001: "); fprintf(stderr,"r_partial=%d ",r_partial); fprintf(stderr,"r_full=%d ",r_full); fprintf(stderr,"max_records=%d ",max_records); fprintf(stderr,"temp=%d\n",temp); break; } And when I run it, I get: root <at> stats:/big/src/coreutils-8.9# yes|src/dd bs=1000 count=1000|wc -c Setting max_records=1000 due to count=1000 being specified. break 001: r_partial=394 r_full=606 max_records=1000 temp=1000 606+394 records in 606+394 records out 618304 bytes (618 kB) copied, 0.0757696 s, 8.2 MB/s 618304 Okay, so that makes sense enough: If it's read count records (whole or partially) then it's done reading because it's satisfied count=. So the question is why is it reading partial records? r_partial gets incremented whenever iread() returns less then input_blocksize. So why should iread() return less then ibs from a stream when simply trying again should finish the read? static ssize_t iread (int fd, char *buf, size_t size) { while (true) { ssize_t nread; process_signals (); nread = read (fd, buf, size); if (! (nread < 0 && errno == EINTR)) return nread; } } As you can see, it is supposed to keep trying until it gets some data. It does block if there is no data to read yet. The problem is when it happens to retry when some but not all of the data has arrived at the pipe. I added in more fprintf()s and in my case, read() is always setting errno to 29=ESPIPE (something about an error seeking on a pipe.) In trying to understand the inverted logic of the return code, I'm re ordering it here: if(nread < 0 && errno == EINTR) //If read returned -1 (ERROR) and that error is "Interrupted by Signal, then" { //keep trying } else //If we read 0 or more bytes, OR there was an error OTHER then EINTR (Interrupted by Signal) then { return nread; } So why don't we want to return on EINTR? or am I all confused? ~~~~ Anyway, of course just below iread() is iread_fullblock() which it appears does exactly how it should for reading from pipes. Maybe iread_fullblock() should always be used for reading from pipes. Ohwell. I really have a hard time believing that posix requries DD to abort a pipe read just because the data wasn't ready quick enough. What use of dd would actually be broken if iread() didn't abort for ESPIPE when reading a pipe? It should be as simple as making it to keep reading and appending of errno==ESPIPE and read() returned other then 0. Can someone point me to where POSIX requires this current behavior of dd? or am I just hopelessly confused? Is the requirement actually specific to reading from pipes? or is reading from pipes broken due to trying to satisfy some requirement relating to the reading of block devices or whatever? But at least I have a work around -- iflags=fullblock. Thank you all very much, Jesse
Message #22 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Eric Blake <eblake <at> redhat.com> To: Jesse Gordon <jesseg <at> nikola.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 13:12:35 -0600
[Message part 1 (text/plain, inline)]
On 04/13/2011 12:28 PM, Jesse Gordon wrote: > > > On 4/13/2011 7:07 AM, Eric Blake wrote: >> On 04/12/2011 03:02 PM, Jesse Gordon wrote: >>> I can't believe such an obvious bug would exist this long, but on the >>> other hand the test is so simple I can't see where it's user error. >> Thanks for the report. And you are correct in surmising that it is user >> error and not a bug in dd. >> >>> dd, when reading from stdin or from a named pipe sometimes (but not >>> always) reads a random number of records a bit less then what it should. >> Rather, dd reads as many bytes as possible, but unless that is less than >> PIPE_MAX, it is not guaranteed to be an atomic read. In turn, if you >> have asked dd to pad out partial reads into complete writes, then that >> explains your problem. Unfortunately, it is rather easy to do this >> without realizing it; the POSIX wording on how dd behaves is rather >> detailed. >> > How have I asked dd to pad out partial read? I'm not specifying pad or > sync or anything. Sorry, I assumed there was a conv=sync in the mix; without that, there is no padding (a partial read becomes a complete write with no padding). > And why is reading from a pipe a partial read when there is neither EOF > or error? Because the writer (yes) is getting ahead of the reader (dd), and is not writing in the same block size as the reader. For the sake of argument, let's suppose that yes uses stdio, which buffers to 4096 bytes before it calls write(). Then the kernel swaps over to dd, which does four reads of 1000 bytes each, then another read() which only has 96 bytes available immediately without swapping back to yes. So the kernel gives dd a short read. > That reads in ibs bytes quite nicely from a pipe. It waits for all the > data to fill into buffer, and only bails for the legitimate reasons -- > like EOF, or some real error. Short reads are not an error, but are a real phenomenon when reading from pipes. > > If POSIX really requires dd to abort a read for any reason other then > EOF or an error, then I'm dumbfounded. To me, it seems obvious that the > rule should be "When asked to read from a pipe, don't quit till it's > done or becomes impossible." POSIX expects the following (and note carefully that bs=nnn is MUCH different than ibs=nnn obs=nnn): http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html If the bs= expr operand is specified and no conversions other than sync, noerror, or notrunc are requested, the data returned from each input block shall be written as a separate output block; if the read returns less than a full block and the sync conversion is not specified, the resulting output block shall be the same size as the input block. If the bs= expr operand is not specified, or a conversion other than sync, noerror, or notrunc is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached. > dd doesn't seem to abort early when reading hard drives, even if the > block size isn't the same as the hard drive IO read size, or even if it > has to wait a few ms for the drive to seek. That's because hard drives, being physical devices, have all of their data handy at once. The kernel doesn't have to task swap over to another process to get more bytes, but can proceed with the full read request right up until EOF. > However, iflags=fullblock seems to fix it. That's one fix. But it's GNU-specific. If you want the POSIX-compliant fix, then use ibs/obs instead of bs. >> > I still cannot fathom why it would ever be acceptable to abort early > when there's no error and no EOF and the pipe is still sending data. > Is there actually EVER a real reason for dd to need to abort a read when > there's no EOF and no error? Why would POSIX require this? It's called 40 years of history. That was the original way dd was written, back when the default medium was _not_ disks, but tapes, and tapes had variable size blocks. It made sense for the default back then. And changing it now _WILL_ break existing scripts that have come to rely on the standardized behavior, even if the standardized behavior makes no sense if dd were being developed from scratch today. > So the question is why is it reading partial records? Because that's the way pipes behave. > I really have a hard time believing that posix requries DD to abort a > pipe read just because the data wasn't ready quick enough. It is NOT aborting a pipe read, it is doing exactly what you told it, and writing as soon as read returns, even if read() had a short read value, because you specified bs. > Can someone point me to where POSIX requires this current behavior of > dd? I just did. -- Eric Blake eblake <at> redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Message #23 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Jesse Gordon <jesseg <at> nikola.com> To: Eric Blake <eblake <at> redhat.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 16:36:35 -0700
On 4/13/2011 12:12 PM, Eric Blake wrote: > On 04/13/2011 12:28 PM, Jesse Gordon wrote: >> >> On 4/13/2011 7:07 AM, Eric Blake wrote: >>> On 04/12/2011 03:02 PM, Jesse Gordon wrote: >>>> I can't believe such an obvious bug would exist this long, but on the >>>> other hand the test is so simple I can't see where it's user error. >>> Thanks for the report. And you are correct in surmising that it is user >>> error and not a bug in dd. >>> >>>> dd, when reading from stdin or from a named pipe sometimes (but not >>>> always) reads a random number of records a bit less then what it should. >>> Rather, dd reads as many bytes as possible, but unless that is less than >>> PIPE_MAX, it is not guaranteed to be an atomic read. In turn, if you >>> have asked dd to pad out partial reads into complete writes, then that >>> explains your problem. Unfortunately, it is rather easy to do this >>> without realizing it; the POSIX wording on how dd behaves is rather >>> detailed. >>> >> How have I asked dd to pad out partial read? I'm not specifying pad or >> sync or anything. > Sorry, I assumed there was a conv=sync in the mix; without that, there > is no padding (a partial read becomes a complete write with no padding). > >> And why is reading from a pipe a partial read when there is neither EOF >> or error? > Because the writer (yes) is getting ahead of the reader (dd), and is not > writing in the same block size as the reader. For the sake of argument, > let's suppose that yes uses stdio, which buffers to 4096 bytes before it > calls write(). Then the kernel swaps over to dd, which does four reads > of 1000 bytes each, then another read() which only has 96 bytes > available immediately without swapping back to yes. So the kernel gives > dd a short read. > >> That reads in ibs bytes quite nicely from a pipe. It waits for all the >> data to fill into buffer, and only bails for the legitimate reasons -- >> like EOF, or some real error. > Short reads are not an error, but are a real phenomenon when reading > from pipes. > I agree - short reads from pipes are real. But I don't see why they should ever need to cause dd to skip data from the pipe. (Mind you, I'm talking only about reading from pipes here!) >> If POSIX really requires dd to abort a read for any reason other then >> EOF or an error, then I'm dumbfounded. To me, it seems obvious that the >> rule should be "When asked to read from a pipe, don't quit till it's >> done or becomes impossible." > POSIX expects the following (and note carefully that bs=nnn is MUCH > different than ibs=nnn obs=nnn): > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html > > If the bs= expr operand is specified and no conversions other than sync, > noerror, or notrunc are requested, the data returned from each input > block shall be written as a separate output block; if the read returns > less than a full block and the sync conversion is not specified, the > resulting output block shall be the same size as the input block. If the > bs= expr operand is not specified, or a conversion other than sync, > noerror, or notrunc is requested, the input shall be processed and > collected into full-sized output blocks until the end of the input is > reached. > >> dd doesn't seem to abort early when reading hard drives, even if the >> block size isn't the same as the hard drive IO read size, or even if it >> has to wait a few ms for the drive to seek. > That's because hard drives, being physical devices, have all of their > data handy at once. Is that really true? Sure they are block devices - but on the nanosecond scale, do they really have all of their data handy? In fact I was doing some tests the other day and I found that dd can read from /dev/zero about 20 times faster then it can /dev/hda --- so I know that dd has to wait for /dev/hda to get its data. Furthermore, my /dev/hda almost certainly has a block size other then 1000 (which is the block size I set for dd for my test) So for example: dd bs=1000 count=1000 of=/dev/null if=/dev/hda And let's say my disk driver transfers 2048 bytes at a time: The first two reads will be fine, but the third read will only get 48 bytes. *boom* But, it doesn't go boom. What gives? Doesn't the kernel have to task swap to the disk driver so it can queue up some more bytes, leaving read() with a short read? And even if my ibs was the same as my drive's driver, I just happen to know that dd can read waaaay faster then any of my drives can send data. Obviously, dd has to wait for the data to become available. > The kernel doesn't have to task swap over to > another process to get more bytes, but can proceed with the full read > request right up until EOF. > >> However, iflags=fullblock seems to fix it. > That's one fix. But it's GNU-specific. If you want the POSIX-compliant > fix, then use ibs/obs instead of bs. > Setting instead ibs and obs does _NOT_ fix. Try it yourself! :-) yes|dd ibs=1000 obs=1000 count=1000| wc -c 694+306 records in 703+1 records out 703488 bytes (703 kB) copied, 0.0220068 s, 32.0 MB/s 703488 >> I still cannot fathom why it would ever be acceptable to abort early >> when there's no error and no EOF and the pipe is still sending data. >> Is there actually EVER a real reason for dd to need to abort a read when >> there's no EOF and no error? Why would POSIX require this? > It's called 40 years of history. That was the original way dd was > written, back when the default medium was _not_ disks, but tapes, and > tapes had variable size blocks. It made sense for the default back > then. > And changing it now _WILL_ break existing scripts that have come > to rely on the standardized behavior,... I can't think of a single scenario where any script would rely on dd dropping bytes from the input pipe. Can anyone else? I mean, seriously, it goes on reading some of the bytes and dropping others and then finishes up like everything's normal. Remember, I'm talking about pipes here, nothing else. > .... even if the standardized behavior > makes no sense if dd were being developed from scratch today. > >> So the question is why is it reading partial records? > Because that's the way pipes behave. > >> I really have a hard time believing that posix requries DD to abort a >> pipe read just because the data wasn't ready quick enough. > It is NOT aborting a pipe read, it is doing exactly what you told it, > and writing as soon as read returns, even if read() had a short read > value, because you specified bs. > >> Can someone point me to where POSIX requires this current behavior of >> dd? > I just did. > Thank you very much! I now see the real problem: The POSIX document is not aware of pipes. It states that certain things should be certain ways and that if read() gets a short read, it should count it as a partial record and write it as such. And the dd authors have just obediently followed the letter of the POSIX not realizing it's for a context that does not include pipes. The problem is that while dd's behavior for short reads is perfect for random access files and devices, it's lousy for pipes. With a file or a disk device, a short read only happens for a reason - like EOF or data unreadable or driver timeout or device disconnect perhaps. And maybe even the tape drivers give a short read as a way of signaling the end of a variable-length block. That really makes sense. That way you could tell it to read x number of blocks off of the tape, and it would read x number of blocks, ignoring the fact that they were different sizes. But pipes do not have variable length blocks, and a short read does not signal anything to worry about or deal with: It just means it was the end of the block and another will be coming right after it. So I think it is clear that the POSIX lack of comment on pipes combined with the absurdity of the current behavior with pipes is due to POSIX being followed to the letter without the realization that it's being applied in a context for which it was not written. The problem with this is that a short read on a pipe has an entirely different meaning then a short read on most other things -- and treating them the same is what causes this strangeness. (And there may be a bug in DD since setting ibs and obs instead of bs still causes partial reads on a pipe.) Do I seem confused? or would a normal thinking person arrive at the same place as I have? Am I missing something? Thank you very much, Jesse
Message #24 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Eric Blake <eblake <at> redhat.com> To: Jesse Gordon <jesseg <at> nikola.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Thu, 14 Apr 2011 08:34:31 -0600
[Message part 1 (text/plain, inline)]
On 04/13/2011 05:36 PM, Jesse Gordon wrote: >> Short reads are not an error, but are a real phenomenon when reading >> from pipes. >> > I agree - short reads from pipes are real. But I don't see why they > should ever need to cause dd to skip data from the pipe. dd is not skipping data. It is stopping after exactly count=1000 reads, just like you asked; so it is ending shy of the amount of data you thought you were getting. >>> dd doesn't seem to abort early when reading hard drives, even if the >>> block size isn't the same as the hard drive IO read size, or even if it >>> has to wait a few ms for the drive to seek. >> That's because hard drives, being physical devices, have all of their >> data handy at once. > Is that really true? From the application's perspective - yes. With pipes, the kernel has to schedule another process to run, and that other process can take an indefinite amount of time producing data. Since the kernel has no control over when other processes will actually produce that data, a short read is the only viable answer that doesn't deadlock the system. But with files, even if the kernel has to swap out in order to continue reading from the disk, it has complete control over the data and does not have to wait on any external processes, so the kernel has no arbitrary waits - it may have a finite (and even long) wait while spinning to the next sector for the next portion of data, but the wait is independent of all other system activity, and therefore the kernel can afford to avoid short reads when reading from devices since there is no way to deadlock the system while getting the rest of the data. >>> However, iflags=fullblock seems to fix it. >> That's one fix. But it's GNU-specific. If you want the POSIX-compliant >> fix, then use ibs/obs instead of bs. >> > Setting instead ibs and obs does _NOT_ fix. Try it yourself! :-) I stand corrected. On re-reading POSIX, I indeed concur that there is no way using _just_ POSIX options to require that a particular amount of input data be read, regardless of short reads; you _have_ to use the GNU extension of iflags=fullblock. > I can't think of a single scenario where any script would rely on dd > dropping bytes from the input pipe. > Can anyone else? It doesn't matter what you think; the problem is that you can't change existing behavior. You can add new commands that give new behavior, but there are 40 years worth of scripts that rely on existing behavior, so even if _you_ can't think of someone that wants short reads from a pipe, someone else may already be wanting it, and even relying on it, because it is standardized that way. Maybe the best thing to do is work on having POSIX standardize the GNU extension of iflags=fullblock. > > I now see the real problem: The POSIX document is not aware of pipes. It > states that certain things should be certain ways and that if read() > gets a short read, it should count it as a partial record and write it > as such. > > And the dd authors have just obediently followed the letter of the POSIX > not realizing it's for a context that does not include pipes. POSIX is very much aware of pipes. POSIX was written long after dd was written, and standardized existing practice. It's not POSIX that got it wrong, but the original dd implementors. But they got it wrong so long ago that people have come to rely on that behavior, and the only way to get new behavior is to mandate a new option. -- Eric Blake eblake <at> redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Message #25 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Jesse Gordon <jesseg <at> nikola.com> To: Eric Blake <eblake <at> redhat.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Thu, 14 Apr 2011 13:53:20 -0700
On 4/14/2011 7:34 AM, Eric Blake wrote: > On 04/13/2011 05:36 PM, Jesse Gordon wrote: > >> I can't think of a single scenario where any script would rely on dd >> dropping bytes from the input pipe. >> Can anyone else? > It doesn't matter what you think; the problem is that you can't change > existing behavior. Is that really true? What if it was a buffer overrun? What if it was a bug where it segfaulted if a certain sequence of bytes was copied? Of course those can be fixed... What if it was a limitation to 2G bytes read? My point is that the existing be > You can add new commands that give new behavior, but > there are 40 years worth of scripts that rely on existing behavior, I'll admit that as a scientist and engineer, it bothers me when an intelligent person makes a broad sweeping claim without even stopping to think weather they can cite a single example - hypothetical or real. Making claims are easy and prove nothing - and may not be of much benefit. Can you even imagine a hypothetical situation where someone would depend on short reads from a pipe? Maybe in a random number generator? I can cite lots of examples where some poor sysadmin wants to just get his job done and needs dd to read all data from the input pipe.... :-) > so > even if _you_ can't think of someone that wants short reads from a pipe, > someone else may already be wanting it, and even relying on it, because > it is standardized that way. > But what if short pipe reads cause reading from pipes to be useless? > Maybe the best thing to do is work on having POSIX standardize the GNU > extension of iflags=fullblock. > Basically, because the users desired block size may not coincide with PIPE_MAX (or whatever,) there is never a guarantee that dd will always read the whole input data from a pipe (using POSIX options.) But, look at POSIX: It says under INPUT FILES that "The input file can be any file type" -- if we take that to include blocks and pipes, then we can see that dd is not compliant because it cannot reliably read from a pipe. Fact: Here's a prime example: Let's say I happen to want to read exactly 999983 blocks of 2099 bytes each from a pipe. With purely POSIX DD, it's impossible. Doesn't that violate POSIX? Doesn't a program's ability to function as described in POSIX preempt technical details about short reads from read()? And if POSIX really intends for pipe as input file to be not guaranteed usable, then it ought to mention something about that, but it doesn't. Any thinking observer would, I believe, have to conclude that dd is supposed to work perfectly to read 999983 blocks of 2099 bytes each from a pipe. So I asked myself how we might have this apparent contradiction, so I looked at POSIX for read() -- http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html It turns out that (Unless specified otherwise) read() is supposed to block when reading from a pipe until it has "some data." Unfortunately, POSIX for read() isn't specific about whether read() from a pipe must give a short read or wait for a full buffer. In other words, it looks like POSIX allows read() to either give a short read on a pipe (like it does now) or to read the requested number of bytes from the pipe before returning. (Stopping only, of course, for EOF/broken/etc.) Do you really think that the POSIX folks meant for dd to be unusable for reading from pipes? Just think of the poor average sysadmin who reads over the man page for dd and says "Oh, this'll work" but doesn't realize that there's such a thing as PIPE_MAX hidden deep down, and that his dd command is going to fail miserably, _loosing important data_. How can we say that it was user error? Does he really have to be a kernel programmer to be able to use dd correctly? Is that really what POSIX means? Clearly, it's an oversight in the document one way or another. POSIX should either say "dd is not required to read correctly from pipes, and must warn or refuse to read from a pipe" otherwise it needs to be changed to specify that FULLBLOCK should be default for pipes -- one way or the other. From a practical standpoint, the most disgusting kind of design flaw in a copy program is for it to partially copy the data and not give specific warning to that effect. It's like a backup program that doesn't warn you if it fails to complete, or a download program that doesn't warn you if it fails to finish. POSIX, by Indicating that dd can read from pipes while not mentioning that it doesn't work under many common circumstances is not sane. I notice that the POSIX documents we're reading are copyright through 2008 -- is there a newer one out now? Anyway, I guess my next step is to try to file a complaint with IEEE to get this contradiction fixed. Do you have any suggestions for getting in touch with IEEE? Thank you very much, Jesse
Message #26 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Eric Blake <eblake <at> redhat.com> To: Jesse Gordon <jesseg <at> nikola.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Thu, 14 Apr 2011 15:45:51 -0600
[Message part 1 (text/plain, inline)]
On 04/14/2011 02:53 PM, Jesse Gordon wrote: > But, look at POSIX: It says under INPUT FILES that "The input file can > be any file type" -- if we take that to include blocks and pipes, then > we can see that dd is not compliant because it cannot reliably read from > a pipe. dd _is_ reliably reading from a pipe. The problem is that POSIX has no option like iflags=fullblock to consolidate short reads into a single buffer before doing output, so the reliable read ends shy of the amount of data that you were hoping for. The solution to this is not to rant on this list, but to propose an enhancement to POSIX to standardize iflags=fullblock. > Fact: Here's a prime example: Let's say I happen to want to read exactly > 999983 blocks of 2099 bytes each from a pipe. With purely POSIX DD, it's > impossible. Doesn't that violate POSIX? No. Just because POSIX doesn't have the command line options _in dd alone_ to do what you want doesn't mean that dd is broken (just that the POSIX standard is not as useful as it could be, by the fact that you have to rely on extensions to the standard), nor does it mean that you can't do what you want with other POSIX tools. Remember, the problem is not that you can't read from pipes, but that when you couple the count=nnn with short reads you don't get full data. So the solution, using only POSIX, is to get rid of count=nnn, and isolate the input counting from the output blocking. Unfortunately, POSIX also states that 'head -c' is not portable, but that other utilities in the standard provide the same functionality, without stating what those other utilities are (I'm guessing that whoever wrote that sentence was referring to dd). If you don't care about efficiency, then you can avoid the problem by avoiding short reads (read from a blocking pipe will always return at least one byte, since returning 0 implies EOF): yes | dd ibs=1 count=$((999983*2099)) | dd obs=1000 | wc -c proves that you can read exactly 999983*2099 bytes, with only one partial write in the second dd. > Doesn't a program's ability to function as described in POSIX preempt > technical details about short reads from read()? Coreutils' dd _does_ function as described in POSIX. > > And if POSIX really intends for pipe as input file to be not guaranteed > usable, then it ought to mention something about that, but it doesn't. Raise that as a bug against POSIX, then: http://austingroupbugs.net/main_page.php > Any thinking observer would, I believe, have to conclude that dd is > supposed to work perfectly to read 999983 blocks of 2099 bytes each from > a pipe. There's nothing in POSIX that requires that, unfortunately. > Do you really think that the POSIX folks meant for dd to be unusable for > reading from pipes? POSIX merely intended for the default behavior of dd to match its historical behavior, which really was unusable for reading from pipes as historically implemented. That said, there is nothing wrong with proposing that POSIX add a new option to make dd more useful when reading from pipes. > I notice that the POSIX documents we're reading are copyright through > 2008 -- is there a newer one out now? 2008 is the newest version. Technical Corrigendum 1 to that revision will probably come out later this year. Membership in the Austin Group (the committee that maintains POSIX) is free to anyone who wants to join. > Do you have any suggestions for getting in touch with IEEE? Join the Austin Group - the IEEE generally blanket-approves whatever the Austin Group says when it comes to POSIX. -- Eric Blake eblake <at> redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Fri, 13 May 2011 11:24:04 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.