Package: coreutils;
Reported by: Jesse Gordon <jesseg <at> nikola.com>
Date: Wed, 13 Apr 2011 02:46:02 UTC
Severity: normal
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Message #23 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
From: Jesse Gordon <jesseg <at> nikola.com> To: Eric Blake <eblake <at> redhat.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Wed, 13 Apr 2011 16:36:35 -0700
On 4/13/2011 12:12 PM, Eric Blake wrote: > On 04/13/2011 12:28 PM, Jesse Gordon wrote: >> >> On 4/13/2011 7:07 AM, Eric Blake wrote: >>> On 04/12/2011 03:02 PM, Jesse Gordon wrote: >>>> I can't believe such an obvious bug would exist this long, but on the >>>> other hand the test is so simple I can't see where it's user error. >>> Thanks for the report. And you are correct in surmising that it is user >>> error and not a bug in dd. >>> >>>> dd, when reading from stdin or from a named pipe sometimes (but not >>>> always) reads a random number of records a bit less then what it should. >>> Rather, dd reads as many bytes as possible, but unless that is less than >>> PIPE_MAX, it is not guaranteed to be an atomic read. In turn, if you >>> have asked dd to pad out partial reads into complete writes, then that >>> explains your problem. Unfortunately, it is rather easy to do this >>> without realizing it; the POSIX wording on how dd behaves is rather >>> detailed. >>> >> How have I asked dd to pad out partial read? I'm not specifying pad or >> sync or anything. > Sorry, I assumed there was a conv=sync in the mix; without that, there > is no padding (a partial read becomes a complete write with no padding). > >> And why is reading from a pipe a partial read when there is neither EOF >> or error? > Because the writer (yes) is getting ahead of the reader (dd), and is not > writing in the same block size as the reader. For the sake of argument, > let's suppose that yes uses stdio, which buffers to 4096 bytes before it > calls write(). Then the kernel swaps over to dd, which does four reads > of 1000 bytes each, then another read() which only has 96 bytes > available immediately without swapping back to yes. So the kernel gives > dd a short read. > >> That reads in ibs bytes quite nicely from a pipe. It waits for all the >> data to fill into buffer, and only bails for the legitimate reasons -- >> like EOF, or some real error. > Short reads are not an error, but are a real phenomenon when reading > from pipes. > I agree - short reads from pipes are real. But I don't see why they should ever need to cause dd to skip data from the pipe. (Mind you, I'm talking only about reading from pipes here!) >> If POSIX really requires dd to abort a read for any reason other then >> EOF or an error, then I'm dumbfounded. To me, it seems obvious that the >> rule should be "When asked to read from a pipe, don't quit till it's >> done or becomes impossible." > POSIX expects the following (and note carefully that bs=nnn is MUCH > different than ibs=nnn obs=nnn): > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/dd.html > > If the bs= expr operand is specified and no conversions other than sync, > noerror, or notrunc are requested, the data returned from each input > block shall be written as a separate output block; if the read returns > less than a full block and the sync conversion is not specified, the > resulting output block shall be the same size as the input block. If the > bs= expr operand is not specified, or a conversion other than sync, > noerror, or notrunc is requested, the input shall be processed and > collected into full-sized output blocks until the end of the input is > reached. > >> dd doesn't seem to abort early when reading hard drives, even if the >> block size isn't the same as the hard drive IO read size, or even if it >> has to wait a few ms for the drive to seek. > That's because hard drives, being physical devices, have all of their > data handy at once. Is that really true? Sure they are block devices - but on the nanosecond scale, do they really have all of their data handy? In fact I was doing some tests the other day and I found that dd can read from /dev/zero about 20 times faster then it can /dev/hda --- so I know that dd has to wait for /dev/hda to get its data. Furthermore, my /dev/hda almost certainly has a block size other then 1000 (which is the block size I set for dd for my test) So for example: dd bs=1000 count=1000 of=/dev/null if=/dev/hda And let's say my disk driver transfers 2048 bytes at a time: The first two reads will be fine, but the third read will only get 48 bytes. *boom* But, it doesn't go boom. What gives? Doesn't the kernel have to task swap to the disk driver so it can queue up some more bytes, leaving read() with a short read? And even if my ibs was the same as my drive's driver, I just happen to know that dd can read waaaay faster then any of my drives can send data. Obviously, dd has to wait for the data to become available. > The kernel doesn't have to task swap over to > another process to get more bytes, but can proceed with the full read > request right up until EOF. > >> However, iflags=fullblock seems to fix it. > That's one fix. But it's GNU-specific. If you want the POSIX-compliant > fix, then use ibs/obs instead of bs. > Setting instead ibs and obs does _NOT_ fix. Try it yourself! :-) yes|dd ibs=1000 obs=1000 count=1000| wc -c 694+306 records in 703+1 records out 703488 bytes (703 kB) copied, 0.0220068 s, 32.0 MB/s 703488 >> I still cannot fathom why it would ever be acceptable to abort early >> when there's no error and no EOF and the pipe is still sending data. >> Is there actually EVER a real reason for dd to need to abort a read when >> there's no EOF and no error? Why would POSIX require this? > It's called 40 years of history. That was the original way dd was > written, back when the default medium was _not_ disks, but tapes, and > tapes had variable size blocks. It made sense for the default back > then. > And changing it now _WILL_ break existing scripts that have come > to rely on the standardized behavior,... I can't think of a single scenario where any script would rely on dd dropping bytes from the input pipe. Can anyone else? I mean, seriously, it goes on reading some of the bytes and dropping others and then finishes up like everything's normal. Remember, I'm talking about pipes here, nothing else. > .... even if the standardized behavior > makes no sense if dd were being developed from scratch today. > >> So the question is why is it reading partial records? > Because that's the way pipes behave. > >> I really have a hard time believing that posix requries DD to abort a >> pipe read just because the data wasn't ready quick enough. > It is NOT aborting a pipe read, it is doing exactly what you told it, > and writing as soon as read returns, even if read() had a short read > value, because you specified bs. > >> Can someone point me to where POSIX requires this current behavior of >> dd? > I just did. > Thank you very much! I now see the real problem: The POSIX document is not aware of pipes. It states that certain things should be certain ways and that if read() gets a short read, it should count it as a partial record and write it as such. And the dd authors have just obediently followed the letter of the POSIX not realizing it's for a context that does not include pipes. The problem is that while dd's behavior for short reads is perfect for random access files and devices, it's lousy for pipes. With a file or a disk device, a short read only happens for a reason - like EOF or data unreadable or driver timeout or device disconnect perhaps. And maybe even the tape drivers give a short read as a way of signaling the end of a variable-length block. That really makes sense. That way you could tell it to read x number of blocks off of the tape, and it would read x number of blocks, ignoring the fact that they were different sizes. But pipes do not have variable length blocks, and a short read does not signal anything to worry about or deal with: It just means it was the end of the block and another will be coming right after it. So I think it is clear that the POSIX lack of comment on pipes combined with the absurdity of the current behavior with pipes is due to POSIX being followed to the letter without the realization that it's being applied in a context for which it was not written. The problem with this is that a short read on a pipe has an entirely different meaning then a short read on most other things -- and treating them the same is what causes this strangeness. (And there may be a bug in DD since setting ibs and obs instead of bs still causes partial reads on a pipe.) Do I seem confused? or would a normal thinking person arrive at the same place as I have? Am I missing something? Thank you very much, Jesse
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.