GNU bug report logs -
#8490
dd reads random number of records from pipes - named or otherwise - coreutils 8.9
Previous Next
Reported by: Jesse Gordon <jesseg <at> nikola.com>
Date: Wed, 13 Apr 2011 02:46:02 UTC
Severity: normal
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
Message #25 received at 8490-done <at> debbugs.gnu.org (full text, mbox):
On 4/14/2011 7:34 AM, Eric Blake wrote:
> On 04/13/2011 05:36 PM, Jesse Gordon wrote:
>
>> I can't think of a single scenario where any script would rely on dd
>> dropping bytes from the input pipe.
>> Can anyone else?
> It doesn't matter what you think; the problem is that you can't change
> existing behavior.
Is that really true? What if it was a buffer overrun? What if it
was a bug where it segfaulted if a certain sequence of bytes was
copied?
Of course those can be fixed...
What if it was a limitation to 2G bytes read?
My point is that the existing be
> You can add new commands that give new behavior, but
> there are 40 years worth of scripts that rely on existing behavior,
I'll admit that as a scientist and engineer, it bothers me when
an intelligent person makes a broad sweeping claim without even
stopping to think weather they can cite a single example -
hypothetical or real. Making claims are easy and prove nothing -
and may not be of much benefit.
Can you even imagine a hypothetical situation where someone would
depend on short reads from a pipe? Maybe in a random number
generator?
I can cite lots of examples where some poor sysadmin wants to
just get his job done and needs dd to read all data from the
input pipe.... :-)
> so
> even if _you_ can't think of someone that wants short reads from a pipe,
> someone else may already be wanting it, and even relying on it, because
> it is standardized that way.
>
But what if short pipe reads cause reading from pipes to be useless?
> Maybe the best thing to do is work on having POSIX standardize the GNU
> extension of iflags=fullblock.
>
Basically, because the users desired block size may not coincide
with PIPE_MAX (or whatever,) there is never a guarantee that dd
will always read the whole input data from a pipe (using POSIX
options.)
But, look at POSIX: It says under INPUT FILES that "The input
file can be any file type" -- if we take that to include blocks
and pipes, then we can see that dd is not compliant because it
cannot reliably read from a pipe.
Fact: Here's a prime example: Let's say I happen to want to read
exactly 999983 blocks of 2099 bytes each from a pipe. With purely
POSIX DD, it's impossible. Doesn't that violate POSIX?
Doesn't a program's ability to function as described in POSIX
preempt technical details about short reads from read()?
And if POSIX really intends for pipe as input file to be not
guaranteed usable, then it ought to mention something about that,
but it doesn't.
Any thinking observer would, I believe, have to conclude that dd
is supposed to work perfectly to read 999983 blocks of 2099 bytes
each from a pipe.
So I asked myself how we might have this apparent contradiction,
so I looked at POSIX for read() --
http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
It turns out that (Unless specified otherwise) read() is supposed
to block when reading from a pipe until it has "some data."
Unfortunately, POSIX for read() isn't specific about whether
read() from a pipe must give a short read or wait for a full buffer.
In other words, it looks like POSIX allows read() to either give
a short read on a pipe (like it does now) or to read the
requested number of bytes from the pipe before returning.
(Stopping only, of course, for EOF/broken/etc.)
Do you really think that the POSIX folks meant for dd to be
unusable for reading from pipes?
Just think of the poor average sysadmin who reads over the man
page for dd and says "Oh, this'll work" but doesn't realize that
there's such a thing as PIPE_MAX hidden deep down, and that his
dd command is going to fail miserably, _loosing important data_.
How can we say that it was user error? Does he really have to be
a kernel programmer to be able to use dd correctly? Is that
really what POSIX means?
Clearly, it's an oversight in the document one way or another.
POSIX should either say "dd is not required to read correctly
from pipes, and must warn or refuse to read from a pipe"
otherwise it needs to be changed to specify that FULLBLOCK should
be default for pipes -- one way or the other.
From a practical standpoint, the most disgusting kind of design
flaw in a copy program is for it to partially copy the data and
not give specific warning to that effect.
It's like a backup program that doesn't warn you if it fails to
complete, or a download program that doesn't warn you if it fails
to finish.
POSIX, by Indicating that dd can read from pipes while not
mentioning that it doesn't work under many common circumstances
is not sane.
I notice that the POSIX documents we're reading are copyright
through 2008 -- is there a newer one out now?
Anyway, I guess my next step is to try to file a complaint with
IEEE to get this contradiction fixed.
Do you have any suggestions for getting in touch with IEEE?
Thank you very much,
Jesse
This bug report was last modified 14 years and 99 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.