GNU bug report logs - #8490
dd reads random number of records from pipes - named or otherwise - coreutils 8.9

Previous Next

Package: coreutils;

Reported by: Jesse Gordon <jesseg <at> nikola.com>

Date: Wed, 13 Apr 2011 02:46:02 UTC

Severity: normal

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #25 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Jesse Gordon <jesseg <at> nikola.com>
To: Eric Blake <eblake <at> redhat.com>
Cc: 8490-done <at> debbugs.gnu.org
Subject: Re: bug#8490: dd reads random number of records from pipes - named
	or	otherwise - coreutils 8.9
Date: Thu, 14 Apr 2011 13:53:20 -0700

On 4/14/2011 7:34 AM, Eric Blake wrote:
> On 04/13/2011 05:36 PM, Jesse Gordon wrote:
>
>> I can't think of a single scenario where any script would rely on dd
>> dropping bytes from the input pipe.
>> Can anyone else?
> It doesn't matter what you think; the problem is that you can't change
> existing behavior.

Is that really true? What if it was a buffer overrun? What if it 
was a bug where it segfaulted if a certain sequence of bytes was 
copied?
Of course those can be fixed...

What if it was a limitation to 2G bytes read?

My point is that the existing be

>   You can add new commands that give new behavior, but
> there are 40 years worth of scripts that rely on existing behavior,

I'll admit that as a scientist and engineer, it bothers me when 
an intelligent person makes a broad sweeping claim without even 
stopping to think weather they can cite a single example - 
hypothetical or real. Making claims are easy and prove nothing - 
and may not be of much benefit.

Can you even imagine a hypothetical situation where someone would 
depend on short reads from a pipe? Maybe in a random number 
generator?

I can cite lots of examples where some poor sysadmin wants to 
just get his job done and needs dd to read all data from the 
input pipe.... :-)

>   so
> even if _you_ can't think of someone that wants short reads from a pipe,
> someone else may already be wanting it, and even relying on it, because
> it is standardized that way.
>
But what if short pipe reads cause reading from pipes to be useless?
> Maybe the best thing to do is work on having POSIX standardize the GNU
> extension of iflags=fullblock.
>
Basically, because the users desired block size may not coincide 
with PIPE_MAX (or whatever,) there is never a guarantee that dd 
will always read the whole input data from a pipe (using POSIX 
options.)

But, look at POSIX: It says under INPUT FILES that "The input 
file can be any file type" -- if we take that to include blocks 
and pipes, then we can see that dd is not compliant because it 
cannot reliably read from a pipe.

Fact: Here's a prime example: Let's say I happen to want to read 
exactly 999983 blocks of 2099 bytes each from a pipe. With purely 
POSIX DD, it's impossible. Doesn't that violate POSIX?
Doesn't a program's ability to function as described in POSIX 
preempt technical details about short reads from read()?

And if POSIX really intends for pipe as input file to be not 
guaranteed usable, then it ought to mention something about that, 
but it doesn't.

Any thinking observer would, I believe, have to conclude that dd 
is supposed to work perfectly to read 999983 blocks of 2099 bytes 
each from a pipe.

So I asked myself how we might have this apparent contradiction, 
so I looked at POSIX for read() -- 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html

It turns out that (Unless specified otherwise) read() is supposed 
to block when reading from a pipe until it has "some data."

Unfortunately, POSIX for read() isn't specific about whether 
read() from a pipe must give a short read or wait for a full buffer.
In other words, it looks like POSIX allows read() to either give 
a short read on a pipe (like it does now) or to read the 
requested number of bytes from the pipe before returning. 
(Stopping only, of course, for EOF/broken/etc.)

Do you really think that the POSIX folks meant for dd to be 
unusable for reading from pipes?

Just think of the poor average sysadmin who reads over the man 
page for dd and says "Oh, this'll work" but doesn't realize that 
there's such a thing as PIPE_MAX hidden deep down, and that his 
dd command is going to fail miserably, _loosing important data_. 
How can we say that it was user error? Does he really have to be 
a kernel programmer to be able to use dd correctly? Is that 
really what POSIX means?

Clearly, it's an oversight in the document one way or another. 
POSIX should either say "dd is not required to read correctly 
from pipes, and must warn or refuse to read from a pipe"  
otherwise it needs to be changed to specify that FULLBLOCK should 
be default for pipes -- one way or the other.

From a practical standpoint, the most disgusting kind of design 
flaw in a copy program is for it to partially copy the data and 
not give specific warning to that effect.
It's like a backup program that doesn't warn you if it fails to 
complete, or a download program that doesn't warn you if it fails 
to finish.

POSIX, by Indicating that dd can read from pipes while not 
mentioning that it doesn't work under many common circumstances 
is not sane.

I notice that the POSIX documents we're reading are copyright 
through 2008 -- is there a newer one out now?

Anyway, I guess my next step is to try to file a complaint with 
IEEE to get this contradiction fixed.

Do you have any suggestions for getting in touch with IEEE?

Thank you very much,

Jesse





This bug report was last modified 14 years and 99 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.