#8490 - dd reads random number of records from pipes - named or otherwise - coreutils 8.9

GNU bug report logs - #8490
dd reads random number of records from pipes - named or otherwise - coreutils 8.9

Reported by: Jesse Gordon <jesseg <at> nikola.com>

Date: Wed, 13 Apr 2011 02:46:02 UTC

Severity: normal

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Message #25 received at 8490-done <at> debbugs.gnu.org (full text, mbox):

From: Jesse Gordon <jesseg <at> nikola.com> To: Eric Blake <eblake <at> redhat.com> Cc: 8490-done <at> debbugs.gnu.org Subject: Re: bug#8490: dd reads random number of records from pipes - named or otherwise - coreutils 8.9 Date: Thu, 14 Apr 2011 13:53:20 -0700

On 4/14/2011 7:34 AM, Eric Blake wrote: > On 04/13/2011 05:36 PM, Jesse Gordon wrote: > >> I can't think of a single scenario where any script would rely on dd >> dropping bytes from the input pipe. >> Can anyone else? > It doesn't matter what you think; the problem is that you can't change > existing behavior. Is that really true? What if it was a buffer overrun? What if it was a bug where it segfaulted if a certain sequence of bytes was copied? Of course those can be fixed... What if it was a limitation to 2G bytes read? My point is that the existing be > You can add new commands that give new behavior, but > there are 40 years worth of scripts that rely on existing behavior, I'll admit that as a scientist and engineer, it bothers me when an intelligent person makes a broad sweeping claim without even stopping to think weather they can cite a single example - hypothetical or real. Making claims are easy and prove nothing - and may not be of much benefit. Can you even imagine a hypothetical situation where someone would depend on short reads from a pipe? Maybe in a random number generator? I can cite lots of examples where some poor sysadmin wants to just get his job done and needs dd to read all data from the input pipe.... :-) > so > even if _you_ can't think of someone that wants short reads from a pipe, > someone else may already be wanting it, and even relying on it, because > it is standardized that way. > But what if short pipe reads cause reading from pipes to be useless? > Maybe the best thing to do is work on having POSIX standardize the GNU > extension of iflags=fullblock. > Basically, because the users desired block size may not coincide with PIPE_MAX (or whatever,) there is never a guarantee that dd will always read the whole input data from a pipe (using POSIX options.) But, look at POSIX: It says under INPUT FILES that "The input file can be any file type" -- if we take that to include blocks and pipes, then we can see that dd is not compliant because it cannot reliably read from a pipe. Fact: Here's a prime example: Let's say I happen to want to read exactly 999983 blocks of 2099 bytes each from a pipe. With purely POSIX DD, it's impossible. Doesn't that violate POSIX? Doesn't a program's ability to function as described in POSIX preempt technical details about short reads from read()? And if POSIX really intends for pipe as input file to be not guaranteed usable, then it ought to mention something about that, but it doesn't. Any thinking observer would, I believe, have to conclude that dd is supposed to work perfectly to read 999983 blocks of 2099 bytes each from a pipe. So I asked myself how we might have this apparent contradiction, so I looked at POSIX for read() -- http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html It turns out that (Unless specified otherwise) read() is supposed to block when reading from a pipe until it has "some data." Unfortunately, POSIX for read() isn't specific about whether read() from a pipe must give a short read or wait for a full buffer. In other words, it looks like POSIX allows read() to either give a short read on a pipe (like it does now) or to read the requested number of bytes from the pipe before returning. (Stopping only, of course, for EOF/broken/etc.) Do you really think that the POSIX folks meant for dd to be unusable for reading from pipes? Just think of the poor average sysadmin who reads over the man page for dd and says "Oh, this'll work" but doesn't realize that there's such a thing as PIPE_MAX hidden deep down, and that his dd command is going to fail miserably, _loosing important data_. How can we say that it was user error? Does he really have to be a kernel programmer to be able to use dd correctly? Is that really what POSIX means? Clearly, it's an oversight in the document one way or another. POSIX should either say "dd is not required to read correctly from pipes, and must warn or refuse to read from a pipe" otherwise it needs to be changed to specify that FULLBLOCK should be default for pipes -- one way or the other. From a practical standpoint, the most disgusting kind of design flaw in a copy program is for it to partially copy the data and not give specific warning to that effect. It's like a backup program that doesn't warn you if it fails to complete, or a download program that doesn't warn you if it fails to finish. POSIX, by Indicating that dd can read from pipes while not mentioning that it doesn't work under many common circumstances is not sane. I notice that the POSIX documents we're reading are copyright through 2008 -- is there a newer one out now? Anyway, I guess my next step is to try to file a complaint with IEEE to get this contradiction fixed. Do you have any suggestions for getting in touch with IEEE? Thank you very much, Jesse

This bug report was last modified 14 years and 99 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #8490 dd reads random number of records from pipes - named or otherwise - coreutils 8.9

GNU bug report logs - #8490
dd reads random number of records from pipes - named or otherwise - coreutils 8.9