Package: coreutils;
Reported by: Karl Berry <karl <at> freefriends.org>
Date: Sat, 26 Mar 2022 20:30:02 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Message #8 received at 54586 <at> debbugs.gnu.org (full text, mbox):
From: Bob Proulx <bob <at> proulx.com> To: Karl Berry <karl <at> freefriends.org> Cc: 54586 <at> debbugs.gnu.org Subject: Re: bug#54586: dd conv options doc Date: Mon, 4 Apr 2022 14:24:25 -0600
Karl Berry wrote: > 'fdatasync' > Synchronize output data just before finishing. This forces a > physical write of output data. > > 'fsync' > Synchronize output data and metadata just before finishing. > This forces a physical write of output data and metadata. > > Weirdly, these descriptions are inducing quite a bit of FUD in me. > > Why would I ever want the writes to be incomplete after running dd? > Seems like that is dd's whole purpose. Yes. FUD. The writes are not incomplete. It is no different than any other write. echo "Hello, World!" > file1 Is that write complete? It's no different. If one is incomplete then so is the other. Note that the documentation does not say "incomplete" but says "physical write". As in, chiseled into stone. The dd utility exists with a plethora of low level options not typically available in other utilities. Other utilities such as cp for example. That is one of the distinguishing features making dd useful in a very large number of cases when otherwise we would use cp, rsync, or one of the others. Very low level control of option flags. But just because options exist does not mean they should always be used. Most of the time they should not be used. > Well, I suppose it is too late to make such a radical change as forcing > a final sync. Please, no. Opposing this is the motivation for me writing this response. Things are wastefully slow already due to the number of fsync() calls now coded into everywhere all over the place. Other programs. Not referring to the coreutils here. Let's not make the problem worse by adding them where they are not desired. And that is why it is an option to dd and not on by default. In those specific cases where it is useful then it can be specified as an option. dd is exposing the interface for when it is useful. As a practical matter I think with GNU dd's extensions that I never ever use conv=fsync or conv=fdatasync but instead would always in those same cases use oflag=direct,sync. Such as when writing a removable storage device like a USB drive, that I subsequently will want to remove. There is no benefit to caching the data since it will be invalidated immediately. Not using buffer cache avoids flushing some other data that would be useful to keep in file system buffer cache. When the write is done then the removable media can be removed. This avoids needing to run sync explicitly. Which sync's *everything*. > In which case I suggest adding another sentence along the lines of > "If these options are not specified, the data will be physically > written when the system schedules the syncs, ordinarily every few > seconds" (correct?). Yes. However the behavior might vary slightly between the different kernels such as Linux kernel, BSD kernel, or even HP-UX kernel. Therefore the documentation of it is kernel specific. Even if all of the kernels operated similarly. > "You can also manually sync the output filesystem yourself > afterwards (xref sync)." Otherwise it feels uncertain when or > whether the data will be physically written, or how to look into it > further. Generally this is a task that the operating system should be handling. The programmer taking explicit control defeating the cache is almost always going to be less efficient at it than the operating system. However as you later mention writing an image to a removable storage device like a USB thumbdrive needs to have the data flushed through before removing the device. GNU dd is good for this as I will describe below but otherwise yes a "sync" (either the standalone or the oflag) would be needed to ensure that the data has been flushed through. > As for "metadata", what does dd have to do with metadata? My wild guess > is that this is referring to filesystem metadata, not anything about dd > specifically. Whatever the case, I suggest adding a word or two to the > doc to give a clue. It's not dd's fault. The OS created it first! It's a property given meaning by the OS. The OS defines the option flags. The dd utility is simply a thin layer giving access to the OS file option flags. > Further, why would I want data to be synced and not metadata? Seems like > fdatasync and fsync should both do both; or at least document that > normally they'd be used together. Or, if there is a real-life case where > a user would want one and not the other, how about documenting that? My > imagination is failing me, but presumably these seemingly-undesirable > options were invented for a reason. The fdatasync() man page provides the information. The aim of fdatasync() is to reduce disk activity for applications that do not require all metadata to be synchronized with the disk. In short fdatasync() is less heavy than fsync(). > BTW, I came across these options on a random page discussing dumping a > .iso to a USB drive; the example was > dd if=foo.iso of=/dev/sde conv=fdatasync > .. seems now like fsync should also have been given, for certainty. For completely portable use one can only write the data and then call sync afterward and then remove the removable storage after the sync completes. I don't know of any better fully portable way. It's silent if there are no errors. Depending upon the speed of the destination it might be tens of minutes before it completes. dd if=someimage.img of=/dev/sdX obs=16M sync Where /dev/sdX is the device path name of the destination. Always be very careful to ensure the correct destination name. Do not overwrite the wrong target destination. Doing so could destroy your system. For writing images to USB with GNU dd and the Linux kernel I prefer This following combination. It's the most friendly with very good user feedback. pv someimage.img | dd of=/dev/sdX obs=16M oflag=direct,sync Then use of pv wil provide a nice progress notification. Check it out! 4.31GiB 0:08:13 [8.94MiB/s] [==============================================>] 100% The main points being to use a output buffer size large enough to be efficient but small enough such that regular notification of progress is reported to the user. If it is too large then the progress reporting will be too "chunky". Ideally it will be a multiple of the internal flash NAND write block size. Which we can't know and can only take a guess. To keep this entirely within GNU dd there is the new status=progress option. $ dd if=someimage.img of=/dev/sdX obs=16M oflag=direct status=progress 426349056 bytes (426 MB, 407 MiB) copied, 3 s, 142 MB/s ... Honestly though it isn't anywhere near as nice as the progress report from pv and I always use pv+dd for this task. Give it a try! :-) Bob
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.