GNU bug report logs -
#15578
Parameter -d or --direct to open files with flag O_DIRECT?
Previous Next
Reported by: Kyle Sallee <kyle.sallee <at> gmail.com>
Date: Wed, 9 Oct 2013 23:21:01 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15578 in the body.
You can then email your comments to 15578 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#15578
; Package
coreutils
.
(Wed, 09 Oct 2013 23:21:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Kyle Sallee <kyle.sallee <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 09 Oct 2013 23:21:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Please consider the attached gzipped patch file.
It provides additional parameters for cat.
The additional parameter allows cat to open files and read content
without storing the content in the kernel's block cache.
[Message part 2 (text/html, inline)]
[coreutils-8.21-O_DIRECT-cat.patch.gz (application/x-gzip, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#15578
; Package
coreutils
.
(Thu, 10 Oct 2013 03:09:03 GMT)
Full text and
rfc822 format available.
Message #8 received at 15578 <at> debbugs.gnu.org (full text, mbox):
On 10/10/2013 12:18 AM, Kyle Sallee wrote:
> Please consider the attached gzipped patch file.
> It provides additional parameters for cat.
> The additional parameter allows cat to open files and read content
> without storing the content in the kernel's block cache.
>
dd supports this already and is more sorted to
lower level setting like this.
Could you describe _why_ you would like this in cat particularly.
thanks,
Pádraig.
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Thu, 10 Oct 2013 03:12:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Thu, 10 Oct 2013 03:12:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Kyle Sallee <kyle.sallee <at> gmail.com>
:
bug acknowledged by developer.
(Thu, 10 Oct 2013 03:12:03 GMT)
Full text and
rfc822 format available.
Message #15 received at 15578-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
tag 15578 notabug
thanks
On 10/09/2013 05:18 PM, Kyle Sallee wrote:
> Please consider the attached gzipped patch file.
Thanks for the patch.
Unless the patch is huger than 100k, it's best to post it inline rather
than forcing readers to go through extra hoops to unpack it.
> It provides additional parameters for cat.
> The additional parameter allows cat to open files and read content
> without storing the content in the kernel's block cache.
However, adding an option to 'cat' is probably not the best option. We
already have 'dd' that can request O_DIRECT on operating systems where
that is defined, so you are probably better off converting your scripts
to use dd than to try and retrofit cat and wait for the patches to
percolate into your distro. Look for 'dd iflag=direct'.
I'm closing this bug as we probably aren't going to change cat, but do
feel free to add additional comments on this thread as appropriate. Who
knows - maybe it's worth reopening and repurposing this bug report into
a documentation patch.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#15578
; Package
coreutils
.
(Thu, 10 Oct 2013 04:50:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 15578-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Please forgive the inconvenience.
Attaching a gzipped patch file
ensures the integrity of patch file content.
I should have presented only the idea
rather than a patch for the implementation.
I concur dd is already suited to the task.
Please consider
| sed 's/^/if=/' | xargs -r --max-lines=1 dd iflag=direct # in contrast
with
| xargs -r --max-lines=4096 cat -d --
Invoking dd 466059 times costs only a slight performance decrease
as compared with invoking cat 114 times.
However, this example probably represents rare usage for cat.
Thanks for granting the time and consideration.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#15578
; Package
coreutils
.
(Thu, 10 Oct 2013 08:10:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 15578 <at> debbugs.gnu.org (full text, mbox):
On 10/10/2013 05:49 AM, Kyle Sallee wrote:
> Please forgive the inconvenience.
> Attaching a gzipped patch file
> ensures the integrity of patch file content.
> I should have presented only the idea
> rather than a patch for the implementation.
>
> I concur dd is already suited to the task.
>
> Please consider
>
> | sed 's/^/if=/' | xargs -r --max-lines=1 dd iflag=direct # in contrast
> with
> | xargs -r --max-lines=4096 cat -d --
>
> Invoking dd 466059 times costs only a slight performance decrease
> as compared with invoking cat 114 times.
> However, this example probably represents rare usage for cat.
>
> Thanks for granting the time and consideration.
Fair point, but still not worth adding to cat(1)
since it's not special in this regard.
Something like this might be more appropriate:
https://github.com/Feh/nocache
Note that doesn't avoid the page cache completely,
and so may be more performant/portable than O_DIRECT.
(dd has this functionality too as described at 'nocache' at:
http://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html)
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#15578
; Package
coreutils
.
(Thu, 10 Oct 2013 23:22:01 GMT)
Full text and
rfc822 format available.
Message #24 received at 15578 <at> debbugs.gnu.org (full text, mbox):
On 10/10/2013 09:09 AM, Pádraig Brady wrote:
> On 10/10/2013 05:49 AM, Kyle Sallee wrote:
>> Please forgive the inconvenience.
>> Attaching a gzipped patch file
>> ensures the integrity of patch file content.
>> I should have presented only the idea
>> rather than a patch for the implementation.
>>
>> I concur dd is already suited to the task.
>>
>> Please consider
>>
>> | sed 's/^/if=/' | xargs -r --max-lines=1 dd iflag=direct # in contrast
>> with
>> | xargs -r --max-lines=4096 cat -d --
>>
>> Invoking dd 466059 times costs only a slight performance decrease
>> as compared with invoking cat 114 times.
>> However, this example probably represents rare usage for cat.
>>
>> Thanks for granting the time and consideration.
>
> Fair point, but still not worth adding to cat(1)
> since it's not special in this regard.
>
> Something like this might be more appropriate:
> https://github.com/Feh/nocache
>
> Note that doesn't avoid the page cache completely,
> and so may be more performant/portable than O_DIRECT.
> (dd has this functionality too as described at 'nocache' at:
> http://www.gnu.org/software/coreutils/manual/html_node/dd-invocation.html)
One possibility worth mentioning, would be to add a files0-from=F option to dd,
like du,sort,wc already have.
Now those have it because they need to operate on the complete input set,
for accumulation or sorting, and thus can't resort to separated runs
with xargs or whatever. dd might use it as it has a very different command
syntax to the standard tools. So that would allow a general method
to efficiently read many files.
Another related thing to consider is the above would allow a single
process to handle everything, but it might be better to split the
load into a process per CPU.
thanks,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#15578
; Package
coreutils
.
(Sat, 12 Oct 2013 20:04:01 GMT)
Full text and
rfc822 format available.
Message #27 received at 15578 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Thu, Oct 10, 2013 at 4:21 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
>
> One possibility worth mentioning, would be to add a files0-from=F option
> to dd,
> like du,sort,wc already have.
>
Wonderful compromise. I agree.
I appreciate how it already continues an established and understood
convention
in parameters for coreutils utilities rather than creating something new.
Oddly enough I might have never used the -files0-from= parameter for du
sort wc
However, that might sometimes save on an invocation of cat within a pipe.
But I am not often working with null terminated lists of files.
Merely creating that list might sometimes require an invocation of "tr"
to translate the linefeed to null.
However, I am aware that "find" can generate lists of null terminated
output.
> Now those have it because they need to operate on the complete input set,
> for accumulation or sorting, and thus can't resort to separated runs
> with xargs or whatever.
That makes sense.
> dd might use it as it has a very different command
> syntax to the standard tools. So that would allow a general method
> to efficiently read many files.
>
Since dd might be the only utility from coreutils that allows
specification of additional open flags
then dd would have been my second choice for modification after cat.
> Another related thing to consider is the above would allow a single
> process to handle everything, but it might be better to split the
> load into a process per CPU.
>
The type of processing that dd does to transliterate character sets
and do conversion can be processor intensive
when the amount of data being processed is immense.
Therefore, the benefit of concurrent threaded processing of blocks
while outputting the processed blocks linearly
could allow dd to expedite processing in proportion
to the amount of processors allowed.
However, I anticipate problems could occur when processing around block
edges.
Yet individual processing of files concurrently avoids that potential
problem.
Perhaps concurrent processing of files with dd
would most often be accomplished with xargs?
That is because the output might not be desired as a single data block.
Therefore, if dd allowed specification of multiple input files
then it should probably also allow the specification
of a list of an equal amount of output files?
The cost for invoking dd through xargs
to gain concurrent processing of individual files
is low in comparison for the boon gained
from using two or more processors concurrently.
Therefore, the added complexity might not be worth
the additional speed when nearly the same benefit
can be gained by invoking dd by xargs -P
Aside from the rejected modification to cat,
and the proposed modification to dd,
and writing a small C helper program;
I wonder what possibilities exist
that have not yet been contemplated?
Perhaps a BASH plugin can be created
that reads a list of file names on stdin
and a list of open flags as parameters
and then opens those files and writes their output to stdout?
Chet might accept a patch for a such a plugin.
Because BASH does not install plugins by default
and lacks an inter-POSIX suggested installation location;
the availability and use of the plugins go unnoticed
by almost everyone who lacks specialization in shell scripting.
Or more succinctly stated few people might possess
an opinion concerning BASH plugins.
And if the plugin is rejected
then I can still regress to the idea of writing
a standalone C program for the task.
Although my original suggestion might have seemed otherwise,
I am not in favor of rapid mutation of coreutils
as opposed to carefully considered evolution.
I appreciate exploring different possibilities for implementing
functionality
and carefully weighing the boons and the banes to arrive at a conclusion
that becomes best for the long term goals of coreutils.
I am 100% satisfied with coreutils.
I wanted an eloquent solution for a spliter of an idea.
I arrogantly and hastily expected that coreutils
should immediately accommodate.
Please forgive my lack of objectivity, insistence, and persistence.
Please pardon the delayed response.
I wanted to avoid a rapid rate of communication
becoming a burden or a cause for irritation.
And I hoped that a day's worth of other activities
would grant opportunity for a fresh perspective
upon resuming consideration.
> thanks,
> Pádraig.
>
Thanks for the continued attention.
I am satisfied with the outcome.
From the conversation I gained a better understanding
of how to accomplish the task.
The outcome is appreciated.
I previously expected that "cat" --direct would suffice.
Yet now I also want to mitigate the necessity of invoking "cat" using
"xargs."
I should have realized that before sending the first email.
coreutils should be expected to do what coreutils does,
but not expected to be the most expeditious implementation.
I wanted unreasonable performance.
Please forgive that I created a bother
and selfishly utilized other Pádraig Brady's time
instead of thinking for myself.
That aside I should not omit saying;
thanks for maintaining and sharing coreutils.
Shell scripting would be insufferable without coreutils. :)
[Message part 2 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 10 Nov 2013 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 220 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.