GNU bug report logs -
#73784
[PATCH] cp: new option --nocache-source
Previous Next
Full log
Message #11 received at 73784 <at> debbugs.gnu.org (full text, mbox):
From: Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source
Date: Sun, 13 Oct 2024 15:59:27 +0100
> On 13/10/2024 05:56, Masatake YAMATO wrote:
>> When copying files, the system data cache are consumed, the system
>> data cache is utilized for both the source and destination files. In
>> scenarios such as creating backup files for old, unused files, it is
>> clear to users that these files will not be needed in the near
>> future. In such cases, retaining the data for these files in the cache
>> constitutes a waste of computer resources, especially when running
>> applications that require significant memory in the foreground.
>> With the new option, users will have the ability to request the
>> discarding of the system data cache, thereby avoiding the unwanted
>> swapping out of data from foreground processes.
>> I evaluated cache consumption using a script called
>> run.bash. Initially, run.bash creates many small files, each 8 KB in
>> size. It then copies these files using the cp command, both with and
>> without the specified option. Finally, it reports the difference in
>> the total size of the caches before and after the copying process.
>> run.bash:
>> #!/bin/bash
>> CP=$1
>> shift
>> [[ -e "$CP" ]] || {
>> echo "no file found: $CP" 1>&2
>> exit 1
>> }
>> N=8
>> S=drop-src
>> D=${HOME}/drop-dst
>> mkdir -p $S
>> mkdir -p $D
>> start=
>> end=
>> print_cached()
>> {
>> grep ^Cached: /proc/meminfo
>> }
>> start()
>> {
>> start=$(print_cached | awk '{print $2}')
>> }
>> end()
>> {
>> end=$(print_cached | awk '{print $2}')
>> }
>> report()
>> {
>> echo -n "delta[$N:$1/$2]: "
>> expr "$end" - "$start"
>> }
>> cleanup()
>> {
>> local i
>> local j
>> for ((i = 0; i < 10; i++)); do
>> for ((j = 0; j < 10; j++)); do
>> rm -f $S/F-${i}${j}*
>> rm -f $D/F-${i}${j}*
>> done
>> done
>> rm -f $S/F-*
>> rm -f $D/F-*
>> }
>> prep()
>> {
>> local i
>> for ((i = 0; i < 1024 * $N; i++ )); do
>> if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
>> status=none; then
>> echo "failed in dd of=$S/F-$F" 1>&2
>> exit 1
>> fi
>> done
>> sync
>> }
>> run_cp()
>> {
>> start
>> local i
>> time for ((i = 0; i < 1024 * $N; i++ )); do
>> if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
>> echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
>> exit 1
>> fi
>> done
>> end
>> report "$1" $2
>> }
>> cleanup
>> sync
>> prep
>> run_cp "$@"
>> running:
>> ~/coreutils/nocache$ ./run.bash ../src/cp
>> real 0m16.051s
>> user 0m4.249s
>> sys 0m12.437s
>> delta[8:/]: 65548
>> ~/coreutils/nocache$ ./run.bash ../src/cp --nocache-source
>> real 0m17.109s
>> user 0m4.492s
>> sys 0m13.317s
>> delta[8:--nocache-source/]: 620
>> --nocache-source option suppresses the consumption of the cache
>> massively.
>
> Thanks for the patch.
> I have some reservations/notes though...
>
> There is nothing particularly special about cp, that it might need
> this option.
> I.e. it would be nice to be able to wrap any program so that it
> streamed
> data through the cache, rather than aggressively cached. I'm not sure
> how to do that,
> but also I'd be reluctant to start adding such options to individual
> commands though.
> Perhaps Linux' open() may gain an O_STREAM flag in future that might
> be
> more generally applied with a wrapper or something.
I found an interesting article: https://www.phoronix.com/news/Uncached-Buffered-IO-Linux-6.14
It seems that RWF_DONTCACHE flag of pwritev and preadv implements
what we need.
When Linux-6.14 is released, I will rewrite my patch based on
RWF_DONTCACHE.
Masatake YAMATO
> For single (large) files, one already has this functionality in dd.
>
> On the write side, you'd also have to worry about syncing, to make the
> drop cache advisory effective, and this could impact performance.
>
> Might this drop caches for already cached files,
> which cp may just happen to be copying,
> thus potentially impacting performance for other programs.
>
> If reflinking we probably would not want to do this operation,
> since we're not reading the source.
>
> thanks,
> Pádraig
>
This bug report was last modified 124 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.