GNU bug report logs - #73784
[PATCH] cp: new option --nocache-source

Previous Next

Package: coreutils;

Reported by: Masatake YAMATO <yamato <at> redhat.com>

Date: Sun, 13 Oct 2024 04:57:02 UTC

Severity: wishlist

Tags: patch

Full log


View this message in rfc822 format

From: Masatake YAMATO <yamato <at> redhat.com>
To: P <at> draigBrady.com
Cc: 73784 <at> debbugs.gnu.org
Subject: bug#73784: [PATCH] cp: new option --nocache-source
Date: Fri, 03 Jan 2025 04:05:35 +0900 (JST)
From: Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source
Date: Sun, 13 Oct 2024 15:59:27 +0100

> On 13/10/2024 05:56, Masatake YAMATO wrote:
>> When copying files, the system data cache are consumed, the system
>> data cache is utilized for both the source and destination files. In
>> scenarios such as creating backup files for old, unused files, it is
>> clear to users that these files will not be needed in the near
>> future. In such cases, retaining the data for these files in the cache
>> constitutes a waste of computer resources, especially when running
>> applications that require significant memory in the foreground.
>> With the new option, users will have the ability to request the
>> discarding of the system data cache, thereby avoiding the unwanted
>> swapping out of data from foreground processes.
>> I evaluated cache consumption using a script called
>> run.bash. Initially, run.bash creates many small files, each 8 KB in
>> size. It then copies these files using the cp command, both with and
>> without the specified option. Finally, it reports the difference in
>> the total size of the caches before and after the copying process.
>> run.bash:
>>      #!/bin/bash
>>      CP=$1
>>      shift
>>      [[ -e "$CP" ]] || {
>> 	echo "no file found: $CP" 1>&2
>> 	exit 1
>>      }
>>      N=8
>>      S=drop-src
>>      D=${HOME}/drop-dst
>>      mkdir -p $S
>>      mkdir -p $D
>>      start=
>>      end=
>>      print_cached()
>>      {
>> 	grep ^Cached: /proc/meminfo
>>      }
>>      start()
>>      {
>> 	start=$(print_cached | awk '{print $2}')
>>      }
>>      end()
>>      {
>> 	end=$(print_cached | awk '{print $2}')
>>      }
>>      report()
>>      {
>> 	echo -n "delta[$N:$1/$2]: "
>> 	expr "$end" - "$start"
>>      }
>>      cleanup()
>>      {
>> 	local i
>> 	local j
>> 	for ((i = 0; i < 10; i++)); do
>> 	    for ((j = 0; j < 10; j++)); do
>> 		rm -f $S/F-${i}${j}*
>> 		rm -f $D/F-${i}${j}*
>> 	    done
>> 	done
>> 	rm -f $S/F-*
>> 	rm -f $D/F-*
>>      }
>>      prep()
>>      {
>> 	local i
>> 	for ((i = 0; i < 1024 * $N; i++ )); do
>> 	    if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
>> 	            status=none; then
>> 		echo "failed in dd of=$S/F-$F" 1>&2
>> 		exit 1
>> 	    fi
>> 	done
>> 	sync
>>      }
>>      run_cp()
>>      {
>> 	start
>> 	local i
>> 	time for ((i = 0; i < 1024 * $N; i++ )); do
>> 	    if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
>> 		echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
>> 		exit 1
>> 	    fi
>> 	done
>> 	end
>> 	report "$1" $2
>>      }
>>      cleanup
>>      sync
>>      prep
>>      run_cp "$@"
>> running:
>>      ~/coreutils/nocache$  ./run.bash ../src/cp
>>      real	0m16.051s
>>      user	0m4.249s
>>      sys	0m12.437s
>>      delta[8:/]: 65548
>>      ~/coreutils/nocache$  ./run.bash ../src/cp --nocache-source
>>      real	0m17.109s
>>      user	0m4.492s
>>      sys	0m13.317s
>>      delta[8:--nocache-source/]: 620
>> --nocache-source option suppresses the consumption of the cache
>> massively.
> 
> Thanks for the patch.
> I have some reservations/notes though...
> 
> There is nothing particularly special about cp, that it might need
> this option.
> I.e. it would be nice to be able to wrap any program so that it
> streamed
> data through the cache, rather than aggressively cached.  I'm not sure
> how to do that,
> but also I'd be reluctant to start adding such options to individual
> commands though.
> Perhaps Linux' open() may gain an O_STREAM flag in future that might
> be
> more generally applied with a wrapper or something.

I found an interesting article: https://www.phoronix.com/news/Uncached-Buffered-IO-Linux-6.14

It seems that RWF_DONTCACHE flag of pwritev and preadv implements
what we need.

When Linux-6.14 is released, I will rewrite my patch based on
RWF_DONTCACHE.

Masatake YAMATO

> For single (large) files, one already has this functionality in dd.
> 
> On the write side, you'd also have to worry about syncing, to make the
> drop cache advisory effective, and this could impact performance.
> 
> Might this drop caches for already cached files,
> which cp may just happen to be copying,
> thus potentially impacting performance for other programs.
> 
> If reflinking we probably would not want to do this operation,
> since we're not reading the source.
> 
> thanks,
> Pádraig
>





This bug report was last modified 124 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.