GNU bug report logs - #73784
[PATCH] cp: new option --nocache-source

Previous Next

Package: coreutils;

Reported by: Masatake YAMATO <yamato <at> redhat.com>

Date: Sun, 13 Oct 2024 04:57:02 UTC

Severity: wishlist

Tags: patch

Full log


Message #8 received at 73784 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Masatake YAMATO <yamato <at> redhat.com>, 73784 <at> debbugs.gnu.org
Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source
Date: Sun, 13 Oct 2024 15:59:27 +0100
On 13/10/2024 05:56, Masatake YAMATO wrote:
> When copying files, the system data cache are consumed, the system
> data cache is utilized for both the source and destination files. In
> scenarios such as creating backup files for old, unused files, it is
> clear to users that these files will not be needed in the near
> future. In such cases, retaining the data for these files in the cache
> constitutes a waste of computer resources, especially when running
> applications that require significant memory in the foreground.
> 
> With the new option, users will have the ability to request the
> discarding of the system data cache, thereby avoiding the unwanted
> swapping out of data from foreground processes.
> 
> I evaluated cache consumption using a script called
> run.bash. Initially, run.bash creates many small files, each 8 KB in
> size. It then copies these files using the cp command, both with and
> without the specified option. Finally, it reports the difference in
> the total size of the caches before and after the copying process.
> 
> run.bash:
> 
>      #!/bin/bash
>      CP=$1
>      shift
> 
>      [[ -e "$CP" ]] || {
> 	echo "no file found: $CP" 1>&2
> 	exit 1
>      }
> 
>      N=8
>      S=drop-src
>      D=${HOME}/drop-dst
> 
>      mkdir -p $S
>      mkdir -p $D
> 
>      start=
>      end=
>      print_cached()
>      {
> 	grep ^Cached: /proc/meminfo
>      }
> 
>      start()
>      {
> 	start=$(print_cached | awk '{print $2}')
>      }
> 
>      end()
>      {
> 	end=$(print_cached | awk '{print $2}')
>      }
> 
>      report()
>      {
> 	echo -n "delta[$N:$1/$2]: "
> 	expr "$end" - "$start"
>      }
> 
>      cleanup()
>      {
> 	local i
> 	local j
> 	for ((i = 0; i < 10; i++)); do
> 	    for ((j = 0; j < 10; j++)); do
> 		rm -f $S/F-${i}${j}*
> 		rm -f $D/F-${i}${j}*
> 	    done
> 	done
> 	rm -f $S/F-*
> 	rm -f $D/F-*
>      }
> 
>      prep()
>      {
> 	local i
> 	for ((i = 0; i < 1024 * $N; i++ )); do
> 	    if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
> 	            status=none; then
> 		echo "failed in dd of=$S/F-$F" 1>&2
> 		exit 1
> 	    fi
> 	done
> 	sync
>      }
> 
>      run_cp()
>      {
> 	start
> 
> 	local i
> 	time for ((i = 0; i < 1024 * $N; i++ )); do
> 	    if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
> 		echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
> 		exit 1
> 	    fi
> 	done
> 
> 	end
> 	report "$1" $2
>      }
> 
>      cleanup
>      sync
> 
>      prep
>      run_cp "$@"
> 
> running:
> 
>      ~/coreutils/nocache$  ./run.bash ../src/cp
> 
>      real	0m16.051s
>      user	0m4.249s
>      sys	0m12.437s
>      delta[8:/]: 65548
>      ~/coreutils/nocache$  ./run.bash ../src/cp --nocache-source
> 
>      real	0m17.109s
>      user	0m4.492s
>      sys	0m13.317s
>      delta[8:--nocache-source/]: 620
> 
> --nocache-source option suppresses the consumption of the cache
> massively.

Thanks for the patch.
I have some reservations/notes though...

There is nothing particularly special about cp, that it might need this option.
I.e. it would be nice to be able to wrap any program so that it streamed
data through the cache, rather than aggressively cached.  I'm not sure how to do that,
but also I'd be reluctant to start adding such options to individual commands though.
Perhaps Linux' open() may gain an O_STREAM flag in future that might be
more generally applied with a wrapper or something.

For single (large) files, one already has this functionality in dd.

On the write side, you'd also have to worry about syncing, to make the
drop cache advisory effective, and this could impact performance.

Might this drop caches for already cached files,
which cp may just happen to be copying,
thus potentially impacting performance for other programs.

If reflinking we probably would not want to do this operation,
since we're not reading the source.

thanks,
Pádraig




This bug report was last modified 124 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.