GNU bug report logs - #73784
[PATCH] cp: new option --nocache-source

Previous Next

Package: coreutils;

Reported by: Masatake YAMATO <yamato <at> redhat.com>

Date: Sun, 13 Oct 2024 04:57:02 UTC

Severity: wishlist

Tags: patch

To reply to this bug, email your comments to 73784 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#73784; Package coreutils. (Sun, 13 Oct 2024 04:57:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Masatake YAMATO <yamato <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sun, 13 Oct 2024 04:57:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Masatake YAMATO <yamato <at> redhat.com>
To: bug-coreutils <at> gnu.org
Cc: yamato <at> redhat.com
Subject: [PATCH] cp: new option --nocache-source
Date: Sun, 13 Oct 2024 13:56:11 +0900
When copying files, the system data cache are consumed, the system
data cache is utilized for both the source and destination files. In
scenarios such as creating backup files for old, unused files, it is
clear to users that these files will not be needed in the near
future. In such cases, retaining the data for these files in the cache
constitutes a waste of computer resources, especially when running
applications that require significant memory in the foreground.

With the new option, users will have the ability to request the
discarding of the system data cache, thereby avoiding the unwanted
swapping out of data from foreground processes.

I evaluated cache consumption using a script called
run.bash. Initially, run.bash creates many small files, each 8 KB in
size. It then copies these files using the cp command, both with and
without the specified option. Finally, it reports the difference in
the total size of the caches before and after the copying process.

run.bash:

    #!/bin/bash
    CP=$1
    shift

    [[ -e "$CP" ]] || {
	echo "no file found: $CP" 1>&2
	exit 1
    }

    N=8
    S=drop-src
    D=${HOME}/drop-dst

    mkdir -p $S
    mkdir -p $D

    start=
    end=
    print_cached()
    {
	grep ^Cached: /proc/meminfo
    }

    start()
    {
	start=$(print_cached | awk '{print $2}')
    }

    end()
    {
	end=$(print_cached | awk '{print $2}')
    }

    report()
    {
	echo -n "delta[$N:$1/$2]: "
	expr "$end" - "$start"
    }

    cleanup()
    {
	local i
	local j
	for ((i = 0; i < 10; i++)); do
	    for ((j = 0; j < 10; j++)); do
		rm -f $S/F-${i}${j}*
		rm -f $D/F-${i}${j}*
	    done
	done
	rm -f $S/F-*
	rm -f $D/F-*
    }

    prep()
    {
	local i
	for ((i = 0; i < 1024 * $N; i++ )); do
	    if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
	            status=none; then
		echo "failed in dd of=$S/F-$F" 1>&2
		exit 1
	    fi
	done
	sync
    }

    run_cp()
    {
	start

	local i
	time for ((i = 0; i < 1024 * $N; i++ )); do
	    if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
		echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
		exit 1
	    fi
	done

	end
	report "$1" $2
    }

    cleanup
    sync

    prep
    run_cp "$@"

running:

    ~/coreutils/nocache$  ./run.bash ../src/cp

    real	0m16.051s
    user	0m4.249s
    sys	0m12.437s
    delta[8:/]: 65548
    ~/coreutils/nocache$  ./run.bash ../src/cp --nocache-source

    real	0m17.109s
    user	0m4.492s
    sys	0m13.317s
    delta[8:--nocache-source/]: 620

--nocache-source option suppresses the consumption of the cache
massively.

* src/copy.h (struct cp_options): New member 'src_nocache'.
* src/copy.c (copy_reg): Call 'fdadvise (..., FADVISE_DONTNEED)'
if the new member is true.
* src/cp.c (NOCACHE_SOURCE_OPTION):
New enumerator.
(long_opts): Add the option.
(usage): Document the option.
(main): Support the option.
* doc/coreutils.texi (mv invocation): Document the option.
* NEWS: Mention the option.
---
 NEWS               |  3 +++
 doc/coreutils.texi |  4 ++++
 src/copy.c         |  2 ++
 src/copy.h         |  3 +++
 src/cp.c           | 13 +++++++++++++
 5 files changed, 25 insertions(+)

diff --git a/NEWS b/NEWS
index 8fea2657b..483c79e91 100644
--- a/NEWS
+++ b/NEWS
@@ -46,6 +46,9 @@ GNU coreutils NEWS                                    -*- outline -*-
   %<i>$ format, where '<i>' is an integer referencing a particular argument,
   thus allowing repetition or reordering of printf arguments.
 
+  cp now accepts --nocache-source option, which causes cp to request to discard
+  the system data cache for the source files.
+
 ** Improvements
 
   'head -c NUM', 'head -n NUM', 'nl -l NUM', 'nproc --ignore NUM',
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index d5dd55092..e5bdc70ff 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -10305,6 +10305,10 @@ skip existing files but not fail.
 If a file cannot be renamed because the destination file system differs,
 fail with a diagnostic instead of copying and then removing the file.
 
+@item --nocache-source
+@opindex --nocache-source
+Request to discard the system data cache for the source files.
+
 @item --exchange
 @opindex --exchange
 Exchange source and destination instead of renaming source to destination.
diff --git a/src/copy.c b/src/copy.c
index b1ac52c79..1c2ed367d 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1705,6 +1705,8 @@ close_src_and_dst_desc:
       return_val = false;
     }
 close_src_desc:
+  if (x->src_nocache)
+    fdadvise (source_desc, 0, 0, FADVISE_DONTNEED);
   if (close (source_desc) < 0)
     {
       error (0, errno, _("failed to close %s"), quoteaf (src_name));
diff --git a/src/copy.h b/src/copy.h
index ab89c75fd..1042ecf7b 100644
--- a/src/copy.h
+++ b/src/copy.h
@@ -302,6 +302,9 @@ struct cp_options
 
   /* FIXME */
   Hash_table *src_info;
+
+  /* Request to discard cache for the source files. */
+  bool src_nocache;
 };
 
 bool copy (char const *src_name, char const *dst_name,
diff --git a/src/cp.c b/src/cp.c
index 127b5603f..10d66fd80 100644
--- a/src/cp.c
+++ b/src/cp.c
@@ -63,6 +63,8 @@ enum
   COPY_CONTENTS_OPTION,
   DEBUG_OPTION,
   NO_PRESERVE_ATTRIBUTES_OPTION,
+  NOCACHE_DESTINATION_OPTION,
+  NOCACHE_SOURCE_OPTION,
   PARENTS_OPTION,
   PRESERVE_ATTRIBUTES_OPTION,
   REFLINK_OPTION,
@@ -127,6 +129,8 @@ static struct option const long_opts[] =
   {"no-dereference", no_argument, nullptr, 'P'},
   {"no-preserve", required_argument, nullptr, NO_PRESERVE_ATTRIBUTES_OPTION},
   {"no-target-directory", no_argument, nullptr, 'T'},
+  {"nocache-destination", no_argument, nullptr, NOCACHE_DESTINATION_OPTION},
+  {"nocache-source", no_argument, nullptr, NOCACHE_SOURCE_OPTION},
   {"one-file-system", no_argument, nullptr, 'x'},
   {"parents", no_argument, nullptr, PARENTS_OPTION},
   {"path", no_argument, nullptr, PARENTS_OPTION},   /* Deprecated.  */
@@ -197,6 +201,9 @@ Copy SOURCE to DEST, or multiple SOURCE(s) to DIRECTORY.\n\
       fputs (_("\
   -n, --no-clobber             (deprecated) silently skip existing files.\n\
                                  See also --update\n\
+"), stdout);
+      fputs (_("\
+      --nocache-source         request to discard the cache for the source files.\n\
 "), stdout);
       fputs (_("\
   -P, --no-dereference         never follow symbolic links in SOURCE\n\
@@ -875,6 +882,8 @@ cp_option_init (struct cp_options *x)
 
   x->dest_info = nullptr;
   x->src_info = nullptr;
+
+  x->src_nocache = false;
 }
 
 /* Given a string, ARG, containing a comma-separated list of arguments
@@ -1088,6 +1097,10 @@ main (int argc, char **argv)
           decode_preserve_arg (optarg, &x, false);
           break;
 
+        case NOCACHE_SOURCE_OPTION:
+          x.src_nocache = true;
+          break;
+
         case PRESERVE_ATTRIBUTES_OPTION:
           if (optarg == nullptr)
             {
-- 
2.46.2





Information forwarded to bug-coreutils <at> gnu.org:
bug#73784; Package coreutils. (Sun, 13 Oct 2024 15:01:01 GMT) Full text and rfc822 format available.

Message #8 received at 73784 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Masatake YAMATO <yamato <at> redhat.com>, 73784 <at> debbugs.gnu.org
Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source
Date: Sun, 13 Oct 2024 15:59:27 +0100
On 13/10/2024 05:56, Masatake YAMATO wrote:
> When copying files, the system data cache are consumed, the system
> data cache is utilized for both the source and destination files. In
> scenarios such as creating backup files for old, unused files, it is
> clear to users that these files will not be needed in the near
> future. In such cases, retaining the data for these files in the cache
> constitutes a waste of computer resources, especially when running
> applications that require significant memory in the foreground.
> 
> With the new option, users will have the ability to request the
> discarding of the system data cache, thereby avoiding the unwanted
> swapping out of data from foreground processes.
> 
> I evaluated cache consumption using a script called
> run.bash. Initially, run.bash creates many small files, each 8 KB in
> size. It then copies these files using the cp command, both with and
> without the specified option. Finally, it reports the difference in
> the total size of the caches before and after the copying process.
> 
> run.bash:
> 
>      #!/bin/bash
>      CP=$1
>      shift
> 
>      [[ -e "$CP" ]] || {
> 	echo "no file found: $CP" 1>&2
> 	exit 1
>      }
> 
>      N=8
>      S=drop-src
>      D=${HOME}/drop-dst
> 
>      mkdir -p $S
>      mkdir -p $D
> 
>      start=
>      end=
>      print_cached()
>      {
> 	grep ^Cached: /proc/meminfo
>      }
> 
>      start()
>      {
> 	start=$(print_cached | awk '{print $2}')
>      }
> 
>      end()
>      {
> 	end=$(print_cached | awk '{print $2}')
>      }
> 
>      report()
>      {
> 	echo -n "delta[$N:$1/$2]: "
> 	expr "$end" - "$start"
>      }
> 
>      cleanup()
>      {
> 	local i
> 	local j
> 	for ((i = 0; i < 10; i++)); do
> 	    for ((j = 0; j < 10; j++)); do
> 		rm -f $S/F-${i}${j}*
> 		rm -f $D/F-${i}${j}*
> 	    done
> 	done
> 	rm -f $S/F-*
> 	rm -f $D/F-*
>      }
> 
>      prep()
>      {
> 	local i
> 	for ((i = 0; i < 1024 * $N; i++ )); do
> 	    if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
> 	            status=none; then
> 		echo "failed in dd of=$S/F-$F" 1>&2
> 		exit 1
> 	    fi
> 	done
> 	sync
>      }
> 
>      run_cp()
>      {
> 	start
> 
> 	local i
> 	time for ((i = 0; i < 1024 * $N; i++ )); do
> 	    if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
> 		echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
> 		exit 1
> 	    fi
> 	done
> 
> 	end
> 	report "$1" $2
>      }
> 
>      cleanup
>      sync
> 
>      prep
>      run_cp "$@"
> 
> running:
> 
>      ~/coreutils/nocache$  ./run.bash ../src/cp
> 
>      real	0m16.051s
>      user	0m4.249s
>      sys	0m12.437s
>      delta[8:/]: 65548
>      ~/coreutils/nocache$  ./run.bash ../src/cp --nocache-source
> 
>      real	0m17.109s
>      user	0m4.492s
>      sys	0m13.317s
>      delta[8:--nocache-source/]: 620
> 
> --nocache-source option suppresses the consumption of the cache
> massively.

Thanks for the patch.
I have some reservations/notes though...

There is nothing particularly special about cp, that it might need this option.
I.e. it would be nice to be able to wrap any program so that it streamed
data through the cache, rather than aggressively cached.  I'm not sure how to do that,
but also I'd be reluctant to start adding such options to individual commands though.
Perhaps Linux' open() may gain an O_STREAM flag in future that might be
more generally applied with a wrapper or something.

For single (large) files, one already has this functionality in dd.

On the write side, you'd also have to worry about syncing, to make the
drop cache advisory effective, and this could impact performance.

Might this drop caches for already cached files,
which cp may just happen to be copying,
thus potentially impacting performance for other programs.

If reflinking we probably would not want to do this operation,
since we're not reading the source.

thanks,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#73784; Package coreutils. (Thu, 02 Jan 2025 19:06:03 GMT) Full text and rfc822 format available.

Message #11 received at 73784 <at> debbugs.gnu.org (full text, mbox):

From: Masatake YAMATO <yamato <at> redhat.com>
To: P <at> draigBrady.com
Cc: 73784 <at> debbugs.gnu.org
Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source
Date: Fri, 03 Jan 2025 04:05:35 +0900 (JST)
From: Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#73784: [PATCH] cp: new option --nocache-source
Date: Sun, 13 Oct 2024 15:59:27 +0100

> On 13/10/2024 05:56, Masatake YAMATO wrote:
>> When copying files, the system data cache are consumed, the system
>> data cache is utilized for both the source and destination files. In
>> scenarios such as creating backup files for old, unused files, it is
>> clear to users that these files will not be needed in the near
>> future. In such cases, retaining the data for these files in the cache
>> constitutes a waste of computer resources, especially when running
>> applications that require significant memory in the foreground.
>> With the new option, users will have the ability to request the
>> discarding of the system data cache, thereby avoiding the unwanted
>> swapping out of data from foreground processes.
>> I evaluated cache consumption using a script called
>> run.bash. Initially, run.bash creates many small files, each 8 KB in
>> size. It then copies these files using the cp command, both with and
>> without the specified option. Finally, it reports the difference in
>> the total size of the caches before and after the copying process.
>> run.bash:
>>      #!/bin/bash
>>      CP=$1
>>      shift
>>      [[ -e "$CP" ]] || {
>> 	echo "no file found: $CP" 1>&2
>> 	exit 1
>>      }
>>      N=8
>>      S=drop-src
>>      D=${HOME}/drop-dst
>>      mkdir -p $S
>>      mkdir -p $D
>>      start=
>>      end=
>>      print_cached()
>>      {
>> 	grep ^Cached: /proc/meminfo
>>      }
>>      start()
>>      {
>> 	start=$(print_cached | awk '{print $2}')
>>      }
>>      end()
>>      {
>> 	end=$(print_cached | awk '{print $2}')
>>      }
>>      report()
>>      {
>> 	echo -n "delta[$N:$1/$2]: "
>> 	expr "$end" - "$start"
>>      }
>>      cleanup()
>>      {
>> 	local i
>> 	local j
>> 	for ((i = 0; i < 10; i++)); do
>> 	    for ((j = 0; j < 10; j++)); do
>> 		rm -f $S/F-${i}${j}*
>> 		rm -f $D/F-${i}${j}*
>> 	    done
>> 	done
>> 	rm -f $S/F-*
>> 	rm -f $D/F-*
>>      }
>>      prep()
>>      {
>> 	local i
>> 	for ((i = 0; i < 1024 * $N; i++ )); do
>> 	    if ! dd if=/dev/zero of=$S/F-$i bs=4096 count=2 \
>> 	            status=none; then
>> 		echo "failed in dd of=$S/F-$F" 1>&2
>> 		exit 1
>> 	    fi
>> 	done
>> 	sync
>>      }
>>      run_cp()
>>      {
>> 	start
>> 	local i
>> 	time for ((i = 0; i < 1024 * $N; i++ )); do
>> 	    if ! "${CP}" "$@" "$S/F-$i" "$D/F-$i"; then
>> 		echo "failed in cp " "$@" "$S/F-$i" " $D/F-$i" 1>&2
>> 		exit 1
>> 	    fi
>> 	done
>> 	end
>> 	report "$1" $2
>>      }
>>      cleanup
>>      sync
>>      prep
>>      run_cp "$@"
>> running:
>>      ~/coreutils/nocache$  ./run.bash ../src/cp
>>      real	0m16.051s
>>      user	0m4.249s
>>      sys	0m12.437s
>>      delta[8:/]: 65548
>>      ~/coreutils/nocache$  ./run.bash ../src/cp --nocache-source
>>      real	0m17.109s
>>      user	0m4.492s
>>      sys	0m13.317s
>>      delta[8:--nocache-source/]: 620
>> --nocache-source option suppresses the consumption of the cache
>> massively.
> 
> Thanks for the patch.
> I have some reservations/notes though...
> 
> There is nothing particularly special about cp, that it might need
> this option.
> I.e. it would be nice to be able to wrap any program so that it
> streamed
> data through the cache, rather than aggressively cached.  I'm not sure
> how to do that,
> but also I'd be reluctant to start adding such options to individual
> commands though.
> Perhaps Linux' open() may gain an O_STREAM flag in future that might
> be
> more generally applied with a wrapper or something.

I found an interesting article: https://www.phoronix.com/news/Uncached-Buffered-IO-Linux-6.14

It seems that RWF_DONTCACHE flag of pwritev and preadv implements
what we need.

When Linux-6.14 is released, I will rewrite my patch based on
RWF_DONTCACHE.

Masatake YAMATO

> For single (large) files, one already has this functionality in dd.
> 
> On the write side, you'd also have to worry about syncing, to make the
> drop cache advisory effective, and this could impact performance.
> 
> Might this drop caches for already cached files,
> which cp may just happen to be copying,
> thus potentially impacting performance for other programs.
> 
> If reflinking we probably would not want to do this operation,
> since we're not reading the source.
> 
> thanks,
> Pádraig
>





Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Sun, 16 Feb 2025 06:59:03 GMT) Full text and rfc822 format available.

This bug report was last modified 124 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.