From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 16 07:48:46 2024 Received: (at submit) by debbugs.gnu.org; 16 Jan 2024 12:48:46 +0000 Received: from localhost ([127.0.0.1]:48058 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rPis1-0004WA-HJ for submit@debbugs.gnu.org; Tue, 16 Jan 2024 07:48:46 -0500 Received: from lists.gnu.org ([2001:470:142::17]:37988) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <~@wolfsden.cz>) id 1rPirz-0004VP-TC for submit@debbugs.gnu.org; Tue, 16 Jan 2024 07:48:44 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1rPirr-0003b2-AF for bug-guile@gnu.org; Tue, 16 Jan 2024 07:48:35 -0500 Received: from wolfsden.cz ([37.205.8.62]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1rPirn-0003la-Tp for bug-guile@gnu.org; Tue, 16 Jan 2024 07:48:35 -0500 Received: by wolfsden.cz (Postfix, from userid 104) id 50577256DE4; Tue, 16 Jan 2024 12:48:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1705409308; bh=alskAz+dpfsEoJuYXtXbnk5z/cVM/q0gpTfYVmArpjg=; h=From:To:Cc:Subject:Date; b=Eb3iz9psQO3TFI2ALRIujHeBEzatfsNEt/V+SpfmINU11WEXM9EbkmlkusQk17nf6 FcKdIyVzZi7qOjOGNzK5DWhVPxNZpJZXi8tuBbOkrUkcEjiSFmP+BKCBecNHtKgk0K pnRka7OyYzr6UECCxVXPuG4pXMAhUU8MxubnoosxRI5xanwg4CW2f1553MIME+SlkH eGIE9mMOP8xfLbXE0IHXyiElwAv2fTdT8FKup7GN8wh6IS7+vOzhJm8SWfIhfQWAJM fAkbUfRPYf+wHzhY+VR6m1oFX+7RJx0AHqJmXnAz71Le7Xd79qZVvfmnhCwvtu0UOL Z7RmwHRgVVy/HJzW4dDoCimEIqcCwaCxSTy19OzymdJSy+gM3BazqgHtJUCqwL0N9C i+ghlfDGD3Vk79Qf4HfxzmxVj89rRn1y1cnf+464achw9Nb5raOgNu+ayrmXwGd8nB RDfxaZXQaNbEX9SRJQEIfnND3aIWSM2iHj1fN970w4C4ogAKsgpz4srf1SOYe2Ln3w 18bSZfjPBvrspvhxT9hjXlcd+iEQuJPwOTHDrJOhEzkq+fgavUCX9OrS7VZJo1Knil a955iuZSnURKn1U2Kom8sRQLlQ0HYaR3S5eRlW3Zmy+Z6CUatgQ1duaiEV5KSebaNK MPq1YBK9WUNJWg/McaOxS8eE= X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on wolfsden X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from localhost (unknown [81.17.16.72]) by wolfsden.cz (Postfix) with ESMTPSA id 905FF256DE3; Tue, 16 Jan 2024 12:48:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1705409307; bh=alskAz+dpfsEoJuYXtXbnk5z/cVM/q0gpTfYVmArpjg=; h=From:To:Cc:Subject:Date; b=wYM7mOjbW02LJpZ3vuxsUAFyaoU+UbEHSzzX1u1hEntfF1ZkcYneHwI54cnN0AJwm HHFnZfUVfwWIJueEFZXQoRTZ47GPYYBsF95RNZlmZ8VpTMu1myvNfpmy1RknGawV1C C+TRdzOEZdQNEQMiaFEyih8B4/7cwcTr9mPpp8CrQi7IlABVKuKRww5tUDmSrcNjFE asdV5yPZ11Yc85A69fY1P96Kb1u0eScCYigqNVm4gJai+O24cTzUsYbQ78qoQkZceZ l6C0E1nKAS9oGGuSjOJ2iVHDC6RwaJu7yFfpgC6I3UNjGl4PhIp2hrHuiKQUHipBgY r9t/DvfyPiCqUyGEOgjjaWxzH0qhJmkkKWuHHz3aPUjnWNcgicQfLjxj3YnPSTQFxV 14+4HxH/inRV8or5QJNLa9/SQIMZx1Z6BX1LKxpjxFlKTDequXAa7RjR2k7CY42fnt lvIO5TqAfAYhuZ+lDDAXGpM8Z7ouxPhFRb8wqKEVaH0YZTd6UddnDRjeoeFecBhQjc XwG7HZtIqzlrRkP0QRcqbFmjXq/FI0obKmBAwwHo+jQkEBV+bCk1G3zx1Bi8lN+rzu JV6z3N+zH/11KSGFybJdNwoNmfLueW9z5DRp56oYD/0vWEbWKMqKOpInZiy3l7hC8G G8m9wGo/Xk1NfxNb4R07J5Qs= From: Tomas Volf <~@wolfsden.cz> To: bug-guile@gnu.org Subject: [PATCH] Add copy-on-write support to scm_copy_file. Date: Tue, 16 Jan 2024 13:48:17 +0100 Message-ID: <20240116124817.14680-1-~@wolfsden.cz> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=37.205.8.62; envelope-from=~@wolfsden.cz; helo=wolfsden.cz X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit Cc: Tomas Volf <~@wolfsden.cz> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On modern file-systems (BTRFS, ZFS) it is possible to copy a file using copy-on-write method. For large files it has the advantage of being much faster and saving disk space (since identical extents are not duplicated). This feature is stable and for example coreutils' `cp' does use it automatically (see --reflink). This commit adds support for this feature into our copy-file (scm_copy_file) procedure. Same as `cp', it defaults to 'auto, meaning the copy-on-write is attempted, and in case of failure the regular copy is performed. No tests are provided, because the behavior depends on the system, underlying file-system and its configuration. That makes it challenging to write a test for it. Manual testing was performed instead: $ btrfs filesystem du /tmp/cow* Total Exclusive Set shared Filename 36.00KiB 36.00KiB 0.00B /tmp/cow $ cat cow-test.scm (copy-file "/tmp/cow" "/tmp/cow-unspecified") (copy-file "/tmp/cow" "/tmp/cow-always" #:copy-on-write 'always) (copy-file "/tmp/cow" "/tmp/cow-auto" #:copy-on-write 'auto) (copy-file "/tmp/cow" "/tmp/cow-never" #:copy-on-write 'never) (copy-file "/tmp/cow" "/dev/shm/cow-unspecified") (copy-file "/tmp/cow" "/dev/shm/cow-auto" #:copy-on-write 'auto) (copy-file "/tmp/cow" "/dev/shm/cow-never" #:copy-on-write 'never) $ ./meta/guile -s cow-test.scm $ btrfs filesystem du /tmp/cow* Total Exclusive Set shared Filename 36.00KiB 0.00B 36.00KiB /tmp/cow 36.00KiB 0.00B 36.00KiB /tmp/cow-always 36.00KiB 0.00B 36.00KiB /tmp/cow-auto 36.00KiB 36.00KiB 0.00B /tmp/cow-never 36.00KiB 0.00B 36.00KiB /tmp/cow-unspecified $ sha1sum /tmp/cow* /dev/shm/cow* 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-always 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-auto 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-never 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-unspecified 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-auto 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-never 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-unspecified This commit also adds to new failure modes for (copy-file). Failure to copy-on-write when 'always was passed in: scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'always) ice-9/boot-9.scm:1676:22: In procedure raise-exception: In procedure copy-file: copy-on-write failed: Invalid cross-device link Passing in invalid value for the #:copy-on-write keyword argument: scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'nevr) ice-9/boot-9.scm:1676:22: In procedure raise-exception: In procedure copy-file: invalid value for #:copy-on-write: nevr * NEWS: Add note for copy-file supporting copy-on-write. * configure.ac: Check for linux/fs.h. * doc/ref/posix.texi (File System)[copy-file]: Document the new signature. * libguile/filesys.c (clone_file): New function cloning a file using FICLONE, if supported. (k_copy_on_write): New keyword. (sym_always, sym_auto, sym_never): New symbols. (scm_copy_file): New #:copy-on-write keyword argument. Attempt copy-on-write copy by default. * libguile/filesys.h: Update signature for scm_copy_file. --- NEWS | 9 ++++++ configure.ac | 1 + doc/ref/posix.texi | 9 +++++- libguile/filesys.c | 74 +++++++++++++++++++++++++++++++++++++++------- libguile/filesys.h | 2 +- 5 files changed, 82 insertions(+), 13 deletions(-) diff --git a/NEWS b/NEWS index b319404d7..9147098c9 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,15 @@ definitely unused---this is notably the case for modules that are only used at macro-expansion time, such as (srfi srfi-26). In those cases, the compiler reports it as "possibly unused". +** copy-file now supports copy-on-write + +The copy-file procedure now takes an additional keyword argument, +#:copy-on-write, specifying whether copy-on-write should be done, if the +underlying file-system supports it. Possible values are 'always, 'auto +and 'never, with 'auto being the default. + +This speeds up copying large files a lot while saving the disk space. + * Bug fixes ** (ice-9 suspendable-ports) incorrect UTF-8 decoding diff --git a/configure.ac b/configure.ac index d0a2dc79b..c46586e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -418,6 +418,7 @@ AC_SUBST([SCM_I_GSC_HAVE_STRUCT_DIRENT64]) # sys/sendfile.h - non-POSIX, found in glibc # AC_CHECK_HEADERS([complex.h fenv.h io.h memory.h process.h \ +linux/fs.h \ sys/dir.h sys/ioctl.h sys/select.h \ sys/time.h sys/timeb.h sys/times.h sys/stdtypes.h sys/types.h \ sys/utime.h unistd.h utime.h pwd.h grp.h sys/utsname.h \ diff --git a/doc/ref/posix.texi b/doc/ref/posix.texi index fec42d061..d26808d91 100644 --- a/doc/ref/posix.texi +++ b/doc/ref/posix.texi @@ -896,10 +896,17 @@ of @code{delete-file}. Why doesn't POSIX have a @code{rmdirat} function for this instead? No idea! @end deffn -@deffn {Scheme Procedure} copy-file oldfile newfile +@deffn {Scheme Procedure} copy-file @var{oldfile} @var{newfile} @ + [#:copy-on-write='auto] @deffnx {C Function} scm_copy_file (oldfile, newfile) Copy the file specified by @var{oldfile} to @var{newfile}. The return value is unspecified. + +@code{#:copy-on-write} keyword argument determines whether copy-on-write +copy should be attempted and the behavior in case of failure. Possible +values are @code{'always} (attempt the copy-on-write, return error if it +fails), @code{'auto} (attempt the copy-on-write, fallback to regular +copy if it fails) and @code{'never} (perform the regular copy). @end deffn @deffn {Scheme Procedure} sendfile out in count [offset] diff --git a/libguile/filesys.c b/libguile/filesys.c index 1f0bba556..4fb8b9831 100644 --- a/libguile/filesys.c +++ b/libguile/filesys.c @@ -67,6 +67,11 @@ # include #endif +#if defined(HAVE_SYS_IOCTL_H) && defined(HAVE_LINUX_FS_H) +# include +# include +#endif + #include "async.h" #include "boolean.h" #include "dynwind.h" @@ -75,6 +80,7 @@ #include "fports.h" #include "gsubr.h" #include "iselect.h" +#include "keywords.h" #include "list.h" #include "load.h" /* for scm_i_mirror_backslashes */ #include "modules.h" @@ -1255,20 +1261,49 @@ SCM_DEFINE (scm_readlink, "readlink", 1, 0, 0, } #undef FUNC_NAME -SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, - (SCM oldfile, SCM newfile), +static int +clone_file (int oldfd, int newfd) +{ +#ifdef FICLONE + return ioctl (newfd, FICLONE, oldfd); +#else + (void)oldfd; + (void)newfd; + errno = EOPNOTSUPP; + return -1; +#endif +} + +SCM_KEYWORD (k_copy_on_write, "copy-on-write"); +SCM_SYMBOL (sym_always, "always"); +SCM_SYMBOL (sym_auto, "auto"); +SCM_SYMBOL (sym_never, "never"); + +SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 1, + (SCM oldfile, SCM newfile, SCM rest), "Copy the file specified by @var{oldfile} to @var{newfile}.\n" - "The return value is unspecified.") + "The return value is unspecified.\n" + "\n" + "@code{#:copy-on-write} keyword argument determines whether " + "copy-on-write copy should be attempted and the " + "behavior in case of failure. Possible values are " + "@code{'always} (attempt the copy-on-write, return error if " + "it fails), @code{'auto} (attempt the copy-on-write, " + "fallback to regular copy if it fails) and @code{'never} " + "(perform the regular copy)." + ) #define FUNC_NAME s_scm_copy_file { char *c_oldfile, *c_newfile; int oldfd, newfd; int n, rv; + SCM cow = sym_auto; + int clone_res; char buf[BUFSIZ]; struct stat_or_stat64 oldstat; scm_dynwind_begin (0); - + c_oldfile = scm_to_locale_string (oldfile); scm_dynwind_free (c_oldfile); c_newfile = scm_to_locale_string (newfile); @@ -1292,13 +1327,30 @@ SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, SCM_SYSERROR; } - while ((n = read (oldfd, buf, sizeof buf)) > 0) - if (write (newfd, buf, n) != n) - { - close (oldfd); - close (newfd); - SCM_SYSERROR; - } + scm_c_bind_keyword_arguments ("copy-file", rest, 0, + k_copy_on_write, &cow, + SCM_UNDEFINED); + + if (scm_is_eq (cow, sym_always) || scm_is_eq (cow, sym_auto)) + clone_res = clone_file(oldfd, newfd); + else if (scm_is_eq (cow, sym_never)) + clone_res = -1; + else + scm_misc_error ("copy-file", + "invalid value for #:copy-on-write: ~S", + scm_list_1 (cow)); + + if (scm_is_eq (cow, sym_always) && clone_res) + scm_syserror ("copy-file: copy-on-write failed"); + + if (clone_res) + while ((n = read (oldfd, buf, sizeof buf)) > 0) + if (write (newfd, buf, n) != n) + { + close (oldfd); + close (newfd); + SCM_SYSERROR; + } close (oldfd); if (close (newfd) == -1) SCM_SYSERROR; diff --git a/libguile/filesys.h b/libguile/filesys.h index 1ce50d30e..4f620dfef 100644 --- a/libguile/filesys.h +++ b/libguile/filesys.h @@ -73,7 +73,7 @@ SCM_API SCM scm_symlink (SCM oldpath, SCM newpath); SCM_API SCM scm_symlinkat (SCM dir, SCM oldpath, SCM newpath); SCM_API SCM scm_readlink (SCM path); SCM_API SCM scm_lstat (SCM str); -SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile); +SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile, SCM rest); SCM_API SCM scm_mkstemp (SCM tmpl); SCM_API SCM scm_mkdtemp (SCM tmpl); SCM_API SCM scm_dirname (SCM filename); -- 2.41.0 From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 24 05:27:25 2024 Received: (at 68504) by debbugs.gnu.org; 24 Jan 2024 10:27:25 +0000 Received: from localhost ([127.0.0.1]:44537 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rSaTc-0004qF-Qw for submit@debbugs.gnu.org; Wed, 24 Jan 2024 05:27:25 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:56726) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rSaTa-0004q0-2G for 68504@debbugs.gnu.org; Wed, 24 Jan 2024 05:27:23 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rSaTO-0005OY-Ii; Wed, 24 Jan 2024 05:27:10 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=XuXvZgmQdAJMWkeagTWDOMYEDtjPjp+h64MEsc4IqVc=; b=rnp3Jcr+1Jfz5ITciNDP TKKz4tKlbuyCKxLYNwKM3ueX+FTBmU8q8bA026fKUbRV2l9DaqtLK+3JPBjPGfkdVhj05faOa+CKv m+ROJCcLVtI/ZE+69J26yW7YXZEOFcqqntj5oF8ABPSmomakC15F+iicXPKLr5yWaj85Rk/iKlOa+ krEv2XreeQ6UgnFpiS/YZSYu9Pnd+W66NHCNwhZ1re8h8LxbVWkAWPYLsGQktettqywBfR6t681Nv jnmt4qIIvepGTybLDI+Je2GYf0zguEEoSTmcOs/LU4yURMvOKumIwQ3WtqROYDfhBFv1hdUjosH6B x88CIyRD/2CzNw==; From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Tomas Volf <~@wolfsden.cz> Subject: Re: bug#68504: [PATCH] Add copy-on-write support to scm_copy_file. In-Reply-To: <20240116124817.14680-1-~@wolfsden.cz> (Tomas Volf's message of "Tue, 16 Jan 2024 13:48:17 +0100") References: <20240116124817.14680-1-~@wolfsden.cz> Date: Wed, 24 Jan 2024 11:26:56 +0100 Message-ID: <87le8fuivj.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 68504 Cc: 68504@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Tomas Volf <~@wolfsden.cz> skribis: > On modern file-systems (BTRFS, ZFS) it is possible to copy a file using > copy-on-write method. For large files it has the advantage of being > much faster and saving disk space (since identical extents are not > duplicated). This feature is stable and for example coreutils' `cp' > does use it automatically (see --reflink). > > This commit adds support for this feature into our > copy-file (scm_copy_file) procedure. Same as `cp', it defaults to > 'auto, meaning the copy-on-write is attempted, and in case of failure > the regular copy is performed. > > No tests are provided, because the behavior depends on the system, > underlying file-system and its configuration. That makes it challenging > to write a test for it. Manual testing was performed instead: > > $ btrfs filesystem du /tmp/cow* > Total Exclusive Set shared Filename > 36.00KiB 36.00KiB 0.00B /tmp/cow > > $ cat cow-test.scm > (copy-file "/tmp/cow" "/tmp/cow-unspecified") > (copy-file "/tmp/cow" "/tmp/cow-always" #:copy-on-write 'always) > (copy-file "/tmp/cow" "/tmp/cow-auto" #:copy-on-write 'auto) > (copy-file "/tmp/cow" "/tmp/cow-never" #:copy-on-write 'never) > (copy-file "/tmp/cow" "/dev/shm/cow-unspecified") > (copy-file "/tmp/cow" "/dev/shm/cow-auto" #:copy-on-write 'auto) > (copy-file "/tmp/cow" "/dev/shm/cow-never" #:copy-on-write 'never) > $ ./meta/guile -s cow-test.scm > > $ btrfs filesystem du /tmp/cow* > Total Exclusive Set shared Filename > 36.00KiB 0.00B 36.00KiB /tmp/cow > 36.00KiB 0.00B 36.00KiB /tmp/cow-always > 36.00KiB 0.00B 36.00KiB /tmp/cow-auto > 36.00KiB 36.00KiB 0.00B /tmp/cow-never > 36.00KiB 0.00B 36.00KiB /tmp/cow-unspecified > > $ sha1sum /tmp/cow* /dev/shm/cow* > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-always > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-auto > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-never > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-unspecified > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-auto > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-never > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-unspecified > > This commit also adds to new failure modes for (copy-file). > > Failure to copy-on-write when 'always was passed in: > > scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-w= rite 'always) > ice-9/boot-9.scm:1676:22: In procedure raise-exception: > In procedure copy-file: copy-on-write failed: Invalid cross-device li= nk > > Passing in invalid value for the #:copy-on-write keyword argument: > > scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-w= rite 'nevr) > ice-9/boot-9.scm:1676:22: In procedure raise-exception: > In procedure copy-file: invalid value for #:copy-on-write: nevr > > * NEWS: Add note for copy-file supporting copy-on-write. > * configure.ac: Check for linux/fs.h. > * doc/ref/posix.texi (File System)[copy-file]: Document the new > signature. > * libguile/filesys.c (clone_file): New function cloning a file using > FICLONE, if supported. > (k_copy_on_write): New keyword. > (sym_always, sym_auto, sym_never): New symbols. > (scm_copy_file): New #:copy-on-write keyword argument. Attempt > copy-on-write copy by default. > * libguile/filesys.h: Update signature for scm_copy_file. The patch looks great (and very useful) to me, modulo one issue: > -SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile); > +SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile, SCM rest); Since this is a public interface, we cannot change this function=E2=80=99s signature during the 3.0 stable series. Thus, I would suggest keeping the public =E2=80=98scm_copy_file=E2=80=99 un= changed and internally having a three-argument variant. The Scheme-level =E2=80=98copy-file=E2=80=99 would map to that three-argument variant. (See= how =E2=80=98scm_pipe=E2=80=99 and =E2=80=98scm=C2=A0accept=E2=80=99 as example= s.) Could you send an updated patch? BTW, copyright assignment to the FSF is now optional but encouraged. Please see . Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 24 14:11:32 2024 Received: (at 68504) by debbugs.gnu.org; 24 Jan 2024 19:11:32 +0000 Received: from localhost ([127.0.0.1]:46588 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rSiep-0000py-VE for submit@debbugs.gnu.org; Wed, 24 Jan 2024 14:11:32 -0500 Received: from wolfsden.cz ([37.205.8.62]:57072) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <~@wolfsden.cz>) id 1rSiem-0000pn-KG for 68504@debbugs.gnu.org; Wed, 24 Jan 2024 14:11:30 -0500 Received: by wolfsden.cz (Postfix, from userid 104) id 37112277E2A; Wed, 24 Jan 2024 19:11:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1706123481; bh=UPB5RxiGuATXe+mR4nDnO2jEhQeEkYkJnSokYhxd4C0=; h=From:To:Cc:Subject:Date; b=coXCJhzEsElz48osKkFnir37O1OKMEFEQ1v0OMqH6a/Bywt7dDfsWyHomV3uQXI/6 n824eEm8u9Y5yjqOtKoCZtzQMqZksc9tHF7zVYm+1hLCeQi1q2NteDvfeYFqvUr2dk uMjkzuIK7sSrVivxWSZRVKquZoVaN/Bx1SD9kb/FgP9DbY3xlGNcyWcapqdaM0gv+7 mz+rOpCR5UWboBJ9A/Y2qU8ed2ro/7hR8+amvjUGZKT+eoqciW5zad5gJ4znF3UfJW 4BVwSkyhaQ1dnNuuFTPUZaIbhPCN5xOSCfJQRCsL/a41w8ErGeNBZ0zlJVuPDxV7hv DuaMh2TBem4JIj0TF53s7o1Cfn0QyjvehptfZA2MsdIXNvRELnXgVzP8+jS6OrLO8I LIN4xbchY4rXSciNVgINrV/bLoG286lVnIAz6qQbyKyZpDmzXrt0mZvz/3SiO7VWaQ CxCIHJ59Wb4yUaY42ipK/tvzjcOrZ/JrV+8H+ovGrds5CIibqQ15GvPkXbiuiS2woS apac/jB1KYF/59cmkWA4c60dIfLC9/3fQc8eUZJAiu5lXEwRdKru2Qe7gF34aRH6Cf sZTth/4ohnNRuatluXf9DffcEsrr1H4XBzSIxfFpwKiPkFXiLk8EuIBLJxpr9G6VKn sCxYvVlyEynuVdg6qzeWDXPs= X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on wolfsden X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from localhost (unknown [146.70.134.137]) by wolfsden.cz (Postfix) with ESMTPSA id 7A10227791B; Wed, 24 Jan 2024 19:11:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1706123480; bh=UPB5RxiGuATXe+mR4nDnO2jEhQeEkYkJnSokYhxd4C0=; h=From:To:Cc:Subject:Date; b=sFOJpEquIvNNZnF/euE2Uh2XjxDogwB+092EXE6mo1DKuE7ynanorPUlqfssyA7Fl 1iCtSwJrfxz8BBHug17nyWmk86Tw8E2NShoBJvmDOe0KkjhrzXeOZ6OQ7l6XWkpm9B WLrMVRZOs57448FjvXC0qf357fZuFiyc87sWsTwW+AVM/XqW+n+KiHdlAmS2mJYIZk vrOWYsNs+ovrPT6a1M3gSA7n+n4AQMjJjWeoHgXcx1UFOVVazuo45ZlBQJSFbjpm8q A+7594lTt++UCLctOR0HybXM3IpPJoiHynO3HUA0R6XjHiQNjpoDSBBWk1x30H1tQM uIecWf21zHIV34SQG4VvFP69Co6jypw/M67iJTYbJemKYOUlPPIwWXazs4SEW77dYW MYRZlMMfrz8mxaj/MZ6uUT7QFjU1GTaMct1FneMhNO5KDXoJDl3IV7O7e0eP18o4Fs 6mLnW8S6sUAiaVA+/pnTpa6Dn1jiP6OmB7LezIByA0B92C/Iuztb+Q8Qkg/LMtN/IL QAY7tV0CN8KjpmEml0vYJh6nKkXv32MCKyZmcJVownefKKoM966AC+huYMGbjb0999 HGL2yhV8+vq6gEDTjzqIxR7rJDuVnX0Wf+LEbz7UT6iH8tgWAr3++AIx5naOuXenok FtYORCcRnSmlDL7EYuQI+pow= From: Tomas Volf <~@wolfsden.cz> To: 68504@debbugs.gnu.org Subject: [PATCH v2] Add copy-on-write support to scm_copy_file. Date: Wed, 24 Jan 2024 20:10:32 +0100 Message-ID: <20240124191113.3256-1-~@wolfsden.cz> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 68504 Cc: Tomas Volf <~@wolfsden.cz> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On modern file-systems (BTRFS, ZFS) it is possible to copy a file using copy-on-write method. For large files it has the advantage of being much faster and saving disk space (since identical extents are not duplicated). This feature is stable and for example coreutils' `cp' does use it automatically (see --reflink). This commit adds support for this feature into our copy-file (scm_copy_file) procedure. Same as `cp', it defaults to 'auto, meaning the copy-on-write is attempted, and in case of failure the regular copy is performed. No tests are provided, because the behavior depends on the system, underlying file-system and its configuration. That makes it challenging to write a test for it. Manual testing was performed instead: $ btrfs filesystem du /tmp/cow* Total Exclusive Set shared Filename 36.00KiB 36.00KiB 0.00B /tmp/cow $ cat cow-test.scm (copy-file "/tmp/cow" "/tmp/cow-unspecified") (copy-file "/tmp/cow" "/tmp/cow-always" #:copy-on-write 'always) (copy-file "/tmp/cow" "/tmp/cow-auto" #:copy-on-write 'auto) (copy-file "/tmp/cow" "/tmp/cow-never" #:copy-on-write 'never) (copy-file "/tmp/cow" "/dev/shm/cow-unspecified") (copy-file "/tmp/cow" "/dev/shm/cow-auto" #:copy-on-write 'auto) (copy-file "/tmp/cow" "/dev/shm/cow-never" #:copy-on-write 'never) $ ./meta/guile -s cow-test.scm $ btrfs filesystem du /tmp/cow* Total Exclusive Set shared Filename 36.00KiB 0.00B 36.00KiB /tmp/cow 36.00KiB 0.00B 36.00KiB /tmp/cow-always 36.00KiB 0.00B 36.00KiB /tmp/cow-auto 36.00KiB 36.00KiB 0.00B /tmp/cow-never 36.00KiB 0.00B 36.00KiB /tmp/cow-unspecified $ sha1sum /tmp/cow* /dev/shm/cow* 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-always 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-auto 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-never 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-unspecified 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-auto 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-never 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-unspecified This commit also adds to new failure modes for (copy-file). Failure to copy-on-write when 'always was passed in: scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'always) ice-9/boot-9.scm:1676:22: In procedure raise-exception: In procedure copy-file: copy-on-write failed: Invalid cross-device link Passing in invalid value for the #:copy-on-write keyword argument: scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'nevr) ice-9/boot-9.scm:1676:22: In procedure raise-exception: In procedure copy-file: invalid value for #:copy-on-write: nevr * NEWS: Add note for copy-file supporting copy-on-write. * configure.ac: Check for linux/fs.h. * doc/ref/posix.texi (File System)[copy-file]: Document the new signature. * libguile/filesys.c (clone_file): New function cloning a file using FICLONE, if supported. (k_copy_on_write): New keyword. (sym_always, sym_auto, sym_never): New symbols. (scm_copy_file2): Renamed from scm_copy_file. New #:copy-on-write keyword argument. Attempt copy-on-write copy by default. (scm_copy_file): Call scm_copy_file2. * libguile/filesys.h: Add scm_copy_file2 as SCM_INTERNAL. --- v2: Introduce scm_copy_file2 in order to preserve backwards compatibility. NEWS | 9 +++++ configure.ac | 1 + doc/ref/posix.texi | 9 ++++- libguile/filesys.c | 82 +++++++++++++++++++++++++++++++++++++++------- libguile/filesys.h | 1 + 5 files changed, 89 insertions(+), 13 deletions(-) diff --git a/NEWS b/NEWS index b319404d7..9147098c9 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,15 @@ definitely unused---this is notably the case for modules that are only used at macro-expansion time, such as (srfi srfi-26). In those cases, the compiler reports it as "possibly unused". +** copy-file now supports copy-on-write + +The copy-file procedure now takes an additional keyword argument, +#:copy-on-write, specifying whether copy-on-write should be done, if the +underlying file-system supports it. Possible values are 'always, 'auto +and 'never, with 'auto being the default. + +This speeds up copying large files a lot while saving the disk space. + * Bug fixes ** (ice-9 suspendable-ports) incorrect UTF-8 decoding diff --git a/configure.ac b/configure.ac index d0a2dc79b..c46586e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -418,6 +418,7 @@ AC_SUBST([SCM_I_GSC_HAVE_STRUCT_DIRENT64]) # sys/sendfile.h - non-POSIX, found in glibc # AC_CHECK_HEADERS([complex.h fenv.h io.h memory.h process.h \ +linux/fs.h \ sys/dir.h sys/ioctl.h sys/select.h \ sys/time.h sys/timeb.h sys/times.h sys/stdtypes.h sys/types.h \ sys/utime.h unistd.h utime.h pwd.h grp.h sys/utsname.h \ diff --git a/doc/ref/posix.texi b/doc/ref/posix.texi index fec42d061..d26808d91 100644 --- a/doc/ref/posix.texi +++ b/doc/ref/posix.texi @@ -896,10 +896,17 @@ of @code{delete-file}. Why doesn't POSIX have a @code{rmdirat} function for this instead? No idea! @end deffn -@deffn {Scheme Procedure} copy-file oldfile newfile +@deffn {Scheme Procedure} copy-file @var{oldfile} @var{newfile} @ + [#:copy-on-write='auto] @deffnx {C Function} scm_copy_file (oldfile, newfile) Copy the file specified by @var{oldfile} to @var{newfile}. The return value is unspecified. + +@code{#:copy-on-write} keyword argument determines whether copy-on-write +copy should be attempted and the behavior in case of failure. Possible +values are @code{'always} (attempt the copy-on-write, return error if it +fails), @code{'auto} (attempt the copy-on-write, fallback to regular +copy if it fails) and @code{'never} (perform the regular copy). @end deffn @deffn {Scheme Procedure} sendfile out in count [offset] diff --git a/libguile/filesys.c b/libguile/filesys.c index 1f0bba556..5be42b825 100644 --- a/libguile/filesys.c +++ b/libguile/filesys.c @@ -67,6 +67,11 @@ # include #endif +#if defined(HAVE_SYS_IOCTL_H) && defined(HAVE_LINUX_FS_H) +# include +# include +#endif + #include "async.h" #include "boolean.h" #include "dynwind.h" @@ -75,6 +80,7 @@ #include "fports.h" #include "gsubr.h" #include "iselect.h" +#include "keywords.h" #include "list.h" #include "load.h" /* for scm_i_mirror_backslashes */ #include "modules.h" @@ -1255,20 +1261,49 @@ SCM_DEFINE (scm_readlink, "readlink", 1, 0, 0, } #undef FUNC_NAME -SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, - (SCM oldfile, SCM newfile), +static int +clone_file (int oldfd, int newfd) +{ +#ifdef FICLONE + return ioctl (newfd, FICLONE, oldfd); +#else + (void)oldfd; + (void)newfd; + errno = EOPNOTSUPP; + return -1; +#endif +} + +SCM_KEYWORD (k_copy_on_write, "copy-on-write"); +SCM_SYMBOL (sym_always, "always"); +SCM_SYMBOL (sym_auto, "auto"); +SCM_SYMBOL (sym_never, "never"); + +SCM_DEFINE (scm_copy_file2, "copy-file", 2, 0, 1, + (SCM oldfile, SCM newfile, SCM rest), "Copy the file specified by @var{oldfile} to @var{newfile}.\n" - "The return value is unspecified.") -#define FUNC_NAME s_scm_copy_file + "The return value is unspecified.\n" + "\n" + "@code{#:copy-on-write} keyword argument determines whether " + "copy-on-write copy should be attempted and the " + "behavior in case of failure. Possible values are " + "@code{'always} (attempt the copy-on-write, return error if " + "it fails), @code{'auto} (attempt the copy-on-write, " + "fallback to regular copy if it fails) and @code{'never} " + "(perform the regular copy)." + ) +#define FUNC_NAME s_scm_copy_file2 { char *c_oldfile, *c_newfile; int oldfd, newfd; int n, rv; + SCM cow = sym_auto; + int clone_res; char buf[BUFSIZ]; struct stat_or_stat64 oldstat; scm_dynwind_begin (0); - + c_oldfile = scm_to_locale_string (oldfile); scm_dynwind_free (c_oldfile); c_newfile = scm_to_locale_string (newfile); @@ -1292,13 +1327,30 @@ SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, SCM_SYSERROR; } - while ((n = read (oldfd, buf, sizeof buf)) > 0) - if (write (newfd, buf, n) != n) - { - close (oldfd); - close (newfd); - SCM_SYSERROR; - } + scm_c_bind_keyword_arguments ("copy-file", rest, 0, + k_copy_on_write, &cow, + SCM_UNDEFINED); + + if (scm_is_eq (cow, sym_always) || scm_is_eq (cow, sym_auto)) + clone_res = clone_file(oldfd, newfd); + else if (scm_is_eq (cow, sym_never)) + clone_res = -1; + else + scm_misc_error ("copy-file", + "invalid value for #:copy-on-write: ~S", + scm_list_1 (cow)); + + if (scm_is_eq (cow, sym_always) && clone_res) + scm_syserror ("copy-file: copy-on-write failed"); + + if (clone_res) + while ((n = read (oldfd, buf, sizeof buf)) > 0) + if (write (newfd, buf, n) != n) + { + close (oldfd); + close (newfd); + SCM_SYSERROR; + } close (oldfd); if (close (newfd) == -1) SCM_SYSERROR; @@ -1308,6 +1360,12 @@ SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, } #undef FUNC_NAME +SCM +scm_copy_file (SCM oldfile, SCM newfile) +{ + return scm_copy_file2 (oldfile, newfile, SCM_UNSPECIFIED); +} + SCM_DEFINE (scm_sendfile, "sendfile", 3, 1, 0, (SCM out, SCM in, SCM count, SCM offset), "Send @var{count} bytes from @var{in} to @var{out}, both of which " diff --git a/libguile/filesys.h b/libguile/filesys.h index 1ce50d30e..8e849fe7a 100644 --- a/libguile/filesys.h +++ b/libguile/filesys.h @@ -74,6 +74,7 @@ SCM_API SCM scm_symlinkat (SCM dir, SCM oldpath, SCM newpath); SCM_API SCM scm_readlink (SCM path); SCM_API SCM scm_lstat (SCM str); SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile); +SCM_INTERNAL SCM scm_copy_file2 (SCM oldfile, SCM newfile, SCM rest); SCM_API SCM scm_mkstemp (SCM tmpl); SCM_API SCM scm_mkdtemp (SCM tmpl); SCM_API SCM scm_dirname (SCM filename); -- 2.41.0 From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 24 14:16:37 2024 Received: (at 68504) by debbugs.gnu.org; 24 Jan 2024 19:16:37 +0000 Received: from localhost ([127.0.0.1]:46615 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rSijk-0000zh-Qz for submit@debbugs.gnu.org; Wed, 24 Jan 2024 14:16:37 -0500 Received: from wolfsden.cz ([37.205.8.62]:44576) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <~@wolfsden.cz>) id 1rSiji-0000zX-6P for 68504@debbugs.gnu.org; Wed, 24 Jan 2024 14:16:35 -0500 Received: by wolfsden.cz (Postfix, from userid 104) id BA018277349; Wed, 24 Jan 2024 19:16:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1706123786; bh=BSqPbZOfSZi+fATL1kz+lCJdqomu4NLz7XRYM/n/KSY=; h=From:To:Cc:Subject:Date; b=oNX43MXTMa69Zhrr9Ci5KAPMrC43+YI5ypqDi1SomtJbx59jvnZXNpcaWa4QKWrVp rPxzX5DR4Jf9XmGMIi+V/I8524feoX3sObKUWU74gqD/seDf4bztcQZ8vIa61KxJVF NNEo6PuSRetak6XAP2WBMscrR8RekY/19fPnQmkFwK/ez77QaUVal89bxRBNe5Yxae xyFxpbdmaRXh4Zh2DDlCsC6gINfhGhiiLu9Xnrk8O3jihh1APSasym2wBkKJM2NBMD 0N7i2+tZEn8pQDNktqZ/DWbK71PyC4dcKOrZSszZKL9omejdKtb+6Kq8ccP/hSAVit uqDD8QttzrCpyeKwT0klivQ3i0B1EK+dX37/swqZpB7CWch0VIDiLPtY3F24O8D9+Y IAt4dEKYF4fuMq9xDAlBitTG1AqDYOhRYlyIMfHmEb6kHBmn9gWuC32PWlrMW8z86s tOu8mNkoYUCGqSDXrDC3/L56Rf/zYGUqaj0X9J4Puc98/oRHLowuXURTFwtkZfIhU2 QmiGlOh/DL4mkBC+5lNkqhTttofi650f7oILe7e46M0x43DjPUcu3whJkB4annReHM xKGSKJhgKDR2dp5H7vEP2Thm/KeajK8mFgCeuZU1LmrYVck1kf3OfrRtQL/o3wI/eI aerW6vmQgg3QH5LhSJYo6874= X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on wolfsden X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from localhost (unknown [146.70.134.137]) by wolfsden.cz (Postfix) with ESMTPSA id E76AA276EBD; Wed, 24 Jan 2024 19:16:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1706123786; bh=BSqPbZOfSZi+fATL1kz+lCJdqomu4NLz7XRYM/n/KSY=; h=From:To:Cc:Subject:Date; b=oNX43MXTMa69Zhrr9Ci5KAPMrC43+YI5ypqDi1SomtJbx59jvnZXNpcaWa4QKWrVp rPxzX5DR4Jf9XmGMIi+V/I8524feoX3sObKUWU74gqD/seDf4bztcQZ8vIa61KxJVF NNEo6PuSRetak6XAP2WBMscrR8RekY/19fPnQmkFwK/ez77QaUVal89bxRBNe5Yxae xyFxpbdmaRXh4Zh2DDlCsC6gINfhGhiiLu9Xnrk8O3jihh1APSasym2wBkKJM2NBMD 0N7i2+tZEn8pQDNktqZ/DWbK71PyC4dcKOrZSszZKL9omejdKtb+6Kq8ccP/hSAVit uqDD8QttzrCpyeKwT0klivQ3i0B1EK+dX37/swqZpB7CWch0VIDiLPtY3F24O8D9+Y IAt4dEKYF4fuMq9xDAlBitTG1AqDYOhRYlyIMfHmEb6kHBmn9gWuC32PWlrMW8z86s tOu8mNkoYUCGqSDXrDC3/L56Rf/zYGUqaj0X9J4Puc98/oRHLowuXURTFwtkZfIhU2 QmiGlOh/DL4mkBC+5lNkqhTttofi650f7oILe7e46M0x43DjPUcu3whJkB4annReHM xKGSKJhgKDR2dp5H7vEP2Thm/KeajK8mFgCeuZU1LmrYVck1kf3OfrRtQL/o3wI/eI aerW6vmQgg3QH5LhSJYo6874= From: Tomas Volf <~@wolfsden.cz> To: 68504@debbugs.gnu.org Subject: [PATCH v3] Add copy-on-write support to scm_copy_file. Date: Wed, 24 Jan 2024 20:14:32 +0100 Message-ID: <20240124191607.3571-1-~@wolfsden.cz> X-Mailer: git-send-email 2.41.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 68504 Cc: Tomas Volf <~@wolfsden.cz> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On modern file-systems (BTRFS, ZFS) it is possible to copy a file using copy-on-write method. For large files it has the advantage of being much faster and saving disk space (since identical extents are not duplicated). This feature is stable and for example coreutils' `cp' does use it automatically (see --reflink). This commit adds support for this feature into our copy-file procedure. Same as `cp', it defaults to 'auto, meaning the copy-on-write is attempted, and in case of failure the regular copy is performed. No tests are provided, because the behavior depends on the system, underlying file-system and its configuration. That makes it challenging to write a test for it. Manual testing was performed instead: $ btrfs filesystem du /tmp/cow* Total Exclusive Set shared Filename 36.00KiB 36.00KiB 0.00B /tmp/cow $ cat cow-test.scm (copy-file "/tmp/cow" "/tmp/cow-unspecified") (copy-file "/tmp/cow" "/tmp/cow-always" #:copy-on-write 'always) (copy-file "/tmp/cow" "/tmp/cow-auto" #:copy-on-write 'auto) (copy-file "/tmp/cow" "/tmp/cow-never" #:copy-on-write 'never) (copy-file "/tmp/cow" "/dev/shm/cow-unspecified") (copy-file "/tmp/cow" "/dev/shm/cow-auto" #:copy-on-write 'auto) (copy-file "/tmp/cow" "/dev/shm/cow-never" #:copy-on-write 'never) $ ./meta/guile -s cow-test.scm $ btrfs filesystem du /tmp/cow* Total Exclusive Set shared Filename 36.00KiB 0.00B 36.00KiB /tmp/cow 36.00KiB 0.00B 36.00KiB /tmp/cow-always 36.00KiB 0.00B 36.00KiB /tmp/cow-auto 36.00KiB 36.00KiB 0.00B /tmp/cow-never 36.00KiB 0.00B 36.00KiB /tmp/cow-unspecified $ sha1sum /tmp/cow* /dev/shm/cow* 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-always 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-auto 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-never 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-unspecified 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-auto 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-never 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-unspecified This commit also adds to new failure modes for (copy-file). Failure to copy-on-write when 'always was passed in: scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'always) ice-9/boot-9.scm:1676:22: In procedure raise-exception: In procedure copy-file: copy-on-write failed: Invalid cross-device link Passing in invalid value for the #:copy-on-write keyword argument: scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'nevr) ice-9/boot-9.scm:1676:22: In procedure raise-exception: In procedure copy-file: invalid value for #:copy-on-write: nevr * NEWS: Add note for copy-file supporting copy-on-write. * configure.ac: Check for linux/fs.h. * doc/ref/posix.texi (File System)[copy-file]: Document the new signature. * libguile/filesys.c (clone_file): New function cloning a file using FICLONE, if supported. (k_copy_on_write): New keyword. (sym_always, sym_auto, sym_never): New symbols. (scm_copy_file2): Renamed from scm_copy_file. New #:copy-on-write keyword argument. Attempt copy-on-write copy by default. (scm_copy_file): Call scm_copy_file2. * libguile/filesys.h: Add scm_copy_file2 as SCM_INTERNAL. --- v2: Introduce scm_copy_file2 in order to preserve backwards compatibility. v3: Remove mention of scm_copy_file from the commit message. NEWS | 9 +++++ configure.ac | 1 + doc/ref/posix.texi | 9 ++++- libguile/filesys.c | 82 +++++++++++++++++++++++++++++++++++++++------- libguile/filesys.h | 1 + 5 files changed, 89 insertions(+), 13 deletions(-) diff --git a/NEWS b/NEWS index b319404d7..9147098c9 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,15 @@ definitely unused---this is notably the case for modules that are only used at macro-expansion time, such as (srfi srfi-26). In those cases, the compiler reports it as "possibly unused". +** copy-file now supports copy-on-write + +The copy-file procedure now takes an additional keyword argument, +#:copy-on-write, specifying whether copy-on-write should be done, if the +underlying file-system supports it. Possible values are 'always, 'auto +and 'never, with 'auto being the default. + +This speeds up copying large files a lot while saving the disk space. + * Bug fixes ** (ice-9 suspendable-ports) incorrect UTF-8 decoding diff --git a/configure.ac b/configure.ac index d0a2dc79b..c46586e9b 100644 --- a/configure.ac +++ b/configure.ac @@ -418,6 +418,7 @@ AC_SUBST([SCM_I_GSC_HAVE_STRUCT_DIRENT64]) # sys/sendfile.h - non-POSIX, found in glibc # AC_CHECK_HEADERS([complex.h fenv.h io.h memory.h process.h \ +linux/fs.h \ sys/dir.h sys/ioctl.h sys/select.h \ sys/time.h sys/timeb.h sys/times.h sys/stdtypes.h sys/types.h \ sys/utime.h unistd.h utime.h pwd.h grp.h sys/utsname.h \ diff --git a/doc/ref/posix.texi b/doc/ref/posix.texi index fec42d061..d26808d91 100644 --- a/doc/ref/posix.texi +++ b/doc/ref/posix.texi @@ -896,10 +896,17 @@ of @code{delete-file}. Why doesn't POSIX have a @code{rmdirat} function for this instead? No idea! @end deffn -@deffn {Scheme Procedure} copy-file oldfile newfile +@deffn {Scheme Procedure} copy-file @var{oldfile} @var{newfile} @ + [#:copy-on-write='auto] @deffnx {C Function} scm_copy_file (oldfile, newfile) Copy the file specified by @var{oldfile} to @var{newfile}. The return value is unspecified. + +@code{#:copy-on-write} keyword argument determines whether copy-on-write +copy should be attempted and the behavior in case of failure. Possible +values are @code{'always} (attempt the copy-on-write, return error if it +fails), @code{'auto} (attempt the copy-on-write, fallback to regular +copy if it fails) and @code{'never} (perform the regular copy). @end deffn @deffn {Scheme Procedure} sendfile out in count [offset] diff --git a/libguile/filesys.c b/libguile/filesys.c index 1f0bba556..5be42b825 100644 --- a/libguile/filesys.c +++ b/libguile/filesys.c @@ -67,6 +67,11 @@ # include #endif +#if defined(HAVE_SYS_IOCTL_H) && defined(HAVE_LINUX_FS_H) +# include +# include +#endif + #include "async.h" #include "boolean.h" #include "dynwind.h" @@ -75,6 +80,7 @@ #include "fports.h" #include "gsubr.h" #include "iselect.h" +#include "keywords.h" #include "list.h" #include "load.h" /* for scm_i_mirror_backslashes */ #include "modules.h" @@ -1255,20 +1261,49 @@ SCM_DEFINE (scm_readlink, "readlink", 1, 0, 0, } #undef FUNC_NAME -SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, - (SCM oldfile, SCM newfile), +static int +clone_file (int oldfd, int newfd) +{ +#ifdef FICLONE + return ioctl (newfd, FICLONE, oldfd); +#else + (void)oldfd; + (void)newfd; + errno = EOPNOTSUPP; + return -1; +#endif +} + +SCM_KEYWORD (k_copy_on_write, "copy-on-write"); +SCM_SYMBOL (sym_always, "always"); +SCM_SYMBOL (sym_auto, "auto"); +SCM_SYMBOL (sym_never, "never"); + +SCM_DEFINE (scm_copy_file2, "copy-file", 2, 0, 1, + (SCM oldfile, SCM newfile, SCM rest), "Copy the file specified by @var{oldfile} to @var{newfile}.\n" - "The return value is unspecified.") -#define FUNC_NAME s_scm_copy_file + "The return value is unspecified.\n" + "\n" + "@code{#:copy-on-write} keyword argument determines whether " + "copy-on-write copy should be attempted and the " + "behavior in case of failure. Possible values are " + "@code{'always} (attempt the copy-on-write, return error if " + "it fails), @code{'auto} (attempt the copy-on-write, " + "fallback to regular copy if it fails) and @code{'never} " + "(perform the regular copy)." + ) +#define FUNC_NAME s_scm_copy_file2 { char *c_oldfile, *c_newfile; int oldfd, newfd; int n, rv; + SCM cow = sym_auto; + int clone_res; char buf[BUFSIZ]; struct stat_or_stat64 oldstat; scm_dynwind_begin (0); - + c_oldfile = scm_to_locale_string (oldfile); scm_dynwind_free (c_oldfile); c_newfile = scm_to_locale_string (newfile); @@ -1292,13 +1327,30 @@ SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, SCM_SYSERROR; } - while ((n = read (oldfd, buf, sizeof buf)) > 0) - if (write (newfd, buf, n) != n) - { - close (oldfd); - close (newfd); - SCM_SYSERROR; - } + scm_c_bind_keyword_arguments ("copy-file", rest, 0, + k_copy_on_write, &cow, + SCM_UNDEFINED); + + if (scm_is_eq (cow, sym_always) || scm_is_eq (cow, sym_auto)) + clone_res = clone_file(oldfd, newfd); + else if (scm_is_eq (cow, sym_never)) + clone_res = -1; + else + scm_misc_error ("copy-file", + "invalid value for #:copy-on-write: ~S", + scm_list_1 (cow)); + + if (scm_is_eq (cow, sym_always) && clone_res) + scm_syserror ("copy-file: copy-on-write failed"); + + if (clone_res) + while ((n = read (oldfd, buf, sizeof buf)) > 0) + if (write (newfd, buf, n) != n) + { + close (oldfd); + close (newfd); + SCM_SYSERROR; + } close (oldfd); if (close (newfd) == -1) SCM_SYSERROR; @@ -1308,6 +1360,12 @@ SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0, } #undef FUNC_NAME +SCM +scm_copy_file (SCM oldfile, SCM newfile) +{ + return scm_copy_file2 (oldfile, newfile, SCM_UNSPECIFIED); +} + SCM_DEFINE (scm_sendfile, "sendfile", 3, 1, 0, (SCM out, SCM in, SCM count, SCM offset), "Send @var{count} bytes from @var{in} to @var{out}, both of which " diff --git a/libguile/filesys.h b/libguile/filesys.h index 1ce50d30e..8e849fe7a 100644 --- a/libguile/filesys.h +++ b/libguile/filesys.h @@ -74,6 +74,7 @@ SCM_API SCM scm_symlinkat (SCM dir, SCM oldpath, SCM newpath); SCM_API SCM scm_readlink (SCM path); SCM_API SCM scm_lstat (SCM str); SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile); +SCM_INTERNAL SCM scm_copy_file2 (SCM oldfile, SCM newfile, SCM rest); SCM_API SCM scm_mkstemp (SCM tmpl); SCM_API SCM scm_mkdtemp (SCM tmpl); SCM_API SCM scm_dirname (SCM filename); -- 2.41.0 From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 24 14:20:02 2024 Received: (at 68504) by debbugs.gnu.org; 24 Jan 2024 19:20:02 +0000 Received: from localhost ([127.0.0.1]:46625 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rSin4-00014k-6h for submit@debbugs.gnu.org; Wed, 24 Jan 2024 14:20:02 -0500 Received: from wolfsden.cz ([37.205.8.62]:47004) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <~@wolfsden.cz>) id 1rSin1-00014P-SJ for 68504@debbugs.gnu.org; Wed, 24 Jan 2024 14:20:01 -0500 Received: by wolfsden.cz (Postfix, from userid 104) id 55B6A2765E4; Wed, 24 Jan 2024 19:19:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1706123993; bh=XLEMKCJSp9uUx3OPhNYkzR238ZRCfb60Ut2T465xibU=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=XNCCBLK86Y5SV0ctU40OwvZOoeevMWx7p19mC/DXcE9SiN0ulzBoYEAGy7r0xz1Q/ 0F8wRngIw5LRrK1rrs95VLBpu3pgGDNwaWHg/GPj82kvY9qPqaRRKhaoLttxfME2pt DazqKUHhGsYkpBypej6ntvH27l1cH420bLyFnWyW6eINkLQN0AzbULDAL4Wov4uOvO cVfo0HKPT6QD8W5UneXZxsbLEiNAK4aK/Uk2HGva0rIhtIhFKhjpB/fKvKUPVebpXZ bCiyYjMEV8anA5+sCpelQ23Co9WLhlFey6zxpRspOl3sImd0Z/2ee/7Ksht9jeMk7j jBnWl/wYMnDE6zDsN/Fnppui+tMHQ9Ixj0oD+UmsRLCMDUuhL11uu3DNwkOJGgrh1T yf+avw+PervM9eobaJnyiqdD7ptr9WJ1L4kbpMeMhKUMto4jgmBA+ePzTCge2prlop shHvkPmPJonmRRVjCrsrbfCCZxR0qghNRipj+06JK/i4PsTo2bJIjInfbVbz/ZPqdH V/qw5sWW80fHhZlNEIEIYK9rf3r47kbKJBwC0MQAeiWP7M4+q1GtxpdMFDfRRZTF2x bjVU24zkCYNHZ0WBv/VX+Pf5azBZEK0v/gS8mSaTnps4XvA9LDwjq8JhwvM9UWPFPP mkkezVtxO+wWGX03woLcIlhU= X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on wolfsden X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from localhost (unknown [146.70.134.137]) by wolfsden.cz (Postfix) with ESMTPSA id 783A1277E95; Wed, 24 Jan 2024 19:19:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1706123992; bh=XLEMKCJSp9uUx3OPhNYkzR238ZRCfb60Ut2T465xibU=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=OzOFOmCrMmXVrFsqOP0aWxqUtGa75p6gipMMC6T6EdTqoC4qqwpE0RoNcj0IeOO80 62p3H89iC+D26/ZYsqgErpE5h5cvhptJ+zSRRVMmn0jKMKEcyaLPZxph6x2ZnG7lAL nskVXohqhh6K9LWfwctGOYsDT5QE9fXvMI8n51pMN1YwFc64MzihemBk4j4VxaLje9 9c8ti5DC44FIp/nZkG6UnZ71JlwuGgOHZil+3kew/+ITy2lgYuIXwwqDMY2MjWn4PJ 1K01/R0G6mOszKGIFdDsNRFeRmpJpNolR6Jlw4nhhKtjOeb8TVodMvZJXA4kzVJMDS +pd+uVa4YOHEZKi5pN59D25bmrxe0AGT3p0ycZfiJBkq6KsVIFiqEr0pz0ojLoLRQd 17uRyS+Qa4/EZ01pJ8Hrju+hptPkU7bnrZNdDSLrxwYvyUTs5u9X957lq0ne0EIeFZ 6ZImCj6Tv0cJvQLRK+3Vso754D51yNFOJ6oYWKBJjCJ54rLzWIzOGSqlG4LHjcuXdq p6NhQzpyJ92M+A5KjLvNOl/GdOHsQh7+xelNSCs3chqP4flqZBBlOt+X4hAYlP5aHZ N2AMXCFiTTlt3LtkUWAlm5int9xLZSAGBgfKtu0kTcrKRzQf07Ri382OkHEsUKvBBU jm6Y2bIC7PhNwpglyhW7AmXg= Date: Wed, 24 Jan 2024 20:19:51 +0100 From: Tomas Volf <~@wolfsden.cz> To: Ludovic =?iso-8859-1?Q?Court=E8s?= Subject: Re: bug#68504: [PATCH] Add copy-on-write support to scm_copy_file. Message-ID: References: <20240116124817.14680-1-~@wolfsden.cz> <87le8fuivj.fsf@gnu.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="xo/A4N8EqDgMYc40" Content-Disposition: inline In-Reply-To: <87le8fuivj.fsf@gnu.org> X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 68504 Cc: 68504@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --xo/A4N8EqDgMYc40 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2024-01-24 11:26:56 +0100, Ludovic Court=C3=A8s wrote: > > The patch looks great (and very useful) to me, modulo one issue: > > > -SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile); > > +SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile, SCM rest); > > Since this is a public interface, we cannot change this function=E2=80=99s > signature during the 3.0 stable series. > > Thus, I would suggest keeping the public =E2=80=98scm_copy_file=E2=80=99 = unchanged and > internally having a three-argument variant. The Scheme-level > =E2=80=98copy-file=E2=80=99 would map to that three-argument variant. (S= ee how > =E2=80=98scm_pipe=E2=80=99 and =E2=80=98scm=C2=A0accept=E2=80=99 as examp= les.) That is a very good point, which I did not realize at all. Thanks to the examples you provided, it was not that hard to do (well, assuming I did it right). > Could you send an updated patch? Done. However now that I read it after myself, I overlooked this occurrenc= e of scm_copy_file in the commit message: This commit adds support for this feature into our copy-file (scm_copy_file) procedure. Same as `cp', it defaults to So I just sent v3 right after v2, sorry for the noise, should have been more careful. > > BTW, copyright assignment to the FSF is now optional but encouraged. > Please see > . Since it is optional, I will currently opt into not doing the assignment, I= do not like the concept that much. I will try to find time to actually form an opinion based on facts. Have a nice day, Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. --xo/A4N8EqDgMYc40 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmWxYtcACgkQL7/ufbZ/ walVqxAAi5THU9sSyNMZOZigU3Wb4PVZWPkRE2FjPG1AiXbwpJlaki6A7yRvXYT7 N62ChPRI0CHf1NPBAnuKLd9w4/0rpARWTKBloVD4WklM39hQKaQLqOVtZoLKud4R huf9L6rZQ20oH071UwnP2el23v/tMSWNowcnlsydamVy4/d6TAsG8wqQkp6bG1wm 51v1N+KTzp+W1Gvro/LE5kd4f4gq8D+CpecKKs+XFKEeOh2IkwRhrRxuKVoSJyl4 1g1rOFOZocx8Z96Mywt/Gmz09qlpxqORfG8cGvPlhkMDtMrOv3cF4AhLb6ndDHg7 i6Qk0Qg21e5vN2ADQpWnW/61VT98KemRhqRoftezkOeGNHyJz1Cso7gNXN9OWJeu nE0NxJ/QDcYsNohL1Wh7h7+MVNmviUXqnkNsIqzWximo08LXcQ4/wQOKBCX1rYkA Nlxjcqe42VNn9yBDd/J0f/sb78IY+vFgwlaRk0BBrpYQ7HNgXyYt3LenSX2u4KPU NkmcLXysaqdXqmVhS+ZiV8JpB6yFh/LQV6ZnapTAY0uOjTmAtpIhLp01a56eH000 HbKy9GpBI+6125gjT48u/OXAl6q2OTgyhj6E++63QWWVqMBiA+69c1IYqEaudiR5 rGl/bXyKlu9WD+AkKTqVLj6ZGCgRs1dtEohNKT4d/NsxEm+jAqQ= =XZc3 -----END PGP SIGNATURE----- --xo/A4N8EqDgMYc40-- From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 12 09:07:23 2024 Received: (at 68504-done) by debbugs.gnu.org; 12 Mar 2024 13:07:23 +0000 Received: from localhost ([127.0.0.1]:42019 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rk1ql-0005LG-26 for submit@debbugs.gnu.org; Tue, 12 Mar 2024 09:07:23 -0400 Received: from eggs.gnu.org ([209.51.188.92]:60186) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rk1qg-0005Kr-1S for 68504-done@debbugs.gnu.org; Tue, 12 Mar 2024 09:07:21 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rk1q1-0004kF-8G; Tue, 12 Mar 2024 09:06:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=5zszqepfYMDk91jZ37NC4r89H2kiZOxEpovRv31jfZQ=; b=aR5DoMA7I4VOYhb7RD3i Zui5KNclAWEM+azVtOoKGPrnFRux+InV5yrshpPeYF+F4kf362GJNDhFd4vUlpBQ4Bnnmgxb1t8c+ vd7jDGDuKRSi7sWnOfu7ANvNPFAOIT6gNLL4gbdmeMk7FQJ3/wIVej5d1Ogt6QknMe/JRqAdkyKWv /XRbyYF9lhhMgUOdmhZqZUutdBHWHqS3ol90En4k4ALFJkq96iE15RFkxtlQctulmap1oxI2Eg0RZ HqRej2PbmsA9OEGkUewjhq4YDDJ9qStCTLL1LYEphh0bO0Dqcx/b9SxUGb9oJ3D11asDtrj4PjyTL L04L3fn5mXe3ww==; From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Tomas Volf <~@wolfsden.cz> Subject: Re: bug#68504: [PATCH v3] Add copy-on-write support to scm_copy_file. In-Reply-To: <20240124191607.3571-1-~@wolfsden.cz> (Tomas Volf's message of "Wed, 24 Jan 2024 20:14:32 +0100") References: <20240116124817.14680-1-~@wolfsden.cz> <20240124191607.3571-1-~@wolfsden.cz> Date: Tue, 12 Mar 2024 14:06:34 +0100 Message-ID: <878r2nmwf9.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 68504-done Cc: 68504-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Tomas, Tomas Volf <~@wolfsden.cz> skribis: > On modern file-systems (BTRFS, ZFS) it is possible to copy a file using > copy-on-write method. For large files it has the advantage of being > much faster and saving disk space (since identical extents are not > duplicated). This feature is stable and for example coreutils' `cp' > does use it automatically (see --reflink). > > This commit adds support for this feature into our copy-file procedure. > Same as `cp', it defaults to 'auto, meaning the copy-on-write is > attempted, and in case of failure the regular copy is performed. > > No tests are provided, because the behavior depends on the system, > underlying file-system and its configuration. That makes it challenging > to write a test for it. Manual testing was performed instead: > > $ btrfs filesystem du /tmp/cow* > Total Exclusive Set shared Filename > 36.00KiB 36.00KiB 0.00B /tmp/cow > > $ cat cow-test.scm > (copy-file "/tmp/cow" "/tmp/cow-unspecified") > (copy-file "/tmp/cow" "/tmp/cow-always" #:copy-on-write 'always) > (copy-file "/tmp/cow" "/tmp/cow-auto" #:copy-on-write 'auto) > (copy-file "/tmp/cow" "/tmp/cow-never" #:copy-on-write 'never) > (copy-file "/tmp/cow" "/dev/shm/cow-unspecified") > (copy-file "/tmp/cow" "/dev/shm/cow-auto" #:copy-on-write 'auto) > (copy-file "/tmp/cow" "/dev/shm/cow-never" #:copy-on-write 'never) > $ ./meta/guile -s cow-test.scm > > $ btrfs filesystem du /tmp/cow* > Total Exclusive Set shared Filename > 36.00KiB 0.00B 36.00KiB /tmp/cow > 36.00KiB 0.00B 36.00KiB /tmp/cow-always > 36.00KiB 0.00B 36.00KiB /tmp/cow-auto > 36.00KiB 36.00KiB 0.00B /tmp/cow-never > 36.00KiB 0.00B 36.00KiB /tmp/cow-unspecified > > $ sha1sum /tmp/cow* /dev/shm/cow* > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-always > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-auto > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-never > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /tmp/cow-unspecified > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-auto > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-never > 4c665f87b5dc2e7d26279c4b48968d085e1ace32 /dev/shm/cow-unspecified > > This commit also adds to new failure modes for (copy-file). > > Failure to copy-on-write when 'always was passed in: > > scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-w= rite 'always) > ice-9/boot-9.scm:1676:22: In procedure raise-exception: > In procedure copy-file: copy-on-write failed: Invalid cross-device li= nk > > Passing in invalid value for the #:copy-on-write keyword argument: > > scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-w= rite 'nevr) > ice-9/boot-9.scm:1676:22: In procedure raise-exception: > In procedure copy-file: invalid value for #:copy-on-write: nevr > > * NEWS: Add note for copy-file supporting copy-on-write. > * configure.ac: Check for linux/fs.h. > * doc/ref/posix.texi (File System)[copy-file]: Document the new > signature. > * libguile/filesys.c (clone_file): New function cloning a file using > FICLONE, if supported. > (k_copy_on_write): New keyword. > (sym_always, sym_auto, sym_never): New symbols. > (scm_copy_file2): Renamed from scm_copy_file. New #:copy-on-write > keyword argument. Attempt copy-on-write copy by default. > (scm_copy_file): Call scm_copy_file2. > * libguile/filesys.h: Add scm_copy_file2 as SCM_INTERNAL. > --- > v2: Introduce scm_copy_file2 in order to preserve backwards compatibility. > > v3: Remove mention of scm_copy_file from the commit message. Finally pushed as e1690f3fd251d69b3687ec12c6f4b41034047f0f. Note that I added copyright lines for you, let me know if I got it wrong. As a followup, we should add support for =E2=80=98copy_file_range=E2=80=99 = when FICLONE cannot be used; glibc supports it on all platforms but it returns ENOSYS on GNU/Hurd currently. WDYT? Thank you! Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 12 19:20:07 2024 Received: (at 68504) by debbugs.gnu.org; 12 Mar 2024 23:20:07 +0000 Received: from localhost ([127.0.0.1]:44158 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rkBPi-0002SC-Lr for submit@debbugs.gnu.org; Tue, 12 Mar 2024 19:20:06 -0400 Received: from wolfsden.cz ([37.205.8.62]:41454) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <~@wolfsden.cz>) id 1rkBPe-0002Rf-QH for 68504@debbugs.gnu.org; Tue, 12 Mar 2024 19:20:04 -0400 Received: by wolfsden.cz (Postfix, from userid 104) id 54D8928C755; Tue, 12 Mar 2024 23:19:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1710285566; bh=HSZvMKep+Tt6e/X+BQCCan72D2NfvM+02SbL7/xjQAw=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=qKbpz2u6knmMGtRcCwZo9ND7GrszdMxji19zXchcEXS80rQO/axA4K7pgAnCqU9N6 kT87F9GPfD6JZ0yS7IkIjPoGZLwZlxj+TishZwnRKD+X5HqcTkICf95sI+74nU3GLI o6SoEESDNzSTTYn44/boZ7h0P9bbX8IfuNl6ML+L7J8nYso62lAx+kyIbsFpKFJTFR Dg6pPyzJJcMHIM2kbaMDP/oYelIztwYaF16YYb9XVNDySNZdbs/JJr8SzJyZQXmCzz ZOAkdcMwHOUP7Lc1Rk0uraSjmNca/woyYgIgf/4bUslJtaAGMysSDupq2cxMEakBNs vEvYex/T7KMQr2YfRl35ZZz2wP/vwd/dH7PSDYB896wFtb+nzwxjuyFdLqLIoRSeth HpM2RpscUWoWgqGxufdxhFWscf5SThUAs5klmtI8J4mWo6js7BG29a+mzlsmTvCQ0E 8kvFo4BKGEsjnezUHYtZeeyD0QM0mPf41qRVjJYT0LCIHmDbJjMrCSBrH3sONUoOhe mWugWJMWZJVA4zoVEEdRZKBIBlYRB9GDtQnqyPzCkwxhwKxAS8fEeXtx8IvPVPxdkL Kt/7d1MLuppHtnWrxn3CG0sP3sHO+F+jmAgiR/PPIEO76PYBsIEE5X1S89dbFqNr82 emqTkyWDwuBnMZtGm/JtpuWU= X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on wolfsden X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 Received: from localhost (unknown [193.32.127.156]) by wolfsden.cz (Postfix) with ESMTPSA id 68DF62893C6; Tue, 12 Mar 2024 23:19:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail; t=1710285565; bh=HSZvMKep+Tt6e/X+BQCCan72D2NfvM+02SbL7/xjQAw=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=E9MRRko8EtlZnJ2/gIy5HfyDgB9mazH/oQkGuV0/jUOZyvqMM5IBP05NWCkrx+npM 3noFSo9HXzuKAQNsVv6GskvYwVuFeTv5JWXVC6DxKNmtUvZdVYUAAlySkA5yMO4LXL uD1m/xQ19mJyEQ0HoP0YJIUqFQHGuzW8j2zydcHX9iFph/2nmaxUHalZdbcJiSAISx 3op0svZ7Uiinl9YCUUCjfAv5qVQiT4hvG3KrVCdAKAFuo7LUVflCOZGjYeUHgg0XNJ kYFZdWE3e9zJmY6bTD7zivar4oQrhbgOvL48py3rgIwo40b8Cqs+WNilDtwhUPtvCx 5rhBvIIlfWguUTNR8Zxn3Gc4AW+xFmKOII/QpDugby49nJ9aLIHj4fUZZ97wz+smnk F38Jn0gBy1JyRqOnifMqvL9NtQch4jYamQgqmx5PX6FOYWk1Nc1ZRzOmu+l6sft7NN hR0LrvhA3MrzSblVnnRhl0WzpsOS9CP0ochWTWD3EMMjsR/isiyJSoatlxjFC8Wu/k 79q10Awu2qb4IVt2fHwz3C7sHv9OYwxzplKAr7lU0oVNgP4o8eX4x5XVJlAIgSTlYz 2zMFR4VO87yXbVQ3KWPUm/xnL7LDSIWj/7CW3kGyf07vB4LDwlPFyB2r8qlVhPfP8B jig4DNbBrgwCN+WX/4ge10rI= Date: Wed, 13 Mar 2024 00:19:24 +0100 From: Tomas Volf <~@wolfsden.cz> To: Ludovic =?iso-8859-1?Q?Court=E8s?= Subject: Re: bug#68504: [PATCH v3] Add copy-on-write support to scm_copy_file. Message-ID: References: <20240116124817.14680-1-~@wolfsden.cz> <20240124191607.3571-1-~@wolfsden.cz> <878r2nmwf9.fsf@gnu.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="vHDuOdkzqzrmfLwb" Content-Disposition: inline In-Reply-To: <878r2nmwf9.fsf@gnu.org> X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 68504 Cc: 68504@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --vHDuOdkzqzrmfLwb Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2024-03-12 14:06:34 +0100, Ludovic Court=C3=A8s wrote: > > Finally pushed as e1690f3fd251d69b3687ec12c6f4b41034047f0f. Note that I > added copyright lines for you, let me know if I got it wrong. Thank you for merging it, and thanks for the copyright, looks correct :) > As a followup, we should add support for =E2=80=98copy_file_range=E2=80= =99 when FICLONE > cannot be used; glibc supports it on all platforms but it returns ENOSYS > on GNU/Hurd currently. > > WDYT? Sure, I am willing to do my part. I managed to find this blog post[0], so = after some minor troubles I did manage to get a VM with GNU/Hurd running. Next I= will read up on copy_file_range and try to put together a patch. Just to make sure, your idea here is exactly what? Always try to use copy_file_range before the regular copy? So the flow would be For 'always case: CoW ---fail--> FAIL For 'auto case: CoW ---fail--> copy_file_range ---fail--> current copy ---fail--> = FAIL For 'never case: copy_file_range ---fail--> current copy ---fail--> FAIL Is that an accurate summary? Or did you mean only as a fallback for the Co= W, so only for 'auto, but not for the 'never? Tomas 0: https://guix.gnu.org/en/blog/2020/a-hello-world-virtual-machine-running-= the-hurd/ -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors. --vHDuOdkzqzrmfLwb Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEt4NJs4wUfTYpiGikL7/ufbZ/wakFAmXw4vwACgkQL7/ufbZ/ wamXaA//e89I0+8GmuZOBp9Ah0Sd983f/D+tQiuA3T8oDe99nWRl9RAuTfsz4uq2 NAY1ekT+x+07nEDmFoRMko0KWglObZGFO6h5Zu4buAG/81beh6QqNSsv3Jr9ZD1J weUOo/AiAQG8GelP4ZxAa185GDN9I9oBWkD4raGeeUErXaXzKw6M/R1c9ss1iswW txs/vbsdkP0FxSi7cBuL4cw3kdFRzZ7Ylcth2kTR3ers8p+76+MK+qqLOmYYQKry vR4Jsjjv4qJBF/aVJnLYrvjLA6FrF64xywvCLo2cmtGitNHlsK98ykwVRc/MDtEy gmefNWYXOssxqrtD/d74vz9Nkuo9Vh8/3fP8nxIh+Mx0tjrxzvTHJUVAaWq8ASnW jbbP8nC+kACuEk+JO4gU2yUb4OCw56illy8i75/0krXff+8zkfiL634Lg8DHrBxB B8/Uvy7vl8ND2yMmS97BsgHLyIgCjs6SKrlt1lqIkGRjS9ogKqHYhNLB70La5b3d XtRCYkRo0rsMluROuI9RWcjK9t/1VcC6X5a7gsDFGNMWP/7lffbGOJ/B8iSS+fe+ eFLr0ALmAYmI4SezgC0mG9fmIBJITxarprCcFtX5aqdQtPkaeoMM0KY/Nv4qWEqP fAdeh5QpVW3Cej9Q/p1vjzOb3EiUzsI9SQwGEoVD8ZTnzxxKPzE= =/Jsb -----END PGP SIGNATURE----- --vHDuOdkzqzrmfLwb-- From debbugs-submit-bounces@debbugs.gnu.org Thu Mar 21 10:47:37 2024 Received: (at 68504) by debbugs.gnu.org; 21 Mar 2024 14:47:37 +0000 Received: from localhost ([127.0.0.1]:37041 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rnJhh-00015W-5s for submit@debbugs.gnu.org; Thu, 21 Mar 2024 10:47:37 -0400 Received: from eggs.gnu.org ([209.51.188.92]:37140) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rnJhc-00015B-13 for 68504@debbugs.gnu.org; Thu, 21 Mar 2024 10:47:35 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rnJKR-00023s-Ux; Thu, 21 Mar 2024 10:23:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=DQy3sROtmG3NMr+dAW3aLwmyBkd91K4IVd0ILpRRTJA=; b=GGm+XCBBP4DpAZkk6p4b Y8RQ6BiZkT+ODkcZ/FTRXN0MM7kbSZptlYVNiovYpmeipptGBK3JDJQLyxvAHBIqkUXteeZNN/B4A JHoa5VlavT8jVfJb9ahbottjmlL3KhJIp/q568+yraGViYeS0YE54Y1sr1SIdeIQEb+bNh2hiG5a0 l6e+nO0EcrH4Ssq3DJ8FCy7hxRkVY6AyCQv7uZAG0PMW2rPSUHpFlq6LzYWCq7YmIV5DY1RmU6uRx USvlN+kDMmtq7JAVILHFkz2x9aZtrhvgPMF6XJJ46PkE3oud7GJsvKovg1QKb4v4yU6jaHLL795Uv A3mIE6KsrN0COA==; From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Tomas Volf <~@wolfsden.cz> Subject: Re: bug#68504: [PATCH v3] Add copy-on-write support to scm_copy_file. In-Reply-To: (Tomas Volf's message of "Wed, 13 Mar 2024 00:19:24 +0100") References: <20240116124817.14680-1-~@wolfsden.cz> <20240124191607.3571-1-~@wolfsden.cz> <878r2nmwf9.fsf@gnu.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Duodi 2 Germinal an 232 de la =?utf-8?Q?R=C3=A9volut?= =?utf-8?Q?ion=2C?= jour du Platane X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Thu, 21 Mar 2024 15:23:33 +0100 Message-ID: <87a5mrvf2y.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 68504 Cc: 68504@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Tomas Volf <~@wolfsden.cz> skribis: > Sure, I am willing to do my part. I managed to find this blog post[0], s= o after > some minor troubles I did manage to get a VM with GNU/Hurd running. Next= I will > read up on copy_file_range and try to put together a patch. It=E2=80=99s really just (service hurd-vm-service-type) on Guix System: https://guix.gnu.org/manual/devel/en/html_node/Virtualization-Services.ht= ml#The-Hurd-in-a-Virtual-Machine > Just to make sure, your idea here is exactly what? Always try to use > copy_file_range before the regular copy? So the flow would be > > For 'always case: > > CoW ---fail--> FAIL > > For 'auto case: > > CoW ---fail--> copy_file_range ---fail--> current copy ---fail--= > FAIL > > For 'never case: > > copy_file_range ---fail--> current copy ---fail--> FAIL > > Is that an accurate summary? Yes, that=E2=80=99s exactly what I had in mind. Actually it might be better to use sendfile(2), which is slightly less generic but otherwise equivalent AFAICS, and which happens to have a Hurd implementation in glibc. Thanks for your help! Ludo=E2=80=99. From unknown Thu Aug 14 22:20:42 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 19 Apr 2024 11:24:08 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator