Package: coreutils;
Reported by: "jeff.liu" <jeff.liu <at> oracle.com>
Date: Fri, 7 May 2010 14:16:02 UTC
Severity: normal
Tags: patch
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Message #128 received at submit <at> debbugs.gnu.org (full text, mbox):
From: "jeff.liu" <jeff.liu <at> oracle.com> To: Jim Meyering <jim <at> meyering.net> Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com>, bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com> Subject: Re: bug#6131: [PATCH]: fiemap support for efficient sparse file copy Date: Wed, 09 Jun 2010 22:46:20 +0800
Jim Meyering wrote: > Jim Meyering wrote: >> Subject: [PATCH 01/10] cp: Add FIEMAP support for efficient sparse file copy > > FYI, using those patches, I ran a test for the first time in a few days: > > check -C tests TESTS=cp/sparse-fiemap VERBOSE=yes > > It failed like this on an ext4 partition using F13: > > + timeout 10 cp --sparse=always sparse fiemap > + fail=1 > ++ stat --printf %s sparse > ++ stat --printf %s fiemap > + test 1099511628800 = 0 > + fail=1 > > That is very odd. No diagnostic from cp, yet it failed > after creating a zero-length file. > > Here's the corresponding piece of the script: > > # It takes many minutes to copy this sparse file using the old method. > # By contrast, it takes far less than 1 second using FIEMAP-copy. > timeout 10 cp --sparse=always sparse fiemap || fail=1 > > # Ensure that the sparse file copied through fiemap has the same size > # in bytes as the original. > test $(stat --printf %s sparse) = $(stat --printf %s fiemap) || fail=1 > > However, so far I've been unable to reproduce the failure, > running hundreds of iterations: > > for i in $(seq 300); do printf .; make check -C tests \ > TESTS=cp/sparse-fiemap VERBOSE=yes >& makerr-$i || break; done > > Have any of you heard of a problem whereby a cold cache can cause > such a thing? "echo 3 > /proc/sys/vm/drop_caches" didn't help. Hi Jim, Have you run `sync' before clean the buffer and caches? Actually, even run `sync' first, sometimes, maybe the dirty objects still can not be freed in some cases. :( I can reproduce this issue on ext4 and btrfs(physical mounted partition) or just run the sparse-fiemap test script, ocfs2 always works fine in this case. I guess this issue might caused by the 'cold cache' as your above mentioned. According to my tryout, after clean out the caches, cp via filemap always works in my test environment, otherwise, it failed from time to time. My kernel version: Linux jeff-laptop 2.6.33-rc5-00238-gb04da8b-dirty #11 SMP Sat Dec 19 22:02:01 CST 2009 i686 GNU/Linux jeff <at> jeff-laptop:/ext4$ dd if=/dev/zero of=sparse bs=1k count=1 seek=1G 1+0 records in 1+0 records out 1024 bytes (1.0 kB) copied, 0.000156654 s, 6.5 MB/s jeff <at> jeff-laptop:/ext4$ ls -l sparse -rw-r--r-- 1 jeff jeff 1099511628800 Jun 9 22:21 sparse jeff <at> jeff-laptop:/ext4$ filefrag sparse sparse: 0 extents found jeff <at> jeff-laptop:/ext4$ filefrag -v sparse Filesystem type is: ef53 File size of sparse is 1099511628800 (268435457 blocks, blocksize 4096) ext logical physical expected length flags sparse: 1 extent found To free the buffer cache: ========================= jeff <at> jeff-laptop:/ext4$ free total used free shared buffers cached Mem: 1980300 719972 1260328 0 2836 94104 -/+ buffers/cache: 623032 1357268 Swap: 0 0 0 jeff <at> jeff-laptop:/ext4$ sync In another root console, run 'echo 3 > /proc/sys/vm/drop_caches' jeff <at> jeff-laptop:/ext4$ free total used free shared buffers cached Mem: 1980300 716780 1263520 0 1184 88592 <<<<<-----freed -/+ buffers/cache: 627004 1353296 Swap: 0 0 0 jeff <at> jeff-laptop:/ext4$ filefrag -v sparse Filesystem type is: ef53 File size of sparse is 1099511628800 (268435457 blocks, blocksize 4096) ext logical physical expected length flags 0 268435456 32999 1 eof sparse: 2 extents found jeff <at> jeff-laptop:/ext4$ ./cp --sparse=always sparse f1 last_ext_logical 1099511627776 last_read_size 1024 src_total_size 1099511628800 jeff <at> jeff-laptop:/ext4$ filefrag -v f1 Filesystem type is: ef53 File size of f1 is 1099511628800 (268435457 blocks, blocksize 4096) ext logical physical expected length flags 0 268435456 296960 1 eof f1: 2 extents found jeff <at> jeff-laptop:/ext4$ ./cp --sparse=always sparse f2 last_ext_logical 1099511627776 last_read_size 1024 src_total_size 1099511628800 jeff <at> jeff-laptop:/ext4$ filefrag -v f2 Filesystem type is: ef53 File size of f2 is 1099511628800 (268435457 blocks, blocksize 4096) ext logical physical expected length flags f2: 1 extent found jeff <at> jeff-laptop:/ext4$ sync and 'clean memory via /proc on another root console' jeff <at> jeff-laptop:/ext4$ filefrag -v f2 Filesystem type is: ef53 File size of f2 is 1099511628800 (268435457 blocks, blocksize 4096) ext logical physical expected length flags 0 268435456 33379 1 eof f2: 2 extents found I will do a double check for my original patch to ensure this is not a code bug for that issue once get through an urgent task on hand. Thanks, -Jeff > I suspect that having so many extents is unusual, so maybe > this is a rarely exercised corner case. > > =============================== > As I wrote the above, I realized I probably had enough > information to deduce where things were going wrong, even > if so far I've been unable to reproduce it. > > And sure enough. There is a way to provoke exactly > that failure. If the *second* (or later) FIEMAP ioctl fails: > > do > { > fiemap->fm_length = FIEMAP_MAX_OFFSET; > fiemap->fm_extent_count = count; > > /* When ioctl(2) fails, fall back to the normal copy only if it > is the first time we met. */ > if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0) > { > /* If the first ioctl fails, tell the caller that it is > ok to proceed with a normal copy. */ > if (i == 0) > *normal_copy_required = true; > return false; > } > > In that case, fiemap_copy returns false (with no diagnostic) > and cp fails silently. > > Obviously I will now add code to diagnose the failure, > but do any of you know off hand how to reproduce this > or what the failure might have been? > > Here's the patch I plan to merge: > > diff --git a/src/copy.c b/src/copy.c > index eb67700..07d605e 100644 > --- a/src/copy.c > +++ b/src/copy.c > @@ -200,6 +200,12 @@ fiemap_copy (int src_fd, int dest_fd, size_t buf_size, > ok to proceed with a normal copy. */ > if (i == 0) > *normal_copy_required = true; > + else > + { > + /* If the second or subsequent ioctl fails, diagnose it, > + since it ends up causing the entire copy/cp to fail. */ > + error (0, errno, _("%s: FIEMAP ioctl failed"), quote (src_name)); > + } > return false; > } -- With Windows 7, Microsoft is asserting legal control over your computer and is using this power to abuse computer users.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.