Package: coreutils;
Reported by: "jeff.liu" <jeff.liu <at> oracle.com>
Date: Fri, 7 May 2010 14:16:02 UTC
Severity: normal
Tags: patch
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: "jeff.liu" <jeff.liu <at> oracle.com> To: Jim Meyering <jim <at> meyering.net> Cc: Sunil Mushran <sunil.mushran <at> oracle.com>, Tao Ma <tao.ma <at> oracle.com>, bug-coreutils <at> gnu.org, Joel Becker <Joel.Becker <at> oracle.com>, Chris Mason <chris.mason <at> oracle.com> Subject: bug#6131: [PATCH]: fiemap support for efficient sparse file copy Date: Fri, 21 May 2010 20:59:56 +0800
Jim Meyering wrote: > jeff.liu wrote: >> Hi Jim, >> >> Thanks for your kind advise! >> >> I'd like to adopt the timeout(1) approach for the test work. >> >> My thought is: >> 1. Create and mount a file-backed ext4 partition rather than relying on the HARD CODE path. >> 2. Create a 2gb sparse file without extent allocated for it. >> 3. It take nearly 30 seconds to transfer this file in normal copy, yet less than 1 second through >> FIEMAP-copy, is it a worst-case scenario that makes the difference as large as possible? >> 4. run FIEMAP-copy, use timeout(1) to limit it will complete in 1 second, I hope I understood your >> opinion correctly ;). >> >> The revised patches are shown as following: >> >> >From f18e1801d1dfca9fa278572b8172a5f97da2adc1 Mon Sep 17 00:00:00 2001 >> From: Jie Liu <jeff.liu <at> oracle.com> >> Date: Thu, 13 May 2010 22:17:53 +0800 >> Subject: [PATCH 1/1] tests: add a new test for FIEMAP-copy >> >> * tests/cp/sparse-fiemap: Add a new test for FIEMAP-copy against a >> loopbacked ext4 partition. >> * tests/Makefile.am (sparse-fiemap): Reference the new test. >> >> Signed-off-by: Jie Liu <jeff.liu <at> oracle.com> >> --- >> tests/Makefile.am | 2 + >> tests/cp/sparse-fiemap | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 63 insertions(+), 0 deletions(-) >> create mode 100644 tests/cp/sparse-fiemap >> >> diff --git a/tests/Makefile.am b/tests/Makefile.am >> index 46d388a..a76c6a7 100644 >> --- a/tests/Makefile.am >> +++ b/tests/Makefile.am >> @@ -25,6 +25,7 @@ root_tests = \ >> cp/special-bits \ >> cp/cp-mv-enotsup-xattr \ >> cp/capability \ >> + cp/sparse-fiemap \ >> dd/skip-seek-past-dev \ >> install/install-C-root \ >> ls/capability \ >> @@ -319,6 +320,7 @@ TESTS = \ >> cp/same-file \ >> cp/slink-2-slink \ >> cp/sparse \ >> + cp/sparse-fiemap \ >> cp/special-f \ >> cp/src-base-dot \ >> cp/symlink-slash \ > > I've applied your patches locally and have begun adjusting them. > First, I removed the addition of cp/sparse-fiemap to the TESTS list above. > Adding it to the root_tests is sufficient. Thank you to point it out. > Then, I've made the following changes to your test script: > - the original size of your test file of 2GiB was too small, > in that the old (pre-fiemap) cp copied it for me in less than > 1 second when the backing file was on a tmpfs file system. > I've made the new size be 2TiB. The fiemap copy is still so > quick that it completes in < .01 second.[*] > - no point in discarding stdout/stderr, since it all goes to the log > - raised timeout to 10 seconds to give more leeway on slow systems > - remove those "rm -f" uses. They're not needed, since the test is > run in its own temp dir, which is removed automatically when done. > - remove the $? = 124 test -- the preceding test for success is sufficient > > [*] I tried to count syscalls with strace but got a segfault. > Using valgrind I get errors, so debugged enough to get a clean > run, but possibly at the expense of correctness. We'll need more > tests to ensure that the non-sparse blocks in the copy all have > the same offset/length as in the original. Is it make sense if we write a utility in C through FIEMAP to show the extent info of a file? then wrap it in our current test scripts or a new test script to compare the non-sparse blocks offset and length? filefrag(8) can do such thing(http://e2fsprogs.sourceforge.net/), but maybe we can implement a compacted version focus on furture extent maping related testing only for coreutils. Details below. > > diff --git a/tests/cp/sparse-fiemap b/tests/cp/sparse-fiemap > old mode 100644 > new mode 100755 > index f9d3a94..814d537 > --- a/tests/cp/sparse-fiemap > +++ b/tests/cp/sparse-fiemap > @@ -28,8 +28,7 @@ cwd=`pwd` > cleanup_() { cd /; umount "$cwd/mnt"; } > > # Create an ext4 loopback file system > -dd if=/dev/zero of=blob bs=8192 count=1000 > /dev/null 2>&1 \ > - || skip=1 > +dd if=/dev/zero of=blob bs=8192 count=1000 || skip=1 > mkdir mnt > mkfs -t ext4 -F blob || > skip_test_ "failed to create ext4 file system" > @@ -42,20 +41,15 @@ test $skip = 1 && > > rm -f mnt/f > > -# Create a 2gb sparse file > -dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2096128 > /dev/null 2>&1 || framework_failure > +# Create a 2TiB sparse file > +dd if=/dev/zero of=mnt/sparse bs=1k count=1 seek=2G || framework_failure > > -# It take more than 20 seconds to transfer the created sparse file > -# through normal copy, by contrast, it take even less than 1 second > -# through FIEMAP-copy. > -timeout 1 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1 > -test $? = 124 && fail=1 > +# It takes many minutes to copy this sparse file using the old method. > +# By contrast, it takes far less than 1 second using FIEMAP-copy. > +timeout 10 cp --sparse=always mnt/sparse mnt/sparse_fiemap || fail=1 > > # Ensure that the sparse file copied through fiemap has the same size > # in bytes as the original. > test `stat --printf %s $sparse` = `stat --printf %s $fiemap` || fail=1 > > -rm -f mnt/sparse > -rm -f mnt/sparse_fiemap > - > Exit $fail > > ---------------------------------------- > On F13, x86_64, ext4, I did this: > > dd if=/dev/null of=big bs=1 seek=2G > valgrind ./cp --sparse=always big big2 > ==4771== Conditional jump or move depends on uninitialised value(s) > ==4771== at 0x40465A: fiemap_copy_ok (copy.c:205) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== > ==4771== Syscall param lseek(offset) contains uninitialised byte(s) > ==4771== at 0x3269CE1540: __lseek_nocancel (syscall-template.S:82) > ==4771== by 0x4046D4: fiemap_copy_ok (copy.c:210) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== > ==4771== Syscall param lseek(offset) contains uninitialised byte(s) > ==4771== at 0x3269CE1540: __lseek_nocancel (syscall-template.S:82) > ==4771== by 0x40472D: fiemap_copy_ok (copy.c:216) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== > ==4771== Conditional jump or move depends on uninitialised value(s) > ==4771== at 0x404792: fiemap_copy_ok (copy.c:222) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== > ==4771== Conditional jump or move depends on uninitialised value(s) > ==4771== at 0x40492B: fiemap_copy_ok (copy.c:229) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== > ==4771== Conditional jump or move depends on uninitialised value(s) > ==4771== at 0x4047FA: fiemap_copy_ok (copy.c:235) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== > ==4771== Syscall param read(count) contains uninitialised byte(s) > ==4771== at 0x3269CD41B0: __read_nocancel (syscall-template.S:82) > ==4771== by 0x404821: fiemap_copy_ok (copy.c:238) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== > ==4771== Invalid read of size 8 > ==4771== at 0x404952: fiemap_copy_ok (copy.c:265) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > ==4771== Address 0x3ffeffdd68 is not stack'd, malloc'd or (recently) free'd > ==4771== > ==4771== > ==4771== Process terminating with default action of signal 11 (SIGSEGV) > ==4771== Access not within mapped region at address 0x3FFEFFDD68 > ==4771== at 0x404952: fiemap_copy_ok (copy.c:265) > ==4771== by 0x405B61: copy_reg (copy.c:822) > ==4771== by 0x408713: copy_internal (copy.c:2163) > ==4771== by 0x409237: copy (copy.c:2449) > ==4771== by 0x403AC9: do_copy (cp.c:754) > ==4771== by 0x4041E4: main (cp.c:1154) > > =========================================================== > The segv just above is due to hitting this line with i==0: > > fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length); strange, code should break if there is no extent allocated for a file. /* If 0 extents are returned, then more ioctls are not needed. */ if (fiemap->fm_mapped_extents == 0) break; > > the obvious fix is probably to do this instead: > > fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length); I just found a bug for dealing with the 'fiemap->fm_start', maybe it is the root cause of the segment fault. above line still need to write as 'fm_ext[i-1].fe_logical +....' to calculate the offset for the next ioctl(2). > > All of the used-uninitialized errors can be papered over by > clearing the fiemap_buf array, like this: > > + memset (fiemap_buf, 0, sizeof fiemap_buf); I recalled why I initialized this buf before when you ask me the reason, I was intented to initialize the 'fiemap->fm_start', so below line 'fiemap->fm_start = 0ULL' should be removed from the loop. > do > { > fiemap->fm_start = 0ULL; > > However, if these are all due solely to F13's valgrind not yet knowing the > semantics of the FIEMAP ioctl, then that may be adequate. as what I mentioned above, this line should be removed or remove out of the loop if we do not initialize the fiemap buf. > > Bottom line: > - you may consider your test-script patch accepted, with the patch above > - I'd like to see a new version of the copy.c-changing patch, > including at least a fix for the fm_ext[-1] access bug. > > =========================================================== > Solely for reference, here's the copy.c patch I used to avoid > the valgrind-spotted problems: > > diff --git a/src/copy.c b/src/copy.c > index 960e5fb..e232eaa 100644 > --- a/src/copy.c > +++ b/src/copy.c > @@ -179,6 +179,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size, > uint64_t last_read_size = 0; > unsigned int i = 0; > > + memset (fiemap_buf, 0, sizeof fiemap_buf); > do > { > fiemap->fm_start = 0ULL; > @@ -187,7 +188,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size, > > /* When ioctl(2) fails, fall back to the normal copy only if it > is the first time we met. */ > - if (ioctl (src_fd, FS_IOC_FIEMAP, (unsigned long) fiemap) < 0) > + if (ioctl (src_fd, FS_IOC_FIEMAP, fiemap) < 0) > { > /* If `i > 0', then at least one ioctl(2) has been performed before. */ > if (i == 0) > @@ -261,7 +262,7 @@ fiemap_copy_ok (int src_fd, int dest_fd, size_t buf_size, > ext_len -= n_read; > } > > - fiemap->fm_start = (fm_ext[i-1].fe_logical + fm_ext[i-1].fe_length); > + fiemap->fm_start = (fm_ext[i].fe_logical + fm_ext[i].fe_length); > } > } while (! last); > > > -- With Windows 7, Microsoft is asserting legal control over your computer and is using this power to abuse computer users.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.