GNU bug report logs - #21942
Files with incorrect file sizes

Previous Next

Package: diffutils;

Reported by: Stephan Müller <fruktopus <at> gmail.com>

Date: Tue, 17 Nov 2015 17:44:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Stephan Müller <fruktopus <at> gmail.com>
Cc: 21942 <at> debbugs.gnu.org
Subject: bug#21942: [bug-diffutils] bug#21942: Files with incorrect file sizes
Date: Sun, 6 Dec 2015 11:27:20 -0800
[Message part 1 (text/plain, inline)]
On Fri, Nov 20, 2015 at 2:33 PM, Stephan Müller <fruktopus <at> gmail.com> wrote:
> Am Fri, 20 Nov 2015 23:12:40 +0100
> schrieb Jim Meyering <jim <at> meyering.net>:
>
>> On Fri, Nov 20, 2015 at 9:06 PM, Stephan Müller <fruktopus <at> gmail.com>
>> wrote:
>> > Am Fri, 20 Nov 2015 18:28:37 +0100
>> > schrieb Jim Meyering <jim <at> meyering.net>:
>> >
>> >> On Tue, Nov 17, 2015 at 12:44 PM, Stephan Müller
>> >> <fruktopus <at> gmail.com> wrote:
>> >> > recently I had to debug weird problem. Finally I figured it out.
>> >> >
>> >> > Virtual file systems like /sys or /proc usually don't care about
>> >> > file sizes. All files have a size of 0. This leads to
>> >> > difficulties as diff sometimes looks for file sizes.
>> >> >
>> >> > Say you do:
>> >> >
>> >> >> $ cp /proc/cmdline my_cmdline
>> >> >> $ diff /proc/cmdline my_cmdline ; echo $?
>> >> >> 0      // ok, files don't differ
>> >> >> $ diff --brief /proc/cmdline my_cmdline
>> >> >> Files /proc/cmdline and mycmdline differ
>> >> >
>> >> > The --brief option triggers a binary compare, as we aren't
>> >> > interested in the actual differences this makes sense. As a first
>> >> > step, file sizes are compared (0 vs ~150) and the files are
>> >> > reported as different.
>> >>
>> >> thanks for the report.
>> >> What version of diffutils are you using?
>> >> I think this has been fixed for some time.
>> >> I was unable to reproduce with 2.8.1 nor with the latest built from
>> >> git. I.e., I created an empty file and used diff-2.8.1 to compare
>> >> it with the nominally-
>> >> zero-length /proc/cmdline file, and diff did the right thing.
>> >> Also, I ran stat to show st_size of each file is indeed 0:
>> >>
>> >>   $ :
>> >> > /tmp/k; /p/p/diffutils-2.8.1/bin/diff /proc/cmdline /tmp/k; \
>> >> > stat --format %s /proc/cmdline /tmp/k
>> >>   1d0
>> >>   < ro root=LABEL=...
>> >>   0
>> >>   0
>> >>
>> >> In fact, I went ahead and built all available versions and tested
>> >> them like this:
>> >>
>> >>   $ for i in /p/p/*/bin/diff; do p=diffutils-$i; echo $i; $i
>> >> /proc/cmdline /tmp/k > /dev/null && echo bad; done
>> >>   /p/p/diffutils-2.7/bin/diff
>> >>   /p/p/diffutils-2.8.1/bin/diff
>> >>   /p/p/diffutils-2.8/bin/diff
>> >>   /p/p/diffutils-2.9/bin/diff
>> >>   /p/p/diffutils-3.0/bin/diff
>> >>   /p/p/diffutils-3.1/bin/diff
>> >>   /p/p/diffutils-3.2/bin/diff
>> >>   /p/p/diffutils-3.3/bin/diff
>> >
>> > Hi,
>> >
>> > I am using v.3.3 of diffutils
>> >
>> > $ diff -v
>> > diff (GNU diffutils) 3.3
>> >
>> > but I think you misunderstood the problem. Sorry for being
>> > ambiguous. I am not diffing against an empty file. That works well.
>> > The point is procfs doesn't care about size, but 'normal' file
>> > systems do. So for example on my system I have (after
>> > cp /proc/cmdline mycmdline)
>> >
>> > $ stat --format %s /proc/cmdline mycmdline
>> > 0
>> > 140
>> >
>> > The result of diffing /proc/cmdline against mycmdline depends on the
>> > --brief flag.
>> >
>> > STEPS TO REPRODUCE:
>> >
>> > cp /proc/cmdline mycmdline
>> > diff --brief /proc/cmdline mycmdline > /dev/null ; echo ?$
>> > 1
>> > diff /proc/cmdline mycmdline ; echo $?
>> > 0
>> >
>> > EXPECTED RESULT:
>> >
>> > cp /proc/cmdline mycmdline
>> > diff --brief /proc/cmdline mycmdline > /dev/null ; echo ?$
>> > 0
>> > diff /proc/cmdline mycmdline ; echo $?
>> > 0
>>
>> Oh, indeed. Thank you for clarifying. That feels like a bug.
>> Here's a knee-jerk patch that refrains from using the
>> st_size-comparing heuristic when either of the sizes is zero. This
>> may well be wrong. I have only barely tested the diff.c code path.
>
> Thanks, that makes the problem at least (even) less unlikely. But if we
> cant trust file sizes we're doomed. What do you think about a flag
> controlling comparison by size and a notice if files differ by size.
>
> I can craft a patch for this.

Thank you, but I don't want to have to specify some new option to
avoid this misbehavior, so will push the attached patch shortly.
If someone finds a system for which a falsely reported stat.st_size
is nonzero, we can revisit this.
[0001-diff-brief-no-longer-mistakenly-reports-diff.-with-0.patch (text/x-patch, attachment)]

This bug report was last modified 9 years and 167 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.