GNU bug report logs -
#19319
? Unexpected behavior in diff(1)
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19319 in the body.
You can then email your comments to 19319 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#19319
; Package
coreutils
.
(Mon, 08 Dec 2014 22:29:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Todd Shandelman <todd.shandelman <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 08 Dec 2014 22:29:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello, Padraig -
How are you?
Can you explain the behavior of diff(1) shown below?
With -U1 and -U2 , the --ignore-matching-lines='Id' argument suppresses
display of the difference between the 'Id' lines, as expected.
But with -U3, the diff of the 'Id' lines reappears, in spite of the
--ignore-matching-lines='Id'
argument.
Why is that?
Thanks,
Todd Shandelman
Houston, Texas
############################################################
$ cat a
#
# ORIG: 2014-12-04 09:28:56 CST Thu
#
# $Id: a,v 1.8 2014/12/04 15:29:99 todd Exp todd $
#
a
b
c
d
e
############################################################
$ cat b
#
# ORIG: 2014-12-04 09:28:56 CST Thu
#
# $Id: a,v 1.8 2014/12/04 15:29:27 todd Exp $
#
a
c
D
e
############################################################
$ diff -v
diff (GNU diffutils) 2.8.1
Copyright (C) 2002 Free Software Foundation, Inc.
This program comes with NO WARRANTY, to the extent permitted by law.
You may redistribute copies of this program
under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
Written by Paul Eggert, Mike Haertel, David Hayes,
Richard Stallman, and Len Tower.
############################################################
$ diff -U1 a b
--- a 2014-12-08 16:07:05.974621828 -0600
+++ b 2014-12-08 16:07:09.693672957 -0600
@@ -3,8 +3,7 @@
#
- # $Id: a,v 1.8 2014/12/04 15:29:99 todd Exp todd $
+ # $Id: a,v 1.8 2014/12/04 15:29:27 todd Exp $
#
a
-b
c
-d
+D
e
############################################################
$ diff -U1 --ignore-matching-lines='Id' a b
--- a 2014-12-08 16:07:05.974621828 -0600
+++ b 2014-12-08 16:07:09.693672957 -0600
@@ -6,5 +6,4 @@
a
-b
c
-d
+D
e
############################################################
$ diff -U2 --ignore-matching-lines='Id' a b
--- a 2014-12-08 16:07:05.974621828 -0600
+++ b 2014-12-08 16:07:09.693672957 -0600
@@ -5,6 +5,5 @@
#
a
-b
c
-d
+D
e
############################################################
$ diff -U3 --ignore-matching-lines='Id' a b
--- a 2014-12-08 16:07:05.974621828 -0600
+++ b 2014-12-08 16:07:09.693672957 -0600
@@ -1,10 +1,9 @@
#
# ORIG: 2014-12-04 09:28:56 CST Thu
#
- # $Id: a,v 1.8 2014/12/04 15:29:99 todd Exp todd $
+ # $Id: a,v 1.8 2014/12/04 15:29:27 todd Exp $
#
a
-b
c
-d
+D
e
############################################################
On 9 December 2012 at 06:34, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 12/07/2012 08:23 PM, Todd Shandelman wrote:
>
>> Hi, bug-coreutils <at> gnu.org -
>>
>> Not quite a bug, but why does the same option, essentially, have two very
>> different names in the 'expand' and 'unexpand' utilities?
>>
>> This is confusing and hampers convenient usage.
>>
>> I am referring to* --initial* in the one case and* --first-only* in the
>> other.
>>
>> See below.
>>
>> Or what am I missing?
>>
>
> Yes that is inconsistent.
> Interestingly the POSIX defined expand and unexpand are inconsistent
> in relation to this to start with.
>
> unexpand only processes leading blanks by default, but
> expand processes all blanks by default.
>
> So unexpand needs the -a option to process all blanks,
> and expand needs the -i, --initial option to process only leading blanks.
>
> Given the above you might think that the --first-only option to unexpand
> is redundant. However -a is implied by -t (as per POSIX), therefore
> to really limit to initial blanks, you need this option.
>
> So as for naming. I agree that --first-only is inconsistent.
> It's also a bit ambiguous. Does it mean only the first tab is written,
> or all leading blanks are processed. Now deprecating --first-only
> for the more consistent --initial has some cost. For example
> it would break part of my FSlint program:
> http://code.google.com/p/fslint/source/browse/trunk/
> fslint/supprt/rmlint/fix_ws.sh
> Also, I see busybox copied --first-only into its unexpand implementation.
> But I guess to be forward looking --first-only should be deprecated
> in favor of --initial?
>
> thanks,
> Pádraig.
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#19319
; Package
coreutils
.
(Mon, 08 Dec 2014 23:20:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 19319 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 12/08/2014 03:27 PM, Todd Shandelman wrote:
> With -U1 and -U2 , the --ignore-matching-lines='Id' argument suppresses
> display of the difference between the 'Id' lines, as expected.
At that (small) level of context, the hunk containing the 'Id' line is
separate from the remaining hunks, so the hunk is omitted.
>
> But with -U3, the diff of the 'Id' lines reappears, in spite of the
> --ignore-matching-lines='Id'
> argument.
>
> Why is that?
As soon as you have enough context, the hunk for the 'Id' line is the
SAME hunk as the rest of the changes. --ignore-matching-lines omits a
hunk only if the entire hunk matches the regex; but as soon as you have
other changes, then the entire hunk is output verbatim.
--ignore-matching-lines does NOT ignore individual lines within a hunk,
but only a hunk at a time.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#19319
; Package
coreutils
.
(Mon, 08 Dec 2014 23:51:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 19319 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks.
Well, I must say that is all rather counter-intuitive.
I never dreamed that the amount of context I select would actually affect
the diff itself and how it is computed.
But if you say so, I guess that is how it works.
Todd
On 8 December 2014 at 16:46, Eric Blake <eblake <at> redhat.com> wrote:
> On 12/08/2014 03:27 PM, Todd Shandelman wrote:
> > With -U1 and -U2 , the --ignore-matching-lines='Id' argument suppresses
> > display of the difference between the 'Id' lines, as expected.
>
> At that (small) level of context, the hunk containing the 'Id' line is
> separate from the remaining hunks, so the hunk is omitted.
>
> >
> > But with -U3, the diff of the 'Id' lines reappears, in spite of the
> > --ignore-matching-lines='Id'
> > argument.
> >
> > Why is that?
>
> As soon as you have enough context, the hunk for the 'Id' line is the
> SAME hunk as the rest of the changes. --ignore-matching-lines omits a
> hunk only if the entire hunk matches the regex; but as soon as you have
> other changes, then the entire hunk is output verbatim.
>
> --ignore-matching-lines does NOT ignore individual lines within a hunk,
> but only a hunk at a time.
>
> --
> Eric Blake eblake redhat com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#19319
; Package
coreutils
.
(Tue, 09 Dec 2014 00:06:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 19319 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 12/08/2014 04:50 PM, Todd Shandelman wrote:
[please don't top-post on technical lists]
> Thanks.
> Well, I must say that is all rather counter-intuitive.
> I never dreamed that the amount of context I select would actually affect
> the diff itself and how it is computed.
The diff is computed the same way. All that --ignore-matching-lines
changes is how the result is output, after the diff is already computed.
If you think the algorithm should be changed, that is a matter for the
diffutils mailing list; coreutils does not maintain diff(1), so
complaining here won't change it (other than the tangential fact that
many of the same developers hang out on both lists).
On the other hand, there ARE cases where different diff algorithms can
pick entirely different lines in a diff, but where both output forms are
valid patches. If you are familiar with git, compare the Myers (greedy)
vs. minimal vs. patience algorithms - they can produce DRASTICALLY
different line counts and even hunk counts in the number of lines
diff'd. If diff(1) were to learn multiple algorithms, the way git
already has, then maybe it would be worth tweaking one or more of those
algorithms to ignore input lines that match a regex prior to coming up
with the final computed diff for output - but it's not necessarily going
to be a trivial task to do that and still keep the diff'ing algorithm fast.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Added tag(s) notabug.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 23 Oct 2018 01:45:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
19319 <at> debbugs.gnu.org and Todd Shandelman <todd.shandelman <at> gmail.com>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 23 Oct 2018 01:45:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 20 Nov 2018 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 216 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.