GNU bug report logs - #72723
diff -d can be very slow

Previous Next

Package: diffutils;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Mon, 19 Aug 2024 23:57:01 UTC

Severity: normal

To reply to this bug, email your comments to 72723 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-diffutils <at> gnu.org:
bug#72723; Package diffutils. (Mon, 19 Aug 2024 23:57:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Vincent Lefevre <vincent <at> vinc17.net>:
New bug report received and forwarded. Copy sent to bug-diffutils <at> gnu.org. (Mon, 19 Aug 2024 23:57:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: bug-diffutils <at> gnu.org
Subject: diff -d can be very slow
Date: Tue, 20 Aug 2024 01:55:22 +0200
[Message part 1 (text/plain, inline)]
When opening a .diff file, GNU Emacs runs "diff -ad" on 2 files
it has built (I suppose that the reason is to get a word diff),
and this can be very slow, even though the original .diff file
is rather simple.

I've attached a slow-diff.tar.xz archive with:
  * diff1L52tn0 and diff2U4TVho (files built be GNU Emacs).
  * file.diff the original .diff file.

When running "/usr/bin/emacs -Q file.diff.xz", I could see what
takes the whole time with ps or top. Here I could see

  diff -ad /tmp/diff1L52tn0 /tmp/diff2U4TVho

As this is slow, I could obtain these files.

On my recent machine, "diff -ad diff1L52tn0 diff2U4TVho" takes
27 seconds.

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)
[slow-diff.tar.xz (application/octet-stream, attachment)]

Information forwarded to bug-diffutils <at> gnu.org:
bug#72723; Package diffutils. (Tue, 20 Aug 2024 00:03:01 GMT) Full text and rfc822 format available.

Message #8 received at 72723 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: 72723 <at> debbugs.gnu.org
Subject: Re: diff -d can be very slow
Date: Tue, 20 Aug 2024 02:02:06 +0200
I forgot to say that this is on a Debian/unstable machine (x86_64),
with the Debian packages, so with diff (GNU diffutils) 3.10 and
GNU Emacs 29.4.

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-diffutils <at> gnu.org:
bug#72723; Package diffutils. (Tue, 20 Aug 2024 00:19:02 GMT) Full text and rfc822 format available.

Message #11 received at 72723 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>, 72723 <at> debbugs.gnu.org
Subject: Re: [bug-diffutils] bug#72723: diff -d can be very slow
Date: Mon, 19 Aug 2024 17:17:27 -0700
On 2024-08-19 16:55, Vincent Lefevre wrote:
> When opening a .diff file, GNU Emacs runs "diff -ad" on 2 files
> it has built (I suppose that the reason is to get a word diff),
> and this can be very slow

That's inherent to the algorithm, no? I don't know of any faster 
algorithm, if you really want minimal output. If you know of one, please 
let us know.

A simple workaround would be for GNU Emacs to not use the -d (--minimal) 
option.




Information forwarded to bug-diffutils <at> gnu.org:
bug#72723; Package diffutils. (Tue, 20 Aug 2024 01:02:02 GMT) Full text and rfc822 format available.

Message #14 received at 72723 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 72723 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: [bug-diffutils] bug#72723: diff -d can be very slow
Date: Tue, 20 Aug 2024 03:00:55 +0200
[Cc to Stefan Monnier, who introduced -d in GNU Emacs in 2007]

On 2024-08-19 17:17:27 -0700, Paul Eggert wrote:
> On 2024-08-19 16:55, Vincent Lefevre wrote:
> > When opening a .diff file, GNU Emacs runs "diff -ad" on 2 files
> > it has built (I suppose that the reason is to get a word diff),
> > and this can be very slow
> 
> That's inherent to the algorithm, no? I don't know of any faster algorithm,
> if you really want minimal output. If you know of one, please let us know.
> 
> A simple workaround would be for GNU Emacs to not use the -d (--minimal)
> option.

Perhaps not possible. In the Emacs code (in lisp/vc/smerge-mode.el),
I could find:

  (let ((coding-system-for-read 'utf-8-emacs))
    (call-process diff-command nil t nil
                  (if (and smerge-refine-ignore-whitespace
                           (not smerge-refine-weight-hack))
                      ;; Pass -a so diff treats it as a text file even
                      ;; if it contains \0 and such.
                      ;; Pass -d so as to get the smallest change, but
                      ;; also and more importantly because otherwise it
                      ;; may happen that diff doesn't behave like
                      ;; smerge-refine-weight-hack expects it to.
                      ;; See https://lists.gnu.org/r/emacs-devel/2007-11/msg00401.html
                      "-awd" "-ad")
                  file1 file2))

I suppose that this is the code that is called, as I couldn't
find another occurrence of -ad. So there is a reference to

  https://lists.gnu.org/r/emacs-devel/2007-11/msg00401.html

(that was with diff (GNU diffutils) 2.8.1) and the latest message
(about the use of -d, in particular):

  https://lists.gnu.org/r/emacs-devel/2007-11/msg00522.html

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




This bug report was last modified 298 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.