GNU bug report logs - #32993
Pathologically slow operation

Previous Next

Package: diffutils;

Reported by: Stefan Monnier <monnier <at> IRO.UMontreal.CA>

Date: Mon, 8 Oct 2018 21:35:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: 32993 <at> debbugs.gnu.org
Subject: bug#32993: Pathologically slow operation
Date: Mon, 08 Oct 2018 17:34:18 -0400
I recently bumped into a `diff` operation that I killed after several
minutes while diffing two files (on 3.7GHz core i3, which is the fastest
machine I have).

These files were generated as part of Emacs's "refine-hunk" processing
which tries to do word-level diffs (by basically turning every word
into N copies of this word, each one on its own line (where N is the
number of chars in the word, used to indicate to `diff` that long words
are "more costly" than short ones)).

So the files's sizes were:

    % wc tmp/diff-bug-* 
    1038026  851160 4963190 tmp/diff-bug-1
      65041   54877  314788 tmp/diff-bug-2
    1103067  906037 5277978 total
    %

With --speed-large-files, diff still took almost a minute to return an
answer (which is 973026 lines long).

Those file aren't exactly security sensitive, but they contain personal
info that I'd rather not make public (I can make send them in private
upon request, tho).  Is there a chance this performance behavior is the
result of a performance bug, or is the algorithm really that costly?


        Stefan




This bug report was last modified 6 years and 253 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.