GNU bug report logs - #73484
31.0.50; Abolishing etags-regen-file-extensions

Previous Next

Package: emacs;

Reported by: Sean Whitton <spwhitton <at> spwhitton.name>

Date: Wed, 25 Sep 2024 19:41:01 UTC

Severity: wishlist

Found in version 31.0.50

Full log


Message #140 received at 73484 <at> debbugs.gnu.org (full text, mbox):

From: Francesco Potortì <pot <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 73484 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name
Subject: Re: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Thu, 10 Oct 2024 16:25:28 +0200
>> I had a quick look at the whole code and in fact the only place I can find where ou have O^2 behaviour seems to be file name comparison, and it still looks so strange to me that this can in facrt cause significant delay.
>
>We are using etags on a huge tree: about 375K files.  I think that's
>the reason, because non-linear behaviors are like that: they are
>insignificant with small sets, but huge with larger ones...
>
>Profiles don't lie...

Ok, makes sense.  I must have missed the number of files in your previous explanations, sorry.  The only other place where I found O^2 behaviour is when managing #line directives, but you already tried to disable them without much change.  So let's concentrate on file name comparison which is done in process_file_name at

  for (fdp = fdhead; fdp != NULL; fdp = fdp->next)
    {
      assert (fdp->infname != NULL);
      if (streq (uncompressed_name, fdp->infname))
	goto cleanup;
    }

This is a simple O^2 comparison, which is repeated sum(1,N,N-1)=~N^2/2, which for ~375k files means ~70G comparisons.  If you can count the number of times streq is called and 70G is a substantial portion of that number, then we have the culprit.  To check, just remove the above test and see if the running time drops.

In that case, using a hash rather than a comparison would probably make sense.  Alternatively, rather than managing file names in a single loop, do a first loop on all file names to canonicalise them, but without searching for tags (essentially, remove the call to process_file from process_file_name), then uniquify the list of canonicalised file names, then run process_file on them.




This bug report was last modified 225 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.