GNU bug report logs - #73484
31.0.50; Abolishing etags-regen-file-extensions

Previous Next

Package: emacs;

Reported by: Sean Whitton <spwhitton <at> spwhitton.name>

Date: Wed, 25 Sep 2024 19:41:01 UTC

Severity: wishlist

Found in version 31.0.50

Full log


View this message in rfc822 format

From: Francesco Potortì <pot <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dmitry <at> gutov.dev, 73484 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name
Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Thu, 10 Oct 2024 10:27:57 +0200
>> >This is basically a "uniqueness" operation using linear search, O(N^2).

> Thus, I
>believe the intent is to avoid duplicate tags if the same file was
>encountered twice in some way.

Yes.  Sorry, I spoke from memory and I was inaccurate.

>Note that canonicalize_filename in this case doesn't really do what
>its name seems to imply, e.g., relative file names will generally stay
>relative.

It canonicalises, that is, reduces to a standard common form.  It retains relative vs absolute difference.
 
>So specifying the same file once as relative and the other
>time as absolute will still process the file more than once.

From memory, I would tell so, yes.  Have not checked right now.

>We need
>to use an inode test or equivalent, and probably use realpath or
>equivalent, to make the duplicate test reliable.
>Or maybe having the
>same file processed under different names is okay, since TAGS is for
>helping Emacs find the file, and so using relative names and symlinks
>is okay?

Yes, I think so.  And from memory I think it should be left unchanged.

>> I do not think that it makes sense to build a hash table for file names given on the command line, because the number of comparisons made on those names is generally vastly inferior to the number of comparisons used to search for tags.
>
>That's not what I see in the code.  But it should be easy to count the
>number of loop iterations in the use case we are talking about
>(running etags on the geck-dev tree), so we don't need to argue about
>facts.

Yes.  If finding a bottleneck is the objective, you should maybe instrument the string comparison functions so that you can count how many times they are called from different places.

I had a quick look at the whole code and in fact the only place I can find where ou have O^2 behaviour seems to be file name comparison, and it still looks so strange to me that this can in facrt cause significant delay.

I may certainly have missed something, but if that's really the case, first thing is looking for code inefficiencies.  If this is really structural, one should first read all filenames, canonicalise and uniquify them, and only then create the tags.




This bug report was last modified 224 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.