GNU bug report logs - #73484
31.0.50; Abolishing etags-regen-file-extensions

Previous Next

Package: emacs;

Reported by: Sean Whitton <spwhitton <at> spwhitton.name>

Date: Wed, 25 Sep 2024 19:41:01 UTC

Severity: wishlist

Found in version 31.0.50

Full log


Message #83 received at 73484 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: pot <at> gnu.org, 73484 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name
Subject: Re: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Sun, 6 Oct 2024 22:14:46 +0300
On 06/10/2024 09:22, Eli Zaretskii wrote:

>> Then, the total time increased a lot: from 30 s to 30-40 min.
> 
> I don't understand why.  How many files with no extensions are in that
> tree, and what was the etags command line in both cases?

Sorry, I have to add a correction: it's about 15 min either way. Seems 
like the first time I either messed up the start time, or the directory 
was in "cold" cache, or the used etags some much older version.

So to reiterate: the current etags-regen scans in around 30s, and the 
simple switch scans the directory in 15 minutes. Retesting the change 
from previous email, it doesn't really help.

And the 'find-tag' scan did become slower - i.e. from 400 ms to 1200 ms. 
Not clear about the mechanics (the size of TAGS only went up from 65 to 
88 MB).

>> But parsing HTML files seems to remain the slowest part. There are a lot
>> of them in that project (many test cases), but maybe 3x the number of
>> code files, not 60x their number. And they're pretty small, on average.
>> If somebody wants to test that locally, here's the repository:
>> https://github.com/mozilla/gecko-dev
> 
> If HTML files is what explains the slowdown, then why this change
> triggered it?  HTML files are supposed to have extensions that tell
> etags they are HTML.

Okay, I've commented out the most obvious suspects (html, asm, makefile) 
- all their entries in 'lang_names' - but the scan still takes too long.

Maybe it's some other file type, which I haven't found yet.

But what is see when monitoring the running scan with 'tail -f TAGS', is 
the output stops sometimes for like 20 seconds, in the middle of 
outputting tags of some common code file (like .cpp or .py, a common 
type), and then resumes, with files of the same type around this one.

> And if they don't have extensions, the code you
> removed would have caused etags to scan these files anyway, looking
> for Fortran or C tags.  So how come the change slowed down etags so
> much?  What am I missing?

I think it would also concern "unknown" extensions, right? Like .txt, 
.png and so on.

Anyway, the difference is either due to the different set of files (all 
project files, rather than files in the specified list of extensions), 
or due to all file names being printed. Not sure how to verify, yet.




This bug report was last modified 225 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.