GNU bug report logs - #73484
31.0.50; Abolishing etags-regen-file-extensions

Previous Next

Package: emacs;

Reported by: Sean Whitton <spwhitton <at> spwhitton.name>

Date: Wed, 25 Sep 2024 19:41:01 UTC

Severity: wishlist

Found in version 31.0.50

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: pot <at> gnu.org, 73484 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name
Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Mon, 07 Oct 2024 19:05:50 +0300
> Date: Mon, 7 Oct 2024 10:11:08 +0300
> Cc: pot <at> gnu.org, spwhitton <at> spwhitton.name, 73484 <at> debbugs.gnu.org
> From: Dmitry Gutov <dmitry <at> gutov.dev>
> 
> > Can you please show the etags command line in each of these two cases
> > that you are comparing?
> 
> Both commands end with a '-' (scanning the list of files passed from stdin).
> 
> >>> And if they don't have extensions, the code you
> >>> removed would have caused etags to scan these files anyway, looking
> >>> for Fortran or C tags.  So how come the change slowed down etags so
> >>> much?  What am I missing?
> >> I think it would also concern "unknown" extensions, right? Like .txt,
> >> .png and so on.
> > I have difficulty reasoning about this without knowing the command
> > lines you used.  E.g., I don't understand why in one case it would
> > scan files with unknown extensions that were not scanned in the other.
> 
> In one case the list is pre-filtered with etags-regen-file-extensions 
> (see 'etags-regen--all-files'), in the other - it is not, and all files 
> in project are passed.

So you are comparing the speed of scanning ~60K files with the speed
of scanning ~375K of files?  I'm not generally surprised that the
latter takes much longer, only that the slowdown is not proportional
to the number of scanned files.  But see below.

Btw, did you exclude the .git/* files from the list submitted to
etags?

Here, scanning, with the unmodified etags from Emacs 30, of only those
files with extensions in etags-regen-file-extensions takes 16.7 sec
and produces a 80.5MB tags table, whereas scanning all the files with
the same etags takes almost 16 min and produces 304MB tags table, of
which more than 200MB are from files whose language is not recognized.

From my testing, it seems like the elapsed time depends non-linearly
on the length of the list of files submitted to etags.  For example,
if I break the list of files in two, I get 3 min 20 sec and 1 min 40
sec, together 5 min.  But if I submit a single list with all the files
in those two lists, I get 14 min 30 sec.  I guess some internal
processing etags does depends non-linearly on the number of files it
scans.  The various loops in etags that scan all of the known files
and/or the tags it previously found seem to confirm this hypothesis.

So what is the conclusion from this?  Are you saying that the long
scan times in this large tree basically make this new no-fallbacks
option not very useful, since we still need to carefully include or
exclude certain files from the scan?  Or should I go ahead and install
these changes?




This bug report was last modified 224 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.