#73484 - 31.0.50; Abolishing etags-regen-file-extensions

GNU bug report logs - #73484
31.0.50; Abolishing etags-regen-file-extensions

Package: emacs;

Reported by: Sean Whitton <spwhitton <at> spwhitton.name>

Date: Wed, 25 Sep 2024 19:41:01 UTC

Severity: wishlist

Found in version 31.0.50

View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev> To: Eli Zaretskii <eliz <at> gnu.org> Cc: pot <at> gnu.org, 73484 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions Date: Mon, 7 Oct 2024 20:36:47 +0300

On 07/10/2024 19:05, Eli Zaretskii wrote: > So you are comparing the speed of scanning ~60K files with the speed > of scanning ~375K of files? I'm not generally surprised that the > latter takes much longer, only that the slowdown is not proportional > to the number of scanned files. But see below. I forgot one thing: all .js files are actually set to be ignored there. And my tree is a little old, so it's 200K files total. Otherwise -- yes. Note, however, that the time is really not proportional: 30 s vs 15 min is a 30x difference. And I've been assuming that the "other" files would mostly fall in the non-recognized category, and most of them would only have the 2 first characters read (then, recognizing that those chars are not '#!', etags would skip the file). > Btw, did you exclude the .git/* files from the list submitted to > etags? Yes, it's excluded. And the files matching the .gitignore entries are excluded as well. > Here, scanning, with the unmodified etags from Emacs 30, of only those > files with extensions in etags-regen-file-extensions takes 16.7 sec > and produces a 80.5MB tags table, whereas scanning all the files with > the same etags takes almost 16 min and produces 304MB tags table, of > which more than 200MB are from files whose language is not recognized. My result in the latter case was only 88 MB. Maybe the many .js files make the difference. I've put them into the "ignored" category long ago because most of them are used for tests, and there are a lot of those files, and there are generated one-long-line files. > From my testing, it seems like the elapsed time depends non-linearly > on the length of the list of files submitted to etags. For example, > if I break the list of files in two, I get 3 min 20 sec and 1 min 40 > sec, together 5 min. But if I submit a single list with all the files > in those two lists, I get 14 min 30 sec. I guess some internal > processing etags does depends non-linearly on the number of files it > scans. The various loops in etags that scan all of the known files > and/or the tags it previously found seem to confirm this hypothesis. Makes sense! It sounds like some N^2 complexity somewhere. > So what is the conclusion from this? Are you saying that the long > scan times in this large tree basically make this new no-fallbacks > option not very useful, since we still need to carefully include or > exclude certain files from the scan? Or should I go ahead and install > these changes? I think that option will be useful, but for better benchmarks and for end usability as well, I think we need the N^2 thing fixed as well. Maybe before the rest of the changes.

This bug report was last modified 320 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #73484 31.0.50; Abolishing etags-regen-file-extensions

GNU bug report logs - #73484
31.0.50; Abolishing etags-regen-file-extensions