GNU bug report logs -
#73484
31.0.50; Abolishing etags-regen-file-extensions
Previous Next
Full log
View this message in rfc822 format
On 07/10/2024 19:05, Eli Zaretskii wrote:
> So you are comparing the speed of scanning ~60K files with the speed
> of scanning ~375K of files? I'm not generally surprised that the
> latter takes much longer, only that the slowdown is not proportional
> to the number of scanned files. But see below.
I forgot one thing: all .js files are actually set to be ignored there.
And my tree is a little old, so it's 200K files total. Otherwise -- yes.
Note, however, that the time is really not proportional: 30 s vs 15 min
is a 30x difference.
And I've been assuming that the "other" files would mostly fall in the
non-recognized category, and most of them would only have the 2 first
characters read (then, recognizing that those chars are not '#!', etags
would skip the file).
> Btw, did you exclude the .git/* files from the list submitted to
> etags?
Yes, it's excluded. And the files matching the .gitignore entries are
excluded as well.
> Here, scanning, with the unmodified etags from Emacs 30, of only those
> files with extensions in etags-regen-file-extensions takes 16.7 sec
> and produces a 80.5MB tags table, whereas scanning all the files with
> the same etags takes almost 16 min and produces 304MB tags table, of
> which more than 200MB are from files whose language is not recognized.
My result in the latter case was only 88 MB. Maybe the many .js files
make the difference. I've put them into the "ignored" category long ago
because most of them are used for tests, and there are a lot of those
files, and there are generated one-long-line files.
> From my testing, it seems like the elapsed time depends non-linearly
> on the length of the list of files submitted to etags. For example,
> if I break the list of files in two, I get 3 min 20 sec and 1 min 40
> sec, together 5 min. But if I submit a single list with all the files
> in those two lists, I get 14 min 30 sec. I guess some internal
> processing etags does depends non-linearly on the number of files it
> scans. The various loops in etags that scan all of the known files
> and/or the tags it previously found seem to confirm this hypothesis.
Makes sense! It sounds like some N^2 complexity somewhere.
> So what is the conclusion from this? Are you saying that the long
> scan times in this large tree basically make this new no-fallbacks
> option not very useful, since we still need to carefully include or
> exclude certain files from the scan? Or should I go ahead and install
> these changes?
I think that option will be useful, but for better benchmarks and for
end usability as well, I think we need the N^2 thing fixed as well.
Maybe before the rest of the changes.
This bug report was last modified 225 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.