GNU bug report logs - #73484
31.0.50; Abolishing etags-regen-file-extensions

Previous Next

Package: emacs;

Reported by: Sean Whitton <spwhitton <at> spwhitton.name>

Date: Wed, 25 Sep 2024 19:41:01 UTC

Severity: wishlist

Found in version 31.0.50

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Francesco Potortì <pot <at> gnu.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 73484 <at> debbugs.gnu.org, spwhitton <at> spwhitton.name
Subject: bug#73484: 31.0.50; Abolishing etags-regen-file-extensions
Date: Sun, 6 Oct 2024 03:56:58 +0300
On 05/10/2024 19:38, Francesco Potortì wrote:
> Eli Zaretskii:
>>> How hard is it to add to a live TAGS file fake lines which look like
>>> this:
>>>
>>>     ^L
>>>     foo,0
>>>
>>> (with random strings instead of "foo"), and then time some TAGS-using
>>> commands with and without these additions?
> 
> Dmitry Gutov:
>> Okay, done that.
>>
>> 'M-.' takes more or less the same.
>>
>> The file size of TAGS increased from 66 MB to 85 MiB.
>>
>> Won't measure time to generate now - because the current method and the
>> "real" one will be different, but note that it's more relevant with
>> etags-regen-mode because the scan is performed lazily: every time the
>> user does the first search in a new project.
> 
> Removing the Fortran and C/C++ fallbacks just for testing requires recompiling etags.c after removing the code beginning with /* Else try Fortran or C. */.  This would avoid parsing the file (except for detecting the sharp-bang) and would leave the file name in the tags file, without tags.

Thank you, this is useful for another kind of test (parsing the same 
project with the list of all enabled file types). The below was also 
needed to avoid a segfault:

diff --git a/lib-src/etags.c b/lib-src/etags.c
index 7f652790261..08c6037b9d7 100644
--- a/lib-src/etags.c
+++ b/lib-src/etags.c
@@ -1830,6 +1830,7 @@ process_file (FILE *fh, char *fn, language *lang)
      curfdp. */
   if (!CTAGS
       && curfdp->usecharno	/* no #line directives in this file */
+      && curfdp->lang
       && !curfdp->lang->metasource)
     {
       node *np, *prev;

Then, the total time increased a lot: from 30 s to 30-40 min. This cuts 
it down in half, if I measured correctly:

diff --git a/lib-src/etags.c b/lib-src/etags.c
index 7f652790261..5c2be2b9574 100644
--- a/lib-src/etags.c
+++ b/lib-src/etags.c
@@ -1902,21 +1903,21 @@ find_entries (FILE *inf)

   /* Else look for sharp-bang as the first two characters. */
   if (parser == NULL
+      && getc (inf) == '#'
+      && getc (inf) == '!'
       && readline_internal (&lb, inf, infilename, false) > 0
-      && lb.len >= 2
-      && lb.buffer[0] == '#'
-      && lb.buffer[1] == '!')
+      )
     {
       char *lp;

       /* Set lp to point at the first char after the last slash in the
          line or, if no slashes, at the first nonblank.  Then set cp to
 	 the first successive blank and terminate the string. */
-      lp = strrchr (lb.buffer+2, '/');
+      lp = strrchr (lb.buffer, '/');
       if (lp != NULL)
 	lp += 1;
       else
-	lp = skip_spaces (lb.buffer + 2);
+	lp = skip_spaces (lb.buffer);
       cp = skip_non_spaces (lp);
       /* If the "interpreter" turns out to be "env", the real 
interpreter is
 	 the next word.  */

But parsing HTML files seems to remain the slowest part. There are a lot 
of them in that project (many test cases), but maybe 3x the number of 
code files, not 60x their number. And they're pretty small, on average. 
If somebody wants to test that locally, here's the repository: 
https://github.com/mozilla/gecko-dev




This bug report was last modified 224 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.