GNU bug report logs -
#73484
31.0.50; Abolishing etags-regen-file-extensions
Previous Next
Full log
View this message in rfc822 format
>Here is the nested loop, which if I comment out, makes the parse finish
>in ~20 seconds, with all the extra files (except *.js), or in 15s when
>using with new flags.
>
>diff --git a/lib-src/etags.c b/lib-src/etags.c
>index a822a823a90..331e3ffe816 100644
>--- a/lib-src/etags.c
>+++ b/lib-src/etags.c
>@@ -1697,14 +1697,14 @@ process_file_name (char *file, language *lang)
> uncompressed_name = file;
> }
>
>- /* If the canonicalized uncompressed name
>- has already been dealt with, skip it silently. */
>- for (fdp = fdhead; fdp != NULL; fdp = fdp->next)
>- {
>- assert (fdp->infname != NULL);
>- if (streq (uncompressed_name, fdp->infname))
>- goto cleanup;
>- }
>+ /* /\* If the canonicalized uncompressed name */
>+ /* has already been dealt with, skip it silently. *\/ */
>+ /* for (fdp = fdhead; fdp != NULL; fdp = fdp->next) */
>+ /* { */
>+ /* assert (fdp->infname != NULL); */
>+ /* if (streq (uncompressed_name, fdp->infname)) */
>+ /* goto cleanup; */
>+ /* } */
>
> inf = fopen (file, "r" FOPEN_BINARY);
> if (inf)
>
>This is basically a "uniqueness" operation using linear search, O(N^2).
This is only for dealing with the case when the same file exists in both compressed and uncompressed form, and we are currently hitting the second one. In that case, we should skip it. Yes, this is a uniqueness test and yes, it is O^2 in the number of file names, but I doubt that this can explain a serious slowdown.
>Is there a hash table we could use?
No, we have a hash table for C tags, and that's all. It is useful because there are 34 keywords against which most strings in a C/C++ file are compared. It makes sesns to build hash tables for other languages where a similar situation happens.
I do not think that it makes sense to build a hash table for file names given on the command line, because the number of comparisons made on those names is generally vastly inferior to the number of comparisons used to search for tags.
>> . Some files have their language identified by means other than their
>> names or extensions: those are the languages that have
>> "interpreters" defined in etags.c
The interpreter is the token what comes after #!, with The possible exception for "env", in which case the interpreter is the second token after #!
There are two O^2 test in the number of tags in C/C++ files which depend on the two options "no-line-directive" and "no-duplicates". Both options are usable to disable those checks and both are off by default because they help producing a more sane tags file and have no practical impact in most cases. Both are there because, in principle, they cause significant slowdown in huge tags files.
This bug report was last modified 225 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.