GNU bug report logs -
#73484
31.0.50; Abolishing etags-regen-file-extensions
Previous Next
Full log
View this message in rfc822 format
>> I had a quick look at the whole code and in fact the only place I can find where ou have O^2 behaviour seems to be file name comparison, and it still looks so strange to me that this can in facrt cause significant delay.
>
>We are using etags on a huge tree: about 375K files. I think that's
>the reason, because non-linear behaviors are like that: they are
>insignificant with small sets, but huge with larger ones...
>
>Profiles don't lie...
Ok, makes sense. I must have missed the number of files in your previous explanations, sorry. The only other place where I found O^2 behaviour is when managing #line directives, but you already tried to disable them without much change. So let's concentrate on file name comparison which is done in process_file_name at
for (fdp = fdhead; fdp != NULL; fdp = fdp->next)
{
assert (fdp->infname != NULL);
if (streq (uncompressed_name, fdp->infname))
goto cleanup;
}
This is a simple O^2 comparison, which is repeated sum(1,N,N-1)=~N^2/2, which for ~375k files means ~70G comparisons. If you can count the number of times streq is called and 70G is a substantial portion of that number, then we have the culprit. To check, just remove the above test and see if the running time drops.
In that case, using a hash rather than a comparison would probably make sense. Alternatively, rather than managing file names in a single loop, do a first loop on all file names to canonicalise them, but without searching for tags (essentially, remove the call to process_file from process_file_name), then uniquify the list of canonicalised file names, then run process_file on them.
This bug report was last modified 224 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.