GNU bug report logs - #50733
28.0.1; project-find-regexp can block Emacs for a long time

Previous Next

Package: emacs;

Reported by: Daniel Martín <mardani29 <at> yahoo.es>

Date: Wed, 22 Sep 2021 09:31:02 UTC

Severity: normal

Found in version 28.0.1

Full log


View this message in rfc822 format

From: Gregory Heytings <gregory <at> heytings.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 50733 <at> debbugs.gnu.org, mardani29 <at> yahoo.es, dgutov <at> yandex.ru
Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date: Mon, 27 Sep 2021 00:43:05 +0000
>> Out of curiosity, because of your "it doesn't scale" remark, I just 
>> compared the efficiency of ripgrep and idutils on the latest Linux 
>> kernel tarball (1.4 GB in 78464 files):
>>
>> mkid takes 31 seconds
>>
>> rg O_CREAT takes 0.18 seconds
>> gid O_CREAT takes 0.02 seconds
>> rg O.?CREAT takes 0.18 seconds
>> gid O.?CREAT takes 0.93 seconds
>> rg O.*CREAT takes 0.19 seconds
>> gid O.*CREAT takes 1.73 seconds
>>
>> Isn't idutils the one that doesn't scale?
>
> No.  You compare apples with oranges.
>

No.  I compare apples with apples.  I compare regexp searches in a code 
base with regexp searches in a code base.  Because this is a thread about 
regexp searches in a code base.  It's you who started talking about 
oranges instead, namely searching for identifiers in a code base.

>> The only case in which idutils is faster (if one does not take the time 
>> that was spent to build the database into account, and if one considers 
>> that it's okay to ignore some matches in comments) is a plain 
>> identifier; from a user viewpoint getting an answer in 0.2 seconds on 
>> such a big code base is as good as getting an answer in 0.02 seconds. 
>> It's slower, much slower in all other cases, whenever a regexp is used 
>> --- which is what project-find-regexp is all about.
>
> See what I mean?  Even when it's better, it's worse.  Perfect reasoning.
>

Perfect reading.  Nowhere did I say that it's worse when it's better.  I 
said that from a user viewpoint, a tool that is 155 ms faster in one (and 
only one) case, and slower in all other cases, is worse, and that from a 
user viewpoint this single "155 ms faster case" does not matter enough to 
justify the use of a more complex tool.

Note that Emacs takes some time (55 ms for a search for O_CREAT on the 
Emacs trunk) to read, process and display the output, which must be taken 
into account to calculate the perceived difference between search tool 
candidates.

Some more detailed numbers:

1. on Emacs' trunk (4759 files, 174 MB)

gid O_CREAT : 10 ms
gid O[A-Z_]CREAT : 75 ms
gid O.?CREAT : 70 ms
gid O.*CREAT : 70 ms

rg O_CREAT : 25 ms
rg O[A-Z_]CREAT : 25 ms
rg O.?CREAT : 25 ms
rg O.*CREAT : 25 ms

rg -w O_CREAT : 30 ms
rg -w O[A-Z_]CREAT : 30 ms
rg -w O.?CREAT : 30 ms
rg -w O.*CREAT : 30 ms

2. on the latest Linux kernel tarball (78464 files, 1.4 GB)

gid O_CREAT : 25 ms
gid O[A-Z_]CREAT : 1375 ms
gid O.?CREAT : 930 ms
gid O.*CREAT : 1730 ms

rg O_CREAT : 180 ms
rg O[A-Z_]CREAT : 185 ms
rg O.?CREAT : 185 ms
rg O.*CREAT : 185 ms

rg -w O_CREAT : 185 ms
rg -w O[A-Z_]CREAT : 190 ms
rg -w O.?CREAT : 190 ms
rg -w O.*CREAT : 190 ms

I initially reacted to your paragraph:

>
> Btw, I don't understand why we focus on general-purpose text-searching 
> tools for these features.  Why not focus on packages like ID Utils 
> instead, they are so much faster.  Daniel, could you time the same 
> search in that large tree when xref-search-program is 'gid'?  (You'd 
> need to run 'mkid' first, to create the ID database, but that is 
> one-time, and is very fast.)  As I told many times, I think this is the 
> future: program language sensitive tools that use a precomputed DB.
>

It should now be clear that idutils is not "so much faster", it is 
marginally faster in one case, and slower in all other cases.  And it 
doesn't do what project-find-regexp needs, because it ignores (most, but 
not all) tokens in comments (oh, BTW, including tokens in comments has 
been on its TODO for at least 20 years).  Creating the ID database is also 
not "very fast", and the ID database cannot be updated incrementally (oh, 
BTW, incremental database updates has been on its TODO list for at least 
20 years).  In short, it's an outdated tool, that isn't maintained 
anymore, and that can't be the future.




This bug report was last modified 3 years and 261 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.