#50733 - 28.0.1; project-find-regexp can block Emacs for a long time

GNU bug report logs - #50733
28.0.1; project-find-regexp can block Emacs for a long time

Package: emacs;

Reported by: Daniel Martín <mardani29 <at> yahoo.es>

Date: Wed, 22 Sep 2021 09:31:02 UTC

Severity: normal

Found in version 28.0.1

View this message in rfc822 format

From: Gregory Heytings <gregory <at> heytings.org> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 50733 <at> debbugs.gnu.org, mardani29 <at> yahoo.es, dgutov <at> yandex.ru Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time Date: Mon, 27 Sep 2021 00:43:05 +0000

>> Out of curiosity, because of your "it doesn't scale" remark, I just >> compared the efficiency of ripgrep and idutils on the latest Linux >> kernel tarball (1.4 GB in 78464 files): >> >> mkid takes 31 seconds >> >> rg O_CREAT takes 0.18 seconds >> gid O_CREAT takes 0.02 seconds >> rg O.?CREAT takes 0.18 seconds >> gid O.?CREAT takes 0.93 seconds >> rg O.*CREAT takes 0.19 seconds >> gid O.*CREAT takes 1.73 seconds >> >> Isn't idutils the one that doesn't scale? > > No. You compare apples with oranges. > No. I compare apples with apples. I compare regexp searches in a code base with regexp searches in a code base. Because this is a thread about regexp searches in a code base. It's you who started talking about oranges instead, namely searching for identifiers in a code base. >> The only case in which idutils is faster (if one does not take the time >> that was spent to build the database into account, and if one considers >> that it's okay to ignore some matches in comments) is a plain >> identifier; from a user viewpoint getting an answer in 0.2 seconds on >> such a big code base is as good as getting an answer in 0.02 seconds. >> It's slower, much slower in all other cases, whenever a regexp is used >> --- which is what project-find-regexp is all about. > > See what I mean? Even when it's better, it's worse. Perfect reasoning. > Perfect reading. Nowhere did I say that it's worse when it's better. I said that from a user viewpoint, a tool that is 155 ms faster in one (and only one) case, and slower in all other cases, is worse, and that from a user viewpoint this single "155 ms faster case" does not matter enough to justify the use of a more complex tool. Note that Emacs takes some time (55 ms for a search for O_CREAT on the Emacs trunk) to read, process and display the output, which must be taken into account to calculate the perceived difference between search tool candidates. Some more detailed numbers: 1. on Emacs' trunk (4759 files, 174 MB) gid O_CREAT : 10 ms gid O[A-Z_]CREAT : 75 ms gid O.?CREAT : 70 ms gid O.*CREAT : 70 ms rg O_CREAT : 25 ms rg O[A-Z_]CREAT : 25 ms rg O.?CREAT : 25 ms rg O.*CREAT : 25 ms rg -w O_CREAT : 30 ms rg -w O[A-Z_]CREAT : 30 ms rg -w O.?CREAT : 30 ms rg -w O.*CREAT : 30 ms 2. on the latest Linux kernel tarball (78464 files, 1.4 GB) gid O_CREAT : 25 ms gid O[A-Z_]CREAT : 1375 ms gid O.?CREAT : 930 ms gid O.*CREAT : 1730 ms rg O_CREAT : 180 ms rg O[A-Z_]CREAT : 185 ms rg O.?CREAT : 185 ms rg O.*CREAT : 185 ms rg -w O_CREAT : 185 ms rg -w O[A-Z_]CREAT : 190 ms rg -w O.?CREAT : 190 ms rg -w O.*CREAT : 190 ms I initially reacted to your paragraph: > > Btw, I don't understand why we focus on general-purpose text-searching > tools for these features. Why not focus on packages like ID Utils > instead, they are so much faster. Daniel, could you time the same > search in that large tree when xref-search-program is 'gid'? (You'd > need to run 'mkid' first, to create the ID database, but that is > one-time, and is very fast.) As I told many times, I think this is the > future: program language sensitive tools that use a precomputed DB. > It should now be clear that idutils is not "so much faster", it is marginally faster in one case, and slower in all other cases. And it doesn't do what project-find-regexp needs, because it ignores (most, but not all) tokens in comments (oh, BTW, including tokens in comments has been on its TODO for at least 20 years). Creating the ID database is also not "very fast", and the ID database cannot be updated incrementally (oh, BTW, incremental database updates has been on its TODO list for at least 20 years). In short, it's an outdated tool, that isn't maintained anymore, and that can't be the future.

This bug report was last modified 3 years and 358 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #50733 28.0.1; project-find-regexp can block Emacs for a long time

GNU bug report logs - #50733
28.0.1; project-find-regexp can block Emacs for a long time