GNU bug report logs - #50733
28.0.1; project-find-regexp can block Emacs for a long time

Previous Next

Package: emacs;

Reported by: Daniel Martín <mardani29 <at> yahoo.es>

Date: Wed, 22 Sep 2021 09:31:02 UTC

Severity: normal

Found in version 28.0.1

Full log


View this message in rfc822 format

From: Gregory Heytings <gregory <at> heytings.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 50733 <at> debbugs.gnu.org, mardani29 <at> yahoo.es, dgutov <at> yandex.ru
Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date: Mon, 27 Sep 2021 09:23:28 +0000
>
> To get back to the issue at hand: we are talking (or at least I was 
> talking) about scalability of an algorithm, not about some particular 
> implementation of the algorithm.
>

Are you now again shifting the discussion to something else, a theoretical 
comparison between various algorithms?

>
> Ripgrep is a multithreaded program, whereas idutils is single-threaded. 
> So for a fair comparison of scalability of these two main ideas: 
> file-based search vs DB search, you need at the very least to limit 
> ripgrep to a single thread.  And then you need to run each program on 
> code bases of various sizes, preferably those which differ by orders of 
> magnitude or close to that, and see their O(n) behavior.  And exclude 
> from your comparison command-line options that require IDUtils to access 
> the files in addition to the DB.  That would be at least an 
> approximation to comparing apples to apples.
>

You're asking me to disable everything that makes ripgrep a modern tool, 
and to disable everything that makes idutils an outdated tool, to make the 
outdated tool shine in comparison?  Interesting viewpoint.

>
> But frankly, I don't understand why this all would be needed at all, 
> because it should be absolutely clear that searching the files in the 
> filesystem will always scale worse than reading a well-indexed DB.
>

Which is precisely what I don't believe.  It is, at least to me, not at 
all "absolutely clear" when you look at the whole picture, IOW, when you 
include the necessity to create and keep a database up to date in your 
comparison, the added complexity of that solution, and the purpose of the 
tool.

>
> IDUtils is an example of the latter, and it beats many utilities that 
> search the files, including ripgrep, as long as it doesn't need to 
> access the files themselves.  But even if it doesn't always beat them 
> (which you didn't yet demonstrate), it just means the ideas of its 
> design should be taken further and/or implemented better, that's all.
>

I provided you with many numbers and comparisons, which IMO demonstrate 
what they were meant to demonstrate.  A tool which finds matches for a 
regexp in a O(100 MB) code base in O(10 ms), and in a O(1 GB) code base in 
O(100 ms), is clearly good enough in practice.  (Note that I made these 
comparisons on a six or seven years old laptop, these numbers would be 
even lower on a more recent machine.)

I'm still waiting for some numbers from you to demonstrate *your* 
viewpoint.

>
> I said that such tools are the future, not that IDUtils itself is 
> necessarily the future (though it could be, if someone picks up its 
> development).
>

Is it not simply because it's not useful/better in practice that nobody is 
picking its development (and pretty much nobody is using it)?

>
> Again, this is about looking for the best tools for this job, and I 
> still stand by my opinion: focusing only on general-purpose search tools 
> is sub-optimal.
>

The message to which you replied and which started this subtread did not 
suggest to "focus only on general-purpose search tools", it suggested to 
focus only on *one* particular general-purpose search tool, ripgrep, which 
is currently the best tool for the job, and to bundle it with Emacs.  It 
has a public domain license, so I guess it should be possible.




This bug report was last modified 3 years and 261 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.