GNU bug report logs -
#39258
Faster guix search using an sqlite cache
Previous Next
Full log
View this message in rfc822 format
Hi Arun,
On Mon, 1 Jun 2020 at 02:00, Arun Isaac <arunisaac <at> systemreboot.net> wrote:
> Sorry for the long delay in replying to this thread.
Based on the Ludo's comments [1] on v4 which is a simple re-write of
your v3, I am finishing a vN+1.. but time flies and I am late on the
topic too. :-)
Well, this still unsent vN+1 series has the same performance of v4 on
"guix pull" which is a key point compared to v3. Obviously, the
performance on "guix search" are equivalent on both version. This
vN+1 builds two caches -- to avoid binary breakage -- in only one go;
the consuming 'fold-modules-public-variables*' is applied only once.
[1] http://issues.guix.gnu.org/39258#93
> I think Ludo is right in that we can improve guix search performance with only
> simple code improvements rather than including xapian or improving our
> existing cache. Here are a few patches on those lines.
Well, improving the cache is easy; at least as you did in v3 by adding
another one.
The most annoying part is the arguments rewrite of 'package->recutils'
to be compliant.
However after some comparisons, I am not convinced that BM25 will be
worth to implement...
> In `relevance`, we set our score to 0 if any of the regexps don't match. Then,
> we might as well not match the remaining regexps. Patch 1 does this early cut
> off optimization.
Interesting.
> Often our search strings are only literal strings. So, we can save some time
> by using string-contains instead of invoking the regexp engine. Patch 2 does
> this. In addition, guile's string-contains uses a naive O(n^2) string search
> algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In
> fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code
> mentions this. If implemented, the KMP algorithm could speed up guix search
> further.
>
> [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
Really interesting idea,
> Patch 3 and 4 are minor improvements.
>
> Here's a rough performance comparison.
On cold or warm cache?
> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search game
>
> real 0m2.261s
> user 0m2.351s
> sys 0m0.104s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search game
>
> real 0m2.661s
> user 0m2.843s
> sys 0m0.080s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search strategy game
>
> real 0m1.613s
> user 0m1.635s
> sys 0m0.096s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search strategy game
>
> real 0m2.520s
> user 0m2.583s
> sys 0m0.112s
> --8<---------------cut here---------------end--------------->8---
So in the best case, you have the ratio old/new is 1.5; this new
version is 1.5 faster.
Well, in the extra cache approach (v3 or v4) the ration old/new is
really higher: 3.1 faster on cold cache (which is the one I am
interested in) and 2.4 faster on warm cache.
I will give a look to this new series and report what happens on my
laptop. But basically, I would like "guix search" under the 1.0
second on my machine. ;-)
Thank you for this new input.
Cheers,
simon
This bug report was last modified 37 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.