GNU bug report logs - #39258
Faster guix search using an sqlite cache

Previous Next

Package: guix-patches;

Reported by: Arun Isaac <arunisaac <at> systemreboot.net>

Date: Thu, 23 Jan 2020 19:53:02 UTC

Severity: important

Done: Arun Isaac <arunisaac <at> systemreboot.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: zimoun <zimon.toutoune <at> gmail.com>
To: Arun Isaac <arunisaac <at> systemreboot.net>
Cc: Ludovic Courtès <ludo <at> gnu.org>, 39258 <at> debbugs.gnu.org
Subject: [bug#39258] [PATCH v5 0/4] Optimize guix search
Date: Mon, 1 Jun 2020 03:25:09 +0200
Hi Arun,

On Mon, 1 Jun 2020 at 02:00, Arun Isaac <arunisaac <at> systemreboot.net> wrote:

> Sorry for the long delay in replying to this thread.

Based on the Ludo's comments [1] on v4 which is a simple re-write of
your v3, I am finishing a vN+1.. but time flies and I am late on the
topic too. :-)

Well, this still unsent vN+1 series has the same performance of v4 on
"guix pull" which is a key point compared to v3.  Obviously, the
performance on "guix search" are equivalent on both version.  This
vN+1 builds two caches -- to avoid binary breakage -- in only one go;
the consuming 'fold-modules-public-variables*' is applied only once.

[1] http://issues.guix.gnu.org/39258#93


> I think Ludo is right in that we can improve guix search performance with only
> simple code improvements rather than including xapian or improving our
> existing cache. Here are a few patches on those lines.

Well, improving the cache is easy; at least as you did in v3 by adding
another one.
The most annoying part is the arguments rewrite of 'package->recutils'
to be compliant.

However after some comparisons, I am not convinced that BM25 will be
worth to implement...


> In `relevance`, we set our score to 0 if any of the regexps don't match. Then,
> we might as well not match the remaining regexps. Patch 1 does this early cut
> off optimization.

Interesting.


> Often our search strings are only literal strings. So, we can save some time
> by using string-contains instead of invoking the regexp engine. Patch 2 does
> this. In addition, guile's string-contains uses a naive O(n^2) string search
> algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In
> fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code
> mentions this. If implemented, the KMP algorithm could speed up guix search
> further.
>
> [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Really interesting idea,


> Patch 3 and 4 are minor improvements.
>
> Here's a rough performance comparison.

On cold or warm cache?


> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search game
>
> real    0m2.261s
> user    0m2.351s
> sys     0m0.104s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search game
>
> real    0m2.661s
> user    0m2.843s
> sys     0m0.080s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search strategy game
>
> real    0m1.613s
> user    0m1.635s
> sys     0m0.096s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search strategy game
>
> real    0m2.520s
> user    0m2.583s
> sys     0m0.112s
> --8<---------------cut here---------------end--------------->8---

So in the best case, you have the ratio old/new is 1.5; this new
version is 1.5 faster.

Well, in the extra cache approach (v3 or v4) the ration old/new is
really higher: 3.1 faster on cold cache (which is the one I am
interested in) and 2.4 faster on warm cache.


I will give a look to this new series and report what happens on my
laptop.  But basically, I would like "guix search" under the 1.0
second on my machine. ;-)


Thank you for this new input.

Cheers,
simon




This bug report was last modified 37 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.