GNU bug report logs - #39258
Faster guix search using an sqlite cache

Previous Next

Package: guix-patches;

Reported by: Arun Isaac <arunisaac <at> systemreboot.net>

Date: Thu, 23 Jan 2020 19:53:02 UTC

Severity: important

Done: Arun Isaac <arunisaac <at> systemreboot.net>

Bug is archived. No further changes may be made.

Full log


Message #137 received at 39258 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Arun Isaac <arunisaac <at> systemreboot.net>
Cc: mail <at> ambrevar.xyz, 39258 <at> debbugs.gnu.org, zimon.toutoune <at> gmail.com
Subject: Re: [PATCH v2 0/3] Xapian for Guix package search
Date: Sun, 08 Mar 2020 12:33:44 +0100
Hi,

Arun Isaac <arunisaac <at> systemreboot.net> skribis:

>>> It turns out that most of the time is spent in printing and texinfo
>>> rendering of the search results.
>
> Also, when we put all package metadata into the Xapian index, we don't
> have to look up any of the package variables in (gnu packages *) during
> `guix search` time. This also contributes substantially to the speedup.

Yup.

>> In general, pre-rendering doesn’t seem practical to me: the output of
>> ‘guix search’ is locale-dependent (it speaks the user’s language) and
>
> Note that we already need to index package synopses and descriptions in
> all languages. I still haven't implemented this, though.

Oh, right.  Tricky!

>> adjusts to the terminal width (well, this is temporarily broken on
>> Guile 3.0.0, but see ‘%text-width’ in (guix ui)).
>
> This could be accomplished even with pre-rendering. Xapian provides
> "slots" to store arbitrary strings with a document. Instead of storing
> the pre-rendered document as a whole, we could store pre-rendered fields
> in separate slots. Then, during `guix search` time, we can assemble the
> result from these pre-rendered fields.

I’m not sure I understand.  The index wouldn’t store pre-rendered
strings for every possible terminal width, right?

>> Also, if the 12K+ descriptions need to be rendered at the time the user
>> runs ‘guix pull’, the experience may not be great, because it could take
>> a bit of time.
>
> This is a problem, but I would see it as a necessary "compilation"
> step. :-P In fact, this whole patchset speeds up `guix search` by doing
> part of the work of `guix search` ahead of time. So, some such cost is
> unavoidable.

Yeah.  I think we need to take the whole user experience into account,
not just ‘guix search’.  ‘guix pull’ already feels very slow, and it’s a
fairly common operation.  Conversely, ‘guix search’ takes roughly
between 0.5 and 2 seconds and is an uncommon operation on a “slow path”
(in the sense that when you’re searching for software, you’ll probably
have to spend more than a couple of seconds to find what you’re looking
for.)

>> What I like about the recutils format in this context is that it’s both
>> human- and machine-readable.  The examples in the manual show how it can
>> be useful to select the information displayed or to refine the search
>> (info "(guix) Invoking guix package").
>
> Xapian's query language is much more natural (as in natural language)
> than the regexp based techniques we need to use with recutils. I have
> hardly ever used the regexp based search and I suspect many others
> haven't either. Also, refining the search query should be easier to do
> with Xapian. We could even use Xapian's query expansion feature to
> suggest improved queries to the user.

I’m not sufficiently familiar with Xapian’s query language.  The
examples I had in mind were:

  guix search malloc | recsel -p name,version,relevance
  guix search | recsel -p name -e 'license ~ "LGPL 3"'
  guix search crypto library | \
    recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis

It’s not so much about regexps than it is about selecting individual
fields.

>> Were you able to measure the cost of rendering specifically?
>
> generate-package-search-index takes around 50 seconds. If I modify
> generate-package-search-index to not pre-render but simply store the
> package description alone, it takes around 20 seconds. That gives us a
> rough idea of the cost of pre-rendering.

To me, adding 20–50 seconds on ‘guix pull’ would be undesirable.  :-/

>> I think we should look at a profile of ‘package->recutils’, there’s
>> probably room for improvement there.
>
> On quick inspection, most of the time in package->recutils is spent in
> texinfo rendering the description. Unless we use the simplified search
> results format as discussed above, we cannot avoid it.

What I meant was that we could use (statprof) to see whether/how Texinfo
rendering/parsing can be optimized.

Thanks,
Ludo’.




This bug report was last modified 37 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.