GNU bug report logs -
#77833
Xapian cache/search proof of concept
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Hi Arun, Ludo, and everyone,
I Just stumbled upon Arun’s suggestion of doing a guix xsearch extension
to use a xapian search in #39258 so I gave it a try tonight, here’s the
proof of concept.
Search is less than a second (loading guile modules, actual search is
instantaneous) and building the cache takes ~20 seconds.
The whole thing was very easy to make following the guile-xapian
example, and it shows as it is only 71 loc.
So, what do you think? Is it an idea worth pursuing?
[guix-xsearch.scm (text/plain, inline)]
(define-module (guix-xsearch)
#:use-module ((gnu packages) #:select (fold-packages
find-packages-by-name))
#:use-module ((guix build utils) #:select (package-name->name+version))
#:use-module (guix packages)
#:use-module (ice-9 match)
#:use-module (srfi srfi-11)
#:use-module (xapian xapian)
#:use-module (statprof))
(define %database-path "guix-xsearch.xapian")
(define (index)
(call-with-writable-database
%database-path
(lambda (database)
(fold-packages
(lambda (package _)
(let* ((name (package-name package))
(version (package-version package))
(description (package-description package))
(synopsis (package-synopsis package))
(name+version (string-append name "@" version))
(idterm (string-append "Q" name+version))
(document (make-document #:data name+version
#:terms `((,idterm . 0))))
(term-generator (make-term-generator #:stem (make-stem "en")
#:document document)))
;; Index title and description with a suitable
;; prefix. This is used to allow for searching separate
;; fields as in name:openttd, description:leather,
;; etc.
;; Disabled for performance.
(index-text! term-generator name #:prefix "S")
(index-text! term-generator synopsis #:prefix "B")
(index-text! term-generator description #:prefix "XD")
;; Index title and description without prefixes for
;; general search.
(index-text! term-generator name)
(increase-termpos! term-generator)
(index-text! term-generator synopsis)
(index-text! term-generator description)
;; Add the document to the database. The unique idterm
;; ensures each object ends up in the database only once
;; no matter how many times we run the indexer.
(replace-document! database idterm document)
#nil))
#nil))))
(define (search query-string)
(call-with-database
%database-path
(lambda (database)
(let* ((query (parse-query query-string
#:stemmer (make-stem "en")
#:prefixes '(("name" . "S")
("synopsis" . "B")
("description" . "XD"))))
(mset (enquire-mset (enquire database query)
#:maximum-items 10)))
(mset-fold
(lambda (item _)
(let* ((name+version (string-split (document-data (mset-item-document item))
#\@))
;; FIXME: Use a more precise way to restore the
;; package, like the package cache does.
(packages (find-packages-by-name
(car name+version)
(cadr name+version))))
(for-each
(lambda (package)
(format #t "~a: #~3,'0d ~a~%"
(mset-item-rank item)
(mset-item-docid item)
package))
packages))
#nil)
#nil
mset)))))
(match (command-line)
((_ "index")
(index))
((_ "search" query)
(search query))
((program . _)
(format (current-error-port) "Usage: ~a index | search <query>~%" program)))
[signature.asc (application/pgp-signature, inline)]
This bug report was last modified 67 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.