GNU bug report logs - #77833
Xapian cache/search proof of concept

Previous Next

Package: guix-patches;

Reported by: Noé Lopez <noe <at> xn--no-cja.eu>

Date: Tue, 15 Apr 2025 21:16:03 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Noé Lopez <noe <at> xn--no-cja.eu>
To: 77833 <at> debbugs.gnu.org
Cc: Arun Isaac <arunisaac <at> systemreboot.net>, Ludovic Courtès <ludo <at> gnu.org>
Subject: [bug#77833] Xapian cache/search proof of concept
Date: Tue, 15 Apr 2025 23:14:41 +0200
[Message part 1 (text/plain, inline)]
Hi Arun, Ludo, and everyone,

I Just stumbled upon Arun’s suggestion of doing a guix xsearch extension
to use a xapian search in #39258 so I gave it a try tonight, here’s the
proof of concept.

Search is less than a second (loading guile modules, actual search is
instantaneous) and building the cache takes ~20 seconds.

The whole thing was very easy to make following the guile-xapian
example, and it shows as it is only 71 loc.

So, what do you think? Is it an idea worth pursuing?

[guix-xsearch.scm (text/plain, inline)]
(define-module (guix-xsearch)
  #:use-module ((gnu packages) #:select (fold-packages
					 find-packages-by-name))
  #:use-module ((guix build utils) #:select (package-name->name+version))
  #:use-module (guix packages)
  #:use-module (ice-9 match)
  #:use-module (srfi srfi-11)
  #:use-module (xapian xapian)
  #:use-module (statprof))

(define %database-path "guix-xsearch.xapian")

(define (index)
  (call-with-writable-database
   %database-path
   (lambda (database)
     (fold-packages
      (lambda (package _)
	(let* ((name (package-name package))
	       (version (package-version package))
	       (description (package-description package))
	       (synopsis (package-synopsis package))
	       (name+version (string-append name "@" version))
	       (idterm (string-append "Q" name+version))
	       (document (make-document #:data name+version
					#:terms `((,idterm . 0))))
	       (term-generator (make-term-generator #:stem (make-stem "en")
						    #:document document)))
          ;; Index title and description with a suitable
          ;; prefix. This is used to allow for searching separate
          ;; fields as in name:openttd, description:leather,
          ;; etc.
	  ;; Disabled for performance.
	  (index-text! term-generator name #:prefix "S")
	  (index-text! term-generator synopsis #:prefix "B")
          (index-text! term-generator description #:prefix "XD")
          ;; Index title and description without prefixes for
          ;; general search.
	  (index-text! term-generator name)
          (increase-termpos! term-generator)
	  (index-text! term-generator synopsis)
          (index-text! term-generator description)
          ;; Add the document to the database. The unique idterm
          ;; ensures each object ends up in the database only once
          ;; no matter how many times we run the indexer.
          (replace-document! database idterm document)
	  #nil))
      #nil))))

(define (search query-string)
  (call-with-database
   %database-path
   (lambda (database)
     (let* ((query (parse-query query-string
			       #:stemmer (make-stem "en")
			       #:prefixes '(("name" . "S")
					    ("synopsis" . "B")
					    ("description" . "XD"))))
	    (mset (enquire-mset (enquire database query)
				#:maximum-items 10)))
       (mset-fold
	(lambda (item _)
	  (let* ((name+version (string-split (document-data (mset-item-document item))
					     #\@))
		 ;; FIXME: Use a more precise way to restore the
		 ;; package, like the package cache does.
		 (packages (find-packages-by-name
			    (car name+version)
			    (cadr name+version))))
	    (for-each
	     (lambda (package)
	       (format #t "~a: #~3,'0d ~a~%"
		       (mset-item-rank item)
		       (mset-item-docid item)
		       package))
	     packages))
	  #nil)
	#nil
	mset)))))

(match (command-line)
  ((_ "index")
   (index))
  ((_ "search" query)
   (search query))
  ((program . _)
   (format (current-error-port) "Usage: ~a index | search <query>~%" program)))
[signature.asc (application/pgp-signature, inline)]

This bug report was last modified 67 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.