Package: emacs;
Reported by: barry <barry.krofchick <at> sympatico.ca>
Date: Fri, 26 Sep 2008 02:55:04 UTC
Severity: wishlist
Tags: patch, wontfix
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):
From: barry <barry.krofchick <at> sympatico.ca> To: bug-gnu-emacs <at> gnu.org Subject: Improvement: Persistent Hash Store with GDBM Date: Thu, 25 Sep 2008 22:44:56 -0400
From: barry <barry.krofchick <at> sympatico.ca> To: bug-gnu-emacs <at> gnu.org Subject: Improvement: Persistent Hash Store with GDBM --text follows this line-- Please write in English if possible, because the Emacs maintainers usually do not have translators to read other languages for them. Your bug report will be posted to the bug-gnu-emacs <at> gnu.org mailing list, and to the gnu.emacs.bug news group. Please describe exactly what actions triggered the bug and the precise symptoms of the bug: +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ This is not a bug but an enhancement, which implements persistent hash store using gdbm to effect the storage. The code is in file gdbm.c (below), with some minor changes to Makefile.in (in the emacs/src directory) and emacs.c to include the new functions into the emacs build. The new functions are as follows and mirror the equivalent functions in the gdbm package itself: 1. gdbm-open 2. gdbm-close 3. gdbm-fetch 4. gdbm-store 5. gdbm-delete 6. gdbm-exists 7. gdbm-firstkey 8. gdbm-nextkey 9. gdbm-sync 10.gdbm-reorganize The doc strings for each of these functions are included below. The array gdbm-open-files (maximum size set by configuration parameter max_gdbm_open_files - in syms_of_gdbm - set to 10 in gdbm.c below) contains a cons cell for each open gdbm file (referenced by an integer file id number). The car of the cons cell is the gdbm data file pointer, and the cdr is the name of the opened hash file. The values written to the hash files are all strings. Normally one would use prin1-to-string to store arbitrary lisp expressions and read-from-string to recover them. This mechanism allows for a simple and fast persistent hash storage for lisp data, directly within emacs lisp code, without the need to resort to external databases. The doc strings for the functions are shown below: 1. gdbm-open is a built-in function in `C source code'. (gdbm-open IDNO FILE ACCESS &optional MODE) Open FILENAME as a gdbm database and assign it the ID IDNO where IDNO is an integer in range 0 to max-gdbm-open-files - 1 ACCESS specifies access rights as one of strings: r for read w for read/write c for create (if none exists) n for force create a new one even if one exists MODE if present on new db create specifies the file permissions as a number ala chmod Returns: gdbm file reference ID on success or nil on failure 2. gdbm-close is a built-in function in `C source code'. (gdbm-close DBF) Close a gdbm database of the specified number. 3. gdbm-fetch is a built-in function in `C source code'. (gdbm-fetch DBF KEY) Fetch data from a gdbm database. Returns: string data stored under KEY or nil if no data under that key. 4. gdbm-store is a built-in function in `C source code'. (gdbm-store DBF KEY DATA) Store data in a gdbm database. KEY and DATA must be strings (to save binary data use prin1-to-string on key and/or data) If KEY already exists in the database it will be replaced with the new DATA If DATA is nil or empty then KEY will be deleted. Returns: 0 on successful insert -1 if open for read and tries insert. 5. gdbm-delete is a built-in function in `C source code'. (gdbm-delete DBF KEY) Delete data from a gdbm database. KEY must be a string Returns: 0 on successful delete -1 if key not in database 6. gdbm-exists is a built-in function in `C source code'. (gdbm-exists DBF KEY) Returns t if KEY is in the hash otherwise nil 7. gdbm-firstkey is a built-in function in `C source code'. (gdbm-firstkey DBF) Fetch first key data from a gdbm database. Returns: first key in GDBM hash or nil if none 8. gdbm-nextkey is a built-in function in `C source code'. (gdbm-nextkey DBF KEY) Fetch next key data from a gdbm database. Returns: the key following KEY in the gdbm hash table or nil if KEY is the last key. 9. gdbm-sync is a built-in function in `C source code'. (gdbm-sync DBF) Sync a gdbm database. Writes all buffered data to disk. 10. gdbm-reorganize is a built-in function in `C source code'. (gdbm-reorganize DBF) Reorganize a gdbm database. ------------------------------------------------------------- Following is the file emacs-22.1/src/gdbm.c to effect the above functions ------------------------------------------------------------- /* GDBM Library Interface */ #include <config.h> #include "lisp.h" #include "blockinput.h" #include "commands.h" #include "keyboard.h" #include "dispextern.h" #include "charset.h" #include "coding.h" #include <gdbm.h> #include <string.h> int max_gdbm_open_files; Lisp_Object Qgdbm_open_files,Vgdbm_open_files; DEFUN ("gdbm-open", Fgdbm_open, Sgdbm_open, 3, 4, 0, "Open FILENAME as a gdbm database and assign it \n\ the ID IDNO where IDNO is an integer in range \n\ 0 to max-gdbm-open-files - 1 \n\ ACCESS specifies access rights as one of strings: \n\ r for read \n\ w for read/write \n\ c for create (if none exists)\n \ n for force create a new one even if one exists\n\ MODE if present on new db create specifies the \n\ file permissions as a number ala chmod\n\ Returns: gdbm file reference ID on success or\n\ nil on failure") (idno,file,access,mode) Lisp_Object idno, file, access, mode; { int imode,iaccess; GDBM_FILE dbf; unsigned char *caccess; struct gcpro gcpro1, gcpro2, gcpro3; Lisp_Object ef, ef1, val; ef = Qnil; GCPRO3 (file, ef, ef1); //ensure id number is in range CHECK_NUMBER(idno); if((XINT(idno) < 0) || XINT(idno) >= max_gdbm_open_files) error("gdbm ID out of range"); //if we haven't yet set up the open files vector //do it now if(!VECTORP (Vgdbm_open_files)) Vgdbm_open_files=Fmake_vector(make_number(max_gdbm_open_files), Qnil); //see if there is an open file at the idno ef = AREF(Vgdbm_open_files, XINT(idno)); if(!NILP (ef)){ if(!CONSP(ef) || !NUMBERP(CAR(ef))) error("gdbm-open-files corrupted"); //if so close it gdbm_close((GDBM_FILE) XPNTR(CAR(ef))); ASET(Vgdbm_open_files,XINT(idno),Qnil); } CHECK_STRING (file); CHECK_STRING (access); if(NILP (file))return Qnil; if(!NILP (mode)){ CHECK_NUMBER(mode); imode = XUINT (mode); } else imode = 0666; ef = Fexpand_file_name (file, Qnil); ef1 = ENCODE_FILE (ef); caccess = XSTRING (access)->data; if(NILP (access))iaccess = GDBM_READER; else { switch (caccess[0]) { case 'r': case 'R': iaccess = GDBM_READER; break; case 'w': case 'W': iaccess = GDBM_WRITER; break; case 'c': case 'C': iaccess = GDBM_WRCREAT; break; case 'n': case 'N': iaccess = GDBM_NEWDB; break; default: iaccess = GDBM_READER; } } dbf = gdbm_open((char *)XSTRING(ef1)->data,0,iaccess,imode,0); if(!dbf)return(Qnil); val = XPNTR((unsigned)dbf); ASET(Vgdbm_open_files,XINT(idno),Fcons(val,ef1)); UNGCPRO; return idno; } static Lisp_Object idToGdbmKey(Lisp_Object dbf) { Lisp_Object val; //ensure id number is in range CHECK_NUMBER(dbf); if((XINT(dbf) < 0) || XINT(dbf) >= max_gdbm_open_files) error("gdbm ID out of range"); if(!VECTORP (Vgdbm_open_files)) error("no open files"); //see if there is an open file at the idno val = AREF(Vgdbm_open_files, XINT(dbf)); if(NILP(val))error("operation but no gdbm file open"); if(!CONSP(val) || !NUMBERP(CAR(val))) error("gdbm-open-files corrupted"); return(XPNTR(CAR(val))); } DEFUN ("gdbm-close", Fgdbm_close, Sgdbm_close, 1, 1, 0, "Close a gdbm database of the specified number.") (dbf) Lisp_Object dbf; { GDBM_FILE idbf; int ival; Lisp_Object val; val = idToGdbmKey(dbf); gdbm_close((GDBM_FILE) val); ASET(Vgdbm_open_files,XINT(dbf),Qnil); return (Qt); } DEFUN ("gdbm-delete", Fgdbm_delete, Sgdbm_delete, 2, 2, 0, "Delete data from a gdbm database.\n\ KEY must be a string\n\ Returns: 0 on successful delete \n\ -1 if key not in database") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; int oval; GDBM_FILE odbf; datum okey; val = idToGdbmKey(dbf); CHECK_STRING (key); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char*)XSTRING (key)->data; //okey.dsize = XINT(Flength(key)); okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_delete(odbf, okey); val = make_number(oval); return(val); } DEFUN ("gdbm-store", Fgdbm_store, Sgdbm_store, 3, 3, 0, "Store data in a gdbm database.\n\ KEY and DATA must be strings \n\ (to save binary data use prin1-to-string on \n\ key and/or data)\n\ If KEY already exists in the database it will\n\ be replaced with the new DATA \n\ If DATA is nil or empty then KEY will be deleted.\n\ Returns: 0 on successful insert\n\ -1 if open for read and tries insert.") (dbf, key, data) Lisp_Object dbf, key, data; { Lisp_Object val; datum okey, odata; GDBM_FILE odbf; int ival; val = idToGdbmKey(dbf); CHECK_STRING (key); if(NILP(data))return(Fgdbm_delete(dbf, key)); CHECK_STRING (data); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); odata.dptr = (char *)XSTRING (data)->data; odata.dsize = STRING_BYTES(XSTRING (data)); if(okey.dsize == 0)ival=0; else ival = gdbm_store(odbf,okey,odata,GDBM_REPLACE); val = make_number(XUINT(ival)); return val; } DEFUN ("gdbm-fetch", Fgdbm_fetch, Sgdbm_fetch, 2, 2, 0, "Fetch data from a gdbm database.\n\ Returns: string data stored under KEY or nil \n\ if no data under that key.") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; GDBM_FILE odbf; datum okey,oval; val = idToGdbmKey(dbf); CHECK_STRING (key); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_fetch(odbf, okey); if(oval.dptr == NULL)return Qnil; val = make_string(oval.dptr, oval.dsize); free(oval.dptr); return val; } DEFUN ("gdbm-firstkey", Fgdbm_firstkey, Sgdbm_firstkey, 1, 1, 0, "Fetch first key data from a gdbm database.\n\ Returns: first key in GDBM hash or nil if none") (dbf) Lisp_Object dbf; { Lisp_Object val; GDBM_FILE odbf; datum oval; val = idToGdbmKey(dbf); odbf = (GDBM_FILE)XPNTR (val); oval = gdbm_firstkey(odbf); if(oval.dptr == NULL)return Qnil; val = make_string(oval.dptr, oval.dsize); free(oval.dptr); return val; } DEFUN ("gdbm-nextkey", Fgdbm_nextkey, Sgdbm_nextkey, 2, 2, 0, "Fetch next key data from a gdbm database.\n\ Returns: the key following KEY in the gdbm hash table\n\ or nil if KEY is the last key.") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; GDBM_FILE odbf; datum okey,oval; struct gcpro gcpro1; val = idToGdbmKey(dbf); GCPRO1 (val); CHECK_STRING (key); odbf = (GDBM_FILE)XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_nextkey(odbf, okey); if(oval.dptr == NULL)return Qnil; val = make_string(oval.dptr, oval.dsize); free(oval.dptr); UNGCPRO; return val; } DEFUN ("gdbm-exists", Fgdbm_exists, Sgdbm_exists, 2, 2, 0, "Returns t if KEY is in the hash otherwise nil") (dbf, key) Lisp_Object dbf, key; { Lisp_Object val; GDBM_FILE odbf; datum okey; int oval; val = idToGdbmKey(dbf); CHECK_STRING (key); odbf = (GDBM_FILE) XPNTR (val); okey.dptr = (char *)XSTRING (key)->data; okey.dsize = STRING_BYTES(XSTRING (key)); oval = gdbm_exists(odbf, okey); if(oval)return Qt; return Qnil; } DEFUN ("gdbm-reorganize", Fgdbm_reorganize, Sgdbm_reorganize, 1, 1, 0, "Reorganize a gdbm database.") (dbf) Lisp_Object dbf; { Lisp_Object val; int ival; GDBM_FILE odbf; val = idToGdbmKey(dbf); odbf = (GDBM_FILE)XPNTR (val); ival = gdbm_reorganize(odbf); val = make_number(ival); return val; } DEFUN ("gdbm-sync", Fgdbm_sync, Sgdbm_sync, 1, 1, 0, "Sync a gdbm database.\n\ Writes all buffered data to disk.") (dbf) Lisp_Object dbf; { Lisp_Object val; int ival; GDBM_FILE odbf; val = idToGdbmKey(dbf); odbf = (GDBM_FILE)XPNTR (val); gdbm_sync(odbf); return Qt; } void syms_of_gdbm () { DEFVAR_INT ("max-gdbm-open-files", &max_gdbm_open_files, "*Maximum number of open gdbm files."); max_gdbm_open_files=10; DEFVAR_INT ("gdbm_errno",(int *)&gdbm_errno, "*GDBM returned error number"); DEFVAR_LISP ("gdbm-open-files", &Vgdbm_open_files, "List of open GDBM files"); Vgdbm_open_files = Fmake_vector(make_number(max_gdbm_open_files),Qnil); Qgdbm_open_files = intern("gdbm-open-files"); staticpro(&Qgdbm_open_files); defsubr (&Sgdbm_open); defsubr (&Sgdbm_close); defsubr (&Sgdbm_store); defsubr (&Sgdbm_fetch); defsubr (&Sgdbm_delete); defsubr (&Sgdbm_firstkey); defsubr (&Sgdbm_nextkey); defsubr (&Sgdbm_exists); defsubr (&Sgdbm_reorganize); defsubr (&Sgdbm_sync); } ------------------------------------------------------------- Following are the changes to Makefile.in in emacs-22.1/src to include the gdbm.c module and the gdbm library in the build (Note that this could be handled better along with the max_open_gdbm_files as a configuration parameter/option) -------------------------------------------------------------- diff -r emacs-22.1/src/Makefile.in /users/barry/emacs-special/emacs-22.1/src/Makefile.in 589c589 < minibuf.o fileio.o dired.o filemode.o \ --- > minibuf.o fileio.o dired.o filemode.o gdbm.o\ 938c938 < LIBS_DEBUG $(GETLOADAVG_LIBS) $(GNULIB_VAR) LIB_MATH LIB_STANDARD \ --- > LIBS_DEBUG -lgdbm $(GETLOADAVG_LIBS) $(GNULIB_VAR) LIB_MATH LIB_STANDARD \ 1141a1142 > gdbm.o: gdbm.c $(config_h) blockinput.h commands.h keyboard.h dispextern.h charset.h coding.h --------------------------------------------------------------- --------------------------------------------------------------- Following are the changes to emacs.c to reference the gdbm.c module in the build: ---------------------------------------------------------------- diff -r emacs-22.1/src/emacs.c /users/barry/emacs-special/emacs-22.1/src/emacs.c 1562a1563 > syms_of_gdbm (); ---------------------------------------------------------------- Changelog entry: 2008-09-21 Barry Krofchick <barry.krofchick <at> sympatico.ca> * gdbm.c Added built-in gdbm-based persistent hash tables for lisp and other data ---------------------------------------------------------------- That's it. Thanks for all the great work on emacs, a beautiful piece of software. I hope you can include the gdbm hash tables in future releases. They are extremely useful for managing large persistent lisp knowledge bases, quickly and easily from within emacs lisp code. I had, prior to this implementation used external custom server to do the same job, with significant reduction in performance. Thanks, Barry barry.krofchick <at> sympatico.ca ------------------------------------------------------------------------ In GNU Emacs 22.1.1 (i686-pc-linux-gnu, X toolkit) of 2008-01-23 on benny Windowing system distributor `The XFree86 Project, Inc', version 11.0.40500000 Important settings: value of $LC_ALL: nil value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: nil locale-coding-system: nil default-enable-multibyte-characters: t Major mode: Info Minor modes in effect: shell-dirtrack-mode: t tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t unify-8859-on-encoding-mode: t utf-translate-cjk-mode: t auto-compression-mode: t line-number-mode: t abbrev-mode: t
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.