GNU bug report logs - #28302
26.0.50; [PATCH] Make ucs-names a hash table

Previous Next

Package: emacs;

Reported by: Mark Oteiza <mvoteiza <at> udel.edu>

Date: Thu, 31 Aug 2017 05:05:01 UTC

Severity: wishlist

Tags: patch

Found in version 26.0.50

Done: Mark Oteiza <mvoteiza <at> udel.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 28302 in the body.
You can then email your comments to 28302 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Thu, 31 Aug 2017 05:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mark Oteiza <mvoteiza <at> udel.edu>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 31 Aug 2017 05:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mark Oteiza <mvoteiza <at> udel.edu>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Thu, 31 Aug 2017 01:04:15 -0400
Hi,

I seem to remember there having been complaints about ucs-names preview
being slow.  I was curious about how much of that time was spent
assoc'ing every element of a roughly n = 42k element long alist, and so
tried making it a hash table instead.  The result is a drastic speedup
of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index 338ca6a6e3..8c5fcf319b 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -2923,10 +2923,10 @@ nonascii-translation-table
 (make-obsolete-variable 'nonascii-translation-table "do not use it." "23.1")
 
 (defvar ucs-names nil
-  "Alist of cached (CHAR-NAME . CHAR-CODE) pairs.")
+  "Hash table of cached (CHAR-NAME . CHAR-CODE) pairs.")
 
 (defun ucs-names ()
-  "Return alist of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
+  "Return hash table of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
   (or ucs-names
       (let ((ranges
 	     '((#x0000 . #x33FF)
@@ -2954,7 +2954,7 @@ ucs-names
 	       ;; (#x20000 . #xDFFFF) CJK Ideograph Extension A, B, etc, unused
 	       (#xE0000 . #xE01FF)))
 	    (gc-cons-threshold 10000000)
-	    names)
+	    (names (make-hash-table :size 42943 :test #'equal)))
 	(dolist (range ranges)
 	  (let ((c (car range))
 		(end (cdr range)))
@@ -2965,27 +2965,28 @@ ucs-names
 		;; shadows a "new-name" but in practice every time an
 		;; `old-name' conflicts with a `new-name', the newer one has a
 		;; higher code, so it gets pushed later!
-		(if new-name (push (cons new-name c) names))
-		(if old-name (push (cons old-name c) names))
+		(if new-name (puthash new-name c names))
+		(if old-name (puthash old-name c names))
 		(setq c (1+ c))))))
 	;; Special case for "BELL" which is apparently the only char which
 	;; doesn't have a new name and whose old-name is shadowed by a newer
 	;; char with that name.
-	(setq ucs-names `(("BELL (BEL)" . 7) ,@names)))))
+	(puthash "BELL (BEL)" ?\a names)
+        (setq ucs-names names))))
 
 (defun mule--ucs-names-annotation (name)
   ;; FIXME: It would be much better to add this annotation before rather than
   ;; after the char name, so the annotations are aligned.
   ;; FIXME: The default behavior of displaying annotations in italics
   ;; doesn't work well here.
-  (let ((char (assoc name ucs-names)))
-    (when char (format " (%c)" (cdr char)))))
+  (let ((char (gethash name ucs-names)))
+    (when char (format " (%c)" char))))
 
 (defun char-from-name (string &optional ignore-case)
   "Return a character as a number from its Unicode name STRING.
 If optional IGNORE-CASE is non-nil, ignore case in STRING.
 Return nil if STRING does not name a character."
-  (or (cdr (assoc-string string (ucs-names) ignore-case))
+  (or (gethash (if ignore-case (upcase string) string) (ucs-names))
       (let ((minus (string-match-p "-[0-9A-F]+\\'" string)))
         (when minus
           ;; Parse names like "VARIATION SELECTOR-17" and "CJK




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Thu, 31 Aug 2017 10:06:02 GMT) Full text and rfc822 format available.

Message #8 received at 28302 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Mark Oteiza <mvoteiza <at> udel.edu>
Cc: 28302 <at> debbugs.gnu.org
Subject: Re: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Thu, 31 Aug 2017 12:05:46 +0200
Mark Oteiza <mvoteiza <at> udel.edu> writes:

> Hi,
>
> I seem to remember there having been complaints about ucs-names preview
> being slow.  I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead.  The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).

I haven't timed it exactly, but it makes a *very* noticeable
difference here. Thanks for this.

Regards

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Thu, 31 Aug 2017 14:02:02 GMT) Full text and rfc822 format available.

Message #11 received at 28302 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mark Oteiza <mvoteiza <at> udel.edu>
Cc: 28302 <at> debbugs.gnu.org
Subject: Re: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Thu, 31 Aug 2017 17:00:19 +0300
> From: Mark Oteiza <mvoteiza <at> udel.edu>
> Date: Thu, 31 Aug 2017 01:04:15 -0400
> 
> I seem to remember there having been complaints about ucs-names preview
> being slow.  I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead.  The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).

Thanks, this is a very good change.  Please make sure (if you haven't
already) that it survives bootstrap.

Also, there are other places which assume that ucs-names is an alist,
so I guess this is not the full final patch?

And this should be mentioned in NEWS under incompatible Lisp changes,
as ucs-names debuted in Emacs 23.1, and there could be some uses of it
outside Emacs proper.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Thu, 31 Aug 2017 14:02:02 GMT) Full text and rfc822 format available.

Message #14 received at 28302 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Robert Pluim <rpluim <at> gmail.com>
Cc: mvoteiza <at> udel.edu, 28302 <at> debbugs.gnu.org
Subject: Re: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Thu, 31 Aug 2017 17:01:10 +0300
> From: Robert Pluim <rpluim <at> gmail.com>
> Date: Thu, 31 Aug 2017 12:05:46 +0200
> Cc: 28302 <at> debbugs.gnu.org
> 
> I haven't timed it exactly, but it makes a *very* noticeable
> difference here.

About ten-fold, according to my measurements.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Thu, 31 Aug 2017 14:47:02 GMT) Full text and rfc822 format available.

Message #17 received at 28302 <at> debbugs.gnu.org (full text, mbox):

From: Mark Oteiza <mvoteiza <at> udel.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 28302 <at> debbugs.gnu.org
Subject: Re: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Thu, 31 Aug 2017 10:46:16 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Mark Oteiza <mvoteiza <at> udel.edu>
>> Date: Thu, 31 Aug 2017 01:04:15 -0400
>> 
>> I seem to remember there having been complaints about ucs-names preview
>> being slow.  I was curious about how much of that time was spent
>> assoc'ing every element of a roughly n = 42k element long alist, and so
>> tried making it a hash table instead.  The result is a drastic speedup
>> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
>
> Thanks, this is a very good change.  Please make sure (if you haven't
> already) that it survives bootstrap.
>
> Also, there are other places which assume that ucs-names is an alist,
> so I guess this is not the full final patch?

Yes, since posting I found the other two places where ucs-names is used
and have made changes there.

> And this should be mentioned in NEWS under incompatible Lisp changes,
> as ucs-names debuted in Emacs 23.1, and there could be some uses of it
> outside Emacs proper.

Thanks, patch with additional changes attached.  I don't really like
repeating the string in descr-text.el, but BEL is the only outlier and I
think it would be overkill to write a rassoc analog.

diff --git a/etc/NEWS b/etc/NEWS
index e8d6ea9c6d..e8f6aec9ef 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1153,6 +1153,9 @@ table implementation. This uses a new bytecode op 'switch', which isn't
 compatible with previous Emacs versions. This functionality can be disabled
 by setting 'byte-compile-cond-use-jump-table' to nil.
 
+---
+** The variable 'ucs-names' is now a hash table.
+
 ** 'C-up', 'C-down', 'C-left' and 'C-right' are now defined in term
 mode to send the same escape sequences that xterm does.  This makes
 things like forward-word in readline work.
diff --git a/lisp/descr-text.el b/lisp/descr-text.el
index 6f36bbed68..b3c96988dd 100644
--- a/lisp/descr-text.el
+++ b/lisp/descr-text.el
@@ -617,16 +617,16 @@ describe-char
 			 (list
                           (let* ((names (ucs-names))
                                  (name
-                                  (or (when (= char 7)
+                                  (or (when (= char ?\a)
 				       ;; Special case for "BELL" which is
 				       ;; apparently the only char which
 				       ;; doesn't have a new name and whose
 				       ;; old-name is shadowed by a newer char
 				       ;; with that name (bug#25641).
-				       (car (rassoc char names)))
+				       "BELL (BEL)")
                                       (get-char-code-property char 'name)
                                       (get-char-code-property char 'old-name))))
-                            (if (and name (assoc-string name names))
+                            (if (and name (gethash name names))
                                 (format
                                  "type \"C-x 8 RET %x\" or \"C-x 8 RET %s\""
                                  char name)
diff --git a/lisp/leim/quail/latin-ltx.el b/lisp/leim/quail/latin-ltx.el
index 6c5afcd4f9..778706a451 100644
--- a/lisp/leim/quail/latin-ltx.el
+++ b/lisp/leim/quail/latin-ltx.el
@@ -75,20 +75,20 @@
           (`(,seq ,re)
            (let ((count 0)
                  (re (eval re t)))
-             (dolist (pair (ucs-names))
-               (let ((name (car pair))
-                     (char (cdr pair)))
-                 (when (and (characterp char) ;; Ignore char-ranges.
-                            (string-match re name))
-                   (let ((keys (if (stringp seq)
-                                   (replace-match seq nil nil name)
-                                 (funcall seq name char))))
-                     (if (listp keys)
-                         (dolist (x keys)
-                           (setq count (1+ count))
-                           (push (list x char) newrules))
-                       (setq count (1+ count))
-                       (push (list keys char) newrules))))))
+             (maphash
+              (lambda (name char) 
+                (when (and (characterp char) ;; Ignore char-ranges.
+                           (string-match re name))
+                  (let ((keys (if (stringp seq)
+                                  (replace-match seq nil nil name)
+                                (funcall seq name char))))
+                    (if (listp keys)
+                        (dolist (x keys)
+                          (setq count (1+ count))
+                          (push (list x char) newrules))
+                      (setq count (1+ count))
+                      (push (list keys char) newrules)))))
+               (ucs-names))
              ;; (message "latin-ltx: %d mappings for %S" count re)
 	     ))))
       (setq newrules (delete-dups newrules))
@@ -206,7 +206,7 @@
 
  ((lambda (name char)
     (let* ((base (concat (match-string 1 name) (match-string 3 name)))
-           (basechar (cdr (assoc base (ucs-names)))))
+           (basechar (gethash base (ucs-names))))
       (when (latin-ltx--ascii-p basechar)
         (string (if (match-end 2) ?^ ?_) basechar))))
   "\\(.*\\)SU\\(?:B\\|\\(PER\\)\\)SCRIPT \\(.*\\)")




Reply sent to Mark Oteiza <mvoteiza <at> udel.edu>:
You have taken responsibility. (Thu, 31 Aug 2017 21:31:01 GMT) Full text and rfc822 format available.

Notification sent to Mark Oteiza <mvoteiza <at> udel.edu>:
bug acknowledged by developer. (Thu, 31 Aug 2017 21:31:02 GMT) Full text and rfc822 format available.

Message #22 received at 28302-done <at> debbugs.gnu.org (full text, mbox):

From: Mark Oteiza <mvoteiza <at> udel.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 28302-done <at> debbugs.gnu.org
Subject: Re: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Thu, 31 Aug 2017 17:30:49 -0400
Pushed as 96c2c09.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Thu, 31 Aug 2017 22:16:01 GMT) Full text and rfc822 format available.

Message #25 received at 28302-done <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Mark Oteiza <mvoteiza <at> udel.edu>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 28302-done <at> debbugs.gnu.org
Subject: RE: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Thu, 31 Aug 2017 15:15:07 -0700 (PDT)
> Pushed as 96c2c09.

Judging only by the Subject line, without looking at the code,
I'd guess that this will break user code that counts on `ucs-names'
being an alist.

I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
and Icicles.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Fri, 01 Sep 2017 06:24:01 GMT) Full text and rfc822 format available.

Message #28 received at 28302 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: mvoteiza <at> udel.edu, 28302 <at> debbugs.gnu.org
Subject: Re: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Fri, 01 Sep 2017 09:23:05 +0300
> Date: Thu, 31 Aug 2017 15:15:07 -0700 (PDT)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 28302-done <at> debbugs.gnu.org
> 
> Judging only by the Subject line, without looking at the code,
> I'd guess that this will break user code that counts on `ucs-names'
> being an alist.

Yes, which is why it is mentioned as an incompatible change in NEWS.

> I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
> and Icicles.

I suggest to update them to support a hash table as well as an alist.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28302; Package emacs. (Fri, 01 Sep 2017 10:39:02 GMT) Full text and rfc822 format available.

Message #31 received at 28302 <at> debbugs.gnu.org (full text, mbox):

From: Thien-Thi Nguyen <ttn <at> gnu.org>
To: 28302 <at> debbugs.gnu.org
Subject: Re: bug#28302: 26.0.50; [PATCH] Make ucs-names a hash table
Date: Fri, 01 Sep 2017 12:43:16 +0200
[Message part 1 (text/plain, inline)]
() Eli Zaretskii <eliz <at> gnu.org>
() Fri, 01 Sep 2017 09:23:05 +0300

   I suggest to update them to support a hash table as well as an
   alist.

Tangent: Maybe we could extend ‘ucs-names’ (the function) to
take args and DTRT accordingly.  Something like:

 ;; backward compatible
 (ucs-names) => ALIST

 ;; lookup
 (ucs-names 'forward-lookup CHAR-NAME) => CHAR-CODE
 (ucs-names 'reverse-lookup CHAR-CODE) => CHAR-NAME

 ;; bonus: reflection
 (ucs-names 'as-alist) => ALIST
 (ucs-names 'as-hash-table) => HASH-TABLE

Admittedly, this is not very idiomatic Emacs Lisp.  OTOH, i
would guess the vast majority of callers use it for lookup, so
centralizing that functionality would be a net win, despite the
style drift.

Really, given the rise of lexical binding and w/ niceties like
‘apply-partially’ already in the mix, i expect that sooner or
later, someone will put into place something like:

 (defun callable-hash-table (source)
   (let ((ht (ELABORATE source)))
     ;; rv
     (lambda (&optional cmd)
       (case cmd
         (as-hash-table ht)
         (as-alist ...)
         (forward-lookup ...)
         (reverse-lookup ...)
         ...))))

 (fset ucs-names (callable-hash-table ucs-names))

IOW:  Style drift be damned!  Up the idioms!  HOP things now!
(Insert more FP slogans here. :-D)

Or maybe this is already done?  What am i missing?  More coffee!

-- 
Thien-Thi Nguyen -----------------------------------------------
 (defun responsep (query)
   (pcase (context query)
     (`(technical ,ml) (correctp ml))
     ...))                              748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502

[signature.asc (application/pgp-signature, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 29 Sep 2017 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 322 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.