GNU bug report logs -
#28302
26.0.50; [PATCH] Make ucs-names a hash table
Previous Next
Reported by: Mark Oteiza <mvoteiza <at> udel.edu>
Date: Thu, 31 Aug 2017 05:05:01 UTC
Severity: wishlist
Tags: patch
Found in version 26.0.50
Done: Mark Oteiza <mvoteiza <at> udel.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 28302 in the body.
You can then email your comments to 28302 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Thu, 31 Aug 2017 05:05:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Mark Oteiza <mvoteiza <at> udel.edu>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 31 Aug 2017 05:05:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
I seem to remember there having been complaints about ucs-names preview
being slow. I was curious about how much of that time was spent
assoc'ing every element of a roughly n = 42k element long alist, and so
tried making it a hash table instead. The result is a drastic speedup
of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index 338ca6a6e3..8c5fcf319b 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -2923,10 +2923,10 @@ nonascii-translation-table
(make-obsolete-variable 'nonascii-translation-table "do not use it." "23.1")
(defvar ucs-names nil
- "Alist of cached (CHAR-NAME . CHAR-CODE) pairs.")
+ "Hash table of cached (CHAR-NAME . CHAR-CODE) pairs.")
(defun ucs-names ()
- "Return alist of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
+ "Return hash table of (CHAR-NAME . CHAR-CODE) pairs cached in `ucs-names'."
(or ucs-names
(let ((ranges
'((#x0000 . #x33FF)
@@ -2954,7 +2954,7 @@ ucs-names
;; (#x20000 . #xDFFFF) CJK Ideograph Extension A, B, etc, unused
(#xE0000 . #xE01FF)))
(gc-cons-threshold 10000000)
- names)
+ (names (make-hash-table :size 42943 :test #'equal)))
(dolist (range ranges)
(let ((c (car range))
(end (cdr range)))
@@ -2965,27 +2965,28 @@ ucs-names
;; shadows a "new-name" but in practice every time an
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
- (if new-name (push (cons new-name c) names))
- (if old-name (push (cons old-name c) names))
+ (if new-name (puthash new-name c names))
+ (if old-name (puthash old-name c names))
(setq c (1+ c))))))
;; Special case for "BELL" which is apparently the only char which
;; doesn't have a new name and whose old-name is shadowed by a newer
;; char with that name.
- (setq ucs-names `(("BELL (BEL)" . 7) ,@names)))))
+ (puthash "BELL (BEL)" ?\a names)
+ (setq ucs-names names))))
(defun mule--ucs-names-annotation (name)
;; FIXME: It would be much better to add this annotation before rather than
;; after the char name, so the annotations are aligned.
;; FIXME: The default behavior of displaying annotations in italics
;; doesn't work well here.
- (let ((char (assoc name ucs-names)))
- (when char (format " (%c)" (cdr char)))))
+ (let ((char (gethash name ucs-names)))
+ (when char (format " (%c)" char))))
(defun char-from-name (string &optional ignore-case)
"Return a character as a number from its Unicode name STRING.
If optional IGNORE-CASE is non-nil, ignore case in STRING.
Return nil if STRING does not name a character."
- (or (cdr (assoc-string string (ucs-names) ignore-case))
+ (or (gethash (if ignore-case (upcase string) string) (ucs-names))
(let ((minus (string-match-p "-[0-9A-F]+\\'" string)))
(when minus
;; Parse names like "VARIATION SELECTOR-17" and "CJK
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Thu, 31 Aug 2017 10:06:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 28302 <at> debbugs.gnu.org (full text, mbox):
Mark Oteiza <mvoteiza <at> udel.edu> writes:
> Hi,
>
> I seem to remember there having been complaints about ucs-names preview
> being slow. I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead. The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
I haven't timed it exactly, but it makes a *very* noticeable
difference here. Thanks for this.
Regards
Robert
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Thu, 31 Aug 2017 14:02:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 28302 <at> debbugs.gnu.org (full text, mbox):
> From: Mark Oteiza <mvoteiza <at> udel.edu>
> Date: Thu, 31 Aug 2017 01:04:15 -0400
>
> I seem to remember there having been complaints about ucs-names preview
> being slow. I was curious about how much of that time was spent
> assoc'ing every element of a roughly n = 42k element long alist, and so
> tried making it a hash table instead. The result is a drastic speedup
> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
Thanks, this is a very good change. Please make sure (if you haven't
already) that it survives bootstrap.
Also, there are other places which assume that ucs-names is an alist,
so I guess this is not the full final patch?
And this should be mentioned in NEWS under incompatible Lisp changes,
as ucs-names debuted in Emacs 23.1, and there could be some uses of it
outside Emacs proper.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Thu, 31 Aug 2017 14:02:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 28302 <at> debbugs.gnu.org (full text, mbox):
> From: Robert Pluim <rpluim <at> gmail.com>
> Date: Thu, 31 Aug 2017 12:05:46 +0200
> Cc: 28302 <at> debbugs.gnu.org
>
> I haven't timed it exactly, but it makes a *very* noticeable
> difference here.
About ten-fold, according to my measurements.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Thu, 31 Aug 2017 14:47:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 28302 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Mark Oteiza <mvoteiza <at> udel.edu>
>> Date: Thu, 31 Aug 2017 01:04:15 -0400
>>
>> I seem to remember there having been complaints about ucs-names preview
>> being slow. I was curious about how much of that time was spent
>> assoc'ing every element of a roughly n = 42k element long alist, and so
>> tried making it a hash table instead. The result is a drastic speedup
>> of C-x 8 RET TAB, presumably this makes the operation O(n) vs O(n^2).
>
> Thanks, this is a very good change. Please make sure (if you haven't
> already) that it survives bootstrap.
>
> Also, there are other places which assume that ucs-names is an alist,
> so I guess this is not the full final patch?
Yes, since posting I found the other two places where ucs-names is used
and have made changes there.
> And this should be mentioned in NEWS under incompatible Lisp changes,
> as ucs-names debuted in Emacs 23.1, and there could be some uses of it
> outside Emacs proper.
Thanks, patch with additional changes attached. I don't really like
repeating the string in descr-text.el, but BEL is the only outlier and I
think it would be overkill to write a rassoc analog.
diff --git a/etc/NEWS b/etc/NEWS
index e8d6ea9c6d..e8f6aec9ef 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1153,6 +1153,9 @@ table implementation. This uses a new bytecode op 'switch', which isn't
compatible with previous Emacs versions. This functionality can be disabled
by setting 'byte-compile-cond-use-jump-table' to nil.
+---
+** The variable 'ucs-names' is now a hash table.
+
** 'C-up', 'C-down', 'C-left' and 'C-right' are now defined in term
mode to send the same escape sequences that xterm does. This makes
things like forward-word in readline work.
diff --git a/lisp/descr-text.el b/lisp/descr-text.el
index 6f36bbed68..b3c96988dd 100644
--- a/lisp/descr-text.el
+++ b/lisp/descr-text.el
@@ -617,16 +617,16 @@ describe-char
(list
(let* ((names (ucs-names))
(name
- (or (when (= char 7)
+ (or (when (= char ?\a)
;; Special case for "BELL" which is
;; apparently the only char which
;; doesn't have a new name and whose
;; old-name is shadowed by a newer char
;; with that name (bug#25641).
- (car (rassoc char names)))
+ "BELL (BEL)")
(get-char-code-property char 'name)
(get-char-code-property char 'old-name))))
- (if (and name (assoc-string name names))
+ (if (and name (gethash name names))
(format
"type \"C-x 8 RET %x\" or \"C-x 8 RET %s\""
char name)
diff --git a/lisp/leim/quail/latin-ltx.el b/lisp/leim/quail/latin-ltx.el
index 6c5afcd4f9..778706a451 100644
--- a/lisp/leim/quail/latin-ltx.el
+++ b/lisp/leim/quail/latin-ltx.el
@@ -75,20 +75,20 @@
(`(,seq ,re)
(let ((count 0)
(re (eval re t)))
- (dolist (pair (ucs-names))
- (let ((name (car pair))
- (char (cdr pair)))
- (when (and (characterp char) ;; Ignore char-ranges.
- (string-match re name))
- (let ((keys (if (stringp seq)
- (replace-match seq nil nil name)
- (funcall seq name char))))
- (if (listp keys)
- (dolist (x keys)
- (setq count (1+ count))
- (push (list x char) newrules))
- (setq count (1+ count))
- (push (list keys char) newrules))))))
+ (maphash
+ (lambda (name char)
+ (when (and (characterp char) ;; Ignore char-ranges.
+ (string-match re name))
+ (let ((keys (if (stringp seq)
+ (replace-match seq nil nil name)
+ (funcall seq name char))))
+ (if (listp keys)
+ (dolist (x keys)
+ (setq count (1+ count))
+ (push (list x char) newrules))
+ (setq count (1+ count))
+ (push (list keys char) newrules)))))
+ (ucs-names))
;; (message "latin-ltx: %d mappings for %S" count re)
))))
(setq newrules (delete-dups newrules))
@@ -206,7 +206,7 @@
((lambda (name char)
(let* ((base (concat (match-string 1 name) (match-string 3 name)))
- (basechar (cdr (assoc base (ucs-names)))))
+ (basechar (gethash base (ucs-names))))
(when (latin-ltx--ascii-p basechar)
(string (if (match-end 2) ?^ ?_) basechar))))
"\\(.*\\)SU\\(?:B\\|\\(PER\\)\\)SCRIPT \\(.*\\)")
Reply sent
to
Mark Oteiza <mvoteiza <at> udel.edu>
:
You have taken responsibility.
(Thu, 31 Aug 2017 21:31:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Mark Oteiza <mvoteiza <at> udel.edu>
:
bug acknowledged by developer.
(Thu, 31 Aug 2017 21:31:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 28302-done <at> debbugs.gnu.org (full text, mbox):
Pushed as 96c2c09.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Thu, 31 Aug 2017 22:16:01 GMT)
Full text and
rfc822 format available.
Message #25 received at 28302-done <at> debbugs.gnu.org (full text, mbox):
> Pushed as 96c2c09.
Judging only by the Subject line, without looking at the code,
I'd guess that this will break user code that counts on `ucs-names'
being an alist.
I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
and Icicles.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Fri, 01 Sep 2017 06:24:01 GMT)
Full text and
rfc822 format available.
Message #28 received at 28302 <at> debbugs.gnu.org (full text, mbox):
> Date: Thu, 31 Aug 2017 15:15:07 -0700 (PDT)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 28302-done <at> debbugs.gnu.org
>
> Judging only by the Subject line, without looking at the code,
> I'd guess that this will break user code that counts on `ucs-names'
> being an alist.
Yes, which is why it is mentioned as an incompatible change in NEWS.
> I, for one, have 3 libraries that do that: `ucs-cmds.el', `apu.el',
> and Icicles.
I suggest to update them to support a hash table as well as an alist.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#28302
; Package
emacs
.
(Fri, 01 Sep 2017 10:39:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 28302 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
() Eli Zaretskii <eliz <at> gnu.org>
() Fri, 01 Sep 2017 09:23:05 +0300
I suggest to update them to support a hash table as well as an
alist.
Tangent: Maybe we could extend ‘ucs-names’ (the function) to
take args and DTRT accordingly. Something like:
;; backward compatible
(ucs-names) => ALIST
;; lookup
(ucs-names 'forward-lookup CHAR-NAME) => CHAR-CODE
(ucs-names 'reverse-lookup CHAR-CODE) => CHAR-NAME
;; bonus: reflection
(ucs-names 'as-alist) => ALIST
(ucs-names 'as-hash-table) => HASH-TABLE
Admittedly, this is not very idiomatic Emacs Lisp. OTOH, i
would guess the vast majority of callers use it for lookup, so
centralizing that functionality would be a net win, despite the
style drift.
Really, given the rise of lexical binding and w/ niceties like
‘apply-partially’ already in the mix, i expect that sooner or
later, someone will put into place something like:
(defun callable-hash-table (source)
(let ((ht (ELABORATE source)))
;; rv
(lambda (&optional cmd)
(case cmd
(as-hash-table ht)
(as-alist ...)
(forward-lookup ...)
(reverse-lookup ...)
...))))
(fset ucs-names (callable-hash-table ucs-names))
IOW: Style drift be damned! Up the idioms! HOP things now!
(Insert more FP slogans here. :-D)
Or maybe this is already done? What am i missing? More coffee!
--
Thien-Thi Nguyen -----------------------------------------------
(defun responsep (query)
(pcase (context query)
(`(technical ,ml) (correctp ml))
...)) 748E A0E8 1CB8 A748 9BFA
--------------------------------------- 6CE4 6703 2224 4C80 7502
[signature.asc (application/pgp-signature, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 29 Sep 2017 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 7 years and 322 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.