GNU bug report logs -
#65997
29.1; ?\N{char_name} reference is wrong
Previous Next
Reported by: awrhygty <at> outlook.com
Date: Fri, 15 Sep 2023 13:04:01 UTC
Severity: normal
Tags: fixed
Found in version 29.1
Fixed in version 29.2
Done: Robert Pluim <rpluim <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
>>>>> On Fri, 15 Sep 2023 22:02:37 +0900, awrhygty <at> outlook.com said:
awrhygty> S-exps in the form of ?\N{char_name} return wrong values for some
awrhygty> characters.
awrhygty> The S-exp below inserts a whole list of such characters.
awrhygty> (dotimes (u (1+ (max-char 'ucs)))
awrhygty> (let* ((name (get-char-code-property u 'name)))
awrhygty> (when (and name (not (<= #xD800 u #xDFFF)))
awrhygty> (let ((u2 (condition-case err
awrhygty> (read (format "?\\N{%s}" name))
awrhygty> (error 0))))
awrhygty> (unless (eq u u2)
awrhygty> (insert (format "%X\t%s\t%X\t%s\n" u name u2
awrhygty> (if (= 0 u2)
awrhygty> "error"
awrhygty> (get-char-code-property u2 'name)))))))))
For a minute there I thought our hash tables were broken :-). Stefan,
it only took 9 years, but this is no longer true:
lisp/international/mule-cmds.el:
;; In theory this code could end up pushing an "old-name" that
;; shadows a "new-name" but in practice every time an
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
The patch below fixes that issue.
awrhygty> output(TANGUT COMPONENTs are omitted):
I donʼt know why the ranges in `ucs-names' donʼt cover these
code-points. Itʼs easy enough to change them, but theyʼre
explicitly commented out.
awrhygty> 16FE4 KHITAN SMALL SCRIPT FILLER 0 error
awrhygty> 16FF0 VIETNAMESE ALTERNATE READING MARK CA 0 error
awrhygty> 16FF1 VIETNAMESE ALTERNATE READING MARK NHAY 0 error
awrhygty> 1B132 HIRAGANA LETTER SMALL KO 0 error
And similarly for these 4.
Robert
--
diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index c26898f7649..254ecae5bd5 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -3135,7 +3135,9 @@ ucs-names
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
(if new-name (puthash new-name c names))
- (if old-name (puthash old-name c names))
+ (when (and old-name
+ (not (gethash old-name names)))
+ (puthash old-name c names))
;; Unicode uses the spelling "lamda" in character
;; names, instead of "lambda", due to "preferences
;; expressed by the Greek National Body" (Bug#30513).
This bug report was last modified 1 year and 297 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.