GNU bug report logs -
#65996
29.1; UCS normalization is wrong
Previous Next
Reported by: awrhygty <at> outlook.com
Date: Fri, 15 Sep 2023 12:51:02 UTC
Severity: normal
Found in version 29.1
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #10 received at 65996-done <at> debbugs.gnu.org (full text, mbox):
> From: awrhygty <at> outlook.com
> Date: Fri, 15 Sep 2023 21:49:38 +0900
>
>
> UCS normalization is wrong for some characters.
>
> (1) NFD/NFKD decompostion is not done
> U+1112E 𑄮 CHAKMA VOWEL SIGN O
> U+1112F 𑄯 CHAKMA VOWEL SIGN AU
> U+1134B 𑍋 GRANTHA VOWEL SIGN OO
> U+1134C 𑍌 GRANTHA VOWEL SIGN AU
> U+114BB 𑒻 TIRHUTA VOWEL SIGN AI
> U+114BC 𑒼 TIRHUTA VOWEL SIGN O
> U+114BE 𑒾 TIRHUTA VOWEL SIGN AU
> U+115BA 𑖺 SIDDHAM VOWEL SIGN O
> U+115BB 𑖻 SIDDHAM VOWEL SIGN AU
> U+11938 𑤸 DIVES AKURU VOWEL SIGN O
>
> (let ((s "\U0001112E\U0001112F\U0001134B\U0001134C\
> \U000114BB\U000114BC\U000114BE\U000115BA\U000115BB\U00011938"))
> (require 'ucs-normalize)
> (list (equal s (ucs-normalize-NFD-string s))
> (equal s (ucs-normalize-NFKD-string s))))
> =>(t t)
>
> (2) NFKC/NFKD replacement is not done
> U+1E030..U+1E06D Cyrillic MODIFIER LETTER or SUBSCRIPT
> U+1EE00..U+1EEBB ARABIC MATHEMATICAL *
> U+1FBF0..U+1FBF9 SEGMENTED DIGIT *
>
> (let* ((f (lambda (cell)
> (apply #'string (number-sequence (car cell) (cdr cell)))))
> (s (mapconcat f '((#x1E030 . #x1E06D)
> (#x1EE00 . #x1EEBB)
> (#x1FBF0 . #x1FBF9)))))
> (require 'ucs-normalize)
> (list (equal s (ucs-normalize-NFKC-string s))
> (equal s (ucs-normalize-NFKD-string s))))
> =>(t t)
Thanks, fixed on the emacs-29 branch.
Once again, if (as I'm guessing) you found these problems by examining
the data in ucs-normalize.el, it would have greatly helped if you'd
pointed to the problematic data in your report. Reverse-engineering
the sources of the problem from the behavior takes time, especially
when the relevant code is not trivial and was written by someone else.
This bug report was last modified 1 year and 251 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.