GNU bug report logs -
#51733
27.1; Detect impossible email addresses better
Previous Next
Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Date: Wed, 10 Nov 2021 00:29:01 UTC
Severity: wishlist
Found in version 27.1
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #140 received at 51733 <at> debbugs.gnu.org (full text, mbox):
I'm not quite sure I understand this bit here
https://www.unicode.org/reports/tr39/#Confusable_Detection
---
For an input string X, define skeleton(X) to be the following transformation on the string:
Convert X to NFD format, as described in [UAX15].
Concatenate the prototypes for each character in X according to the specified data, producing a string of exemplar characters.
Reapply NFD.
---
I mean, that sounds OK in and of itself, but then:
---
X and Y are single-script confusables if and only if they are confusable, and their resolved script sets have at least one element in common.
Examples: “ljeto” and “ljeto” in Latin (the Croatian word for “summer”), where the first word uses only four codepoints, the first of which is U+01C9 (lj) LATIN SMALL LETTER LJ.
---
But:
(ucs-normalize-NFD-string "ljeto")
=> "ljeto"
So according to that algo "ljeto" and "ljeto" are not confusable.
But if we use NFKD instead, they are:
(ucs-normalize-NFKD-string "ljeto")
=> "ljeto"
It seems unlikely to be a typo in this document, surely? But NFKD seems
to make a whole lot more sense than NFD for this usage. I must be
missing or misreading something.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
This bug report was last modified 3 years and 124 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.