GNU bug report logs - #51733
27.1; Detect impossible email addresses better

Previous Next

Packages: gnus, emacs;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Wed, 10 Nov 2021 00:29:01 UTC

Severity: wishlist

Found in version 27.1

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51733 <at> debbugs.gnu.org
Subject: bug#51733: 27.1; Detect impossible email addresses better
Date: Mon, 17 Jan 2022 21:22:58 +0100
I'm not quite sure I understand this bit here
https://www.unicode.org/reports/tr39/#Confusable_Detection

---
For an input string X, define skeleton(X) to be the following transformation on the string:

    Convert X to NFD format, as described in [UAX15].
    Concatenate the prototypes for each character in X according to the specified data, producing a string of exemplar characters.
    Reapply NFD.
---

I mean, that sounds OK in and of itself, but then:

---
 X and Y are single-script confusables if and only if they are confusable, and their resolved script sets have at least one element in common.

    Examples: “ljeto” and “ljeto” in Latin (the Croatian word for “summer”), where the first word uses only four codepoints, the first of which is U+01C9 (lj) LATIN SMALL LETTER LJ.
---

But:

(ucs-normalize-NFD-string "ljeto")
=> "ljeto"

So according to that algo "ljeto" and "ljeto" are not confusable.

But if we use NFKD instead, they are:

(ucs-normalize-NFKD-string "ljeto")
=> "ljeto"

It seems unlikely to be a typo in this document, surely?  But NFKD seems
to make a whole lot more sense than NFD for this usage.  I must be
missing or misreading something.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





This bug report was last modified 3 years and 124 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.