#51733 - 27.1; Detect impossible email addresses better

GNU bug report logs - #51733
27.1; Detect impossible email addresses better

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Wed, 10 Nov 2021 00:29:01 UTC

Severity: wishlist

Found in version 27.1

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Lars Ingebrigtsen <larsi <at> gnus.org> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 51733 <at> debbugs.gnu.org, jidanni <at> jidanni.org Subject: bug#51733: 27.1; Detect impossible email addresses better Date: Wed, 19 Jan 2022 14:55:35 +0100

Eli Zaretskii <eliz <at> gnu.org> writes: > I think we should first determine what kinds of applications may need > this, and take it from there. The initial number of "confusability > with" classes can be very small, and we can add more as we discover > interesting use cases. The full number is pretty much infinite, I > think, but I'm not sure Emacs needs to support all of them OOTB. We > could support some of the popular ones, and provide infrastructure for > developing more. Yes. I was thinking about this bit, which isn't implemented yet (although the utility functions for it basically are). ---- The process of determining suspect usage of whole-script confusables is more complicated than simply looking at the scripts of the labels in a domain name. For example, it can be perfectly legitimate to have scripts in a SLD (second level domain) not be the same as scripts in a TLD (top-level domain), such as: Cyrillic labels in a domain name with a TLD of .ru or .рф Chinese labels in a domain name with a TLD of .com.au or .com Cyrillic labels that aren’t confusable with Latin with a TLD of .com.au or .com The following high-level algorithm can be used to determine all scripts that contain a whole-script confusable with a string X: Consider Q, the set of all strings confusable with X. Remove all strings from Q whose resolved script set is ∅ or ALL (that is, keep only single-script strings plus those with characters only in Common). Take the union of the resolved script sets of all strings remaining in Q. As usual, this algorithm is intended only as a definition; implementations should use an optimized routine that produces the same result. ---- I'm not sure I understand the algorithm they're proposing. I think this shouldn't be suspicious? But I may be wrong: (textsec-domain-suspicious-p "Сгсе.рф") => nil But this should be, but isn't currently: (textsec-domain-suspicious-p "Сгсе.ru") => nil Now, (textsec-ascii-confusable-p "Сгсе.ru") => t and (textsec-ascii-confusable-p "Сгсе.рф") => nil Is that what they mean here? I'm finding the logic overly clear here. -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no

This bug report was last modified 3 years and 172 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #51733 27.1; Detect impossible email addresses better

GNU bug report logs - #51733
27.1; Detect impossible email addresses better