GNU bug report logs -
#51733
27.1; Detect impossible email addresses better
Previous Next
Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Date: Wed, 10 Nov 2021 00:29:01 UTC
Severity: wishlist
Found in version 27.1
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Eli Zaretskii <eliz <at> gnu.org> writes:
> I think we should first determine what kinds of applications may need
> this, and take it from there. The initial number of "confusability
> with" classes can be very small, and we can add more as we discover
> interesting use cases. The full number is pretty much infinite, I
> think, but I'm not sure Emacs needs to support all of them OOTB. We
> could support some of the popular ones, and provide infrastructure for
> developing more.
Yes.
I was thinking about this bit, which isn't implemented yet (although the
utility functions for it basically are).
----
The process of determining suspect usage of whole-script confusables is more complicated than simply looking at the scripts of the labels in a domain name. For example, it can be perfectly legitimate to have scripts in a SLD (second level domain) not be the same as scripts in a TLD (top-level domain), such as:
Cyrillic labels in a domain name with a TLD of .ru or .рф
Chinese labels in a domain name with a TLD of .com.au or .com
Cyrillic labels that aren’t confusable with Latin with a TLD of .com.au or .com
The following high-level algorithm can be used to determine all scripts that contain a whole-script confusable with a string X:
Consider Q, the set of all strings confusable with X.
Remove all strings from Q whose resolved script set is ∅ or ALL (that is, keep only single-script strings plus those with characters only in Common).
Take the union of the resolved script sets of all strings remaining in Q.
As usual, this algorithm is intended only as a definition;
implementations should use an optimized routine that produces the same
result.
----
I'm not sure I understand the algorithm they're proposing. I think this
shouldn't be suspicious? But I may be wrong:
(textsec-domain-suspicious-p "Сгсе.рф")
=> nil
But this should be, but isn't currently:
(textsec-domain-suspicious-p "Сгсе.ru")
=> nil
Now,
(textsec-ascii-confusable-p "Сгсе.ru")
=> t
and
(textsec-ascii-confusable-p "Сгсе.рф")
=> nil
Is that what they mean here? I'm finding the logic overly clear here.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
This bug report was last modified 3 years and 124 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.