GNU bug report logs - #51733
27.1; Detect impossible email addresses better

Previous Next

Packages: gnus, emacs;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Wed, 10 Nov 2021 00:29:01 UTC

Severity: wishlist

Found in version 27.1

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51733 <at> debbugs.gnu.org, jidanni <at> jidanni.org
Subject: bug#51733: 27.1; Detect impossible email addresses better
Date: Wed, 19 Jan 2022 14:55:35 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> I think we should first determine what kinds of applications may need
> this, and take it from there.  The initial number of "confusability
> with" classes can be very small, and we can add more as we discover
> interesting use cases.  The full number is pretty much infinite, I
> think, but I'm not sure Emacs needs  to support all of them OOTB.  We
> could support some of the popular ones, and provide infrastructure for
> developing more.

Yes.

I was thinking about this bit, which isn't implemented yet (although the
utility functions for it basically are).

----
The process of determining suspect usage of whole-script confusables is more complicated than simply looking at the scripts of the labels in a domain name. For example, it can be perfectly legitimate to have scripts in a SLD (second level domain) not be the same as scripts in a TLD (top-level domain), such as:

    Cyrillic labels in a domain name with a TLD of .ru or .рф
    Chinese labels in a domain name with a TLD of .com.au or .com
    Cyrillic labels that aren’t confusable with Latin with a TLD of .com.au or .com

The following high-level algorithm can be used to determine all scripts that contain a whole-script confusable with a string X:

    Consider Q, the set of all strings confusable with X.
    Remove all strings from Q whose resolved script set is ∅ or ALL (that is, keep only single-script strings plus those with characters only in Common).
    Take the union of the resolved script sets of all strings remaining in Q.

As usual, this algorithm is intended only as a definition;
implementations should use an optimized routine that produces the same
result.
----

I'm not sure I understand the algorithm they're proposing.  I think this
shouldn't be suspicious?  But I may be wrong:

(textsec-domain-suspicious-p "Сгсе.рф")
=> nil

But this should be, but isn't currently:

(textsec-domain-suspicious-p "Сгсе.ru")
=> nil

Now, 

(textsec-ascii-confusable-p "Сгсе.ru")
=> t

and

(textsec-ascii-confusable-p "Сгсе.рф")
=> nil

Is that what they mean here?  I'm finding the logic overly clear here.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




This bug report was last modified 3 years and 124 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.