GNU bug report logs -
#51733
27.1; Detect impossible email addresses better
Previous Next
Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Date: Wed, 10 Nov 2021 00:29:01 UTC
Severity: wishlist
Found in version 27.1
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: 51733 <at> debbugs.gnu.org, jidanni <at> jidanni.org
> Date: Wed, 19 Jan 2022 16:45:29 +0100
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > OK, but why do you think "Сгсе.ru" is confusable? The SLD part is
> > entirely made of single-script characters, and UTS#39 explicitly
> > allows that:
> >
> > [...] it can be perfectly legitimate to have scripts in a SLD
> > (second level domain) not be the same as scripts in a TLD (top-level
> > domain), such as:
> >
> > Cyrillic labels in a domain name with a TLD of .ru or .рф
> >
> > That's your case, isn't it?
>
> Yes, indeed. But:
>
> ---
> For some applications, it is useful to determine if a given input string has any whole-script confusable. For example, the identifier "ѕсоре" using Cyrillic characters would pass the single-script test described in Section 5.2, Restriction-Level Detection, even though it is likely to be a spoof attempt.
> ---
>
> So "Сгсе.ru" is suspicious in most contexts.
Right, but the functions we had back then didn't yet support that
part.
> > Regardless of what they are saying, I don't think the above is
> > suitable for production. I think it should be enough to see whether
> > there could be confusion with the corresponding ASCII characters from
> > confusables.txt.
>
> Yes, so that's what I've done now, but... I'd feel slightly better if I
> knew what they were actually getting at. I think they're saying that if
> "foo" is confusable with anything in any other scripts, then it's
> suspicious?
Yes, that's what they meant.
> But that sounds unworkeable. For instance, "circle.ru" is
> confusable with "СігсӀе.ru", and perhaps it's suspicious to a Russian,
> but I don't see how to make a workable function from that.
They've left that to the implementation...
Anyway, I think confusable to ASCII is good enough for Emacs for now.
> So perhaps what I've implemented now is sufficient for domains.
I think it is, yes. It definitely covers a very large chunk of the
problem.
This bug report was last modified 3 years and 124 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.