GNU bug report logs - #58472
[PATCH] Make `message-unique-id' less prone to collisions

Package: emacs;

Reported by: Stefan Kangas <stefankangas <at> gmail.com>

Date: Wed, 12 Oct 2022 16:09:01 UTC

Severity: wishlist

Tags: patch

Done: Stefan Kangas <stefankangas <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 58472 in the body.
You can then email your comments to 58472 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to eggert <at> cs.ucla.edu, bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Wed, 12 Oct 2022 16:09:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Kangas <stefankangas <at> gmail.com>:
New bug report received and forwarded. Copy sent to eggert <at> cs.ucla.edu, bug-gnu-emacs <at> gnu.org. (Wed, 12 Oct 2022 16:09:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Make `message-unique-id' less prone to collisions
Date: Wed, 12 Oct 2022 09:07:53 -0700

[Message part 1 (text/plain, inline)]

This is a proposal to make `message-unique-id' less prone to collisions.

For the first 1-2 characters, it uses the return value `user-uid', which
on most single-user systems is a fixed value (for example, this gives us
"87" on almost any single-user Debian machine).[1]  It's also
unnecessarily leaky of potentially privacy sensitive information.

The next 8 characters are the current time, with some gymnastics to
emulate support for a fractional part to seconds.  This seems
unnecessary now that, AFAIU, `time-convert' can do that for us portably
(please correct me if I'm wrong, Paul).

I suggest that we instead base the left-hand side of the Message-ID on:

  1. (time-convert nil (expt 10 9))
  2. 2^N bits of pseudo random data (e.g. N=32)

We can then ignore the effective user id, while significantly decreasing
the risk of a Message-ID collision.[2]

Currently, we get values like:

    (message-unique-id)
    => "87o7uhi3at.fsf"            ; length 10

With the attached patch, we have instead:

    (message-unique-id)
    => "cnk29wgg1a4nvrpqcy.fsf"   ; length ~22

Note also that `message-number-base36' uses a Caesar cipher for some
reason:

    (message-number-base36 5 -1)
    => "u"    ; expect "5"
    (message-number-base36 (expt 36 3) -1)
    => "yzzz" ; expect "1000"

The patch fixes this also.

I don't know if this change should be in NEWS or not.

Footnotes:
[1]  Just for fun, you can search for 87 on
     https://en.wikipedia.org/wiki/Message-ID

[2]  See also: https://www.jwz.org/doc/mid.html

[0001-Make-message-unique-id-less-prone-to-collisions.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Wed, 12 Oct 2022 18:09:02 GMT) Full text and rfc822 format available.

Message #8 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Wed, 12 Oct 2022 11:08:12 -0700

On 2022-10-12 09:07, Stefan Kangas wrote:

> This seems
> unnecessary now that, AFAIU, `time-convert' can do that for us portably
> (please correct me if I'm wrong, Paul).
> 
> I suggest that we instead base the left-hand side of the Message-ID on:
> 
>    1. (time-convert nil (expt 10 9))
>    2. 2^N bits of pseudo random data (e.g. N=32)

(time-convert nil t) would be a bit more efficient.

If the goal is to avoid collisions, though, wouldn't it be better to 
avoid the time entirely? time is not very random in the high order bits.

Also, the comment about ".fsf" and other newsreaders is wrong. If we're 
generating IDs at random it doesn't matter whether we append ".fsf" as 
the ".fsf" itself could be randomly generated by another newsreader. The 
".fsf" is just a comment or advertisement or debugging aid or what have 
you. And if we're changing the algorithm perhaps we should change ".fsf" 
to something else.

Something like this, perhaps, where you can choose LEN as you like:

  (concat
     (let ((len 18))
       ;; Pass LEN, not -1, to message-number-base36 so that it never
       ;; returns "" which would make the message-ID nonconforming.
       (message-number-base36 (random (expt 36 len)) len))
     ;; Append ".gnu" to advertise that we're GNU Emacs.
     ".gnu")

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 02:47:01 GMT) Full text and rfc822 format available.

Message #11 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Wed, 12 Oct 2022 19:46:14 -0700

[Message part 1 (text/plain, inline)]

Paul Eggert <eggert <at> cs.ucla.edu> writes:

> If the goal is to avoid collisions, though, wouldn't it be better to
> avoid the time entirely? time is not very random in the high order bits.

The main goal is to avoid collisions, but using the time also gives an
idea of when the message was sent, which is kind of nice.  Time also
guarantees a somewhat unique value even if the user has happened to set
the random seed.

If we limit the length of the time string to 12 characters, and the
total length to 25 characters (including the ".gnu" part), we still have
a guaranteed 9 characters of random data, or 46 bits of entropy.

    ⌊log₂ 36⁹⌋ = 46

If we want to guarantee 62 bits of entropy, we could do a total length
of 28 characters instead, but that might be overkill.

> Also, the comment about ".fsf" and other newsreaders is wrong [...]
> And if we're changing the algorithm perhaps we should change ".fsf" to
> something else.

Changing it to ".gnu" makes sense to me.

> Something like this, perhaps, where you can choose LEN as you like:
>
>    (concat
>       (let ((len 18))
>         ;; Pass LEN, not -1, to message-number-base36 so that it never
>         ;; returns "" which would make the message-ID nonconforming.
>         (message-number-base36 (random (expt 36 len)) len))
>       ;; Append ".gnu" to advertise that we're GNU Emacs.
>       ".gnu")

I think making sure it is always the same length is a good idea.

How about the attached?

[0001-Make-message-unique-id-less-prone-to-collisions.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 04:54:01 GMT) Full text and rfc822 format available.

Message #14 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Wed, 12 Oct 2022 21:53:44 -0700

Stefan Kangas <stefankangas <at> gmail.com> writes:

> Paul Eggert <eggert <at> cs.ucla.edu> writes:
>
>> If the goal is to avoid collisions, though, wouldn't it be better to
>> avoid the time entirely? time is not very random in the high order bits.
>
> The main goal is to avoid collisions, but using the time also gives an
> idea of when the message was sent, which is kind of nice.  Time also
> guarantees a somewhat unique value even if the user has happened to set
> the random seed.

Most email I get today to use a UUID or UUID-like Message-ID, like this:

Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69 <at> dfw1s10mta1086.xt.local>
Message-ID: <1815053947.8446619.1665544925708 <at> lor1-app45123.prod.linkedin.com>
Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000 <at> email.amazonses.com>
Message-ID: <CABqZ1wa8MxrieVKZ11adZUV2qB_CnpMJoFEn-U3d5CQ7z7smWw <at> mail.gmail.com>


> If we limit the length of the time string to 12 characters, and the
> total length to 25 characters (including the ".gnu" part), we still have
> a guaranteed 9 characters of random data, or 46 bits of entropy.
>
>     ⌊log₂ 36⁹⌋ = 46
>
> If we want to guarantee 62 bits of entropy, we could do a total length
> of 28 characters instead, but that might be overkill.

I suspect that most mailers use more randomness than that.

Some of the SHA hash algorithms are in the public domain.  Could they be
added to Emacs and used for UUID generation here?


-- 
matt (sent from an Emacs running the feature/noverlay branch)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 11:47:01 GMT) Full text and rfc822 format available.

Message #17 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 13 Oct 2022 13:45:53 +0200

Stefan Kangas <stefankangas <at> gmail.com> writes:

> For the first 1-2 characters, it uses the return value `user-uid', which
> on most single-user systems is a fixed value (for example, this gives us
> "87" on almost any single-user Debian machine).[1]  It's also
> unnecessarily leaky of potentially privacy sensitive information.

Well, since it is "87" on most machines, it's not very leaky.  😀

It's documented to be this way, though?  That is, we've got tricks like
being able to score on References based on your Message-ID to score up
threads you've been a part of, etc.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 12:11:01 GMT) Full text and rfc822 format available.

Message #20 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 58472 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 13 Oct 2022 12:10:19 +0000

Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> It's documented to be this way, though?

Hmm, right (from `(message) News Headers'):

    ‘Message-ID’
         This required header will be generated by Message.  A unique ID
         will be created based on the date, time, user name (for the local
         part) and the domain part.

Actually, the user name is currently only used on MS-DOS, AFAICT.
So I guess the documentation is already wrong?

> Well, since it is "87" on most machines, it's not very leaky.  😀

Until your name is "88", of course.  ;-(

> That is, we've got tricks like being able to score on References based
> on your Message-ID to score up threads you've been a part of, etc.

It seems flaky to use it for such scoring purposes though, as almost
everyone is named "87"...  Do we really have code that does that?

Should we care?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 12:11:01 GMT) Full text and rfc822 format available.

Message #23 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Matt Armstrong <matt <at> rfc20.org>
Cc: 58472 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 13 Oct 2022 12:10:39 +0000

Matt Armstrong <matt <at> rfc20.org> writes:

> Most email I get today to use a UUID or UUID-like Message-ID, like this:
>
> Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69 <at> dfw1s10mta1086.xt.local>
> Message-ID: <1815053947.8446619.1665544925708 <at> lor1-app45123.prod.linkedin.com>
> Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000 <at> email.amazonses.com>
> Message-ID: <CABqZ1wa8MxrieVKZ11adZUV2qB_CnpMJoFEn-U3d5CQ7z7smWw <at> mail.gmail.com>

Those are 30-51 characters in length.  I also note that Gmail uses both
lower case and upper case characters.

>> If we limit the length of the time string to 12 characters, and the
>> total length to 25 characters (including the ".gnu" part), we still have
>> a guaranteed 9 characters of random data, or 46 bits of entropy.
>
> I suspect that most mailers use more randomness than that.

So I guess we might as well bump this up to 30 characters in total,
which gives us 72 bits.  The Message-IDs would look like:

    cnkrs75yamag1k7x8rnt3y50za.gnu <at> stefankangas.se
    cnkrifkirauwuwfkzs3rcit8cq.gnu <at> stefankangas.se

We could go longer, but it's also nice to have something which is not an
absolute abomination to look at.

If we add in upper case characters too, we can encode the time with one
less character.  So we end up with 89 bits of randomness and this:

    1Z2KnqE1t2bSgUWkcu53M34Y4y.gnu <at> stefankangas.se
    1Z2KbUgleGoe0WRJ3jbiM0mE7W.gnu <at> stefankangas.se

If we don't want to always start the Message-ID with the same characters
(which makes them more distinct, at a glance), we could just reverse the
time string:

    QlRXPpmK2Z1kUklxIpMNZpChOu.gnu <at> stefankangas.se
    Z59YikmK2Z1FSmYj172SAdPpuX.gnu <at> stefankangas.se

> Some of the SHA hash algorithms are in the public domain.  Could they be
> added to Emacs and used for UUID generation here?

We have `secure-hash'.  Is that what you mean?  Or do you mean to use a
proper RFC 4122 UUID?

All we need is for the Message-ID to be unique though, so some ad hoc
solution is probably fine.

Severity set to 'wishlist' from 'normal' Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 13 Oct 2022 13:49:06 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 16:22:02 GMT) Full text and rfc822 format available.

Message #28 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 13 Oct 2022 09:21:05 -0700

On 2022-10-12 19:46, Stefan Kangas wrote:

> The main goal is to avoid collisions, but using the time also gives an
> idea of when the message was sent, which is kind of nice.

That info is in the Date: line, along with zillions of other Received: 
lines. There should be no need to repeat it in the Message-ID line.

> Time also
> guarantees a somewhat unique value even if the user has happened to set
> the random seed.

If that's a concern, we should be using more-random data, e.g., with

   (base64-encode-string
    (secure-hash 'md5 'iv-auto 128 nil t))

if we want 128 bits of randomness (this yields a string like 
"B8a3usyu5QSE/rTLu0nIHg==").

As an aside, it's weird that there's no easy way to ask Emacs for an 
N-bit random integer, where the randomness is taken from system entropy. 
Shouldn't we extend Emacs to support that? E.g., (make-string 128 
'iv-auto) could give you an N-byte entropy-derived random string, or 
(random -N) could give you an N-bit entropy-derived random nonnegative 
integer, or something like that. Then we could write something like this:

  (base64-encode-string (make-string 16 'iv-auto))

to get a Message-ID component with 16 bytes (128 bits) of entropy.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 16:36:02 GMT) Full text and rfc822 format available.

Message #31 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 13 Oct 2022 09:35:06 -0700

Stefan Kangas <stefankangas <at> gmail.com> writes:

> Matt Armstrong <matt <at> rfc20.org> writes:
>
>> Most email I get today to use a UUID or UUID-like Message-ID, like this:
>>
>> Message-ID: <736d10a6-001f-4a29-a1d4-554f58733b69 <at> dfw1s10mta1086.xt.local>
>> Message-ID: <1815053947.8446619.1665544925708 <at> lor1-app45123.prod.linkedin.com>
>> Message-ID: <01000183b9eaa6f8-411d1f4c-b573-472d-b45f-47b0c4eb6ace-000000 <at> email.amazonses.com>
>> Message-ID: <CABqZ1wa8MxrieVKZ11adZUV2qB_CnpMJoFEn-U3d5CQ7z7smWw <at> mail.gmail.com>
>
> Those are 30-51 characters in length.  I also note that Gmail uses both
> lower case and upper case characters.

Yes, the Google example is probably a crypto secure hash base64url
encoded.

>>> If we limit the length of the time string to 12 characters, and the
>>> total length to 25 characters (including the ".gnu" part), we still have
>>> a guaranteed 9 characters of random data, or 46 bits of entropy.
>>
>> I suspect that most mailers use more randomness than that.
>
> So I guess we might as well bump this up to 30 characters in total,
> which gives us 72 bits.  The Message-IDs would look like:
>
>     cnkrs75yamag1k7x8rnt3y50za.gnu <at> stefankangas.se
>     cnkrifkirauwuwfkzs3rcit8cq.gnu <at> stefankangas.se

You said you like the timestamp prefix.  How many times would that
actually be useful, and for what?

I am not seeing the utility myself, but I might be missing something.

> We could go longer, but it's also nice to have something which is not an
> absolute abomination to look at.
>
> If we add in upper case characters too, we can encode the time with one
> less character.  So we end up with 89 bits of randomness and this:
>
>     1Z2KnqE1t2bSgUWkcu53M34Y4y.gnu <at> stefankangas.se
>     1Z2KbUgleGoe0WRJ3jbiM0mE7W.gnu <at> stefankangas.se
>
> If we don't want to always start the Message-ID with the same characters
> (which makes them more distinct, at a glance), we could just reverse the
> time string:
>
>     QlRXPpmK2Z1kUklxIpMNZpChOu.gnu <at> stefankangas.se
>     Z59YikmK2Z1FSmYj172SAdPpuX.gnu <at> stefankangas.se

Maybe use (base64url-encode-string str t) is an option?

>> Some of the SHA hash algorithms are in the public domain.  Could they be
>> added to Emacs and used for UUID generation here?
>
> We have `secure-hash'.  Is that what you mean?  Or do you mean to use a
> proper RFC 4122 UUID?
>
> All we need is for the Message-ID to be unique though, so some ad hoc
> solution is probably fine.

I am no security expert, so I'll end with this and then be quiet on this
issue.  :-)

I agree that globally unique (with very high probability) is pretty easy
to design in ad hoc ways within a private namespace.  But Message-ID is
not that.  The Message-ID space consists of all Message-ID generated by
all programs past, present and future.  An impossible goal, given that
the RFC basically says "go make something up, have fun with that", so it
is a free for all.

Given that, timestamp + short-rng + ".gnu" doesn't feel good enough to
me, but I won't lose sleep if you use that solution.

In these situations I usually feel better going with what seems like an
existing practice: encode a crypto hash sourced from (possibly security
sensitive) entropy available on the local machine.

Inspired by https://nullprogram.com/blog/2010/05/11/ and the code linked
there (https://nullprogram.com/download/uuid.el) I came up with this:

;; Written while discussing Emacs bug#58472
(defun my-message-unique-id()
  (let* ((object (format "%S|%s|%s|%s"
                         (time-convert nil t)
                         (random (expt 2 512))
                         (emacs-uptime)
                         (emacs-pid)))
         (hash (secure-hash 'sha512 object nil nil t))
         (encoded-hash (base64url-encode-string hash t))
         (id (concat encoded-hash ".gnu")))
    ;; Replace leading and trailing non-alnum with ?x.  This is not
    ;; necessary to create strictly conforming Message-Id, but it makes
    ;; them less confusing in practice.
    (dolist (index `(0 ,(1- (length id))))
      (let ((char (aref id index)))
        (when (or (eq char ?-)
                  (eq char ?_))
          (aset id index ?x))))
    id))

This gives IDs like:

   lzD9K4s2GkiGmZhmP46M4ezv37sGEQO2tU2GX2mx-rs.gnu <at> stefankangas.se
   wEUY87gazgDLnsARlJZ5MEY4pgFZmDhxy9IZhbDLpfA.gnu <at> stefankangas.se
   VbUzrNkjJiCnZYiDzMdcqz2BqibzismlaR8ZkcU6O7E.gnu <at> stefankangas.se
   XX8f7gOrT1Iv-NLdZBjHgz8s4u4EF2uE7o-UttFgF0U.gnu <at> stefankangas.se
   tp9f3Up7ilsrH8XqtmQT4_evPkOO-EteN127d9bt85s.gnu <at> stefankangas.se

One advantage of this approach is that it is possible to add more
sources of entropy, or grow the hash function, without necessarily
requiring a "format" version change, because we're relying on the hash
for uniqueness in a sort of brute force way, not a schema and a hope
that only Emacs uses it.

One question: I'm not sure that (random (exp 2 512)) actually gives more
entropy than a simple call to (random).  It depends on Emacs' RNG.  Last
I checked there was code still calling something like rand(), which
generally not intended for this purpose.

P.S. if you go with your original approach, I think you want (expt 2 X)
instead of (exp X).

-- 
matt (sent from an Emacs running the feature/noverlay branch)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 16:39:03 GMT) Full text and rfc822 format available.

Message #34 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Matt Armstrong <matt <at> rfc20.org>, Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 13 Oct 2022 09:38:34 -0700

On 2022-10-13 09:35, Matt Armstrong wrote:
> One question: I'm not sure that (random (exp 2 512)) actually gives more
> entropy than a simple call to (random).

It does not. (Please see my previous email for how to get more entropy 
out of Emacs; it's kinda ugly now.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Thu, 13 Oct 2022 19:17:02 GMT) Full text and rfc822 format available.

Message #37 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 13 Oct 2022 21:15:59 +0200

Stefan Kangas <stefankangas <at> gmail.com> writes:

> Hmm, right (from `(message) News Headers'):
> 
>     ‘Message-ID’
>          This required header will be generated by Message.  A unique ID
>          will be created based on the date, time, user name (for the local
>          part) and the domain part.
> 
> Actually, the user name is currently only used on MS-DOS, AFAICT.
> So I guess the documentation is already wrong?

Is says "based on", not that the user name is actually in the string.
(And it uses the uid instead, but same same.)

>> That is, we've got tricks like being able to score on References based
>> on your Message-ID to score up threads you've been a part of, etc.
>
> It seems flaky to use it for such scoring purposes though, as almost
> everyone is named "87"...  Do we really have code that does that?
>
> Should we care?

I thought I remembered it being a part of the scoring section of the
manual, so either it's been removed or I misremember.

But that's the reason the algo is the way it is now -- it allows you do
do stuff based on <ID.*@big-domain.com> in References/In-reply-to etc.
If there's only one UID per domain (which is the case these days),
that's less of a thing, of course.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Fri, 14 Oct 2022 09:23:02 GMT) Full text and rfc822 format available.

Message #40 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Matt Armstrong <matt <at> rfc20.org>
Cc: 58472 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Fri, 14 Oct 2022 11:22:08 +0200

Matt Armstrong <matt <at> rfc20.org> writes:

> Inspired by https://nullprogram.com/blog/2010/05/11/ and the code linked
> there (https://nullprogram.com/download/uuid.el) I came up with this:

That's pretty neat!

> One question: I'm not sure that (random (exp 2 512)) actually gives more
> entropy than a simple call to (random).  It depends on Emacs' RNG.  Last
> I checked there was code still calling something like rand(), which
> generally not intended for this purpose.

Ouch, indeed.  I think we should use the stuff Paul pointed to, which
AFAICT just punts to Gnulib.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Fri, 14 Oct 2022 09:23:02 GMT) Full text and rfc822 format available.

Message #43 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58472 <at> debbugs.gnu.org, Matt Armstrong <matt <at> rfc20.org>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Fri, 14 Oct 2022 11:22:15 +0200

Paul Eggert <eggert <at> cs.ucla.edu> writes:

> That info is in the Date: line, along with zillions of other Received:
> lines. There should be no need to repeat it in the Message-ID line.

OK, let's use only random data.

> If that's a concern, we should be using more-random data, e.g., with
>
>     (base64-encode-string
>      (secure-hash 'md5 'iv-auto 128 nil t))
>
> if we want 128 bits of randomness (this yields a string like
> "B8a3usyu5QSE/rTLu0nIHg==").

Sounds good to me, but:

The only reference I find to `iv-auto' is in (info "(elisp) Format of
GnuTLS Cryptography Inputs").  The `secure-hash' OBJECT parameter is
documented like this:

    The argument OBJECT should be a buffer or a string.

Is the documentation incomplete?

> As an aside, it's weird that there's no easy way to ask Emacs for an
> N-bit random integer, where the randomness is taken from system entropy.
> Shouldn't we extend Emacs to support that? E.g., (make-string 128
> 'iv-auto) could give you an N-byte entropy-derived random string, or
> (random -N) could give you an N-bit entropy-derived random nonnegative
> integer, or something like that. Then we could write something like this:
>
>    (base64-encode-string (make-string 16 'iv-auto))
>
> to get a Message-ID component with 16 bytes (128 bits) of entropy.

Yes, we should find a better interface here.

We also have `cl-random', which I think support float values.  Perhaps
`random' should do the same?  (See also Bug#9103.)

Oh, and whatever we do, we should probably also document it in `(elisp)
Random Numbers'.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Sun, 16 Oct 2022 07:34:01 GMT) Full text and rfc822 format available.

Message #46 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58472 <at> debbugs.gnu.org, Matt Armstrong <matt <at> rfc20.org>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Sun, 16 Oct 2022 00:32:58 -0700

Stefan Kangas <stefankangas <at> gmail.com> writes:

>> If that's a concern, we should be using more-random data, e.g., with
>>
>>     (base64-encode-string
>>      (secure-hash 'md5 'iv-auto 128 nil t))
>>
>> if we want 128 bits of randomness (this yields a string like
>> "B8a3usyu5QSE/rTLu0nIHg==").
>
> Sounds good to me, but:

Looking at the implementation of `secure-hash', doesn't this run the
random number through the MD5 function?  I guess that will reduce
entropy, as AFAIU hash functions like MD5 are not really bijective
functions f: ℕ → ℕ.  (In other words, it does not give you a permutation
of the set of all natural numbers.)

I don't think it matters for generating a Message-ID, but I thought it
was worth pointing out.  So if I'm right, you might not want to use this
particular method for generating cryptographic keys, if anyone happens
to be doing stuff like that in ELisp.

In any case, I do think we should add an easy way of directly accessing
the getrandom(2) syscall [or equivalent].

>> As an aside, it's weird that there's no easy way to ask Emacs for an
>> N-bit random integer, where the randomness is taken from system entropy.
>> Shouldn't we extend Emacs to support that? E.g., (make-string 128
>> 'iv-auto) could give you an N-byte entropy-derived random string, or
>> (random -N) could give you an N-bit entropy-derived random nonnegative
>> integer, or something like that. Then we could write something like this:
>>
>>    (base64-encode-string (make-string 16 'iv-auto))
>>
>> to get a Message-ID component with 16 bytes (128 bits) of entropy.
>
> Yes, we should find a better interface here.

How about `secure-random' (named in analogy with `secure-hash')?

    (secure-random 8)
    => <8 byte number>

This is the same interface as the Linux getrandom(2) system call, and
the portable function in Gnulib, so it's easy enough to implement.

BTW, we currently call getrandom with flags 0 in
extract_data_from_object.  Should we consider using GRND_RANDOM?  The
Linux kernel uses the same code path for /dev/random and /dev/urandom
these days, IIUC, so maybe it doesn't matter.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Sun, 16 Oct 2022 15:21:02 GMT) Full text and rfc822 format available.

Message #49 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Sun, 16 Oct 2022 08:19:57 -0700

Paul Eggert <eggert <at> cs.ucla.edu> writes:

> On 2022-10-12 19:46, Stefan Kangas wrote:
>
>> The main goal is to avoid collisions, but using the time also gives an
>> idea of when the message was sent, which is kind of nice.
>
> That info is in the Date: line, along with zillions of other Received: 
> lines. There should be no need to repeat it in the Message-ID line.
>
>> Time also
>> guarantees a somewhat unique value even if the user has happened to set
>> the random seed.
>
> If that's a concern, we should be using more-random data, e.g., with
>
>     (base64-encode-string
>      (secure-hash 'md5 'iv-auto 128 nil t))
>
> if we want 128 bits of randomness (this yields a string like 
> "B8a3usyu5QSE/rTLu0nIHg==").

Small suggestion: `base64-url-encode-string` avoids any trailing `===`
and uses slightly preferable non-alnum chars (I think).

But both can generate Message-Id with "command line switch" chars:
either ?- or ?/.  Since some tools expect users to work directly with
Message-ID at times (https://notmuchmail.org/) it might be nice to avoid
leading non-alnum chars, and possibly avoid '-' to avoid any confusion
with a UUID (in the sense of the schema defined by the RFC standard).

Maybe a base 62 encoder could be written just for this, as Emacs'
version of this doesn't need to be fast.  Can a string be turned into a
non-negative bignum integer in (simple) elisp?

-- 
matt (sent from an Emacs running the feature/noverlay branch)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Sun, 16 Oct 2022 16:50:02 GMT) Full text and rfc822 format available.

Message #52 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Matt Armstrong <matt <at> rfc20.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Sun, 16 Oct 2022 16:49:45 +0000

Matt Armstrong <matt <at> rfc20.org> writes:

> Since some tools expect users to work directly with Message-ID at
> times (https://notmuchmail.org/) it might be nice to avoid leading
> non-alnum chars,

I agree that it'd be nice to just use alpha-numeric characters.

> Maybe a base 62 encoder could be written just for this, as Emacs'
> version of this doesn't need to be fast.

A base62 encoder is just `message-number-base36' with the A-Z range
added.  I think I included that in a previous patch.

> Can a string be turned into a non-negative bignum integer in
> (simple) elisp?

Does this look reasonable?

    (seq-reduce (lambda (a i) (+ (ash a 8) i))
                (secure-hash 'md5 'iv-auto 128 nil t) 0)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Sun, 16 Oct 2022 17:06:02 GMT) Full text and rfc822 format available.

Message #55 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58472 <at> debbugs.gnu.org, Matt Armstrong <matt <at> rfc20.org>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Sun, 16 Oct 2022 17:05:36 +0000

Stefan Kangas <stefankangas <at> gmail.com> writes:

> BTW, we currently call getrandom with flags 0 in
> extract_data_from_object.  Should we consider using GRND_RANDOM?

linux.git/include/uapi/linux/random.h says that:

/*
 * Flags for getrandom(2)
 *
 * GRND_NONBLOCK	Don't block and return EAGAIN instead
 * GRND_RANDOM		No effect
 * GRND_INSECURE	Return non-cryptographic random bytes
 */

But the story is probably not the same on other systems, according to
the glibc documentation, so we should probably not change this.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 06:18:02 GMT) Full text and rfc822 format available.

Message #58 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org>
To: Stefan Kangas <stefankangas <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Sun, 16 Oct 2022 23:17:28 -0700

Stefan Kangas <stefankangas <at> gmail.com> writes:

> Matt Armstrong <matt <at> rfc20.org> writes:
>
>> Can a string be turned into a non-negative bignum integer in
>> (simple) elisp?
>
> Does this look reasonable?
>
>     (seq-reduce (lambda (a i) (+ (ash a 8) i))
>                 (secure-hash 'md5 'iv-auto 128 nil t) 0)

Yep, not too bad, thanks.

-- 
matt (sent from an Emacs running the feature/noverlay branch)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 07:32:01 GMT) Full text and rfc822 format available.

Message #61 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Kangas <stefankangas <at> gmail.com>, Matt Armstrong <matt <at> rfc20.org>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 00:30:49 -0700

[Message part 1 (text/plain, inline)]

I've been looking into this and have several patches along these lines. 
None of them address message-unique-id directly yet (I plan to tackle 
this soon) but they do address the general problem area. The basic idea 
is to use a new make-nonce primitive.

I thought I'd email the patches now to see what others think about the 
direction they're headed.

[0001-No-need-to-call-w32_init_random.patch (text/x-patch, attachment)]

[0002-Simplify-feedmail-Message-ID-generation.patch (text/x-patch, attachment)]

[0003-Simplify-calc-comb-by-using-random.patch (text/x-patch, attachment)]

[0004-Simplify-calc-prog-by-using-random.patch (text/x-patch, attachment)]

[0005-doc-lispref-numbers.texi-Improve-random-doc.patch (text/x-patch, attachment)]

[0006-New-function-make-nonce.patch (text/x-patch, attachment)]

[0007-Simplify-auth-source-obfuscate-via-make-nonce.patch (text/x-patch, attachment)]

[0008-Simplify-ntml-by-using-make-nonce.patch (text/x-patch, attachment)]

[0009-Improve-randomness-of-server.el.patch (text/x-patch, attachment)]

[0010-Improve-randomness-of-message-canlock-generate.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 08:15:01 GMT) Full text and rfc822 format available.

Message #64 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Matt Armstrong <matt <at> rfc20.org>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 08:14:03 +0000

[Message part 1 (text/plain, inline)]

Paul Eggert <eggert <at> cs.ucla.edu> writes:

> I've been looking into this and have several patches along these lines.
> None of them address message-unique-id directly yet (I plan to tackle
> this soon) but they do address the general problem area. The basic idea
> is to use a new make-nonce primitive.

Thanks!  I have read your patchset, which looks good to me.

I have also attached my latest patch for `message-unique-id', but I'm
not married to it if you have something better in mind.  It could easily
be updated to use `make-nonce' though.

>  (defun math-init-random-base ()
[...snip...]
> +  (declare (obsolete nil "29.1")))

This is a nit, but perhaps this could be simplified to:

    (declare-obsolete-function-alias 'math-init-random-base
       #'ignore "29.1)

> diff --git a/src/sysdep.c b/src/sysdep.c
> index 4786c8fa4f..5117460fc0 100644
> --- a/src/sysdep.c
> +++ b/src/sysdep.c
> @@ -2159,6 +2159,22 @@ seed_random (void *seed, ptrdiff_t seed_size)
>    set_random_seed (arg);
>  }
>
> +/* Set BUF, of size BUFSIZE, to random data derived from system entropy.  */
> +
> +void
> +get_entropy (void *buf, ptrdiff_t bufsize)
> +{
> +  char *p = buf, *lim = p + bufsize;
> +  while (p < lim)
> +    {
> +      ssize_t gotten = getrandom (p, lim - p, 0);
> +      if (0 <= gotten)
> +	p += gotten;
> +      else if (errno != EINTR)
> +	report_file_error ("Getting random data", Qnil);
> +    }
> +}

If we claim that the random data is suitable for cryptographic purposes,
should we be using the GRND_RANDOM flag here?

On Linux, flags 0 and GRND_RANDOM are equivalent (I read the most recent
kernel code to verify this).  But I have no idea about other platforms.

[0001-Make-message-unique-id-less-prone-to-collisions.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 08:18:02 GMT) Full text and rfc822 format available.

Message #67 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: matt <at> rfc20.org, stefankangas <at> gmail.com, 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 11:16:53 +0300

> Cc: 58472 <at> debbugs.gnu.org
> Date: Mon, 17 Oct 2022 00:30:49 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> I've been looking into this and have several patches along these lines. 
> None of them address message-unique-id directly yet (I plan to tackle 
> this soon) but they do address the general problem area. The basic idea 
> is to use a new make-nonce primitive.
> 
> I thought I'd email the patches now to see what others think about the 
> direction they're headed.

IMNSHO, this thread has long passed the point of being reasonable.
There's nothing particularly wrong with the current message-id we use,
and as mentioned several times, its exact form and contents are not
very important anyway.

So I'm objected to any of these wide-sweeping changes for a reason
that is so minor it IMO shouldn't have been brought up in the first
place.  I regret I didn't stop this discussion back then, because it
has now snowballed into a monster.  But better late than never.

> --- a/src/sysdep.c
> +++ b/src/sysdep.c
> @@ -2163,17 +2163,11 @@ seed_random (void *seed, ptrdiff_t seed_size)
>  init_random (void)
>  {
>    random_seed v;
> -  bool success = false;
>  
>    /* First, try seeding the PRNG from the operating system's entropy
>       source.  This approach is both fast and secure.  */
> -#ifdef WINDOWSNT
> -  /* FIXME: Perhaps getrandom can be used here too?  */
> -  success = w32_init_random (&v, sizeof v) == 0;
> -#else
>    verify (sizeof v <= 256);
> -  success = getrandom (&v, sizeof v, 0) == sizeof v;
> -#endif
> +  bool success = getrandom (&v, sizeof v, 0) == sizeof v;
>  
>    /* If that didn't work, just use the current time value and PID.
>       It's at least better than XKCD 221.  */

Please never replace w32-specific code with Gnulib without auditing.
Gnulib doesn't support old versions of Windows which we still do, and
so its replacement break Emacs on those old platforms.

> * lisp/calc/calc-comb.el (math-random-table, math-last-RandSeed)
> (math-random-ptr1, math-random-ptr2, math-random-shift)
> (var-RandSeed, math-random-cache, math-init-random-base)
> (math-random-base, math-random-last)
> (math-random-three-digit-number):
> Now obsolete, as we can assume that ‘random’ is good enough.
> (math-random-digits): Simplify by using ‘random’.

Why do we need to touch Calc, for crying out loud?!

> From 7113ce5ab4a265db7f2870c6614da88d09407604 Mon Sep 17 00:00:00 2001
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sun, 16 Oct 2022 16:33:05 -0700
> Subject: [PATCH 06/10] New function make-nonce
> 
> * src/alloc.c (clear_nonce, Fmake_nonce): New functions.
> * src/fns.c: Do not include <sys/random.h>.
> (extract_data_from_object): Simplify by calling get_entropy.
> * src/sysdep.c (get_entropy): New function, taken from
> the old extract_data_from_object.

I don't want this new function in Emacs, with all the code churn and
other strings with which it comes attached.  Please leave our random
functions alone, they do their job just fine!

Bottom line: please don't install any of this, certainly not so close
to cutting a release branch, and hopefully not ever.  There were much
easier and smaller changes proposed for message-id; let's use one of
those, or even leave the original message-id intact, as there's
nothing particularly wrong with it.  We have much more important jobs
to do.

TIA.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 08:25:02 GMT) Full text and rfc822 format available.

Message #70 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: matt <at> rfc20.org, eggert <at> cs.ucla.edu, 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 11:23:47 +0300

> Cc: 58472 <at> debbugs.gnu.org
> From: Stefan Kangas <stefankangas <at> gmail.com>
> Date: Mon, 17 Oct 2022 08:14:03 +0000
> 
> Paul Eggert <eggert <at> cs.ucla.edu> writes:
> 
> > I've been looking into this and have several patches along these lines.
> > None of them address message-unique-id directly yet (I plan to tackle
> > this soon) but they do address the general problem area. The basic idea
> > is to use a new make-nonce primitive.
> 
> Thanks!  I have read your patchset, which looks good to me.

It doesn't look good to me at all, and I'm against installing any of
that stuff, certainly at this time, hopefully never.

Please stop pushing this issue, as I will not agree to installing
anything that complex.  The only changes I'm willing to consider wrt
this issue are local changes in message.el that affect only the
message-id.  Please drop any wider and more general changes, as I will
not agree to them.

TIA

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 08:30:02 GMT) Full text and rfc822 format available.

Message #73 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: matt <at> rfc20.org, Paul Eggert <eggert <at> cs.ucla.edu>, stefankangas <at> gmail.com,
 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 10:29:07 +0200

Eli Zaretskii <eliz <at> gnu.org> writes:

> So I'm objected to any of these wide-sweeping changes for a reason
> that is so minor it IMO shouldn't have been brought up in the first
> place.  I regret I didn't stop this discussion back then, because it
> has now snowballed into a monster.  But better late than never.

I sort of agree with you here, but not totally -- I think a `make-nonce'
function would be useful in general, because this is an area that's
genuinely difficult to get right, and having a function that does this
for you -- correctly -- is good.

But, like you, I'm not sure about the proposed changes otherwise.

And, like I've said before, there's a reason the Message-ID is on the
format it's on now -- it has information that allows users to do work on
it (so changing it will break some use cases), and it's short (which
makes it efficient in many algos), and it's obviously "good enough" --
it's been this way for decades without any problems.

So I'd prefer not to change `message-make-id', but adding a `make-nonce'
function would be nice anyway.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 08:36:02 GMT) Full text and rfc822 format available.

Message #76 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: matt <at> rfc20.org, eggert <at> cs.ucla.edu, stefankangas <at> gmail.com,
 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 11:34:44 +0300

> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>,  matt <at> rfc20.org,
>   stefankangas <at> gmail.com,  58472 <at> debbugs.gnu.org
> Date: Mon, 17 Oct 2022 10:29:07 +0200
> 
> I sort of agree with you here, but not totally -- I think a `make-nonce'
> function would be useful in general, because this is an area that's
> genuinely difficult to get right, and having a function that does this
> for you -- correctly -- is good.
> 
> But, like you, I'm not sure about the proposed changes otherwise.
> 
> And, like I've said before, there's a reason the Message-ID is on the
> format it's on now -- it has information that allows users to do work on
> it (so changing it will break some use cases), and it's short (which
> makes it efficient in many algos), and it's obviously "good enough" --
> it's been this way for decades without any problems.

Agreed.

> So I'd prefer not to change `message-make-id', but adding a `make-nonce'
> function would be nice anyway.

If we want a make-nonce function for unrelated reasons, by all means
let's discuss that -- but in a separate thread, and with the reasons
and use cases spelled out.  Doing it as a side effect of what was a
wishlist bug report for a minor feature to begin with doesn't sound
right to me.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 09:31:02 GMT) Full text and rfc822 format available.

Message #79 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>, Eli Zaretskii <eliz <at> gnu.org>
Cc: matt <at> rfc20.org, Paul Eggert <eggert <at> cs.ucla.edu>, 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 09:30:22 +0000

Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> And, like I've said before, there's a reason the Message-ID is on the
> format it's on now -- it has information that allows users to do work on
> it (so changing it will break some use cases), and it's short (which
> makes it efficient in many algos), and it's obviously "good enough" --
> it's been this way for decades without any problems.

It still has the privacy issues I've indicated.  Leaking the euid for no
good reason leaves you vulnerable to fingerprinting (or even attack, in
the worst case scenario).

I also don't think the kind of processing you propose on the Message-ID
is useful, as most people end up with euid 1000 these days.  You have
other headers that are more suitable for that.

> So I'd prefer not to change `message-make-id',

If my most recent patch is not acceptable, could you agree with any of
the earlier ones?  We can make it as short as you want to, or even keep
it at 10 characters.  The important part is to not include the euid.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 11:23:01 GMT) Full text and rfc822 format available.

Message #82 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: matt <at> rfc20.org, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 13:22:17 +0200

Stefan Kangas <stefankangas <at> gmail.com> writes:

> It still has the privacy issues I've indicated.  Leaking the euid for no
> good reason leaves you vulnerable to fingerprinting (or even attack, in
> the worst case scenario).

I don't think the privacy issues here are compelling.  And it's not for
"no good reason", as previously explained:

> I also don't think the kind of processing you propose on the Message-ID
> is useful, as most people end up with euid 1000 these days.  You have
> other headers that are more suitable for that.

No, matching on References/In-reply-to is the only way to get at that
functionality.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 15:42:02 GMT) Full text and rfc822 format available.

Message #85 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: matt <at> rfc20.org, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 15:40:55 +0000

Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> Stefan Kangas <stefankangas <at> gmail.com> writes:
>
>> It still has the privacy issues I've indicated.  Leaking the euid for no
>> good reason leaves you vulnerable to fingerprinting (or even attack, in
>> the worst case scenario).
>
> I don't think the privacy issues here are compelling.

Do you find it problematic that it's very easy for us to have
collisions?  We will have a 1 in 625 chance for a _partial_ Message-ID
collision every time two users:

1. send an email the same second, and
2. have the same euid (e.g. 1000 on Ubuntu, or 501 or whatever it is on
   macOS, etc.).

Just try this:

    (let ((tim (time-convert nil 'integer))
          (i 0) ids)
      (while t
        (cl-incf i)
        (cl-flet ((time-convert (lambda (_ _) tim)))
          (let ((id (message-unique-id)))
            (if (member id ids)
                (error "oops after %d tries" i)
              (push id ids))))))

We will have a _full_ Message-ID collision if they also:

3. have the same host (e.g. it's misconfigured [a not insignificant
   number of desktops, mind you, so it says "tickle-me" or whatever
   non-hilarious thing we use now], or they are on the same big domain
   like eecs.mit.edu).

Is that really "good enough"?  We could do drastically better here, with
very small means, so I'm not sure why we wouldn't.

> No, matching on References/In-reply-to is the only way to get at that
> functionality.

I still don't know which functionality that is.  Getting an In-reply-to
for your highly unique euid 1000?  What's wrong with just checking if
your email address is in To/Cc?

If this use-case is important, wouldn't it be much better to use a
defcustom that you could at least set yourself to something somewhat
unique to you?  We could just set it to 1000 or whatever by default (so
we're not worse off than today, but also not leaking information by
default), and then users could set that to whatever they like (even to
their euid).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 18:42:02 GMT) Full text and rfc822 format available.

Message #88 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Stefan Kangas <stefankangas <at> gmail.com>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 11:40:54 -0700

Paul Eggert <eggert <at> cs.ucla.edu> writes:

> I've been looking into this and have several patches along these lines.
> None of them address message-unique-id directly yet (I plan to tackle
> this soon) but they do address the general problem area. The basic idea
> is to use a new make-nonce primitive.

I like it.

> ---
>  doc/lispref/numbers.texi | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/doc/lispref/numbers.texi b/doc/lispref/numbers.texi
> index fdcda328d8..c1a1349d1a 100644
> --- a/doc/lispref/numbers.texi
> +++ b/doc/lispref/numbers.texi
>
> -If @var{limit} is a string, it means to choose a new seed based on the
> -string's contents.
> +If you need a random nonce for cryptographic purposes, @code{(random
> +t)} is typically not the best approach, as it can adversely affect other
> +parts of your program that benefit from reproducible results, and it can
> +leave information about the nonce scattered about Emacs's internal state.

Mention the new `make-nonce'?

With respect to "cryptographic purposes" how about mentioning that
`random' itself is potentially seeded from a cryptographically weak
source and makes no promise to use a PRNG suitable for cryptography?  If
I'm right about those two assertions, I think they are important to
mention.

> diff --git a/doc/lispref/strings.texi b/doc/lispref/strings.texi
> index cf961e9e7c..0f3e0ae213 100644
> --- a/doc/lispref/strings.texi
> +++ b/doc/lispref/strings.texi
> @@ -455,6 +455,18 @@ Creating Strings
>  Remove the final newline, if any, from @var{string}.
>  @end defun
> 
> +@defun make-nonce length &optional function
> +Return a newly created random string of length @var{length}.
> +The string is unibyte, with bytes taken from system entropy,
> +and with each string value equally likely.
> +
> +If @var{function} is given, call it with the newly created string as
> +an argument and return the value that @var{function} returns.
> +When the function exits, overwrite the string's random contents with
> +unspecified bytes, to lessen information leakage in insecure code.
> +The string's contents are therefore valid only during the function call.
> +@end defun

First question I'll have as a reader: what happens if the system has low
entropy?  Does this block?  Signal an error?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Mon, 17 Oct 2022 18:48:01 GMT) Full text and rfc822 format available.

Message #91 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Matt Armstrong <matt <at> rfc20.org>
To: Stefan Kangas <stefankangas <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 11:47:30 -0700

Stefan Kangas <stefankangas <at> gmail.com> writes:

> On Linux, flags 0 and GRND_RANDOM are equivalent (I read the most
> recent kernel code to verify this).  But I have no idea about other
> platforms.

That is interesting, since they are documented quite differently with no
hint that they are the same:

https://www.kernel.org/doc/man-pages/ ->
https://man7.org/linux/man-pages/man2/getrandom.2.html

But, yet:

https://lore.kernel.org/lkml/531cfcd62151916cc7fbade2ecd0311fbafc02a9.1567126741.git.luto <at> kernel.org/

...would not be the first time a developer changed code without updating
the documentation.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Tue, 18 Oct 2022 01:39:02 GMT) Full text and rfc822 format available.

Message #94 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Matt Armstrong <matt <at> rfc20.org>
Cc: 58472 <at> debbugs.gnu.org, Stefan Kangas <stefankangas <at> gmail.com>
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Mon, 17 Oct 2022 18:38:48 -0700

[Message part 1 (text/plain, inline)]

On 10/17/22 11:40, Matt Armstrong wrote:

> I like it.

Eli doesn't, so I'll drop the idea for now. I didn't realize we were 
close to releasing 29.1, and I agree with Eli that adding a make-nonce 
primitive is not something to do close to a release.

> With respect to "cryptographic purposes" how about mentioning that
> `random' itself is potentially seeded from a cryptographically weak
> source and makes no promise to use a PRNG suitable for cryptography?  If
> I'm right about those two assertions, I think they are important to
> mention.

Good point. This can be done in the documentation now: this doesn't hurt 
anything release-relevant, as it's simply documenting what we have. I 
installed the attached.

[0001-Improve-random-doc-re-nonces.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#58472; Package emacs. (Tue, 18 Oct 2022 14:07:02 GMT) Full text and rfc822 format available.

Message #97 received at 58472 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: matt <at> rfc20.org, stefankangas <at> gmail.com, 58472 <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Tue, 18 Oct 2022 17:05:28 +0300

> Cc: 58472 <at> debbugs.gnu.org, Stefan Kangas <stefankangas <at> gmail.com>
> Date: Mon, 17 Oct 2022 18:38:48 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> > I like it.
> 
> Eli doesn't, so I'll drop the idea for now. I didn't realize we were 
> close to releasing 29.1, and I agree with Eli that adding a make-nonce 
> primitive is not something to do close to a release.

I actually don't mind adding a new primitive, and Lars says it could
be useful.  A new primitive cannot do any harm; all I'm asking is not
to start using it right away in places where we have solid code that
worked for years.  And the particular feature for which make-nonce was
intended in this case definitely doesn't need it.

However, I would like to have at least a short discussion of the
potential needs for make-nonce, and to have it in a separate thread,
so that we could be all on the same page about its need, use, and
semantics.

Reply sent to Stefan Kangas <stefankangas <at> gmail.com>:
You have taken responsibility. (Fri, 25 Nov 2022 01:27:02 GMT) Full text and rfc822 format available.

Notification sent to Stefan Kangas <stefankangas <at> gmail.com>:
bug acknowledged by developer. (Fri, 25 Nov 2022 01:27:02 GMT) Full text and rfc822 format available.

Message #102 received at 58472-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefankangas <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: matt <at> rfc20.org, Eli Zaretskii <eliz <at> gnu.org>,
 Paul Eggert <eggert <at> cs.ucla.edu>, 58472-done <at> debbugs.gnu.org
Subject: Re: bug#58472: [PATCH] Make `message-unique-id' less prone to
 collisions
Date: Thu, 24 Nov 2022 17:26:31 -0800

I'm guessing that none of this is happening, so I'm closing this bug
report.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 23 Dec 2022 12:24:21 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 276 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #58472 [PATCH] Make `message-unique-id' less prone to collisions

GNU bug report logs - #58472
[PATCH] Make `message-unique-id' less prone to collisions