GNU bug report logs - #78340
[PATCH] New option for comp *.eln file name by the file timestamp of *.el

Previous Next

Package: emacs;

Reported by: Lin Sun <sunlin7 <at> hotmail.com>

Date: Sat, 10 May 2025 00:14:02 UTC

Severity: normal

Tags: patch

Done: Eli Zaretskii <eliz <at> gnu.org>

Full log


Message #53 received at 78340 <at> debbugs.gnu.org (full text, mbox):

From: Lynn Winebarger <owinebar <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: sunlin7 <at> hotmail.com, acorallo <at> gnu.org, 78340 <at> debbugs.gnu.org
Subject: Re: bug#78340: [PATCH] New option for comp *.eln file name by the
 file timestamp of *.el
Date: Mon, 12 May 2025 11:37:13 -0400
[Message part 1 (text/plain, inline)]
On Mon, May 12, 2025, 10:24 AM Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: Lynn Winebarger <owinebar <at> gmail.com>
> > Date: Mon, 12 May 2025 10:05:36 -0400
> > Cc: acorallo <at> gnu.org, sunlin7 <at> hotmail.com, 78340 <at> debbugs.gnu.org
> >
> > On Mon, May 12, 2025 at 9:53 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > >
> > > > From: Lynn Winebarger <owinebar <at> gmail.com>
> > > > Date: Mon, 12 May 2025 09:42:06 -0400
> > > > Cc: acorallo <at> gnu.org, sunlin7 <at> hotmail.com, 78340 <at> debbugs.gnu.org
> > > >
> > > > I don't really follow the logic on why the content hash of the
> current
> > > > source is required to keep emacs from *crashing*, as opposed to just
> > > > being inconsistent with the current source.
> > >
> > > How else can you make sure the native code is safe to run?  E.g., what
> > > if the code calls primitives that don't exist in Emacs, or their
> > > signature is different?
> >
> > That's what the ABI hash does.  The content hash doesn't prove
> > anything about that.
>
> No, the ABI hash is the evidence for the Emacs binary.  We need
> evidence for the .eln file's code


So the idea is that the content hash is supposed to certify that the ELN
was derived from some valid lisp code, and the existence of that source in
the path is the trusted authority for valid lisp code?
Looking in src/comp.c,
------------------
  /* We create eln filenames with an hash in order to look-up these
     starting from the source filename, IOW have a relation

     /absolute/path/filename.el + content ->
     eln-cache/filename-path_hash-content_hash.eln.

     'dlopen' can return the same handle if two shared with the same
     filename are loaded in two different times (even if the first was
     deleted!).  To prevent this scenario the source file content is
     included in the hashing algorithm.

     As at any point in time no more then one file can exist with the
     same filename, should be possible to clean up all
     filename-path_hash-* except the most recent one (or the new one
     being recompiled).

     As installing .eln files compiled during the build changes their
     absolute path we need an hashing mechanism that is not sensitive
     to that.  For this we replace if match PATH_DUMPLOADSEARCH or
     *PATH_REL_LOADSEARCH with '//' before computing the hash.  */
------------------------

There's nothing there saying the check is to make sure that the code in the
.eln is safe, only that the eln filename is unique when a library is
recompiled with different source.  There's nothing that implies that it
would be *unsafe* to load a library generated from a version with a
different content hash than the one that happens to exist on disk at the
moment.  A compile-time time stamp would probably be adequate for the
stated purpose of ensuring a unqiue file name.
I don't see anything in comp.c storing the content hash in the .eln file
for verification purposes.  It looks like if I modify an elisp library,
then copy the existing .eln to have a filename with the content hash from
the new version, but without any modification of the eln contents, it
should load the same way the older byte-compiled version would.  Are you
(or Andrea) saying that would risk a crash?
Am I missing something?


> > > > So, instead of checking file times, why not just put the content hash
> > > > in the .elc, and if loading the elc would be acceptable (determined
> by
> > > > file times), then use the value from the elc file (stored in the
> > > > comment block at the start so the Fread procedure is not required).
> > >
> > > I don't see how the time stamp of the .elc file is more reliable than
> > > of the .el file.
> >
> > If the byte-compiler notes in its output (by this comment tag) "this
> > elc was compiled from a source file with content hash X", then why
> > would X not be satisfactory for determining if the .eln was generated
> > from valid elisp code?  If someone can write to the elc file, they can
> > probably write to the .eln file as well and change the embedded
> > content hash to whatever they want if that's the concern.
>
> You've changed the subject: I was talking about the time stamp.
>
> As for your proposal, I'm not sure it is better, let alone
> significantly better.  Emacs currently looks for the .eln without
> reading the .elc file, only because the .elc file was found.  You
> suggest reading the .elc file, which will probably be slow on Windows
> for the same reason computing the content hash is slow.  (We currently
> don't open the .elc file on Windows because of this slowness, we just
> ask the filesystem whether the file exists.)
>
> So it is not clear to me that this complication (which will as a side
> effect make the .elc files incompatible with previous Emacs versions)
> is justified.  Someone will have to show a significant speedup to make
> it worth considering.
>


I didn't suggest reading the whole file, only the leading comment block. If
the performance impact is from reading a single block of a file, the you're
right.  I thought the issue was having to do a computation linear in the
size of the file.

For your parenthetical, how would adding a line like
; content-hash: ...
after the existing comment block make the file incompatible with older
versions?  load just processes the file through "read" anyway, and that
wouldn't be affected by a comment line.  Unless I've
seriously misread src/lread.c.

If the only purpose of the content hash is to certify that some valid elisp
source (with respect to the running emacs) is known to exist, then emacs
should be able to keep a table in memory loaded at startup and just consult
that for verification.  That should address any concern about crashing, at
least, unless I have completely misunderstood the concern about (which is
very possible).

Lynn
[Message part 2 (text/html, inline)]

This bug report was last modified 24 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.