GNU bug report logs - #73318
31.0.50; with-native-compilation=aot breaks exec -a emacs

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Tue, 17 Sep 2024 15:20:01 UTC

Severity: normal

Found in version 31.0.50

Full log


Message #56 received at 73318 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 73318 <at> debbugs.gnu.org, larsi <at> gnus.org, acorallo <at> gnu.org,
 schwab <at> linux-m68k.org, shipmints <at> gmail.com
Subject: Re: bug#73318: 31.0.50; with-native-compilation=aot breaks exec -a
 emacs
Date: Fri, 04 Oct 2024 09:22:08 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Spencer Baugh <sbaugh <at> janestreet.com>
>> Cc: 73318 <at> debbugs.gnu.org,  larsi <at> gnus.org,  acorallo <at> gnu.org,
>>    shipmints <at> gmail.com, schwab <at> linux-m68k.org
>> Date: Fri, 04 Oct 2024 08:09:59 -0400
>> 
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> 
>> > A more interesting discussion starts here:
>> >
>> >   https://lists.gnu.org/archive/html/emacs-devel/2019-01/msg00635.html
>> >
>> > That discussion is about finding the pdumper file, but the side effect
>> > of looking for pdumper file is the directory where we think the Emacs
>> > executable file is.  That discussion mentions several issues related
>> > to finding the leading directories of the Emacs executable.
>> 
>> Ah, this is indeed an interesting discussion.  I have to say, I agree
>> with Andreas Schwab when they say that argv[0] is not reliable :)
>
> For some reason that evades me, you take only part of the discussion
> and ignore the rest.
>
> The specific issue with finding the pdumper file more reliably is
> solved in Emacs 30 in the way Andreas suggested, but it has no effect
> on the problem which you describe, with finding the preloaded *.eln
> files.

Ah, interesting.  So Emacs 30 falls back on looking up the pdmp in
PATH_EXEC, a path compiled into the Emacs binary.

Should we perhaps do the same for the native-lisp directory?  If we
can't find it in other ways, look it up relative to a path compiled into
the Emacs binary?  I don't know if that should be PATH_EXEC or some
other path.

That would work on my system.  Then we wouldn't need to use
/proc/self/exe at all.

Might that be the best solution?

> It is true that argv[0] alone is not reliable enough, which is why we
> use other techniques when argv[0] cannot help.  But /proc/self/exe is
> not reliable enough, either.

Yes, I'm persuaded now that if we use /proc/self/exe, we should also use
argv[0].  As Po Lu said, /proc can be unmounted, so we must have some
other mechanism besides just /proc/self/exe.

>> >   . what if /proc/self/exe is unreadable? AFAIK, on some systems you
>> >     need special privileges to follow its symlink
>> >   . what if /proc/self/exe points to a file name that is a symlink, or
>> >     some of its leading directories are symlinks?
>> >   . what if Emacs is invoked via a script which is in the correct
>> >     installation directory, but the actual binary the script invokes
>> >     is not in the expected location relative to the native-lisp/
>> >     directory where we have the preloaded *.eln files?
>> >
>> > The existing code handles all these cases, and some others.  We could
>> > perhaps _add_ the use of /proc/self/exe to what we have, but we'd need
>> > to be sure that it doesn't break for the above situations.
>
> I'm still waiting for some answers to these.

If we use /proc/self/exe, I'm fine with it being a fallback if all other
mechanisms fail.  That should make these cases still work fine, right?

>> > I also don't understand why your script insists on removing the
>> > leading directories from argv[0] of Emacs.  Is there any problem for
>> > you to modify your script such that the leading directories would
>> > still be present in argv[0]?
>> 
>> We have a generic script, part of our packaging system, which wraps most
>> executables, and does "exec -a executable-name /path/to/executable".
>> The motivation, quoting one of the developers of said script, is:
>> 
>>   For context, I believe the reason why we pass `-a` is to make the prog
>>   more identifiable when users try to find it in the output of
>>   `ps`. That still sounds like the right thing to do in the majority of
>>   the cases.
>> 
>> Since this is a generic script used for everything, it's difficult to
>> modify it just for Emacs.
>
> Why cannot you modify the script for all the commands to include the
> leading directories in executable-name?  That is all that is needed
> for Emacs to find its *.eln files.

See the motivation that I quoted above:

 For context, I believe the reason why we pass `-a` is to make the prog
 more identifiable when users try to find it in the output of
 `ps`. That still sounds like the right thing to do in the majority of
 the cases.

Including the leading directories would make them show up in the output
of "ps", which is uglier.

I realize this might not seem like an important justification, but it
works for every other program we run, and has worked for decades.  And
other distributors might be doing this too, so I think it's reasonable
to make Emacs robust to this by having it fall back to looking up
native-lisp in something like PATH_EXEC.

>> $ sh -c "exec -a emacs /home/sbaugh/prefix/bin/emacs -Q --batch"
>> Error using execdir /home/sbaugh/.dispatch/bin/:
>> emacs: /home/sbaugh/.dispatch/bin/../native-lisp/31.0.50-ef69cec6/preloaded/simple-e50f0a67-a4bb4e10.eln: cannot open shared object file: No such file or directory
>> 
>> > When Emacs does not find its executable file using argv[0], it assumes
>> > that the executable is in PATH_EXEC/../../../../bin/.  Since you are
>> > running an installed Emacs, that should have worked, unless you also
>> > somehow changed the relative path from $prefix/bin to the directory
>> > where the native-lisp/ directory is installed.  Why didn't it work?
>> 
>> Since argv[0] is "emacs", Emacs searched PATH for "emacs" and found it:
>> 
>> $ type emacs
>> emacs is /home/sbaugh/.dispatch/bin/emacs
>> 
>> Unfortunately, that file is not the actual Emacs executable, it's the
>> aforementioned generic wrapper script, which in this case ends with
>> this:
>> 
>> exec -a emacs /j/office/app/emacs/dev/bin/emacs "$@"
>> 
>> So Emacs looks for native-lisp in /home/sbaugh/.dispatch/bin/, but
>> that's wrong.  It should be looking in /j/office/app/emacs/dev/bin/.
>
> If you include the leading directories in the "-a NAME" switch to
> 'exec', the problem will be solved.  Since the script evidently knows
> the exact absolute file name of the program it invokes, there should
> be no problem including that absolute file name in the NAME argument
> of the -a option.
>
>> > Bottom line: I think there are still unclear aspects of what happened
>> > in your case, and using /proc/self/exe to fix that is not as simple as
>> > it might seem, especially since we don't yet understand fully what
>> > failed and why.
>
> Using /proc/self/exe to fix this kind of situations is not trivial,
> due to the issues with /proc/self/exe I mentioned in my previous
> email.  So my suggestion would be first to try to adapt your generic
> script to the Emacs expectations, namely, to make sure argv[0] of
> Emacs includes the leading directories of the real installation tree.
> This is by far the simplest solution, and will not require any changes
> in Emacs.

Sure, see the motivation I quoted above.

But now I think we maybe don't need to use /proc/self/exe at all, and
can just have Emacs fall back on something like PATH_EXEC when it fails
to find the native lisp files.




This bug report was last modified 248 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.