GNU bug report logs - #34655
26.1.92; Segfault in module with --module-assertions

Previous Next

Package: emacs;

Reported by: "Basil L. Contovounesios" <contovob <at> tcd.ie>

Date: Mon, 25 Feb 2019 21:02:01 UTC

Severity: normal

Merged with 31238

Found in version 26.1.92

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Subject: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Sun, 17 Mar 2019 23:52:55 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Cc: <34655 <at> debbugs.gnu.org>, Philipp Stephani <p.stephani2 <at> gmail.com>
>> Date: Sun, 17 Mar 2019 16:38:58 +0000
>> 
>> These reveal that value_to_lisp eventually returns a corrupted string,
>> but I don't know why.
>
> Did you try to identify the code which causes the corruption, i.e. the
> data is valid before that code runs, and invalid after that?  If not,
> can you try?  The way to do that is by painstakingly stepping through
> the code while examining the relevant data, possibly with help of
> watchpoints and displays set up by the GDB "display" command.

The patch adding assertions to emacs-module.c narrows the problematic
code to lines 123--127 of the dynamic module[1]:

      if (rp_lisp_string (env, &file, nbuf)
          && rp_funcall (env, &dir, "directory-name-p", 1, &dir)
          && env->is_not_nil (env, dir))
        /* Return directory name when given one à la Ffile_truename.  */
        rp_funcall (env, &file, "file-name-as-directory", 1, &file);

[1]: https://gitlab.com/basil-conto/realpath/blob/master/realpath.c#L123-127

On line 123, 'file' is set to an Emacs string created from the C string
'nbuf' ('rp_lisp_string' wraps 'module_make_string' along with a
nonlocal exit check, and similarly 'rp_funcall' wraps 'module_funcall').
On line 127, 'file' is passed to 'file-name-as-directory'.

The assertions added to 'module_make_string' and 'lisp_to_value' never
fail, suggesting the string returned by them is fine (though the
assertions in 'lisp_to_value' only target intermediate Lisp_Objects, not
the returned emacs_value).

The assertion added to 'value_to_lisp' via 'module_funcall', OTOH, does
fail.  I'll see if I can step through this, though I'm not yet sure how
I'll distinguish the problematic call to the module function from the
hundreds of unproblematic ones before it.  There's probably a way to
teach GDB how to inspect emacs_values which I'm not yet familiar with.

>> I've seen comments in src/fileio.c referring to string-relocation
>> during GC; could this be at play here?
>
> It could be, if your module code holds onto C pointers to Lisp string
> data while Emacs runs parts of the interpreter which could GC.  Does
> that happen anywhere in your code or in the code involved in
> module-assertions?

I can't speak for emacs-module.c (I haven't yet understood how
Vmodule_environments and its save pointers work), but the only exchange
between C and Lisp strings in my code is via the module API,
i.e. module_make_string and module_copy_string_contents.  I would hope
the API and its opaque emacs_value type make it difficult for such
issues to arise.

>> Either way, do you have any suggestions on how to proceed?
>
> See above.
>
> I tried at the time to reproduce your problem, and failed.  But I did
> that on Windows, where I needed to replace the non-existent realpath
> by an equivalent function, so it's not a faithful reproduction.  I
> will see if I can find time to look at this on a GNU machine, unless
> someone beats me to it.

Replacing 'canonicalize_file_name' with 'strdup' still reproduces the
issue for me.  Perhaps increasing the number of calls to
realpath-truename from 1000 to 5000 will also help.

>> 8. bt full
>> 9. f 2
>> 10. p p
>> 11. pr [#<INVALID_LISP_OBJECT 0x03059c90>]
>> 12. xpr
>
> Why did you expect 'p' to be a valid Lisp object?  It's actually a
> pointer to a Lisp object, i.e. try
>
>   (gdb) p *p
>   (gdb) xpr

Oops, that was a thinko.  The only difference is GDB reports XIL(...)
instead of (Lisp_Object *), though.

Thank you for your help, I'll report more as time allows.

-- 
Basil




This bug report was last modified 6 years and 61 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.