GNU bug report logs - #75845
mapconcat crashes for unreasonable self-modifications

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> protonmail.com>

Date: Sun, 26 Jan 2025 02:03:02 UTC

Severity: normal

To reply to this bug, email your comments to 75845 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 02:03:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pip Cet <pipcet <at> protonmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 26 Jan 2025 02:03:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 02:02:22 +0000
If the mapping function supplied to mapconcat modifies the sequence
being mapped over, Emacs can crash.

This seems easily fixable, but I wanted to file a bug to make sure we
get it right: our current highly-optimized implementation of mapconcat
will fail in many cases in which a naive implementation of mapconcat (in
Lisp, say) would succeed.  Some of those limitations are documented (the
argument must be a sequence when the function is called), some aren't
(the sequence must not become shorter or longer).

My suggestion is to make the ordinary no-modification case work as fast
as is possible under the constraint that extraordinary
self-modifications should not result in crashes.  They can and will
result in unpredictable behavior, though.

These crash:

(let ((l (make-list 1000 nil))) (mapconcat (lambda (x) (ntake 1 l)) l))

(let ((s (string ?a 1000))) (mapconcat (lambda (x) (aset s 0 1000) (string x)) s))

Patch will follow once this has a bug number.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 02:38:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 02:37:37 +0000
Pip Cet <pipcet <at> protonmail.com> writes:

> If the mapping function supplied to mapconcat modifies the sequence
> being mapped over, Emacs can crash.
>
> This seems easily fixable, but I wanted to file a bug to make sure we
> get it right: our current highly-optimized implementation of mapconcat
> will fail in many cases in which a naive implementation of mapconcat (in
> Lisp, say) would succeed.  Some of those limitations are documented (the
> argument must be a sequence when the function is called), some aren't
> (the sequence must not become shorter or longer).
>
> My suggestion is to make the ordinary no-modification case work as fast
> as is possible under the constraint that extraordinary
> self-modifications should not result in crashes.  They can and will
> result in unpredictable behavior, though.
>
> These crash:
>
> (let ((l (make-list 1000 nil))) (mapconcat (lambda (x) (ntake 1 l)) l))
>
> (let ((s (string ?a 1000))) (mapconcat (lambda (x) (aset s 0 1000) (string x)) s))
>
> Patch will follow once this has a bug number.

This patch does the minimum work necessary; while the previous code
attempted to return the "correct" length of a list which was modified,
crashing in an eassert in mapconcat, the new code errors instead.  As
error is marked with ATTRIBUTE_COLD, this should result in better code.

The string case is slightly trickier: we don't keep around an SDATA
pointer, which is good, but we do remember byte and charpos index
values, and those might get out of sync.  So verify we're looking at a
CHAR_HEAD when we need to, which is for multibyte strings.

mapconcat itself is also fixed not to assert but to use whatever mapcar1
returns.

From cf861bc042650c968f2702d842ffa4a36f0105e9 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH] Fix crashes in mapconcat etc. (bug#75845)

This doesn't attempt to catch all cases in which the mapping function
modifies the sequence, only those which would cause crashes.

* src/fns.c (mapcar1): Error if we detect modifications by the mapping
function.
(mapconcat): Don't eassert that 'mapcar1' never returns a
partially-filled vector, even though that is currently correct.
---
 src/fns.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/fns.c b/src/fns.c
index 081ed2b9f51..8e8b9584509 100644
--- a/src/fns.c
+++ b/src/fns.c
@@ -3385,7 +3385,7 @@ mapcar1 (EMACS_INT leni, Lisp_Object *vals, Lisp_Object fn, Lisp_Object seq)
       for (ptrdiff_t i = 0; i < leni; i++)
 	{
 	  if (! CONSP (tail))
-	    return i;
+	    error ("list modified by mapping function");
 	  Lisp_Object dummy = calln (fn, XCAR (tail));
 	  if (vals)
 	    vals[i] = dummy;
@@ -3403,16 +3403,21 @@ mapcar1 (EMACS_INT leni, Lisp_Object *vals, Lisp_Object fn, Lisp_Object seq)
     }
   else if (STRINGP (seq))
     {
+      ptrdiff_t i = 0;
       ptrdiff_t i_byte = 0;
 
-      for (ptrdiff_t i = 0; i < leni;)
+      while (i_byte < SBYTES (seq))
 	{
 	  ptrdiff_t i_before = i;
+	  if (STRING_MULTIBYTE (seq) && !CHAR_HEAD_P (SREF (seq, i_byte)))
+	    error ("string modified by mapping function");
 	  int c = fetch_string_char_advance (seq, &i, &i_byte);
 	  Lisp_Object dummy = calln (fn, make_fixnum (c));
 	  if (vals)
 	    vals[i_before] = dummy;
 	}
+      if (i < leni)
+	error ("string modified by mapping function");
     }
   else
     {
@@ -3473,8 +3478,7 @@ DEFUN ("mapconcat", Fmapconcat, Smapconcat, 2, 3, 0,
 	  goto concat;
 	}
     }
-  ptrdiff_t nmapped = mapcar1 (leni, args, function, sequence);
-  eassert (nmapped == leni);
+  leni = mapcar1 (leni, args, function, sequence);
 
  concat: ;
   ptrdiff_t nargs = args_alloc;
-- 
2.47.1





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 06:57:02 GMT) Full text and rfc822 format available.

Message #11 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Mattias Engdegård <mattiase <at> acm.org>
Cc: 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 08:56:03 +0200
> Date: Sun, 26 Jan 2025 02:37:37 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> Pip Cet <pipcet <at> protonmail.com> writes:
> 
> > If the mapping function supplied to mapconcat modifies the sequence
> > being mapped over, Emacs can crash.
> >
> > This seems easily fixable, but I wanted to file a bug to make sure we
> > get it right: our current highly-optimized implementation of mapconcat
> > will fail in many cases in which a naive implementation of mapconcat (in
> > Lisp, say) would succeed.  Some of those limitations are documented (the
> > argument must be a sequence when the function is called), some aren't
> > (the sequence must not become shorter or longer).
> >
> > My suggestion is to make the ordinary no-modification case work as fast
> > as is possible under the constraint that extraordinary
> > self-modifications should not result in crashes.  They can and will
> > result in unpredictable behavior, though.
> >
> > These crash:
> >
> > (let ((l (make-list 1000 nil))) (mapconcat (lambda (x) (ntake 1 l)) l))
> >
> > (let ((s (string ?a 1000))) (mapconcat (lambda (x) (aset s 0 1000) (string x)) s))
> >
> > Patch will follow once this has a bug number.
> 
> This patch does the minimum work necessary; while the previous code
> attempted to return the "correct" length of a list which was modified,
> crashing in an eassert in mapconcat, the new code errors instead.  As
> error is marked with ATTRIBUTE_COLD, this should result in better code.
> 
> The string case is slightly trickier: we don't keep around an SDATA
> pointer, which is good, but we do remember byte and charpos index
> values, and those might get out of sync.  So verify we're looking at a
> CHAR_HEAD when we need to, which is for multibyte strings.
> 
> mapconcat itself is also fixed not to assert but to use whatever mapcar1
> returns.

Thanks, adding Stefan and Mattias to the discussion, in the hope that
they could review the patch and comment.

>  	  if (! CONSP (tail))
> -	    return i;
> +	    error ("list modified by mapping function");

I think this error message is not explicit enough: it doesn't tell
which list was modified by what function.  mapcar and mapconcat are
many times invoked by code that has little to do with the level which
could make sense to the user, so we should try to provide as much
contextual information as possible.  Perhaps something like

  sequence arg to `mapcar' or `mapconcat' modified by its mapping function arg

Better suggestions are welcome.

> +	  if (STRING_MULTIBYTE (seq) && !CHAR_HEAD_P (SREF (seq, i_byte)))
> +	    error ("string modified by mapping function");

AFAIU, this doesn't check whether the string was modified, it checks
for modifications that invalidate the relation between i and i_byte,
right?  So I would suggest

  string arg of mapcar or mapconcat clobbered by its mapping function arg

Because if we want to disallow _any_ modifications of the string, we'd
need to use a different test (e.g., unibyte ASCII strings could also
be modified by the mapping function).

> +      if (i < leni)
> +	error ("string modified by mapping function");

Why do we need this test?  Does it prevent us from crashing in some
cases?  If that replaces the below:

> @@ -3473,8 +3478,7 @@ DEFUN ("mapconcat", Fmapconcat, Smapconcat, 2, 3, 0,
>  	  goto concat;
>  	}
>      }
> -  ptrdiff_t nmapped = mapcar1 (leni, args, function, sequence);
> -  eassert (nmapped == leni);
> +  leni = mapcar1 (leni, args, function, sequence);

then can we have the test as eassert?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 09:50:02 GMT) Full text and rfc822 format available.

Message #14 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 09:48:48 +0000
R"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Sun, 26 Jan 2025 02:37:37 +0000
>> From:  Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> Pip Cet <pipcet <at> protonmail.com> writes:
>>
>> > If the mapping function supplied to mapconcat modifies the sequence
>> > being mapped over, Emacs can crash.
>> >
>> > This seems easily fixable, but I wanted to file a bug to make sure we
>> > get it right: our current highly-optimized implementation of mapconcat
>> > will fail in many cases in which a naive implementation of mapconcat (in
>> > Lisp, say) would succeed.  Some of those limitations are documented (the
>> > argument must be a sequence when the function is called), some aren't
>> > (the sequence must not become shorter or longer).
>> >
>> > My suggestion is to make the ordinary no-modification case work as fast
>> > as is possible under the constraint that extraordinary
>> > self-modifications should not result in crashes.  They can and will
>> > result in unpredictable behavior, though.
>> >
>> > These crash:
>> >
>> > (let ((l (make-list 1000 nil))) (mapconcat (lambda (x) (ntake 1 l)) l))
>> >
>> > (let ((s (string ?a 1000))) (mapconcat (lambda (x) (aset s 0 1000) (string x)) s))
>> >
>> > Patch will follow once this has a bug number.
>>
>> This patch does the minimum work necessary; while the previous code
                       ^^^^^^^


I don't think this situation is expected to be common enough to warrant
a more specific error message, TBH, particularly not if it then fails to
apply to all possible cases.

My suggestion is to merge the two error messages to save some space,
apply the patch, and make a mental note to include both
mapconcat/mapcar1 and the error handling situation for further review
when we have time.

The rest of this email mostly consists of notes for that (future)
discussion, except for this, which I've reordered:

>> +      if (i < leni)
>> +	error ("string modified by mapping function");
>
> Why do we need this test?

mapconcat assumed the wrong thing, I think other future callers might,
as well.  It's not obvious that mapcar1's return value can be different
from its first argument, and given the semi-optimized state of this
function I think future work is more likely.

> Does it prevent us from crashing in some cases?

Yes.  Silly programmers like myself (see first suggested patch in
another bug) may attempt to use out-of-bounds values in the array, which
are uninitialized and would cause crashes.

Also, a minor point in performance-sensitive code is that conditional
branches are bad, but not if they call error().  Such branches will be
optimized by GCC to be predicted not to call error in a static way,
independently of the CPU's limited branch prediction cache.

Here's the rest:

>> attempted to return the "correct" length of a list which was modified,
>> crashing in an eassert in mapconcat, the new code errors instead.  As
>> error is marked with ATTRIBUTE_COLD, this should result in better code.
>>
>> The string case is slightly trickier: we don't keep around an SDATA
>> pointer, which is good, but we do remember byte and charpos index
>> values, and those might get out of sync.  So verify we're looking at a
>> CHAR_HEAD when we need to, which is for multibyte strings.
>>
>> mapconcat itself is also fixed not to assert but to use whatever mapcar1
>> returns.
>
> Thanks, adding Stefan and Mattias to the discussion, in the hope that
> they could review the patch and comment.

Thanks!

>>  	  if (! CONSP (tail))
>> -	    return i;
>> +	    error ("list modified by mapping function");
>
> I think this error message is not explicit enough: it doesn't tell
> which list was modified by what function.  mapcar and mapconcat are
> many times invoked by code that has little to do with the level which
> could make sense to the user, so we should try to provide as much
> contextual information as possible.  Perhaps something like
>
>   sequence arg to `mapcar' or `mapconcat' modified by its mapping function arg

I like the idea of using the same message for both, reducing the memory
requirement for the unused error string.  (I may be unaware of a good
error function to use in this case; I ended up not providing additional
arguments because it seems unlikely to me that this problem, if really
accidental, is something the programmer can fix without a Lisp
backtrace, which indicates the situation more clearly).

However, all functions using mapcar1 are affected by the fix, if not the
current bug: mapcan and mapc are as well.  Making a message more
specific is bad if that makes it incorrect, and mapc is the most likely
candidate for modifying its sequence argument.  (I've never seen (mapc
function "string") though, that I can recall).

>> +	  if (STRING_MULTIBYTE (seq) && !CHAR_HEAD_P (SREF (seq, i_byte)))
>> +	    error ("string modified by mapping function");
>
> AFAIU, this doesn't check whether the string was modified, it checks
> for modifications that invalidate the relation between i and i_byte,
> right?  So I would suggest

Not even all of those, really, only if they happen to leave us looking
at the middle of a multibyte sequence.  Again, minimal effort.

> Because if we want to disallow _any_ modifications of the string, we'd
> need to use a different test (e.g., unibyte ASCII strings could also
> be modified by the mapping function).

I don't think we do want that.

However, my minority opinion on the question of read-only objects is
just that, so I'm not a good person to ask if you want to reintroduce
such objects after most of them are removed when scratch/no-purespace is
merged.

> If that replaces the below:

>> @@ -3473,8 +3478,7 @@ DEFUN ("mapconcat", Fmapconcat, Smapconcat, 2, 3, 0,
>>  	  goto concat;
>>  	}
>>      }
>> -  ptrdiff_t nmapped = mapcar1 (leni, args, function, sequence);
>> -  eassert (nmapped == leni);
>> +  leni = mapcar1 (leni, args, function, sequence);
>
> then can we have the test as eassert?

I see no reason to set this trap for the next person who modifies
mapcar1.  If they see the return value is checked by some callers, they
might miss the one caller which crashes.  If they remove the return
value, they might miss that the eassert, too, needs to be removed,
making their job harder.

If we want that eassert (why?), it belongs in mapcar1, not Fmapconcat.

mapconcat in general seems to be semi-optimized to me, to be honest, and
it makes the code very long and (apparently) bug-prone.  Until we have
better structures for growing strings, we could do worse than to call
Fappend on the sequence argument and handle only the (simple) list case.

If there are significant users of the vector case, we could add that,
too, but strings and bool vectors add many lines of unnecessary code.

(Bool vectors are obsolete, I'm not aware of any new users since I
suggested obsoleting and removing them long ago.  IMHO, that remains the
right thing to do, but no new arguments for removing them have appeared,
either.  I'd do the change locally, but can't as bool vectors complicate
alloc.c in so many places (which is why I want to remove them in the
first place) that this would introduce too many merge conflicts.)

Again, it's unclear to me which cases mapconcat has chosen to optimize,
and why.  I'm not proposing to optimize it for other cases at this point
because I don't know what the criterion is, but I think we went
overboard there supporting weird sequences instead of making the common
case fast and reducing all other cases to it.

I understand we can't go with the 7-line Lisp version of it, but maybe a
compromise can be found.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 10:25:01 GMT) Full text and rfc822 format available.

Message #17 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: mattiase <at> acm.org, monnier <at> iro.umontreal.ca, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 12:24:23 +0200
> Date: Sun, 26 Jan 2025 09:48:48 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, Mattias Engdegård <mattiase <at> acm.org>, 75845 <at> debbugs.gnu.org
> 
> I don't think this situation is expected to be common enough to warrant
> a more specific error message, TBH, particularly not if it then fails to
> apply to all possible cases.

I didn't want the message to be specific.  I wanted it to explain
which function signaled the error, so the bug could be easier to look
for and spot.

> My suggestion is to merge the two error messages to save some space,

I'm okay with using a single error message.  But let's make its text
provide more context than just saying that some list was modified by
some function.

> > Does it prevent us from crashing in some cases?
> 
> Yes.  Silly programmers like myself (see first suggested patch in
> another bug) may attempt to use out-of-bounds values in the array, which
> are uninitialized and would cause crashes.

No, I meant can we crash in the following code.  If the caller then
uses the value somewhere else, without validating it, it's a separate
problem that should be perhaps dealt with at that other place.

> >   sequence arg to `mapcar' or `mapconcat' modified by its mapping function arg
> 
> I like the idea of using the same message for both, reducing the memory
> requirement for the unused error string.

Then how about

  sequence arg to `mapcar' etc. modified by its mapping function arg

> (I may be unaware of a good
> error function to use in this case; I ended up not providing additional
> arguments because it seems unlikely to me that this problem, if really
> accidental, is something the programmer can fix without a Lisp
> backtrace, which indicates the situation more clearly).

The above just provides more context, to make it easier to look for
the likely culprits.  Arguments are not provided, just mentioned.

> >> -  ptrdiff_t nmapped = mapcar1 (leni, args, function, sequence);
> >> -  eassert (nmapped == leni);
> >> +  leni = mapcar1 (leni, args, function, sequence);
> >
> > then can we have the test as eassert?
> 
> I see no reason to set this trap for the next person who modifies
> mapcar1.  If they see the return value is checked by some callers, they
> might miss the one caller which crashes.  If they remove the return
> value, they might miss that the eassert, too, needs to be removed,
> making their job harder.
> 
> If we want that eassert (why?), it belongs in mapcar1, not Fmapconcat.

That's what I meant: to make the test an assertion in mapcar1.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 10:56:02 GMT) Full text and rfc822 format available.

Message #20 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 Pip Cet <pipcet <at> protonmail.com>, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 05:55:44 -0500
> Thanks, adding Stefan and Mattias to the discussion, in the hope that
> they could review the patch and comment.

The patch LGTM.

>>  	  if (! CONSP (tail))
>> -	    return i;
>> +	    error ("list modified by mapping function");
>
> I think this error message is not explicit enough: it doesn't tell
> which list was modified by what function.  mapcar and mapconcat are
> many times invoked by code that has little to do with the level which
> could make sense to the user, so we should try to provide as much
> contextual information as possible.  Perhaps something like
>
>   sequence arg to `mapcar' or `mapconcat' modified by its mapping function arg

Since this is in the list-part of the code, I'd prefer "list" rather
than "sequence", since it's both shorter and more precise.   But I'd
capitalize that first word.

>   string arg of mapcar or mapconcat clobbered by its mapping function arg

+1 for "clobbered"


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 12:09:02 GMT) Full text and rfc822 format available.

Message #23 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Mattias Engdegård <mattiase <at> acm.org>,
 Eli Zaretskii <eliz <at> gnu.org>, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 12:07:50 +0000
"Stefan Monnier" <monnier <at> iro.umontreal.ca> writes:

>> Thanks, adding Stefan and Mattias to the discussion, in the hope that
>> they could review the patch and comment.
>
> The patch LGTM.

Thanks.  I'll wait for a decision to be made on which precise error to
throw in this extremely rare situation.

>
>>>  	  if (! CONSP (tail))
>>> -	    return i;
>>> +	    error ("list modified by mapping function");
>>
>> I think this error message is not explicit enough: it doesn't tell

FWIW, I started out by copying

    signal_error ("hash table test modifies table", obj);

since that's the only similar case I could think of right away, and I
thought this might not require additional thought.  Upon reflection, my
current suggestion is to define an error symbol vaguely equivalent to
Java's ConcurrentModificationException, but with a less puzzling name
(also, hopefully, this would happen much less often than that common
exception).

Having a single error symbol and documenting it once, for all cases,
with general advice for what might have gone wrong and how to fix it
(rewrite callbacks to be as pure as possible) doesn't seem harder than
arguing about this specific corner case, but does provide more benefits.

>>   string arg of mapcar or mapconcat clobbered by its mapping function arg

TBH, "string arg of ... mapconcat" sounds like it's referring to the
separator string, to me.

> +1 for "clobbered"

+1 from me, too.  "Concurrent modification exceptions" are puzzling to
programmers partly because the name doesn't make it clear who was at
fault.  "Clobbered" clearly indicates that the modifying code was
careless and is the part that needs to be fixed, and that we were lucky
to detect this was attempted or happened.

PUT_ERROR (Qdata_clobbered_error, "Lisp callback unexpectedly clobbered data");

on the assumption that xsignal2 is the right thing to use here, and that
the reference to the clobbering function would be the first and the data
clobbered would be the second argument.

No objections to reversing the string and arguments if that's clearer,
or any rewording that makes sense to you.

(There's a tiny white lie here: I don't think it's worth it to
distinguish cases in which the clobberer was successful from those in
which an illicit modification was prevented.  Even if it's a single
error for all cases, the combined probability isn't great enough to
warrant two error symbols).

As a tangent, I'd prefer providing format arguments in PUT_ERROR, on the
understanding that expanding them is optional: omitting the format
arguments would yield the current error message, and including them
would provide additional output in situations where the user desires
this.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 12:17:01 GMT) Full text and rfc822 format available.

Message #26 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: mattiase <at> acm.org, monnier <at> iro.umontreal.ca, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 14:16:03 +0200
> Date: Sun, 26 Jan 2025 12:07:50 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, Mattias Engdegård <mattiase <at> acm.org>, 75845 <at> debbugs.gnu.org
> 
> >>   string arg of mapcar or mapconcat clobbered by its mapping function arg
> 
> TBH, "string arg of ... mapconcat" sounds like it's referring to the
> separator string, to me.

If that is a concern, we could say

  First arg of mapcar etc. clobbered ...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sun, 26 Jan 2025 15:39:01 GMT) Full text and rfc822 format available.

Message #29 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: mattiase <at> acm.org, monnier <at> iro.umontreal.ca, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 17:37:45 +0200
> Date: Sun, 26 Jan 2025 12:07:50 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, Mattias Engdegård <mattiase <at> acm.org>, 75845 <at> debbugs.gnu.org
> 
> "Stefan Monnier" <monnier <at> iro.umontreal.ca> writes:
> 
> >> Thanks, adding Stefan and Mattias to the discussion, in the hope that
> >> they could review the patch and comment.
> >
> > The patch LGTM.
> 
> Thanks.  I'll wait for a decision to be made on which precise error to
> throw in this extremely rare situation.

I'd like also to hear from Mattias, if possible, before we install
this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sat, 08 Feb 2025 09:28:02 GMT) Full text and rfc822 format available.

Message #32 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: mattiase <at> acm.org
Cc: pipcet <at> protonmail.com, monnier <at> iro.umontreal.ca, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sat, 08 Feb 2025 11:27:40 +0200
Ping!  Mattias, any comments to thiws discussion and to the patch?

> Cc: mattiase <at> acm.org, monnier <at> iro.umontreal.ca, 75845 <at> debbugs.gnu.org
> Date: Sun, 26 Jan 2025 17:37:45 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > Date: Sun, 26 Jan 2025 12:07:50 +0000
> > From: Pip Cet <pipcet <at> protonmail.com>
> > Cc: Eli Zaretskii <eliz <at> gnu.org>, Mattias Engdegård <mattiase <at> acm.org>, 75845 <at> debbugs.gnu.org
> > 
> > "Stefan Monnier" <monnier <at> iro.umontreal.ca> writes:
> > 
> > >> Thanks, adding Stefan and Mattias to the discussion, in the hope that
> > >> they could review the patch and comment.
> > >
> > > The patch LGTM.
> > 
> > Thanks.  I'll wait for a decision to be made on which precise error to
> > throw in this extremely rare situation.
> 
> I'd like also to hear from Mattias, if possible, before we install
> this.
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Sat, 08 Feb 2025 11:06:02 GMT) Full text and rfc822 format available.

Message #35 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: pipcet <at> protonmail.com, monnier <at> iro.umontreal.ca, 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sat, 8 Feb 2025 12:04:56 +0100
8 feb. 2025 kl. 10.27 skrev Eli Zaretskii <eliz <at> gnu.org>:

>> I'd like also to hear from Mattias, if possible, before we install
>> this.

Sorry, I'm trying to catch up on my Emacs backlog after a hiatus. Will get back to this shortly.
Thank you for the reminder.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Thu, 20 Feb 2025 18:17:02 GMT) Full text and rfc822 format available.

Message #38 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Pip Cet <pipcet <at> protonmail.com>, monnier <at> iro.umontreal.ca,
 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Thu, 20 Feb 2025 19:16:42 +0100
>>>> Thanks, adding Stefan and Mattias to the discussion, in the hope that
>>>> they could review the patch and comment.

Finally I got some time to look at this, and the patch looks fine, largely.

Now that `mapcar1` always returns its first argument (leni), the return value can be eliminated.

The change in the list loop is cheap and fine.

The new tests and logic in the string loop are more expensive (5-10 % overhead, from a quick test) but given that strings are less common in calls to mapcar etc, it's probably not so bad. More significantly, these checks will go away entirely when we abolish size-changing string mutation -- mentioned elsewhere but there is no need for that issue to be resolved before this bug.

We should probably amend the documentation for mapcar & co with respect to the restrictions on mutating the input sequence. (Common Lisp has basically the same restrictions.) At least the manual would do well to include a short notice about this; we do mention it for `maphash`.

We could just contend ourselves with stating that the input sequence shouldn't be mutated and that's it, but if you really want to spell it out in detail (allowed to change the CARs but not the CDRs etc) then that's fine. Again, strings are messy here because it depends on internal representation, so maybe keep it simple.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Fri, 21 Feb 2025 12:36:03 GMT) Full text and rfc822 format available.

Message #41 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, monnier <at> iro.umontreal.ca,
 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Fri, 21 Feb 2025 12:35:10 +0000
Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:

>>>>> Thanks, adding Stefan and Mattias to the discussion, in the hope that
>>>>> they could review the patch and comment.
>
> Finally I got some time to look at this, and the patch looks fine, largely.

Thanks!  I'll probably apply something like it, but see below: eassert
is now officially used in places where Lisp code can trigger it, so
there's little point fixing eassert crashes in one place while new ones
are being deliberately added in others.

> Now that `mapcar1` always returns its first argument (leni), the return value can be eliminated.

Certainly, but this entire code is so strange that I don't really want
to touch it more than necessary, and this particular redundancy seems
potentially useful to me.

(If we really want to optimize it further, let's get rid of the strange
double loop in mapconcat first, which spreads the arguments then goes
back over the list in reverse order to fill in the separators).

> The change in the list loop is cheap and fine.
>
> The new tests and logic in the string loop are more expensive (5-10 %
> overhead, from a quick test) but given that strings are less common in
> calls to mapcar etc, it's probably not so bad. More significantly,

I maintain that using strings as a sequence argument to the map*
functions is so rare we should just turn them into a list or vector
instead and drop the special-casing code.

> these checks will go away entirely when we abolish size-changing
                                     ^^^^

I still think that's an "if".

> string mutation -- mentioned elsewhere but there is no need for that

I'm still not convinced abolishing string mutation is a good idea, but
if we do, we should not special-case "size-changing" mutations, because
how many bytes we use to encode each character is an implementation
detail that shouldn't be exposed to Lisp.

Indeed, the code becomes a lot simpler if we make all forms of string
mutation allocate new SDATA blocks while keeping the old ones alive, but
that needs GC changes.

> We should probably amend the documentation for mapcar & co with
> respect to the restrictions on mutating the input sequence. (Common
> Lisp has basically the same restrictions.) At least the manual would
> do well to include a short notice about this; we do mention it for
> `maphash`.

With maphash, this actually happens (it does, or did, in the nativecomp
code) by accident.  I'm not sure the same is true for sequences, where
"don't crash" is probably the best we can hope for.

That said, we've since introduced some new code which crashes because
Lisp code misbehaves (608113628c2750b09b925b17c96a29b2dc9abc37, which
easserts that a Lisp function returns an integer in the desired range),
so there appears to be no consensus that Lisp code shouldn't cause
eassert crashes at this point.  My idea remains that one should be able
to experiment with Lisp without running into eassert crashes (or
ordinary non-eassert-mediated ones), but if there's no consensus
supporting that, fixing mapcar while leaving readcharfun broken seems
pointless.

> We could just contend ourselves with stating that the input sequence
> shouldn't be mutated and that's it, but if you really want to spell it

I think that would be best.  If you need fine-grained control, just
don't use mapcar and open-code it instead.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75845; Package emacs. (Fri, 21 Feb 2025 15:12:06 GMT) Full text and rfc822 format available.

Message #44 received at 75845 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, monnier <at> iro.umontreal.ca,
 75845 <at> debbugs.gnu.org
Subject: Re: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Fri, 21 Feb 2025 16:10:45 +0100
21 feb. 2025 kl. 13.35 skrev Pip Cet <pipcet <at> protonmail.com>:

> (If we really want to optimize it further, let's get rid of the strange
> double loop in mapconcat first, which spreads the arguments then goes
> back over the list in reverse order to fill in the separators).

I don't think it's really a cache killer but will give it a look.

> I maintain that using strings as a sequence argument to the map*
> functions is so rare we should just turn them into a list or vector
> instead and drop the special-casing code.

I'll think about it.

> I'm still not convinced abolishing string mutation is a good idea,

Let's try to solve this bug in isolation. We're almost done!

> With maphash, this actually happens (it does, or did, in the nativecomp
> code) by accident.  I'm not sure the same is true for sequences, where
> "don't crash" is probably the best we can hope for.

Lisp shouldn't crash but it's impractical to be absolute about it. The programming interface is so rich that it's almost impossible to prevent crashes altogether, and performance considerations further reduce what checks we want to perform. For instance, the user can easily crash Emacs by running bad bytecode. But users expect more about basic Lisp primitives like mapcar & co and since it seems that a reasonable effort can prevent it, and it's not (too) expensive, let's try.

Feel free to commit your patch, or post it here if you want us to poke fun at it beforehand.





This bug report was last modified 210 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.