GNU bug report logs - #75845
mapconcat crashes for unreasonable self-modifications

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> protonmail.com>

Date: Sun, 26 Jan 2025 02:03:02 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Mattias EngdegÄrd <mattiase <at> acm.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>, 75845 <at> debbugs.gnu.org
Subject: bug#75845: mapconcat crashes for unreasonable self-modifications
Date: Sun, 26 Jan 2025 09:48:48 +0000
R"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Sun, 26 Jan 2025 02:37:37 +0000
>> From:  Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> Pip Cet <pipcet <at> protonmail.com> writes:
>>
>> > If the mapping function supplied to mapconcat modifies the sequence
>> > being mapped over, Emacs can crash.
>> >
>> > This seems easily fixable, but I wanted to file a bug to make sure we
>> > get it right: our current highly-optimized implementation of mapconcat
>> > will fail in many cases in which a naive implementation of mapconcat (in
>> > Lisp, say) would succeed.  Some of those limitations are documented (the
>> > argument must be a sequence when the function is called), some aren't
>> > (the sequence must not become shorter or longer).
>> >
>> > My suggestion is to make the ordinary no-modification case work as fast
>> > as is possible under the constraint that extraordinary
>> > self-modifications should not result in crashes.  They can and will
>> > result in unpredictable behavior, though.
>> >
>> > These crash:
>> >
>> > (let ((l (make-list 1000 nil))) (mapconcat (lambda (x) (ntake 1 l)) l))
>> >
>> > (let ((s (string ?a 1000))) (mapconcat (lambda (x) (aset s 0 1000) (string x)) s))
>> >
>> > Patch will follow once this has a bug number.
>>
>> This patch does the minimum work necessary; while the previous code
                       ^^^^^^^


I don't think this situation is expected to be common enough to warrant
a more specific error message, TBH, particularly not if it then fails to
apply to all possible cases.

My suggestion is to merge the two error messages to save some space,
apply the patch, and make a mental note to include both
mapconcat/mapcar1 and the error handling situation for further review
when we have time.

The rest of this email mostly consists of notes for that (future)
discussion, except for this, which I've reordered:

>> +      if (i < leni)
>> +	error ("string modified by mapping function");
>
> Why do we need this test?

mapconcat assumed the wrong thing, I think other future callers might,
as well.  It's not obvious that mapcar1's return value can be different
from its first argument, and given the semi-optimized state of this
function I think future work is more likely.

> Does it prevent us from crashing in some cases?

Yes.  Silly programmers like myself (see first suggested patch in
another bug) may attempt to use out-of-bounds values in the array, which
are uninitialized and would cause crashes.

Also, a minor point in performance-sensitive code is that conditional
branches are bad, but not if they call error().  Such branches will be
optimized by GCC to be predicted not to call error in a static way,
independently of the CPU's limited branch prediction cache.

Here's the rest:

>> attempted to return the "correct" length of a list which was modified,
>> crashing in an eassert in mapconcat, the new code errors instead.  As
>> error is marked with ATTRIBUTE_COLD, this should result in better code.
>>
>> The string case is slightly trickier: we don't keep around an SDATA
>> pointer, which is good, but we do remember byte and charpos index
>> values, and those might get out of sync.  So verify we're looking at a
>> CHAR_HEAD when we need to, which is for multibyte strings.
>>
>> mapconcat itself is also fixed not to assert but to use whatever mapcar1
>> returns.
>
> Thanks, adding Stefan and Mattias to the discussion, in the hope that
> they could review the patch and comment.

Thanks!

>>  	  if (! CONSP (tail))
>> -	    return i;
>> +	    error ("list modified by mapping function");
>
> I think this error message is not explicit enough: it doesn't tell
> which list was modified by what function.  mapcar and mapconcat are
> many times invoked by code that has little to do with the level which
> could make sense to the user, so we should try to provide as much
> contextual information as possible.  Perhaps something like
>
>   sequence arg to `mapcar' or `mapconcat' modified by its mapping function arg

I like the idea of using the same message for both, reducing the memory
requirement for the unused error string.  (I may be unaware of a good
error function to use in this case; I ended up not providing additional
arguments because it seems unlikely to me that this problem, if really
accidental, is something the programmer can fix without a Lisp
backtrace, which indicates the situation more clearly).

However, all functions using mapcar1 are affected by the fix, if not the
current bug: mapcan and mapc are as well.  Making a message more
specific is bad if that makes it incorrect, and mapc is the most likely
candidate for modifying its sequence argument.  (I've never seen (mapc
function "string") though, that I can recall).

>> +	  if (STRING_MULTIBYTE (seq) && !CHAR_HEAD_P (SREF (seq, i_byte)))
>> +	    error ("string modified by mapping function");
>
> AFAIU, this doesn't check whether the string was modified, it checks
> for modifications that invalidate the relation between i and i_byte,
> right?  So I would suggest

Not even all of those, really, only if they happen to leave us looking
at the middle of a multibyte sequence.  Again, minimal effort.

> Because if we want to disallow _any_ modifications of the string, we'd
> need to use a different test (e.g., unibyte ASCII strings could also
> be modified by the mapping function).

I don't think we do want that.

However, my minority opinion on the question of read-only objects is
just that, so I'm not a good person to ask if you want to reintroduce
such objects after most of them are removed when scratch/no-purespace is
merged.

> If that replaces the below:

>> @@ -3473,8 +3478,7 @@ DEFUN ("mapconcat", Fmapconcat, Smapconcat, 2, 3, 0,
>>  	  goto concat;
>>  	}
>>      }
>> -  ptrdiff_t nmapped = mapcar1 (leni, args, function, sequence);
>> -  eassert (nmapped == leni);
>> +  leni = mapcar1 (leni, args, function, sequence);
>
> then can we have the test as eassert?

I see no reason to set this trap for the next person who modifies
mapcar1.  If they see the return value is checked by some callers, they
might miss the one caller which crashes.  If they remove the return
value, they might miss that the eassert, too, needs to be removed,
making their job harder.

If we want that eassert (why?), it belongs in mapcar1, not Fmapconcat.

mapconcat in general seems to be semi-optimized to me, to be honest, and
it makes the code very long and (apparently) bug-prone.  Until we have
better structures for growing strings, we could do worse than to call
Fappend on the sequence argument and handle only the (simple) list case.

If there are significant users of the vector case, we could add that,
too, but strings and bool vectors add many lines of unnecessary code.

(Bool vectors are obsolete, I'm not aware of any new users since I
suggested obsoleting and removing them long ago.  IMHO, that remains the
right thing to do, but no new arguments for removing them have appeared,
either.  I'd do the change locally, but can't as bool vectors complicate
alloc.c in so many places (which is why I want to remove them in the
first place) that this would introduce too many merge conflicts.)

Again, it's unclear to me which cases mapconcat has chosen to optimize,
and why.  I'm not proposing to optimize it for other cases at this point
because I don't know what the criterion is, but I think we went
overboard there supporting weird sequences instead of making the common
case fast and reducing all other cases to it.

I understand we can't go with the 7-line Lisp version of it, but maybe a
compromise can be found.

Pip





This bug report was last modified 210 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.