GNU bug report logs -
#76517
31.0.50; feature/igc 6ff509af3d31 crash on Wayland KDE, (with -g3
Previous Next
Reported by: Eval Exec <execvy <at> gmail.com>
Date: Mon, 24 Feb 2025 02:28:02 UTC
Severity: normal
Found in version 31.0.50
Done: Pip Cet <pipcet <at> protonmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
"Eli Zaretskii" <eliz <at> gnu.org> writes:
>> Cc: 76517 <at> debbugs.gnu.org
>> Date: Mon, 24 Feb 2025 15:49:38 +0000
>> From: Pip Cet via "Bug reports for GNU Emacs,
>> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> "Eval Exec" <execvy <at> gmail.com> writes:
>>
>> > Hello,
>> > I'm helping to test feature/igc branch
>>
>> Thanks for the report!
>>
>> At first glance, the problem doesn't seem to be specific to feature/igc.
>>
>> > #16 0x00000000004933d8 in c_string_width (nbytes=<synthetic pointer>,
>> > nchars=<synthetic pointer>, precision=<optimized out>, len=69,
>> > str=0x7f1ec16860fa "\274\214我中间使用 充电宝给电脑冲了一次电。还行。。。") at
>>
>> This string starts with an incomplete character sequence.
>
> The evaluation of the header-line-format shows the complete string.
The relevant part in the screenshot is "%,我中间使用"
>> As the screenshot at https://imgur.com/a/tON6P7w (why a screenshot?)
>> shows, the last character before that is "%", followed by what looks
>> like ",", a fullwidth comma.
>>
>> It seems the "%" was interpreted as introducing a mode line escape,
>> which used the first byte of the three-byte encoding used for the
>> fullwidth comma. The remaining bytes were then interpreted as the
>> beginning of a multi-byte character, which ended up out of range and
>> accessing an element of the display_table_ chartab which wasn't defined.
>>
>> So I guess our mode line escapes need to be fixed for multibyte
>> characters, and hopefully no further action is necessary (you might also
>> want to consider not making mode line escapes part of your header
>> lines).
>
> I don't see any "%", but are you saying that some UTF-8 byte sequence
Look at the screenshot.
> of a non-ASCII character that is not the character '%' itself could
> have the '%' byte as part of it? I thought that was impossible,
No. I'm saying that display_mode_element scans for a '%', finds it,
takes the next byte, which is the first byte of the fullwidth comma,,
passes it to decode_mode_spec, then leaves offset pointing to the second
byte of the multi-byte sequence following the %, and attempts to
continue printing the modeline from that offset, in the middle of a
multi-byte sequence.
The multi-byte sequence decodes to an out-of-range character (in my
case, c = 0xc427df80), and char_table_ref makes no attempt to verify the
character is in range; CHARTAB_IDX doesn't either, so this code:
#define CHARTAB_IDX(c, depth, min_char) \
(((c) - (min_char)) >> chartab_bits[(depth)])
{
val = tbl->contents[CHARTAB_IDX (c, 0, 0)];
if (SUB_CHAR_TABLE_P (val))
val = sub_char_table_ref (val, c, UNIPROP_TABLE_P (table));
}
just accesses random memory that isn't anywhere near the char table's
actual contents.
> guaranteed by the way UTF-8 sequences are produced. AFAIK, ASCII
> bytes can only happen as themselves in UTF-8 encoding. So when we see
> '%', it cannot be anything but the ASCII chyaracter '%'.
It's the next character that matters, the fullwidth comma after the '%'.
Something like this should help:
diff --git a/src/xdisp.c b/src/xdisp.c
index 577d5b1b401..4ee47eea818 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -27933,6 +27933,12 @@ display_mode_element (struct it *it, int depth, int field_width, int precision,
while ((c = SREF (elt, offset++)) >= '0' && c <= '9')
field = field * 10 + c - '0';
+ if (c > 127)
+ {
+ offset--;
+ continue;
+ }
+
/* Don't pad beyond the total padding allowed. */
if (field_width - n > 0 && field > field_width - n)
field = field_width - n;
Pip
This bug report was last modified 131 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.