GNU bug report logs -
#70988
(read FUNCTION) uses Latin-1 [PATCH]
Previous Next
Full log
Message #44 received at 70988 <at> debbugs.gnu.org (full text, mbox):
"Eli Zaretskii" <eliz <at> gnu.org> writes:
>> Date: Wed, 12 Feb 2025 20:27:58 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: stefankangas <at> gmail.com, mattias.engdegard <at> gmail.com, 70988 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> >> --- a/src/lread.c
>> >> +++ b/src/lread.c
>> >> @@ -398,9 +398,12 @@ readchar (Lisp_Object readcharfun, bool *multibyte)
>> >>
>> >> tem = call0 (readcharfun);
>> >>
>> >> - if (NILP (tem))
>> >> + if (!CHARACTERP (tem))
>> >> return -1;
>> >> - return XFIXNUM (tem);
>> >> + if (multibyte && !ASCII_CHAR_P (XFIXNAT (tem)))
>> >> + *multibyte = true;
>> >> +
>> >> + return XFIXNAT (tem);
>> >
>> > AFAIU, the proposed patch was just a bugfix, whereas the above also
>> > changes behavior in backward-incompatible ways.
>>
>> The other way around, I think: the first proposed patch changed the
>> behavior of readchar to always set the multibyte flag when a function
>> was used, resulting in the creation of symbols whose ASCII names are
>> multibyte strings. The previous behavior was never to set the multibyte
>> flag, which was correct for ASCII strings but not multibyte ones.
>>
>> This patch retains the previous behavior for ASCII symbols, but sets the
>> multibyte flag for non-ASCII symbols, which seems the best we can do if
>> we're given a simple function.
>
> I'm talking about the CHARACTERP test (why not FIXNUMP?), and the
The function is supposed to return a character, not just any fixnum.
> addition of ASCII_CHAR_P test (why would we want an ASCII character
> to never be considered multibyte?).
It's the other way around, again: if there's a non-ASCII character, we
treat the stream as multibyte; if there are ONLY ASCII characters, we
treat it as unibyte.
>> If we want to change symbol names to always be multibyte strings, we can
>> do that, but then we probably want to do that or all streams.
>
> I don't understand why you are talking about symbols: AFAIU this code
> is used in many other cases as well. But even for symbols: why change
> the current behavior of making their names multibyte?
The current behavior is to make their names unibyte! The current
behavior is *changed* by the first patch, and *retained* by my patch.
>> It also fixes yet another XFIXNUM crash, but those (there are more in
>> lread.c, it seems) should be fixed independently.
>
> I'm okay with adding a FIXNUMP test (which happens in the debugging
> builds anyway, so any violations probably never happen), but using
> CHARACTERP changes behavior.
If you count "avoids further crashes" as "changes behavior", yes.
readcharfun is supposed to return a character or -1. Some callers
assume the return value is a valid character, and will crash otherwise.
I haven't checked all of them because there are many.
>> However, it does give us the ability to extend the API so
>> readcharfun could return a single character string, unibyte or
>> multibyte, to be handled appropriately.
>
> This is also a change in behavior.
Yes, of course, which is why it's a separate proposal and not part of
the patch.
Pip
This bug report was last modified 10 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.