#70988 - (read FUNCTION) uses Latin-1 [PATCH]

GNU bug report logs - #70988
(read FUNCTION) uses Latin-1 [PATCH]

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Thu, 16 May 2024 18:14:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Pip Cet <pipcet <at> protonmail.com> Cc: 70988 <at> debbugs.gnu.org, mattias.engdegard <at> gmail.com, stefankangas <at> gmail.com, monnier <at> iro.umontreal.ca Subject: bug#70988: (read FUNCTION) uses Latin-1 [PATCH] Date: Thu, 13 Feb 2025 08:00:57 +0200

> Date: Wed, 12 Feb 2025 20:27:58 +0000 > From: Pip Cet <pipcet <at> protonmail.com> > Cc: stefankangas <at> gmail.com, mattias.engdegard <at> gmail.com, 70988 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca > > "Eli Zaretskii" <eliz <at> gnu.org> writes: > > >> --- a/src/lread.c > >> +++ b/src/lread.c > >> @@ -398,9 +398,12 @@ readchar (Lisp_Object readcharfun, bool *multibyte) > >> > >> tem = call0 (readcharfun); > >> > >> - if (NILP (tem)) > >> + if (!CHARACTERP (tem)) > >> return -1; > >> - return XFIXNUM (tem); > >> + if (multibyte && !ASCII_CHAR_P (XFIXNAT (tem))) > >> + *multibyte = true; > >> + > >> + return XFIXNAT (tem); > > > > AFAIU, the proposed patch was just a bugfix, whereas the above also > > changes behavior in backward-incompatible ways. > > The other way around, I think: the first proposed patch changed the > behavior of readchar to always set the multibyte flag when a function > was used, resulting in the creation of symbols whose ASCII names are > multibyte strings. The previous behavior was never to set the multibyte > flag, which was correct for ASCII strings but not multibyte ones. > > This patch retains the previous behavior for ASCII symbols, but sets the > multibyte flag for non-ASCII symbols, which seems the best we can do if > we're given a simple function. I'm talking about the CHARACTERP test (why not FIXNUMP?), and the addition of ASCII_CHAR_P test (why would we want an ASCII character to never be considered multibyte?). > If we want to change symbol names to always be multibyte strings, we can > do that, but then we probably want to do that or all streams. I don't understand why you are talking about symbols: AFAIU this code is used in many other cases as well. But even for symbols: why change the current behavior of making their names multibyte? > It also fixes yet another XFIXNUM crash, but those (there are more in > lread.c, it seems) should be fixed independently. I'm okay with adding a FIXNUMP test (which happens in the debugging builds anyway, so any violations probably never happen), but using CHARACTERP changes behavior. > However, it does give us the ability to extend the API so > readcharfun could return a single character string, unibyte or > multibyte, to be handled appropriately. This is also a change in behavior.

This bug report was last modified 36 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #70988 (read FUNCTION) uses Latin-1 [PATCH]

GNU bug report logs - #70988
(read FUNCTION) uses Latin-1 [PATCH]