GNU bug report logs - #70988
(read FUNCTION) uses Latin-1 [PATCH]

Previous Next

Package: emacs;

Reported by: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>

Date: Thu, 16 May 2024 18:14:01 UTC

Severity: normal

Tags: patch

Done: Mattias EngdegÄrd <mattias.engdegard <at> gmail.com>

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: 70988 <at> debbugs.gnu.org, mattias.engdegard <at> gmail.com, stefankangas <at> gmail.com, monnier <at> iro.umontreal.ca
Subject: bug#70988: (read FUNCTION) uses Latin-1 [PATCH]
Date: Thu, 13 Feb 2025 08:00:57 +0200
> Date: Wed, 12 Feb 2025 20:27:58 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: stefankangas <at> gmail.com, mattias.engdegard <at> gmail.com, 70988 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> --- a/src/lread.c
> >> +++ b/src/lread.c
> >> @@ -398,9 +398,12 @@ readchar (Lisp_Object readcharfun, bool *multibyte)
> >>
> >>    tem = call0 (readcharfun);
> >>
> >> -  if (NILP (tem))
> >> +  if (!CHARACTERP (tem))
> >>      return -1;
> >> -  return XFIXNUM (tem);
> >> +  if (multibyte && !ASCII_CHAR_P (XFIXNAT (tem)))
> >> +    *multibyte = true;
> >> +
> >> +  return XFIXNAT (tem);
> >
> > AFAIU, the proposed patch was just a bugfix, whereas the above also
> > changes behavior in backward-incompatible ways.
> 
> The other way around, I think: the first proposed patch changed the
> behavior of readchar to always set the multibyte flag when a function
> was used, resulting in the creation of symbols whose ASCII names are
> multibyte strings.  The previous behavior was never to set the multibyte
> flag, which was correct for ASCII strings but not multibyte ones.
> 
> This patch retains the previous behavior for ASCII symbols, but sets the
> multibyte flag for non-ASCII symbols, which seems the best we can do if
> we're given a simple function.

I'm talking about the CHARACTERP test (why not FIXNUMP?), and the
addition of ASCII_CHAR_P test (why would we want an ASCII character
to never be considered multibyte?).

> If we want to change symbol names to always be multibyte strings, we can
> do that, but then we probably want to do that or all streams.

I don't understand why you are talking about symbols: AFAIU this code
is used in many other cases as well.  But even for symbols: why change
the current behavior of making their names multibyte?

> It also fixes yet another XFIXNUM crash, but those (there are more in
> lread.c, it seems) should be fixed independently.

I'm okay with adding a FIXNUMP test (which happens in the debugging
builds anyway, so any violations probably never happen), but using
CHARACTERP changes behavior.

> However, it does give us the ability to extend the API so
> readcharfun could return a single character string, unibyte or
> multibyte, to be handled appropriately.

This is also a change in behavior.




This bug report was last modified 10 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.