GNU bug report logs - #70988
(read FUNCTION) uses Latin-1 [PATCH]

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Thu, 16 May 2024 18:14:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Full log


Message #32 received at 70988 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Stefan Kangas <stefankangas <at> gmail.com>
Cc: 70988 <at> debbugs.gnu.org,
 Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, monnier <at> iro.umontreal.ca
Subject: Re: bug#70988: (read FUNCTION) uses Latin-1 [PATCH]
Date: Wed, 12 Feb 2025 16:42:43 +0000
"Stefan Kangas" <stefankangas <at> gmail.com> writes:

> Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:
>
>> After looking further into the Lisp reader/printer I found two more
>> silent Latin-1 assumptions. In all three cases, I firmly believe the
>> following to be true:
>>
>> * The behaviour is not intended but just code accidents.
>> * They should hardly affect any user code at all.
>> * They are nevertheless clear bugs which should be fixed.
>>
>> Further on this will have to wait until after Emacs 30 has been branched to avoid delaying that more important task.
>
> FWIW, the proposed patch looks like a bug fix to me as well, so I think
> we should install it.

I think we should think about whether we want to force multibyte to true
for all functions, even those never returning non-ASCII chars.  Also,
the code appears to use XFIXNUM on a Lisp_Object that might not be one.

IIUC, the difference is that all-ASCII strings would be unibyte strings
in some circumstances.

The alternative patch would look something like this:

From bbc65c9be7ccebf034f4d10f018a076ef1e8a4e9 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH] Auto-detect multibyteness of readchar funs (bug#70988)

* src/lread.c (readchar): Set *MULTIBYTE if we detect a multibyte
character.  Return -1 for non-characters rather than crashing.
---
 src/lread.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/lread.c b/src/lread.c
index 6af95873bb8..c18c1be3cf5 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -398,9 +398,12 @@ readchar (Lisp_Object readcharfun, bool *multibyte)
 
   tem = call0 (readcharfun);
 
-  if (NILP (tem))
+  if (!CHARACTERP (tem))
     return -1;
-  return XFIXNUM (tem);
+  if (multibyte && !ASCII_CHAR_P (XFIXNAT (tem)))
+    *multibyte = true;
+
+  return XFIXNAT (tem);
 
  read_multibyte:
   if (unread_char >= 0)
-- 
2.48.1






This bug report was last modified 10 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.