GNU bug report logs - #70988
(read FUNCTION) uses Latin-1 [PATCH]

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Thu, 16 May 2024 18:14:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Full log


Message #62 received at 70988 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 70988 <at> debbugs.gnu.org, Pip Cet <pipcet <at> protonmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#70988: (read FUNCTION) uses Latin-1 [PATCH]
Date: Sat, 5 Jul 2025 13:27:26 +0200
[Message part 1 (text/plain, inline)]
Sorry about the delay. The bugs I was talking about earlier are:

1. (read FUNCTION) assumes latin-1, as discussed earlier in this bug.

   The code in readchar() just forgets to set the multibyte flag for function sources.

2. (read UNIBYTE-STRING) assumes latin-1:

   (read "\"a\xff\"") -> "aÿ"

   For buffer and marker sources, readchar() does

	  if (! ASCII_CHAR_P (c))
	    c = BYTE8_TO_CHAR (c);

   but this is missing for string sources.

3. (print UNIBYTE-SYM) assumes latin-1;

   (prin1-to-string (make-symbol "a\xff")) -> "aÿ"

   Here the reason is that print_object() calls `fetch_string_char_advance` instead of `fetch_string_char_as_multibyte_advance`.

The above three bugs are clear omissions and were never intended behaviour; a lot happened in the switch to multibyte and bugs were bound to appear in the cracks. There should be no downside from fixing them.

We may want to ask ourselves whether it's reasonable that read sources have a multibyteness, which affects how symbols are read but not string literals. I don't think it should affect either. However, I'm leaving this concern out of the immediate discussion.

I also have a patch that improves reader performance while cleaning up some parts of the code, but it can be applied before or after fixing the three bugs above.

[0001-Read-characters-from-functions-as-multibyte.patch (application/octet-stream, attachment)]
[0002-Print-non-ASCII-chars-in-unibyte-symbols-as-raw-byte.patch (application/octet-stream, attachment)]
[0003-Read-non-ASCII-chars-from-unibyte-string-sources-as-.patch (application/octet-stream, attachment)]
[Message part 5 (text/plain, inline)]


This bug report was last modified 10 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.