#58168 - string-lessp glitches and inconsistencies

GNU bug report logs - #58168
string-lessp glitches and inconsistencies

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Thu, 29 Sep 2022 16:25:01 UTC

Severity: normal

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Mattias Engdegård <mattias.engdegard <at> gmail.com> Cc: 58168 <at> debbugs.gnu.org, larsi <at> gnus.org Subject: bug#58168: string-lessp glitches and inconsistencies Date: Tue, 04 Oct 2022 19:24:52 +0300

> From: Mattias Engdegård <mattias.engdegard <at> gmail.com> > Date: Tue, 4 Oct 2022 16:44:17 +0200 > Cc: larsi <at> gnus.org, > 58168 <at> debbugs.gnu.org > > 4 okt. 2022 kl. 13.37 skrev Eli Zaretskii <eliz <at> gnu.org>: > > > First I needed to fix fallout from making STRING_CHAR intolerant of > > unibyte text, because redisplay-testsuite caused assertion violations > > in string_char_and_length. > > Good catch! Just to satisfy my curiosity: > > > error ("Invalid format operation %%%c", > > - STRING_CHAR ((unsigned char *) format - 1)); > > + multibyte_format > > + ? STRING_CHAR ((unsigned char *) format - 1) > > + : *((unsigned char *) format - 1)); > > This treats unibyte format strings as if they were Latin-1 for the purpose of the error message. No, it doesn't. It shows the problematic characters as raw bytes, as in "%\200" (where \200 is a single character). If you see something different, please show the recipe. > Not very important, of course, but maybe there should be a UNIBYTE_TO_CHAR in the alternative branch? No, that would show the multibyte codepoint, and will confuse users, because the result would look very different from the problematic format spec in this case. > > (Doesn't it abort for you? or do you not > > build Emacs with --enable-checking?) > > Oh I certainly do that occasionally, but it's mostly when I've changed something at the C level or have reason to believe that something is broken there. Please _always_ test changes related to encoding/decoding and character representation conversions in a --enable-checking build. We should have discovered these bugs in time for Emacs 28.2 to be devoid of them. > > I could understand why you'd want to _add_ the larger values, but why > > replace? > > Because it seemed pretty clear that the old code intended to use #x3ffffc for testing display of raw bytes but a typo turned it into #x3fffc instead which isn't a raw byte but a multibyte character. That it's an easy mistake to make (done so several times myself). Who said anything about #x3fffc? The original code had #xfc, the unibyte code for #x3ffffc. I don't see why we shouldn't test both. In the other problematic hunk you replaced \777774 with \374 -- why? > I've now pushed the patch; the code can be improved further if necessary. I've reverted it. Please stop this madness of rushing into installing changes that are still under controversy.

This bug report was last modified 2 years and 324 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #58168 string-lessp glitches and inconsistencies

GNU bug report logs - #58168
string-lessp glitches and inconsistencies