#16048 - 24.3.50; String compare surprise

GNU bug report logs - #16048
24.3.50; String compare surprise

Package: emacs;

Reported by: michael.albinus <at> gmx.de

Date: Wed, 4 Dec 2013 11:45:03 UTC

Severity: normal

Found in version 24.3.50

Done: Michael Albinus <michael.albinus <at> gmx.de>

Bug is archived. No further changes may be made.

Message #28 received at 16048 <at> debbugs.gnu.org (full text, mbox):

From: Josh <josh <at> foxtail.org> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 16048 <at> debbugs.gnu.org, Michael Albinus <michael.albinus <at> gmx.de>, Andreas Schwab <schwab <at> linux-m68k.org> Subject: Re: bug#16048: 24.3.50; String compare surprise Date: Wed, 4 Dec 2013 12:13:42 -0800

[Message part 1 (text/plain, inline)]

On Wed, Dec 4, 2013 at 9:29 AM, Eli Zaretskii <eliz <at> gnu.org> wrote: > > From: Josh <josh <at> foxtail.org> > > Date: Wed, 4 Dec 2013 06:00:46 -0800 > > Cc: Michael Albinus <michael.albinus <at> gmx.de>, 16048 <at> debbugs.gnu.org > > On Wed, Dec 4, 2013 at 5:07 AM, Andreas Schwab <schwab <at> linux-m68k.org >wrote: > > > michael.albinus <at> gmx.de writes: > > > > > > > The following form evals to nil: > > > > > > > > (string-equal "\377" "ÿ") > > > > > > "\377" is a unibyte string. When converted to multibyte it yields > > > "\x3fffff". > > > > > > At least as of 24.3, the manual[0] suggests that such a conversion > > should not occur in this case: > And it doesn't occur, indeed: > > (multibyte-string-p "\377") > > => nil > > > You can also use hexadecimal escape sequences (`\xN') and octal > > escape sequences (`\N') in string constants. *But beware:* If a > > string constant contains hexadecimal or octal escape sequences, > > and these escape sequences all specify unibyte characters (i.e., > > less than 256), and there are no other literal non-ASCII > > characters or Unicode-style escape sequences in the string, then > > Emacs automatically assumes that it is a unibyte string. That is > > to say, it assumes that all non-ASCII characters occurring in the > > string are 8-bit raw bytes. > > > > [0] (info "(elisp) Non-ASCII in Strings") > Best citation contest? you're on! No, thanks. I haven't entered such contests in many years. > -- Function: string= string1 string2 > This function returns `t' if the characters of the two strings > match exactly. Symbols are also allowed as arguments, in which > case the symbol names are used. Case is always significant, > regardless of `case-fold-search'. > > [...] > > For technical reasons, a unibyte and a multibyte string are > `equal' if and only if they contain the same sequence of character > codes and all these codes are either in the range 0 through 127 > (ASCII) or 160 through 255 (`eight-bit-graphic'). However, when a > unibyte string is converted to a multibyte string, all characters > with codes in the range 160 through 255 are converted to > characters with higher codes, whereas ASCII characters remain > unchanged. Thus, a unibyte string and its conversion to multibyte > are only `equal' if the string is all ASCII. > > Note the last sentence. Yes, I must have misunderstood Andreas' meaning; I believed he was suggesting that the two strings compared differently due to "\377" having been converted to a multibyte string and therefore miscomparing with the unibyte (or so I thought) string "ÿ". I see now that I had it exactly backwards. Thanks for setting me straight.

[Message part 2 (text/html, inline)]

This bug report was last modified 11 years and 221 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16048 24.3.50; String compare surprise

GNU bug report logs - #16048
24.3.50; String compare surprise