GNU bug report logs -
#16048
24.3.50; String compare surprise
Previous Next
Reported by: michael.albinus <at> gmx.de
Date: Wed, 4 Dec 2013 11:45:03 UTC
Severity: normal
Found in version 24.3.50
Done: Michael Albinus <michael.albinus <at> gmx.de>
Bug is archived. No further changes may be made.
Full log
Message #19 received at 16048 <at> debbugs.gnu.org (full text, mbox):
> From: Josh <josh <at> foxtail.org>
> Date: Wed, 4 Dec 2013 06:00:46 -0800
> Cc: Michael Albinus <michael.albinus <at> gmx.de>, 16048 <at> debbugs.gnu.org
>
> On Wed, Dec 4, 2013 at 5:07 AM, Andreas Schwab <schwab <at> linux-m68k.org>wrote:
>
> > michael.albinus <at> gmx.de writes:
> >
> > > The following form evals to nil:
> > >
> > > (string-equal "\377" "ΓΏ")
> >
> > "\377" is a unibyte string. When converted to multibyte it yields
> > "\x3fffff".
>
>
> At least as of 24.3, the manual[0] suggests that such a conversion
> should not occur in this case:
And it doesn't occur, indeed:
(multibyte-string-p "\377")
=> nil
> You can also use hexadecimal escape sequences (`\xN') and octal
> escape sequences (`\N') in string constants. *But beware:* If a
> string constant contains hexadecimal or octal escape sequences,
> and these escape sequences all specify unibyte characters (i.e.,
> less than 256), and there are no other literal non-ASCII
> characters or Unicode-style escape sequences in the string, then
> Emacs automatically assumes that it is a unibyte string. That is
> to say, it assumes that all non-ASCII characters occurring in the
> string are 8-bit raw bytes.
>
> [0] (info "(elisp) Non-ASCII in Strings")
Best citation contest? you're on!
-- Function: string= string1 string2
This function returns `t' if the characters of the two strings
match exactly. Symbols are also allowed as arguments, in which
case the symbol names are used. Case is always significant,
regardless of `case-fold-search'.
[...]
For technical reasons, a unibyte and a multibyte string are
`equal' if and only if they contain the same sequence of character
codes and all these codes are either in the range 0 through 127
(ASCII) or 160 through 255 (`eight-bit-graphic'). However, when a
unibyte string is converted to a multibyte string, all characters
with codes in the range 160 through 255 are converted to
characters with higher codes, whereas ASCII characters remain
unchanged. Thus, a unibyte string and its conversion to multibyte
are only `equal' if the string is all ASCII.
Note the last sentence.
This bug report was last modified 11 years and 172 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.