GNU bug report logs -
#103
23.0.60; Segmentation fault loading auto-lang.el
Previous Next
Reported by: intrigeri <intrigeri <at> boum.org>
Date: Sun, 30 Mar 2008 22:15:12 UTC
Severity: normal
Done: Chong Yidong <cyd <at> stupidchicken.com>
Bug is archived. No further changes may be made.
Full log
Message #30 received at 103 <at> emacsbugs.donarmstrong.com (full text, mbox):
>>> (let ((str (string-as-unibyte "ä")))
>>> (string-match (char-to-string (string-to-char str)) str))
>>
>>> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that
>>> this screws up the use of all-completions in regexp-opt-group.
>>
>>> Anyone have any idea what's going on here?
>>
>> (string-as-unibyte "ä") => "\303\244"
>> (string-to-char "\303\244") => 195 (because ?\303 == 195)
>> (char-to-string 195) => "Ã" (because 195==0xC3 U+00C3=='Ã')
>> (string-match "Ã" "ä") => nil (obvious)
>>
>> Any Lisp program that depends on the result of
>> string-as-unibyte (thus Emacs' internal character
>> representation) won't work in Emacs 23.
Notice that the problem is unrelated to string-as-unibyte:
(string-match (char-to-string (string-to-char str)) str)
this should intuitively always return 0. Of course, once you replace
`char-to-string' with just `string', you may be reminded that Emacs-23
introduced `unibyte-string', which leads you to the key, if `str' is
unibyte, you need to do
(string-match (unibyte-string (string-to-char str)) str)
In Emacs-22, `string' used a heuristic to decide whether to build
a unibyte or multibyte string, and more importantly, the character
representing byte code 209 had code 209, whereas in Emacs-23, we have
the strange situation that byte 209 is character 4194257.
So an integer <256 needs to be accompagnied with some contextual info
that says whether it represents a char or a byte, otherwise you get
ambiguity that lead to bugs. And string-to-char returns either a byte
or a char depending on whether the string was unibyte or multibyte.
> I see. However, maybe the following change to regexp-opt-group in
> regexp-opt.el would make things a little more predictable. What do you
> think?
Yes, it looks like a good fix. Maybe "-no-properties" would be even
better.
Stefan
This bug report was last modified 17 years and 45 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.