GNU bug report logs -
#103
23.0.60; Segmentation fault loading auto-lang.el
Previous Next
Reported by: intrigeri <intrigeri <at> boum.org>
Date: Sun, 30 Mar 2008 22:15:12 UTC
Severity: normal
Done: Chong Yidong <cyd <at> stupidchicken.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 103 in the body.
You can then email your comments to 103 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#103
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
intrigeri <intrigeri <at> boum.org>
:
New bug report received and forwarded. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):
Hello,
First glance :
- download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/
- run emacs -Q
- M-x load-file
- choose file ~/.elisp/auto-lang.el
=> Emacs segfaults (same result with emacs -Q -nw)
Trying harder :
- download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/
- run emacs -Q
- C-x C-f ~/.elisp/auto-lang.el
- select region from the beginning of the file to, and including, line 1398
- eval-region
=> Emacs eval’s the region just fine
- then eval the next sexp : (defvar al-german-common-8bit-regexp ... )
=> Emacs segfaults (same result with emacs -Q -nw)
I know that auto-lang.el is not part of GNU Emacs, but I guess that
Emacs is supposed not to segfault when loading random *.el files.
I’m running Romain Françoise’s emacs-snapshot Debian package, based on
Emacs CVS (2008-03-28) :
In GNU Emacs 23.0.60.1 (i486-pc-linux-gnu, GTK+ Version 2.12.9)
of 2008-03-28 on elegiac, modified by Debian
(emacs-snapshot package, version 1:20080328-1)
configured using `configure '--build' 'i486-linux-gnu' '--host' 'i486-linux-gnu' '--prefix=/usr' '--sharedstatedir=/var/lib' '--libexecdir=/usr/lib' '--localstatedir=/var' '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-pop=yes' '--enable-locallisppath=/etc/emacs-snapshot:/etc/emacs:/usr/local/share/emacs/23.0.60/site-lisp:/usr/local/share/emacs/site-lisp:/usr/share/emacs/23.0.60/site-lisp:/usr/share/emacs/site-lisp:/usr/share/emacs/23.0.60/leim' '--with-x=yes' '--with-x-toolkit=gtk' 'build_alias=i486-linux-gnu' 'host_alias=i486-linux-gnu' 'CFLAGS=-DDEBIAN -DSITELOAD_PURESIZE_EXTRA=5000 -g -O2''
Important settings:
value of $LC_ALL: fr_FR.UTF-8
value of $LC_COLLATE: nil
value of $LC_CTYPE: UTF-8
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: fr_FR.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default-enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
global-auto-composition-mode: t
auto-composition-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
ESC x r e p o r t - e m TAB RET
Recent messages:
("emacs" "-Q")
Bye,
--
intrigeri <intrigeri <at> boum.org>
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#103
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
intrigeri <intrigeri <at> boum.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #10 received at 103 <at> emacsbugs.donarmstrong.com (full text, mbox):
Hello,
I tried to further isolate the segfault cause.
I first ran a clean Emacs : emacs -Q -nw
Then I eval’d only the
(defcustom al-german-common-words ...)
and the
(defcustom al-german-8bit-words ...)
from auto-lang.el.
Then I tried to eval parts of (defvar al-german-common-8bit-regexp ... )
to find out which part of it makes Emacs segfault.
I could eval the following without any issue :
(mapcar 'string-as-unibyte
(append
al-german-common-words
al-german-8bit-words
nil))
But the following triggers the segfault :
(regexp-opt
(mapcar 'string-as-unibyte
(append
al-german-common-words
al-german-8bit-words
nil)))
So it seems the regexp-opt function call is the one that triggers
the segfault.
Bye,
--
intrigeri <intrigeri <at> boum.org>
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#103
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Chong Yidong <cyd <at> stupidchicken.com>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #15 received at 103 <at> emacsbugs.donarmstrong.com (full text, mbox):
> - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/
> - run emacs -Q
> - M-x load-file
> - choose file ~/.elisp/auto-lang.el
> => Emacs segfaults (same result with emacs -Q -nw)
This is due to an infinite nesting depth in regexp-opt, which can be
tracked down to the following problem:
(let ((str (string-as-unibyte "ä")))
(string-match (char-to-string (string-to-char str)) str))
evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that
this screws up the use of all-completions in regexp-opt-group.
Anyone have any idea what's going on here?
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#103
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Kenichi Handa <handa <at> m17n.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #20 received at 103 <at> emacsbugs.donarmstrong.com (full text, mbox):
In article <87r6dg3oe2.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:
> > - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/
> > - run emacs -Q
> > - M-x load-file
> > - choose file ~/.elisp/auto-lang.el
> > => Emacs segfaults (same result with emacs -Q -nw)
> This is due to an infinite nesting depth in regexp-opt, which can be
> tracked down to the following problem:
> (let ((str (string-as-unibyte "$(D+#(B")))
> (string-match (char-to-string (string-to-char str)) str))
> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that
> this screws up the use of all-completions in regexp-opt-group.
> Anyone have any idea what's going on here?
(string-as-unibyte "$(D+#(B") => "\303\244"
(string-to-char "\303\244") => 195 (because ?\303 == 195)
(char-to-string 195) => "$(D**(B" (because 195==0xC3 U+00C3=='$(D**(B')
(string-match "$(D**(B" "$(D+#(B") => nil (obvious)
Any Lisp program that depends on the result of
string-as-unibyte (thus Emacs' internal character
representation) won't work in Emacs 23.
---
Kenichi Handa
handa <at> ni.aist.go.jp
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#103
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Chong Yidong <cyd <at> stupidchicken.com>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #25 received at 103 <at> emacsbugs.donarmstrong.com (full text, mbox):
Kenichi Handa <handa <at> m17n.org> writes:
> In article <87r6dg3oe2.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:
>
>> > - download http://www.marquardt-home.de/auto-lang.el to ~/.elisp/
>> > - run emacs -Q
>> > - M-x load-file
>> > - choose file ~/.elisp/auto-lang.el
>> > => Emacs segfaults (same result with emacs -Q -nw)
>
>> This is due to an infinite nesting depth in regexp-opt, which can be
>> tracked down to the following problem:
>
>> (let ((str (string-as-unibyte "ä")))
>> (string-match (char-to-string (string-to-char str)) str))
>
>> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that
>> this screws up the use of all-completions in regexp-opt-group.
>
>> Anyone have any idea what's going on here?
>
> (string-as-unibyte "ä") => "\303\244"
> (string-to-char "\303\244") => 195 (because ?\303 == 195)
> (char-to-string 195) => "Ã" (because 195==0xC3 U+00C3=='Ã')
> (string-match "Ã" "ä") => nil (obvious)
>
> Any Lisp program that depends on the result of
> string-as-unibyte (thus Emacs' internal character
> representation) won't work in Emacs 23.
I see. However, maybe the following change to regexp-opt-group in
regexp-opt.el would make things a little more predictable. What do you
think?
*** trunk/lisp/emacs-lisp/regexp-opt.el.~1.37.~ 2008-03-14 17:17:34.000000000 -0400
--- trunk/lisp/emacs-lisp/regexp-opt.el 2008-04-08 12:46:49.000000000 -0400
***************
*** 226,232 ****
;; Otherwise, divide the list into those that start with a
;; particular letter and those that do not, and recurse on them.
! (let* ((char (char-to-string (string-to-char (car strings))))
(half1 (all-completions char strings))
(half2 (nthcdr (length half1) strings)))
(concat open-group
--- 226,232 ----
;; Otherwise, divide the list into those that start with a
;; particular letter and those that do not, and recurse on them.
! (let* ((char (substring (car strings) 0 1))
(half1 (all-completions char strings))
(half2 (nthcdr (length half1) strings)))
(concat open-group
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#103
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Stefan Monnier <monnier <at> iro.umontreal.ca>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #30 received at 103 <at> emacsbugs.donarmstrong.com (full text, mbox):
>>> (let ((str (string-as-unibyte "ä")))
>>> (string-match (char-to-string (string-to-char str)) str))
>>
>>> evaluates to 0 in Emacs 22, and to nil in Emacs 23. It turns out that
>>> this screws up the use of all-completions in regexp-opt-group.
>>
>>> Anyone have any idea what's going on here?
>>
>> (string-as-unibyte "ä") => "\303\244"
>> (string-to-char "\303\244") => 195 (because ?\303 == 195)
>> (char-to-string 195) => "Ã" (because 195==0xC3 U+00C3=='Ã')
>> (string-match "Ã" "ä") => nil (obvious)
>>
>> Any Lisp program that depends on the result of
>> string-as-unibyte (thus Emacs' internal character
>> representation) won't work in Emacs 23.
Notice that the problem is unrelated to string-as-unibyte:
(string-match (char-to-string (string-to-char str)) str)
this should intuitively always return 0. Of course, once you replace
`char-to-string' with just `string', you may be reminded that Emacs-23
introduced `unibyte-string', which leads you to the key, if `str' is
unibyte, you need to do
(string-match (unibyte-string (string-to-char str)) str)
In Emacs-22, `string' used a heuristic to decide whether to build
a unibyte or multibyte string, and more importantly, the character
representing byte code 209 had code 209, whereas in Emacs-23, we have
the strange situation that byte 209 is character 4194257.
So an integer <256 needs to be accompagnied with some contextual info
that says whether it represents a char or a byte, otherwise you get
ambiguity that lead to bugs. And string-to-char returns either a byte
or a char depending on whether the string was unibyte or multibyte.
> I see. However, maybe the following change to regexp-opt-group in
> regexp-opt.el would make things a little more predictable. What do you
> think?
Yes, it looks like a good fix. Maybe "-no-properties" would be even
better.
Stefan
Information forwarded to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#103
; Package
emacs
.
Full text and
rfc822 format available.
Acknowledgement sent to
Kenichi Handa <handa <at> m17n.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
Full text and
rfc822 format available.
Message #35 received at 103 <at> emacsbugs.donarmstrong.com (full text, mbox):
In article <87skxwl29o.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:
> > Any Lisp program that depends on the result of
> > string-as-unibyte (thus Emacs' internal character
> > representation) won't work in Emacs 23.
> I see. However, maybe the following change to regexp-opt-group in
> regexp-opt.el would make things a little more predictable. What do you
> think?
I agree because that change will avoid a unibyte string
being changed to multibyte by accident.
But, I've just downloaded auto-lang.el and found that it
has codes something like this:
(string-as-multibyte
(regexp-opt
(mapcar 'string-as-unibyte
(append
al-german-common-words
al-german-8bit-words
nil))))
All of them should be changed to this simple form:
(regexp-opt (append al-german-common-words al-german-8bit-words))
The above german case works just by chance, but
al-danish-common-words doesn't. You'll see peculiar 8-bit
codes in it.
And, the file should have a coding tag.
---
Kenichi Handa
handa <at> ni.aist.go.jp
> *** trunk/lisp/emacs-lisp/regexp-opt.el.~1.37.~ 2008-03-14 17:17:34.000000000 -0400
> --- trunk/lisp/emacs-lisp/regexp-opt.el 2008-04-08 12:46:49.000000000 -0400
> ***************
> *** 226,232 ****
> ;; Otherwise, divide the list into those that start with a
> ;; particular letter and those that do not, and recurse on them.
> ! (let* ((char (char-to-string (string-to-char (car strings))))
> (half1 (all-completions char strings))
> (half2 (nthcdr (length half1) strings)))
> (concat open-group
> --- 226,232 ----
> ;; Otherwise, divide the list into those that start with a
> ;; particular letter and those that do not, and recurse on them.
> ! (let* ((char (substring (car strings) 0 1))
> (half1 (all-completions char strings))
> (half2 (nthcdr (length half1) strings)))
> (concat open-group
Reply sent to
Chong Yidong <cyd <at> stupidchicken.com>
:
You have taken responsibility.
Full text and
rfc822 format available.
Notification sent to
intrigeri <intrigeri <at> boum.org>
:
bug acknowledged by developer.
Full text and
rfc822 format available.
Message #40 received at 103-close <at> emacsbugs.donarmstrong.com (full text, mbox):
I've checked in the fix to regexp-opt.
bug archived.
Request was from
Debbugs Internal Request <don <at> donarmstrong.com>
to
internal_control <at> emacsbugs.donarmstrong.com
.
(Wed, 07 May 2008 14:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 17 years and 45 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.