Package: emacs;
Reported by: rasmith <at> tamu.edu
Date: Mon, 29 Mar 2010 16:35:01 UTC
Severity: normal
Tags: fixed
Merged with 5799
Fixed in version 24.1
Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: rasmith <at> tamu.edu To: 5797 <at> debbugs.gnu.org Subject: bug#5797: 23.1; search-forward in unibyte buffer for \377 Date: Mon, 29 Mar 2010 10:09:19 -0500 (CDT)
Please write in English if possible, because the Emacs maintainers usually do not have translators to read other languages for them. Your bug report will be posted to the bug-gnu-emacs <at> gnu.org mailing list, and to the gnu.emacs.bug news group. Please describe exactly what actions triggered the bug and the precise symptoms of the bug: search-forward fails to find a unibyte \377 in a raw unibyte buffer. I use "cgreek", a package written by Naoto Takahashi for handling polytonic (ancient, fully accented) Greek. It includes a file, cgreek-tlg.el, for processing the files in the Thesaurus Linguae Graecae, which have their own unique formats. In these files, the byte \377 is used as a string terminator. Prior to emacs23, these files could be processed by reading the file in with insert-file-contents-literally, making the buffer unibyte with (set-buffer-multibyte nil), and searching for the string terminator with (search-forward (char-to-string ?\xff)). However, that search now fails to find a single byte \377 and instead matches on the two-byte sequence \231\277. Changing the search function to (search-forward (unibyte-string ?\377)) has the same result. On investigation, I see the following: After further investigation, I'm not certain it's a bug: it may be an intentional part of the modifications to accommodate utf-8. Here are the details; In a multibyte-buffer (set-buffer-multibyte t), (search-forward (char-to-string ?\xff)) matches utf-8 "ÿ" (i.e. \303\277) (search-forward (char-to-string ?\377)) matches utf-8 "ÿ" (search-forward (unibyte-string ?\377)) matches byte \377 In a unibyte buffer (set-buffer-multibyte nil) (search-forward (char-to-string ?\xff)) matches \231\277 (search-forward (char-to-string ?\377)) matches \231\277 (search-forward (unibyte-string ?\377)) matches \231\277 In other words, search-forward cannot find byte \377 when searching in a *unibyte* buffer, but it can find that same byte if the buffer is changed to multibyte. The reason is that in a unibyte buffer, search-forward apparently changes byte \377 to a two-byte representation (but not to utf-8, which would be \303\277). This may be exactly the intended behavior of search-forward, but it breaks scripts expecting search-forward to be able to find a single high 8-bit byte in a unibyte buffer. In context, changing the buffer to multibyte is not a solution. The code in which I found this error can be fixed by replacing (search-forward (char-to-string ?\xff)) with (skip-chars-forward "^\377") (forward-char 1) (fix provided by Naoto Takahashi) However, that means that scripts counting on the old behavior of search-forward will have to be modified. If Emacs crashed, and you have the Emacs process in the gdb debugger, please include the output from the following gdb commands: `bt full' and `xbacktrace'. If you would like to further debug the crash, please read the file /usr/local/share/emacs/23.1/etc/DEBUG for instructions. In GNU Emacs 23.1.1 (amd64-portbld-freebsd8.0, GTK+ Version 2.18.7) of 2010-03-25 on aristotle.tamu.edu Windowing system distributor `The X.Org Foundation', version 11.0.10605000 configured using `configure '--with-x-toolkit=gtk' '--x-libraries=/usr/local/lib' '--x-includes=/usr/local/include' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.0' 'build_alias=amd64-portbld-freebsd8.0' 'CC=cc' 'CFLAGS=-O2 -pipe -fno-strict-aliasing' 'LDFLAGS=-L/usr/local/lib -lintl' 'CPPFLAGS=-I/usr/local/include'' Important settings: value of $LC_ALL: en_US.UTF-8 value of $LC_COLLATE: nil value of $LC_CTYPE: nil value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: en_US.UTF-8 value of $XMODIFIERS: nil locale-coding-system: utf-8-unix default-enable-multibyte-characters: t Major mode: Lisp Interaction Minor modes in effect: tooltip-mode: t tool-bar-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t global-auto-composition-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: o <down> <down> <down> <return> C-q 0 0 0 <return> C-q 3 7 7 <return> <up> <up> <up> <left> <up> C-x C-e C-x o <down> <down> <down> <down> <backspace> <backspace> C-q 2 3 1 <return> ] <backspace> C-q 2 7 7 <return> <up> <up> <up> <up> C-e C-x C-e <up> <up> <left> C-x C-e <up> <up> <switch-frame> <down-mouse-1> <mouse-movement> <switch-frame> <mouse-1> <help-echo> <switch-frame> <switch-frame> <switch-frame> <switch-frame> <switch-frame> <switch-frame> <switch-frame> <switch-frame> <help-echo> <up> <up> <left> <up> <right> C-k C-y <return> C-y <left> <backspace> <backspace> <backspace> t <right> C-x C-e <down> <right> <right> <right> <right> <right> <right> <right> <right> <right> <right> <right> <right> <right> <right> <right> C-x C-e C-x o <down> C-x C-e <up> <up> <up> <left> <left> <left> <left> <return> <up> ( s e a r c h - f o r w a r d SPC ( c h a r - t o - s t r i o n g <backspace> <backspace> <backspace> g <backspace> g SPC <backspace> <backspace> n g SPC ? \ x f f ) ) C-x C-e C-x o <up> <up> <down> <up> C-x C-e <down> <down> C-e C-x C-e <up> <up> <up> <up> C-e C-x C-e <up> <up> <left> C-x C-e <up> <up> <up> <up> <up> <up> C-e C-x C-e <down> C-e C-x C-e C-x o <down> <down> <down> <down> <down> <down> <return> C-q 3 7 7 <return> <up> <up> <up> <up> <up> <up> <left> <left> C-x C-e <up> <up> <up> <up> <up> <up> <down> <left> <left> C-x C-e <up> <up> <up> <up> <left> C-x C-e <up> <up> <up> <up> <up> <left> <left> <left> <left> <left> C-x C-e <down> <down> C-e C-x C-e <up> <up> <up> <up> C-e C-x C-e <up> <up> <up> C-e C-x C-e <down> <switch-frame> <switch-frame> <help-echo> <help-echo> <help-echo> M-x r e p o r t <tab> b <tab> <return> Recent messages: Entering debugger... 326 Entering debugger... nil 369 [3 times] t Entering debugger... 374 [2 times] 366 nil 369 [3 times]
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.