GNU bug report logs -
#5797
23.1; search-forward in unibyte buffer for \377
Previous Next
Reported by: rasmith <at> tamu.edu
Date: Mon, 29 Mar 2010 16:35:01 UTC
Severity: normal
Tags: fixed
Merged with 5799
Fixed in version 24.1
Done: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 5797 in the body.
You can then email your comments to 5797 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#5797
; Package
emacs
.
(Mon, 29 Mar 2010 16:35:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
rasmith <at> tamu.edu
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Mon, 29 Mar 2010 16:35:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.
Your bug report will be posted to the bug-gnu-emacs <at> gnu.org mailing list,
and to the gnu.emacs.bug news group.
Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:
search-forward fails to find a unibyte \377 in a raw unibyte buffer.
I use "cgreek", a package written by Naoto Takahashi for handling
polytonic (ancient, fully accented) Greek. It includes a file,
cgreek-tlg.el, for processing the files in the Thesaurus Linguae
Graecae, which have their own unique formats. In these files, the
byte \377 is used as a string terminator. Prior to emacs23, these
files could be processed by reading the file in with
insert-file-contents-literally, making the buffer unibyte with
(set-buffer-multibyte nil), and searching for the string terminator
with (search-forward (char-to-string ?\xff)). However, that search
now fails to find a single byte \377 and instead matches on the
two-byte sequence \231\277.
Changing the search function to (search-forward (unibyte-string ?\377))
has the same result.
On investigation, I see the following:
After further investigation, I'm not certain it's a bug: it may be an
intentional part of the modifications to accommodate utf-8. Here are
the details;
In a multibyte-buffer (set-buffer-multibyte t),
(search-forward (char-to-string ?\xff)) matches utf-8 "ÿ" (i.e. \303\277)
(search-forward (char-to-string ?\377)) matches utf-8 "ÿ"
(search-forward (unibyte-string ?\377)) matches byte \377
In a unibyte buffer (set-buffer-multibyte nil)
(search-forward (char-to-string ?\xff)) matches \231\277
(search-forward (char-to-string ?\377)) matches \231\277
(search-forward (unibyte-string ?\377)) matches \231\277
In other words, search-forward cannot find byte \377 when searching in
a *unibyte* buffer, but it can find that same byte if the buffer is
changed to multibyte. The reason is that in a unibyte buffer,
search-forward apparently changes byte \377 to a two-byte
representation (but not to utf-8, which would be \303\277).
This may be exactly the intended behavior of search-forward, but it
breaks scripts expecting search-forward to be able to find a single
high 8-bit byte in a unibyte buffer. In context, changing the buffer
to multibyte is not a solution.
The code in which I found this error can be fixed by replacing
(search-forward (char-to-string ?\xff))
with
(skip-chars-forward "^\377")
(forward-char 1)
(fix provided by Naoto Takahashi)
However, that means that scripts counting on the old behavior of
search-forward will have to be modified.
If Emacs crashed, and you have the Emacs process in the gdb debugger,
please include the output from the following gdb commands:
`bt full' and `xbacktrace'.
If you would like to further debug the crash, please read the file
/usr/local/share/emacs/23.1/etc/DEBUG for instructions.
In GNU Emacs 23.1.1 (amd64-portbld-freebsd8.0, GTK+ Version 2.18.7)
of 2010-03-25 on aristotle.tamu.edu
Windowing system distributor `The X.Org Foundation', version 11.0.10605000
configured using `configure '--with-x-toolkit=gtk' '--x-libraries=/usr/local/lib' '--x-includes=/usr/local/include' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.0' 'build_alias=amd64-portbld-freebsd8.0' 'CC=cc' 'CFLAGS=-O2 -pipe -fno-strict-aliasing' 'LDFLAGS=-L/usr/local/lib -lintl' 'CPPFLAGS=-I/usr/local/include''
Important settings:
value of $LC_ALL: en_US.UTF-8
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en_US.UTF-8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default-enable-multibyte-characters: t
Major mode: Lisp Interaction
Minor modes in effect:
tooltip-mode: t
tool-bar-mode: t
mouse-wheel-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
global-auto-composition-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t
line-number-mode: t
transient-mark-mode: t
Recent input:
o <down> <down> <down> <return> C-q 0 0 0 <return>
C-q 3 7 7 <return> <up> <up> <up> <left> <up> C-x C-e
C-x o <down> <down> <down> <down> <backspace> <backspace>
C-q 2 3 1 <return> ] <backspace> C-q 2 7 7 <return>
<up> <up> <up> <up> C-e C-x C-e <up> <up> <left> C-x
C-e <up> <up> <switch-frame> <down-mouse-1> <mouse-movement>
<switch-frame> <mouse-1> <help-echo> <switch-frame>
<switch-frame> <switch-frame> <switch-frame> <switch-frame>
<switch-frame> <switch-frame> <switch-frame> <help-echo>
<up> <up> <left> <up> <right> C-k C-y <return> C-y
<left> <backspace> <backspace> <backspace> t <right>
C-x C-e <down> <right> <right> <right> <right> <right>
<right> <right> <right> <right> <right> <right> <right>
<right> <right> <right> C-x C-e C-x o <down> C-x C-e
<up> <up> <up> <left> <left> <left> <left> <return>
<up> ( s e a r c h - f o r w a r d SPC ( c h a r -
t o - s t r i o n g <backspace> <backspace> <backspace>
g <backspace> g SPC <backspace> <backspace> n g SPC
? \ x f f ) ) C-x C-e C-x o <up> <up> <down> <up> C-x
C-e <down> <down> C-e C-x C-e <up> <up> <up> <up> C-e
C-x C-e <up> <up> <left> C-x C-e <up> <up> <up> <up>
<up> <up> C-e C-x C-e <down> C-e C-x C-e C-x o <down>
<down> <down> <down> <down> <down> <return> C-q 3 7
7 <return> <up> <up> <up> <up> <up> <up> <left> <left>
C-x C-e <up> <up> <up> <up> <up> <up> <down> <left>
<left> C-x C-e <up> <up> <up> <up> <left> C-x C-e <up>
<up> <up> <up> <up> <left> <left> <left> <left> <left>
C-x C-e <down> <down> C-e C-x C-e <up> <up> <up> <up>
C-e C-x C-e <up> <up> <up> C-e C-x C-e <down> <switch-frame>
<switch-frame> <help-echo> <help-echo> <help-echo>
M-x r e p o r t <tab> b <tab> <return>
Recent messages:
Entering debugger...
326
Entering debugger...
nil
369 [3 times]
t
Entering debugger...
374 [2 times]
366
nil
369 [3 times]
Merged 5797 5799.
Request was from
Glenn Morris <rgm <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Wed, 31 Mar 2010 18:01:01 GMT)
Full text and
rfc822 format available.
Added tag(s) fixed.
Request was from
Lars Magne Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Sun, 18 Sep 2011 20:10:01 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 24.1, send any further explanations to
5799 <at> debbugs.gnu.org and bojohan <at> gnu.org
Request was from
Lars Magne Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Sun, 18 Sep 2011 20:10:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 17 Oct 2011 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 13 years and 307 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.