GNU bug report logs -
#4209
23.1; Emacs 23.1 regression in re-search-forward
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 4209 in the body.
You can then email your comments to 4209 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#4209
; Package
emacs
.
(Thu, 20 Aug 2009 20:35:06 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Christopher J. Madsen" <cjm <at> cjmweb.net>
:
New bug report received and forwarded. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
(Thu, 20 Aug 2009 20:35:06 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):
I've found a regression in Emacs 23.1 (versus Emacs 22.3). I've
narrowed it down to this test case:
;--- re-bug.el starts here
(set-buffer (get-buffer-create "*Test Buffer*"))
(insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")
(goto-char (point-min))
(message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))
(message "re-search-forward: %s"
(re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
;--- re-bug.el ends here
Then at the command line:
$ emacs-22 --batch -Q -l re-bug.el
looking-at: t
re-search-forward: 9
$ emacs-23 --batch -Q -l re-bug.el
looking-at: t
re-search-forward: nil
As you can see, looking-at succeeds in both versions, but
re-search-forward fails in Emacs 23. I don't know why. It seems like
the functions should either both succeed or both fail.
For comparison, here's the version of Emacs 22 that I'm using:
GNU Emacs 22.3.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11)
of 2009-04-06 on byte
Windowing system distributor `The X.Org Foundation', version 11.0.10503000
configured using `configure '--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--program-suffix=-emacs-22' '--infodir=/usr/share/info/emacs-22' '--without-carbon' '--with-sound' '--with-x' '--with-toolkit-scroll-bars' '--with-jpeg' '--with-tiff' '--with-gif' '--with-png' '--with-xpm' '--with-x-toolkit=gtk' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--build=i686-pc-linux-gnu' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CFLAGS=-march=prescott -O2 -pipe' 'LDFLAGS=-Wl,-O1''
And this is the Emacs 23 information:
In GNU Emacs 23.1.1 (i686-pc-linux-gnu, GTK+ Version 2.16.5)
of 2009-08-10 on byte
Windowing system distributor `The X.Org Foundation', version 11.0.10503000
configured using `configure '--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--program-suffix=-emacs-23' '--infodir=/usr/share/info/emacs-23' '--with-sound' '--with-x' '--with-toolkit-scroll-bars' '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm' '--without-xft' '--without-libotf' '--without-m17n-flt' '--with-x-toolkit=gtk' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus' '--build=i686-pc-linux-gnu' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CFLAGS=-march=core2 -O2 -pipe' 'LDFLAGS=-Wl,-O1''
Important settings:
value of $LC_ALL: nil
value of $LC_COLLATE: nil
value of $LC_CTYPE: nil
value of $LC_MESSAGES: nil
value of $LC_MONETARY: nil
value of $LC_NUMERIC: nil
value of $LC_TIME: nil
value of $LANG: en_US.utf8
value of $XMODIFIERS: nil
locale-coding-system: utf-8-unix
default-enable-multibyte-characters: t
Major mode: Fundamental
Severity set to 'serious' from 'normal'
Request was from
Chong Yidong <cyd <at> stupidchicken.com>
to
control <at> emacsbugs.donarmstrong.com
.
(Sat, 12 Sep 2009 01:00:05 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#4209
; Package
emacs
.
(Wed, 02 Dec 2009 00:30:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Matthew Dempsky <matthew <at> dempsky.org>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
(Wed, 02 Dec 2009 00:30:04 GMT)
Full text and
rfc822 format available.
Message #12 received at 4209 <at> emacsbugs.donarmstrong.com (full text, mbox):
This is a stab in the dark, but the patch below corrects this issue for me:
$ ./retest.sh
looking-at: t
re-search-forward: 9
I don't see any reason this should cause regressions (searching
forward 0 steps seems to me it should be the same as searching
backward 0 steps), but I've only casually looked over regex.c.
--- a/src/regex.c
+++ b/src/regex.c
@@ -4524,7 +4524,7 @@ re_search_2 (bufp, str1, size1, str2, size2,
startpos, range, regs, stop)
d = POS_ADDR_VSTRING (startpos);
- if (range > 0) /* Searching forwards. */
+ if (range >= 0) /* Searching forwards. */
{
register int lim = 0;
int irange = range;
Information forwarded
to
bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>
:
bug#4209
; Package
emacs
.
(Mon, 07 Dec 2009 03:35:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Christopher J. Madsen" <cjm <at> cjmweb.net>
:
Extra info received and forwarded to list. Copy sent to
Emacs Bugs <bug-gnu-emacs <at> gnu.org>
.
(Mon, 07 Dec 2009 03:35:04 GMT)
Full text and
rfc822 format available.
Message #17 received at 4209 <at> emacsbugs.donarmstrong.com (full text, mbox):
Matthew's patch corrects the problem for me, too. (Even though that
line did not change between 22.3 and 23.1.)
Thanks, Matthew. This bug had been preventing me from upgrading to 23.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Tue, 26 Jan 2010 20:39:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 4209 <at> debbugs.gnu.org (full text, mbox):
Hi Handa-san,
Could you try to investigate Bug#4209? I took a quick look, and the
contents of the Lisp string passed to Fre_search_forward in Emacs 23 is
$2 = (struct Lisp_String *) 0x86765b8
"\\`\302L\357w\306i\214\n"
but in Emacs 22 (where this test works) it's
$2 = (struct Lisp_String *) 0x86290e8
"\\`\302L\357w\306i\236\254\n"
which seems a little strange to me.
> I've found a regression in Emacs 23.1 (versus Emacs 22.3). I've
> narrowed it down to this test case:
>
> ;--- re-bug.el starts here
> (set-buffer (get-buffer-create "*Test Buffer*"))
>
> (insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")
>
> (goto-char (point-min))
>
> (message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))
>
> (message "re-search-forward: %s"
> (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
> ;--- re-bug.el ends here
>
> Then at the command line:
>
> $ emacs-22 --batch -Q -l re-bug.el
> looking-at: t
> re-search-forward: 9
>
> $ emacs-23 --batch -Q -l re-bug.el
> looking-at: t
> re-search-forward: nil
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Wed, 27 Jan 2010 03:44:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 4209 <at> debbugs.gnu.org (full text, mbox):
In article <87ljfkha9i.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:
> Hi Handa-san,
> Could you try to investigate Bug#4209?
Ok, I'll work on it.
> I took a quick look, and the
> contents of the Lisp string passed to Fre_search_forward in Emacs 23 is
> $2 = (struct Lisp_String *) 0x86765b8
> "\\`\302L\357w\306i\214\n"
> but in Emacs 22 (where this test works) it's
> $2 = (struct Lisp_String *) 0x86290e8
> "\\`\302L\357w\306i\236\254\n"
> which seems a little strange to me.
It seems that Emacs 22 provoides a multibyte string (perhaps
because the searching buffer is multibyte) and Emacs 23
provoides a unibyte string. But, I think that difference is
not important here.
---
Kenichi Handa
handa <at> m17n.org
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Wed, 27 Jan 2010 05:42:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 4209 <at> debbugs.gnu.org (full text, mbox):
In article <tl7my00w6tw.fsf <at> m17n.org>, Kenichi Handa <handa <at> m17n.org> writes:
> In article <87ljfkha9i.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:
> > Hi Handa-san,
> > Could you try to investigate Bug#4209?
> Ok, I'll work on it.
I fixed it as below.
=== modified file 'src/regex.c'
--- src/regex.c 2010-01-13 08:35:10 +0000
+++ src/regex.c 2010-01-27 03:57:03 +0000
@@ -4083,8 +4083,7 @@
the corresponding multibyte character. */
int c = RE_CHAR_TO_MULTIBYTE (p[1]);
- if (! CHAR_BYTE8_P (c))
- fastmap[CHAR_LEADING_CODE (c)] = 1;
+ fastmap[CHAR_LEADING_CODE (c)] = 1;
}
}
break;
But, first of all, I don't know (remember) why there was this check:
if (! CHAR_BYTE8_P (c))
I may have overlooked something. Stefan, could you please
confirm that the above change is correct?
---
Kenichi Handa
handa <at> m17n.org
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Wed, 27 Jan 2010 14:35:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 4209 <at> debbugs.gnu.org (full text, mbox):
>> > Could you try to investigate Bug#4209?
>> Ok, I'll work on it.
> I fixed it as below.
> === modified file 'src/regex.c'
> --- src/regex.c 2010-01-13 08:35:10 +0000
> +++ src/regex.c 2010-01-27 03:57:03 +0000
> @@ -4083,8 +4083,7 @@
> the corresponding multibyte character. */
> int c = RE_CHAR_TO_MULTIBYTE (p[1]);
> - if (! CHAR_BYTE8_P (c))
> - fastmap[CHAR_LEADING_CODE (c)] = 1;
> + fastmap[CHAR_LEADING_CODE (c)] = 1;
> }
> }
> break;
> But, first of all, I don't know (remember) why there was this check:
> if (! CHAR_BYTE8_P (c))
> I may have overlooked something. Stefan, could you please
> confirm that the above change is correct?
The preceding comment keeps me puzzled. I thought that we only ever
matched re_patterns and buffers of the same multibyteness, i.e. if
a unibyte regexp is matched against a multibyte buffer it should first
be turned into a multibyte regexp and then re_compiled, so the case of:
/* For the case of matching this unibyte regex
against multibyte, we must set a leading code of
the corresponding multibyte character. */
should never happen in analyse_first. Yet, if your patch fixes the bug,
that indicates that apparently it *does* happen.
Stefan
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Wed, 27 Jan 2010 16:44:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 4209 <at> debbugs.gnu.org (full text, mbox):
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> The preceding comment keeps me puzzled. I thought that we only ever
> matched re_patterns and buffers of the same multibyteness, i.e. if
> a unibyte regexp is matched against a multibyte buffer it should first
> be turned into a multibyte regexp and then re_compiled, so the case of:
>
> /* For the case of matching this unibyte regex
> against multibyte, we must set a leading code of
> the corresponding multibyte character. */
>
> should never happen in analyse_first. Yet, if your patch fixes the bug,
> that indicates that apparently it *does* happen.
I observe that in the original bug recipe:
(set-buffer (get-buffer-create "*Test Buffer*"))
(insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")
(goto-char (point-min))
(message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))
(message "re-search-forward: %s"
(re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
If we replace
(re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
with
(re-search-forward (string-to-multibyte
"\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A") 100 t))
then the regexp match takes places correctly. I'm not sure why the
looking-at call works, tho.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Thu, 28 Jan 2010 01:20:03 GMT)
Full text and
rfc822 format available.
Message #35 received at 4209 <at> debbugs.gnu.org (full text, mbox):
In article <jwviqanliww.fsf-monnier+emacs <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> The preceding comment keeps me puzzled. I thought that we only ever
> matched re_patterns and buffers of the same multibyteness, i.e. if
> a unibyte regexp is matched against a multibyte buffer it should first
> be turned into a multibyte regexp and then re_compiled, so the case of:
Before we changed the behavour of unibyte->multibyte
conversion, that conversion depended on the preferred
charset (thus on lang. env.). But, Emacs 22 wrongly cached
the pattern converted at some point, and reused it without
checking the change of preferred charset.
So, in emacs-unicode branch, I fixed the regex code so that
unibyte pattern can be directry used for multibyte buffer
search by doing unibyte->multibyte conversion on the fly.
And that code was merged to trunk.
So,
> /* For the case of matching this unibyte regex
> against multibyte, we must set a leading code of
> the corresponding multibyte character. */
really happens.
---
Kenichi Handa
handa <at> m17n.org
bug closed, send any further explanations to "Christopher J. Madsen" <cjm <at> cjmweb.net>
Request was from
Chong Yidong <cyd <at> stupidchicken.com>
to
control <at> debbugs.gnu.org
.
(Thu, 28 Jan 2010 17:28:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Thu, 28 Jan 2010 19:03:02 GMT)
Full text and
rfc822 format available.
Message #40 received at 4209 <at> debbugs.gnu.org (full text, mbox):
> So, in emacs-unicode branch, I fixed the regex code so that
> unibyte pattern can be directry used for multibyte buffer
> search by doing unibyte->multibyte conversion on the fly.
> And that code was merged to trunk.
Hmm... that's too bad since the subsequent change to get rid of the
dependency on locales made this change unnecessary.
But given this, yes, the patch looks right, and no, I have no idea what
the CHAR_BYTE8_P test was trying to do.
Stefan
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#4209
; Package
emacs
.
(Fri, 29 Jan 2010 06:16:02 GMT)
Full text and
rfc822 format available.
Message #43 received at 4209 <at> debbugs.gnu.org (full text, mbox):
In article <jwvvdemoy0b.fsf-monnier+emacs <at> gnu.org>, Stefan Monnier <monnier <at> IRO.UMontreal.CA> writes:
> > So, in emacs-unicode branch, I fixed the regex code so that
> > unibyte pattern can be directry used for multibyte buffer
> > search by doing unibyte->multibyte conversion on the fly.
> > And that code was merged to trunk.
> Hmm... that's too bad since the subsequent change to get rid of the
> dependency on locales made this change unnecessary.
Yes. I'll put this in my todo list (but with lower priority).
* avoid on-the-fly uni<->multi conversion in regex.c.
> But given this, yes, the patch looks right, and no, I have no idea what
> the CHAR_BYTE8_P test was trying to do.
Ok, thank you for the confirmation.
---
Kenichi Handa
handa <at> m17n.org
bug archived.
Request was from
Debbugs Internal Request <bug-gnu-emacs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 26 Feb 2010 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 15 years and 114 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.