GNU bug report logs - #4209
23.1; Emacs 23.1 regression in re-search-forward

Previous Next

Package: emacs;

Reported by: "Christopher J. Madsen" <cjm <at> cjmweb.net>

Date: Thu, 20 Aug 2009 20:35:06 UTC

Severity: serious

Done: Chong Yidong <cyd <at> stupidchicken.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 4209 in the body.
You can then email your comments to 4209 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#4209; Package emacs. (Thu, 20 Aug 2009 20:35:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Christopher J. Madsen" <cjm <at> cjmweb.net>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 20 Aug 2009 20:35:06 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: "Christopher J. Madsen" <cjm <at> cjmweb.net>
To: bug-gnu-emacs <at> gnu.org
Cc: cjm <at> byte.mynet
Subject: 23.1; Emacs 23.1 regression in re-search-forward
Date: 20 Aug 2009 00:50:33 -0000
I've found a regression in Emacs 23.1 (versus Emacs 22.3).  I've
narrowed it down to this test case:

;--- re-bug.el starts here
(set-buffer (get-buffer-create "*Test Buffer*"))

(insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")

(goto-char (point-min))

(message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))

(message "re-search-forward: %s"
         (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
;--- re-bug.el ends here

Then at the command line:

$ emacs-22 --batch -Q -l re-bug.el
looking-at: t
re-search-forward: 9

$ emacs-23 --batch -Q -l re-bug.el
looking-at: t
re-search-forward: nil


As you can see, looking-at succeeds in both versions, but
re-search-forward fails in Emacs 23.  I don't know why.  It seems like
the functions should either both succeed or both fail.


For comparison, here's the version of Emacs 22 that I'm using:
GNU Emacs 22.3.1 (i686-pc-linux-gnu, GTK+ Version 2.12.11)
 of 2009-04-06 on byte
Windowing system distributor `The X.Org Foundation', version 11.0.10503000
configured using `configure  '--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--program-suffix=-emacs-22' '--infodir=/usr/share/info/emacs-22' '--without-carbon' '--with-sound' '--with-x' '--with-toolkit-scroll-bars' '--with-jpeg' '--with-tiff' '--with-gif' '--with-png' '--with-xpm' '--with-x-toolkit=gtk' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--build=i686-pc-linux-gnu' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CFLAGS=-march=prescott -O2 -pipe' 'LDFLAGS=-Wl,-O1''


And this is the Emacs 23 information:

In GNU Emacs 23.1.1 (i686-pc-linux-gnu, GTK+ Version 2.16.5)
 of 2009-08-10 on byte
Windowing system distributor `The X.Org Foundation', version 11.0.10503000
configured using `configure  '--prefix=/usr' '--host=i686-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--program-suffix=-emacs-23' '--infodir=/usr/share/info/emacs-23' '--with-sound' '--with-x' '--with-toolkit-scroll-bars' '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm' '--without-xft' '--without-libotf' '--without-m17n-flt' '--with-x-toolkit=gtk' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus' '--build=i686-pc-linux-gnu' 'build_alias=i686-pc-linux-gnu' 'host_alias=i686-pc-linux-gnu' 'CFLAGS=-march=core2 -O2 -pipe' 'LDFLAGS=-Wl,-O1''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en_US.utf8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t

Major mode: Fundamental





Severity set to 'serious' from 'normal' Request was from Chong Yidong <cyd <at> stupidchicken.com> to control <at> emacsbugs.donarmstrong.com. (Sat, 12 Sep 2009 01:00:05 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#4209; Package emacs. (Wed, 02 Dec 2009 00:30:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Matthew Dempsky <matthew <at> dempsky.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Wed, 02 Dec 2009 00:30:04 GMT) Full text and rfc822 format available.

Message #12 received at 4209 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Matthew Dempsky <matthew <at> dempsky.org>
To: 4209 <at> debbugs.gnu.org
Subject: Re: 23.1; Emacs 23.1 regression in re-search-forward
Date: Tue, 1 Dec 2009 16:21:07 -0800
This is a stab in the dark, but the patch below corrects this issue for me:

    $ ./retest.sh
    looking-at: t
    re-search-forward: 9

I don't see any reason this should cause regressions (searching
forward 0 steps seems to me it should be the same as searching
backward 0 steps), but I've only casually looked over regex.c.

--- a/src/regex.c
+++ b/src/regex.c
@@ -4524,7 +4524,7 @@ re_search_2 (bufp, str1, size1, str2, size2,
startpos, range, regs, stop)

          d = POS_ADDR_VSTRING (startpos);

-         if (range > 0)        /* Searching forwards.  */
+         if (range >= 0)       /* Searching forwards.  */
            {
              register int lim = 0;
              int irange = range;



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#4209; Package emacs. (Mon, 07 Dec 2009 03:35:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Christopher J. Madsen" <cjm <at> cjmweb.net>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 07 Dec 2009 03:35:04 GMT) Full text and rfc822 format available.

Message #17 received at 4209 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: "Christopher J. Madsen" <cjm <at> cjmweb.net>
To: 4209 <at> debbugs.gnu.org
Subject: Re: 23.1; Emacs 23.1 regression in re-search-forward
Date: Sun, 06 Dec 2009 21:30:43 -0600
Matthew's patch corrects the problem for me, too.  (Even though that
line did not change between 22.3 and 23.1.)

Thanks, Matthew.  This bug had been preventing me from upgrading to 23.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Tue, 26 Jan 2010 20:39:02 GMT) Full text and rfc822 format available.

Message #20 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Kenichi Handa  <handa <at> m17n.org>
Cc: 4209 <at> debbugs.gnu.org
Subject: Re: 23.1; Emacs 23.1 regression in re-search-forward
Date: Tue, 26 Jan 2010 15:38:33 -0500
Hi Handa-san,

Could you try to investigate Bug#4209?  I took a quick look, and the
contents of the Lisp string passed to Fre_search_forward in Emacs 23 is

$2 = (struct Lisp_String *) 0x86765b8
"\\`\302L\357w\306i\214\n"

but in Emacs 22 (where this test works) it's

$2 = (struct Lisp_String *) 0x86290e8
"\\`\302L\357w\306i\236\254\n"

which seems a little strange to me.


> I've found a regression in Emacs 23.1 (versus Emacs 22.3).  I've
> narrowed it down to this test case:
>
> ;--- re-bug.el starts here
> (set-buffer (get-buffer-create "*Test Buffer*"))
>
> (insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")
>
> (goto-char (point-min))
>
> (message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))
>
> (message "re-search-forward: %s"
>          (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))
> ;--- re-bug.el ends here
>
> Then at the command line:
>
> $ emacs-22 --batch -Q -l re-bug.el
> looking-at: t
> re-search-forward: 9
>
> $ emacs-23 --batch -Q -l re-bug.el
> looking-at: t
> re-search-forward: nil




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Wed, 27 Jan 2010 03:44:02 GMT) Full text and rfc822 format available.

Message #23 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Chong Yidong <cyd <at> stupidchicken.com>
Cc: 4209 <at> debbugs.gnu.org
Subject: Re: 23.1; Emacs 23.1 regression in re-search-forward
Date: Wed, 27 Jan 2010 12:43:39 +0900
In article <87ljfkha9i.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:

> Hi Handa-san,
> Could you try to investigate Bug#4209?

Ok, I'll work on it.

> I took a quick look, and the
> contents of the Lisp string passed to Fre_search_forward in Emacs 23 is

> $2 = (struct Lisp_String *) 0x86765b8
> "\\`\302L\357w\306i\214\n"

> but in Emacs 22 (where this test works) it's

> $2 = (struct Lisp_String *) 0x86290e8
> "\\`\302L\357w\306i\236\254\n"

> which seems a little strange to me.

It seems that Emacs 22 provoides a multibyte string (perhaps
because the searching buffer is multibyte) and Emacs 23
provoides a unibyte string.  But, I think that difference is
not important here.

---
Kenichi Handa
handa <at> m17n.org




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Wed, 27 Jan 2010 05:42:01 GMT) Full text and rfc822 format available.

Message #26 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 4209 <at> debbugs.gnu.org, cyd <at> stupidchicken.com
Subject: Re: bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
Date: Wed, 27 Jan 2010 14:41:11 +0900
In article <tl7my00w6tw.fsf <at> m17n.org>, Kenichi Handa <handa <at> m17n.org> writes:

> In article <87ljfkha9i.fsf <at> stupidchicken.com>, Chong Yidong <cyd <at> stupidchicken.com> writes:
> > Hi Handa-san,
> > Could you try to investigate Bug#4209?

> Ok, I'll work on it.

I fixed it as below.

=== modified file 'src/regex.c'
--- src/regex.c	2010-01-13 08:35:10 +0000
+++ src/regex.c	2010-01-27 03:57:03 +0000
@@ -4083,8 +4083,7 @@
 		     the corresponding multibyte character.  */
 		  int c = RE_CHAR_TO_MULTIBYTE (p[1]);
 
-		  if (! CHAR_BYTE8_P (c))
-		    fastmap[CHAR_LEADING_CODE (c)] = 1;
+		  fastmap[CHAR_LEADING_CODE (c)] = 1;
 		}
 	    }
 	  break;

But, first of all, I don't know (remember) why there was this check:

   if (! CHAR_BYTE8_P (c))

I may have overlooked something.  Stefan, could you please
confirm that the above change is correct?

---
Kenichi Handa
handa <at> m17n.org




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Wed, 27 Jan 2010 14:35:02 GMT) Full text and rfc822 format available.

Message #29 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 4209 <at> debbugs.gnu.org, cyd <at> stupidchicken.com
Subject: Re: bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
Date: Wed, 27 Jan 2010 09:34:38 -0500
>> > Could you try to investigate Bug#4209?
>> Ok, I'll work on it.
> I fixed it as below.

> === modified file 'src/regex.c'
> --- src/regex.c	2010-01-13 08:35:10 +0000
> +++ src/regex.c	2010-01-27 03:57:03 +0000
> @@ -4083,8 +4083,7 @@
>  		     the corresponding multibyte character.  */
>  		  int c = RE_CHAR_TO_MULTIBYTE (p[1]);
 
> -		  if (! CHAR_BYTE8_P (c))
> -		    fastmap[CHAR_LEADING_CODE (c)] = 1;
> +		  fastmap[CHAR_LEADING_CODE (c)] = 1;
>  		}
>  	    }
>  	  break;

> But, first of all, I don't know (remember) why there was this check:

>    if (! CHAR_BYTE8_P (c))

> I may have overlooked something.  Stefan, could you please
> confirm that the above change is correct?

The preceding comment keeps me puzzled.  I thought that we only ever
matched re_patterns and buffers of the same multibyteness, i.e. if
a unibyte regexp is matched against a multibyte buffer it should first
be turned into a multibyte regexp and then re_compiled, so the case of:

		  /* For the case of matching this unibyte regex
		     against multibyte, we must set a leading code of
		     the corresponding multibyte character.  */

should never happen in analyse_first.  Yet, if your patch fixes the bug,
that indicates that apparently it *does* happen.


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Wed, 27 Jan 2010 16:44:02 GMT) Full text and rfc822 format available.

Message #32 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Chong Yidong <cyd <at> stupidchicken.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 4209 <at> debbugs.gnu.org, Kenichi Handa <handa <at> m17n.org>
Subject: Re: bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
Date: Wed, 27 Jan 2010 11:43:30 -0500
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> The preceding comment keeps me puzzled.  I thought that we only ever
> matched re_patterns and buffers of the same multibyteness, i.e. if
> a unibyte regexp is matched against a multibyte buffer it should first
> be turned into a multibyte regexp and then re_compiled, so the case of:
>
> 		  /* For the case of matching this unibyte regex
> 		     against multibyte, we must set a leading code of
> 		     the corresponding multibyte character.  */
>
> should never happen in analyse_first.  Yet, if your patch fixes the bug,
> that indicates that apparently it *does* happen.

I observe that in the original bug recipe:

  (set-buffer (get-buffer-create "*Test Buffer*"))
  (insert "\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A")
  (goto-char (point-min))
  (message "looking-at: %s" (looking-at "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A"))
  (message "re-search-forward: %s"
           (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))

If we replace

  (re-search-forward "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A" 100 t))

with

  (re-search-forward (string-to-multibyte
                       "\\`\xC2\x4C\xEF\x77\xC6\x69\x8C\x0A") 100 t))

then the regexp match takes places correctly.  I'm not sure why the
looking-at call works, tho.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Thu, 28 Jan 2010 01:20:03 GMT) Full text and rfc822 format available.

Message #35 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 4209 <at> debbugs.gnu.org, cyd <at> stupidchicken.com
Subject: Re: bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
Date: Thu, 28 Jan 2010 10:18:55 +0900
In article <jwviqanliww.fsf-monnier+emacs <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> The preceding comment keeps me puzzled.  I thought that we only ever
> matched re_patterns and buffers of the same multibyteness, i.e. if
> a unibyte regexp is matched against a multibyte buffer it should first
> be turned into a multibyte regexp and then re_compiled, so the case of:

Before we changed the behavour of unibyte->multibyte
conversion, that conversion depended on the preferred
charset (thus on lang. env.).  But, Emacs 22 wrongly cached
the pattern converted at some point, and reused it without
checking the change of preferred charset.

So, in emacs-unicode branch, I fixed the regex code so that
unibyte pattern can be directry used for multibyte buffer
search by doing unibyte->multibyte conversion on the fly.
And that code was merged to trunk.

So, 

> 		  /* For the case of matching this unibyte regex
> 		     against multibyte, we must set a leading code of
> 		     the corresponding multibyte character.  */

really happens.

---
Kenichi Handa
handa <at> m17n.org




bug closed, send any further explanations to "Christopher J. Madsen" <cjm <at> cjmweb.net> Request was from Chong Yidong <cyd <at> stupidchicken.com> to control <at> debbugs.gnu.org. (Thu, 28 Jan 2010 17:28:02 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Thu, 28 Jan 2010 19:03:02 GMT) Full text and rfc822 format available.

Message #40 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 4209 <at> debbugs.gnu.org, cyd <at> stupidchicken.com
Subject: Re: bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
Date: Thu, 28 Jan 2010 14:01:55 -0500
> So, in emacs-unicode branch, I fixed the regex code so that
> unibyte pattern can be directry used for multibyte buffer
> search by doing unibyte->multibyte conversion on the fly.
> And that code was merged to trunk.

Hmm... that's too bad since the subsequent change to get rid of the
dependency on locales made this change unnecessary.

But given this, yes, the patch looks right, and no, I have no idea what
the CHAR_BYTE8_P test was trying to do.


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#4209; Package emacs. (Fri, 29 Jan 2010 06:16:02 GMT) Full text and rfc822 format available.

Message #43 received at 4209 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: 4209 <at> debbugs.gnu.org, cyd <at> stupidchicken.com
Subject: Re: bug#4209: 23.1; Emacs 23.1 regression in re-search-forward
Date: Fri, 29 Jan 2010 15:15:28 +0900
In article <jwvvdemoy0b.fsf-monnier+emacs <at> gnu.org>, Stefan Monnier <monnier <at> IRO.UMontreal.CA> writes:

> > So, in emacs-unicode branch, I fixed the regex code so that
> > unibyte pattern can be directry used for multibyte buffer
> > search by doing unibyte->multibyte conversion on the fly.
> > And that code was merged to trunk.

> Hmm... that's too bad since the subsequent change to get rid of the
> dependency on locales made this change unnecessary.

Yes.  I'll put this in my todo list (but with lower priority).

   * avoid on-the-fly uni<->multi conversion in regex.c.

> But given this, yes, the patch looks right, and no, I have no idea what
> the CHAR_BYTE8_P test was trying to do.

Ok, thank you for the confirmation.

---
Kenichi Handa
handa <at> m17n.org




bug archived. Request was from Debbugs Internal Request <bug-gnu-emacs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 26 Feb 2010 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 15 years and 114 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.