GNU bug report logs - #9681
Broken behaviour of re-search-backward (.+ matching only a single character)

Previous Next

Package: emacs;

Reported by: Štěpán Němec <stepnem <at> gmail.com>

Date: Thu, 6 Oct 2011 09:20:02 UTC

Severity: minor

Tags: notabug

Merged with 11025, 24801

Found in versions 23.1, 25.1

Done: npostavs <at> users.sourceforge.net

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9681 in the body.
You can then email your comments to 9681 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Thu, 06 Oct 2011 09:20:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Štěpán Němec <stepnem <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 06 Oct 2011 09:20:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Štěpán Němec <stepnem <at> gmail.com>
To: bug-gnu-emacs <bug-gnu-emacs <at> gnu.org>
Subject: Broken behaviour of re-search-backward (.+ matching only a single
	character)
Date: Thu, 06 Oct 2011 11:13:26 +0200
Quoting from <http://permalink.gmane.org/gmane.emacs.gnus.user/15052>:

===========

> What am I doing wrong?

Nothing, I think :-). I personally don't use fancy split-
ting, but a deeper look at (at least Gnus 5.13's) code seems
to locate the culprit in Emacs' *backward* regular expres-
sion "non-greedity": Position point at the end of
"bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
is matched.

============

If this curious inconsistency of `re-search-backward' with
`re-search-forward' is intentional (which I hope it is not), it should
be documented, but I couldn't find anything in the manuals or
docstrings.

-- 
Štěpán




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Thu, 06 Oct 2011 12:58:02 GMT) Full text and rfc822 format available.

Message #8 received at 9681 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Štěpán Němec <stepnem <at> gmail.com>
Cc: 9681 <at> debbugs.gnu.org
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Thu, 06 Oct 2011 08:57:09 -0400
> If this curious inconsistency of `re-search-backward' with
> `re-search-forward' is intentional (which I hope it is not), it should
> be documented, but I couldn't find anything in the manuals or
> docstrings.

re-search-* stops at the first character position that has a match.
And then it chooses the longest match at that position.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Thu, 06 Oct 2011 18:55:02 GMT) Full text and rfc822 format available.

Message #11 received at 9681 <at> debbugs.gnu.org (full text, mbox):

From: Štěpán Němec <stepnem <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 9681 <at> debbugs.gnu.org
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Thu, 06 Oct 2011 20:48:51 +0200
[Stefan: sorry for two replies, I forgot to cc the bug list in my first
reply, also, I've changed my mind on some of the points since then, see
below.]

On Thu, Oct 06, 2011 at 08:57:09AM -0400, Stefan Monnier wrote:
> > If this curious inconsistency of `re-search-backward' with
> > `re-search-forward' is intentional (which I hope it is not), it should
> > be documented, but I couldn't find anything in the manuals or
> > docstrings.
>
> re-search-* stops at the first character position that has a match.
> And then it chooses the longest match at that position.

Thanks, but I'm not sure I understand what you mean here. Naturally, the
longest match for `re-search-backward' should be backward, not forward,
i.e. using your wording above, when searching _backward_ for \w+ in
"foobar|" where "|" is point, the "first character position that has a
match" might be "r", but it's hardly the longest match.

If I'm the only one who considers this behaviour broken (by design?[1]),
which I very much doubt, it definitely needs to at least be documented,
as I'm certainly not the only one who is very surprised by this
behaviour. In my opinion it should be fixed, though.

[1] Cf. e.g. ?\w\+ in Vim, which does the right thing.

--
Štěpán




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Thu, 06 Oct 2011 19:47:01 GMT) Full text and rfc822 format available.

Message #14 received at 9681 <at> debbugs.gnu.org (full text, mbox):

From: Johan Bockgård <bojohan <at> gnu.org>
To: Štěpán Němec <stepnem <at> gmail.com>
Cc: 9681 <at> debbugs.gnu.org
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Thu, 06 Oct 2011 21:46:32 +0200
Štěpán Němec <stepnem <at> gmail.com> writes:

> If this curious inconsistency of `re-search-backward' with
> `re-search-forward' is intentional (which I hope it is not), it should
> be documented, but I couldn't find anything in the manuals or
> docstrings.

Then you must not have looked very hard.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Fri, 07 Oct 2011 13:03:02 GMT) Full text and rfc822 format available.

Message #17 received at 9681 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Štěpán Němec <stepnem <at> gmail.com>
Cc: 9681 <at> debbugs.gnu.org
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Fri, 07 Oct 2011 09:02:18 -0400
>> re-search-* stops at the first character position that has a match.
>> And then it chooses the longest match at that position.
> Thanks, but I'm not sure I understand what you mean here. Naturally, the
> longest match for `re-search-backward' should be backward, not forward,

Ah, yes, sorry for being unclear: the search for a match goes backward,
but the matching itself goes forward.

The docstring of re-search-backward is more clear about that:

   The match found is the one starting last in the buffer
   and yet ending before the origin of the search.

> If I'm the only one who considers this behaviour broken (by design?[1]),

It's not the ideal behavior, admittedly.  It's even more obvious in
`looking-back'.  But fixing it would require the implementation of
a backward regexp matcher.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Fri, 07 Oct 2011 13:26:02 GMT) Full text and rfc822 format available.

Message #20 received at 9681 <at> debbugs.gnu.org (full text, mbox):

From: Štěpán Němec <stepnem <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 9681 <at> debbugs.gnu.org
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Fri, 7 Oct 2011 15:19:56 +0200
On Fri, Oct 07, 2011 at 09:02:18AM -0400, Stefan Monnier wrote:
> >> re-search-* stops at the first character position that has a match.
> >> And then it chooses the longest match at that position.
> > Thanks, but I'm not sure I understand what you mean here. Naturally, the
> > longest match for `re-search-backward' should be backward, not forward,
> 
> Ah, yes, sorry for being unclear: the search for a match goes backward,
> but the matching itself goes forward.
> 
> The docstring of re-search-backward is more clear about that:
> 
>    The match found is the one starting last in the buffer
>    and yet ending before the origin of the search.

I suppose that is more clear if you already know the behaviour, but I
didn't understand it that way, either. I think it should at least add
that the match is still forward, not backward, and that it might not
behave as expected for regexps containing constructs like * and +.
 
> > If I'm the only one who considers this behaviour broken (by design?[1]),
> 
> It's not the ideal behavior, admittedly.  It's even more obvious in
> `looking-back'.  But fixing it would require the implementation of
> a backward regexp matcher.

Yeah, as I said above (and as is obvious in the message quoted in the
bug report), the set of regexps usable with `re-search-backward' seems
to be quite limited, and one has to be very careful when using it (and
even some developers apparently fail at that).

So, again: it definitely needs better documentation, and IMO it also
needs fixing.

-- 
Štěpán




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Tue, 11 Oct 2011 02:04:02 GMT) Full text and rfc822 format available.

Message #23 received at 9681 <at> debbugs.gnu.org (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 9681 <at> debbugs.gnu.org, stepnem <at> gmail.com
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Tue, 11 Oct 2011 11:03:26 +0900
In article <jwvr52pj706.fsf-monnier+emacs <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> It's not the ideal behavior, admittedly.  It's even more obvious in
> `looking-back'.  But fixing it would require the implementation of
> a backward regexp matcher.

FYI, in Mule (the version before integrating into Emacs), we
implemented such a feature by doing these:

  o Regular expression compiler written in Elisp which
    generates both forward matching and backward matching
    compiled patterns.

  o Modify regex.c to accept the above patterns and do
    backward matching if necessary.

I vaguely remember that I discussed this feature with RMS
when we were going to integrate Mule's multilingual feature
into Emacs, and it was rejected because it's not related to
multilingual feature.  And actually, that feature had been
used very rarely.

---
Kenichi Handa
handa <at> m17n.org




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Tue, 11 Oct 2011 03:57:01 GMT) Full text and rfc822 format available.

Message #26 received at 9681 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 9681 <at> debbugs.gnu.org, stepnem <at> gmail.com
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Mon, 10 Oct 2011 23:56:27 -0400
>> It's not the ideal behavior, admittedly.  It's even more obvious in
>> `looking-back'.  But fixing it would require the implementation of
>> a backward regexp matcher.
> FYI, in Mule (the version before integrating into Emacs), we
> implemented such a feature by doing these:

I actually think it would be nice to have such a thing, but I also think
it'd be more important to have a non-backtracking regexp matcher.

> multilingual feature.  And actually, that feature had been
> used very rarely.

Indeed, it's not often needed, but those few cases can be significant.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#9681; Package emacs. (Fri, 16 Mar 2012 16:50:02 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jack Duthen <duthen.mac.01 <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#9681: Broken behaviour of re-search-backward (.+ matching
	only a single character)
Date: Fri, 16 Mar 2012 16:49:37 +0100
On Thu, Oct 06, 2011 at 08:57:09AM -0400, Stefan Monnier wrote:
> re-search-* stops at the first character position that has a match.
> And then it chooses the longest match at that position.

Stepan wrote:
> So, again: it definitely needs better documentation,
> and IMO it also needs fixing.

Hi!

For my own imenu-prev-index-position-function, I needed
a backward regexp search which would match something like ".+"
the way one (like Stepan) can expect rather than the way it actually
does (as described by Stefan).

So, I just wrote a function to do that.

The way it handles the COUNT variable is not as good as one could want
but, as I almost never use it, I don't care.
It's not very efficient but, since I can't notice the time it takes
when used in the "*rescan" menu and since I can't imagine a better algorithm,
it's ok for me.

(defun jd-re-search-backward (regexp &optional bound noerror count)
  (let ((orig-point (point)) bom)
    (when (re-search-backward regexp bound noerror count)
      (setq bom (point)) ; should not be useful
      (goto-char (point-min))
      (while (re-search-forward regexp orig-point 'noerror)
        ;; remember the last beginning of match
        (setq bom (match-beginning 0)))
      (goto-char bom)
      ;; set match data (erased by the last failing search) and return T
      (looking-at regexp))))

HTH
)jack(




Forcibly Merged 9681 11025. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 16 Mar 2012 17:49:02 GMT) Full text and rfc822 format available.

Forcibly Merged 9681 11025 24801. Request was from npostavs <at> users.sourceforge.net to control <at> debbugs.gnu.org. (Sat, 29 Oct 2016 01:19:01 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 24801 <at> debbugs.gnu.org and Drew Adams <drew.adams <at> oracle.com> Request was from npostavs <at> users.sourceforge.net to control <at> debbugs.gnu.org. (Sat, 29 Oct 2016 01:19:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 26 Nov 2016 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 205 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.