GNU bug report logs -
#9681
Broken behaviour of re-search-backward (.+ matching only a single character)
Previous Next
Reported by: Štěpán Němec <stepnem <at> gmail.com>
Date: Thu, 6 Oct 2011 09:20:02 UTC
Severity: minor
Tags: notabug
Merged with 11025,
24801
Found in versions 23.1, 25.1
Done: npostavs <at> users.sourceforge.net
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9681 in the body.
You can then email your comments to 9681 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Thu, 06 Oct 2011 09:20:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Štěpán Němec <stepnem <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 06 Oct 2011 09:20:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Quoting from <http://permalink.gmane.org/gmane.emacs.gnus.user/15052>:
===========
> What am I doing wrong?
Nothing, I think :-). I personally don't use fancy split-
ting, but a deeper look at (at least Gnus 5.13's) code seems
to locate the culprit in Emacs' *backward* regular expres-
sion "non-greedity": Position point at the end of
"bugzilla.gdm", C-u C-r "\w+" - et voilà, only one character
is matched.
============
If this curious inconsistency of `re-search-backward' with
`re-search-forward' is intentional (which I hope it is not), it should
be documented, but I couldn't find anything in the manuals or
docstrings.
--
Štěpán
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Thu, 06 Oct 2011 12:58:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 9681 <at> debbugs.gnu.org (full text, mbox):
> If this curious inconsistency of `re-search-backward' with
> `re-search-forward' is intentional (which I hope it is not), it should
> be documented, but I couldn't find anything in the manuals or
> docstrings.
re-search-* stops at the first character position that has a match.
And then it chooses the longest match at that position.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Thu, 06 Oct 2011 18:55:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 9681 <at> debbugs.gnu.org (full text, mbox):
[Stefan: sorry for two replies, I forgot to cc the bug list in my first
reply, also, I've changed my mind on some of the points since then, see
below.]
On Thu, Oct 06, 2011 at 08:57:09AM -0400, Stefan Monnier wrote:
> > If this curious inconsistency of `re-search-backward' with
> > `re-search-forward' is intentional (which I hope it is not), it should
> > be documented, but I couldn't find anything in the manuals or
> > docstrings.
>
> re-search-* stops at the first character position that has a match.
> And then it chooses the longest match at that position.
Thanks, but I'm not sure I understand what you mean here. Naturally, the
longest match for `re-search-backward' should be backward, not forward,
i.e. using your wording above, when searching _backward_ for \w+ in
"foobar|" where "|" is point, the "first character position that has a
match" might be "r", but it's hardly the longest match.
If I'm the only one who considers this behaviour broken (by design?[1]),
which I very much doubt, it definitely needs to at least be documented,
as I'm certainly not the only one who is very surprised by this
behaviour. In my opinion it should be fixed, though.
[1] Cf. e.g. ?\w\+ in Vim, which does the right thing.
--
Štěpán
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Thu, 06 Oct 2011 19:47:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 9681 <at> debbugs.gnu.org (full text, mbox):
Štěpán Němec <stepnem <at> gmail.com> writes:
> If this curious inconsistency of `re-search-backward' with
> `re-search-forward' is intentional (which I hope it is not), it should
> be documented, but I couldn't find anything in the manuals or
> docstrings.
Then you must not have looked very hard.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Fri, 07 Oct 2011 13:03:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 9681 <at> debbugs.gnu.org (full text, mbox):
>> re-search-* stops at the first character position that has a match.
>> And then it chooses the longest match at that position.
> Thanks, but I'm not sure I understand what you mean here. Naturally, the
> longest match for `re-search-backward' should be backward, not forward,
Ah, yes, sorry for being unclear: the search for a match goes backward,
but the matching itself goes forward.
The docstring of re-search-backward is more clear about that:
The match found is the one starting last in the buffer
and yet ending before the origin of the search.
> If I'm the only one who considers this behaviour broken (by design?[1]),
It's not the ideal behavior, admittedly. It's even more obvious in
`looking-back'. But fixing it would require the implementation of
a backward regexp matcher.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Fri, 07 Oct 2011 13:26:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 9681 <at> debbugs.gnu.org (full text, mbox):
On Fri, Oct 07, 2011 at 09:02:18AM -0400, Stefan Monnier wrote:
> >> re-search-* stops at the first character position that has a match.
> >> And then it chooses the longest match at that position.
> > Thanks, but I'm not sure I understand what you mean here. Naturally, the
> > longest match for `re-search-backward' should be backward, not forward,
>
> Ah, yes, sorry for being unclear: the search for a match goes backward,
> but the matching itself goes forward.
>
> The docstring of re-search-backward is more clear about that:
>
> The match found is the one starting last in the buffer
> and yet ending before the origin of the search.
I suppose that is more clear if you already know the behaviour, but I
didn't understand it that way, either. I think it should at least add
that the match is still forward, not backward, and that it might not
behave as expected for regexps containing constructs like * and +.
> > If I'm the only one who considers this behaviour broken (by design?[1]),
>
> It's not the ideal behavior, admittedly. It's even more obvious in
> `looking-back'. But fixing it would require the implementation of
> a backward regexp matcher.
Yeah, as I said above (and as is obvious in the message quoted in the
bug report), the set of regexps usable with `re-search-backward' seems
to be quite limited, and one has to be very careful when using it (and
even some developers apparently fail at that).
So, again: it definitely needs better documentation, and IMO it also
needs fixing.
--
Štěpán
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Tue, 11 Oct 2011 02:04:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 9681 <at> debbugs.gnu.org (full text, mbox):
In article <jwvr52pj706.fsf-monnier+emacs <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> It's not the ideal behavior, admittedly. It's even more obvious in
> `looking-back'. But fixing it would require the implementation of
> a backward regexp matcher.
FYI, in Mule (the version before integrating into Emacs), we
implemented such a feature by doing these:
o Regular expression compiler written in Elisp which
generates both forward matching and backward matching
compiled patterns.
o Modify regex.c to accept the above patterns and do
backward matching if necessary.
I vaguely remember that I discussed this feature with RMS
when we were going to integrate Mule's multilingual feature
into Emacs, and it was rejected because it's not related to
multilingual feature. And actually, that feature had been
used very rarely.
---
Kenichi Handa
handa <at> m17n.org
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Tue, 11 Oct 2011 03:57:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 9681 <at> debbugs.gnu.org (full text, mbox):
>> It's not the ideal behavior, admittedly. It's even more obvious in
>> `looking-back'. But fixing it would require the implementation of
>> a backward regexp matcher.
> FYI, in Mule (the version before integrating into Emacs), we
> implemented such a feature by doing these:
I actually think it would be nice to have such a thing, but I also think
it'd be more important to have a non-backtracking regexp matcher.
> multilingual feature. And actually, that feature had been
> used very rarely.
Indeed, it's not often needed, but those few cases can be significant.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9681
; Package
emacs
.
(Fri, 16 Mar 2012 16:50:02 GMT)
Full text and
rfc822 format available.
Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):
On Thu, Oct 06, 2011 at 08:57:09AM -0400, Stefan Monnier wrote:
> re-search-* stops at the first character position that has a match.
> And then it chooses the longest match at that position.
Stepan wrote:
> So, again: it definitely needs better documentation,
> and IMO it also needs fixing.
Hi!
For my own imenu-prev-index-position-function, I needed
a backward regexp search which would match something like ".+"
the way one (like Stepan) can expect rather than the way it actually
does (as described by Stefan).
So, I just wrote a function to do that.
The way it handles the COUNT variable is not as good as one could want
but, as I almost never use it, I don't care.
It's not very efficient but, since I can't notice the time it takes
when used in the "*rescan" menu and since I can't imagine a better algorithm,
it's ok for me.
(defun jd-re-search-backward (regexp &optional bound noerror count)
(let ((orig-point (point)) bom)
(when (re-search-backward regexp bound noerror count)
(setq bom (point)) ; should not be useful
(goto-char (point-min))
(while (re-search-forward regexp orig-point 'noerror)
;; remember the last beginning of match
(setq bom (match-beginning 0)))
(goto-char bom)
;; set match data (erased by the last failing search) and return T
(looking-at regexp))))
HTH
)jack(
Forcibly Merged 9681 11025.
Request was from
Glenn Morris <rgm <at> gnu.org>
to
control <at> debbugs.gnu.org
.
(Fri, 16 Mar 2012 17:49:02 GMT)
Full text and
rfc822 format available.
Forcibly Merged 9681 11025 24801.
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Sat, 29 Oct 2016 01:19:01 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
24801 <at> debbugs.gnu.org and Drew Adams <drew.adams <at> oracle.com>
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Sat, 29 Oct 2016 01:19:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 26 Nov 2016 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 8 years and 205 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.