GNU bug report logs -
#31584
27.0.50; Document again what match re-search-backward finds
Previous Next
Reported by: Michael Heerdegen <michael_heerdegen <at> web.de>
Date: Thu, 24 May 2018 21:32:02 UTC
Severity: minor
Tags: fixed
Found in version 27.0.50
Fixed in version 26.1
Done: Noam Postavsky <npostavs <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31584 in the body.
You can then email your comments to 31584 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 21:32:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Michael Heerdegen <michael_heerdegen <at> web.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 24 May 2018 21:32:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
a user asked in emacs-help why
(re-search-backward "a*")
at the end of a line consisting only "a"s didn't move point. With
today's documentation, that question can't be answered.
Some time ago, we had this sentence in the docstring:
The match found is the one starting last in the buffer
and yet ending before the origin of the search.
but it has been removed. I think we need to say something like that,
otherwise the semantics of backward re search is unclear.
TIA,
Michael.
In GNU Emacs 27.0.50 (build 30, x86_64-pc-linux-gnu, GTK+ Version 3.22.29)
of 2018-05-24 built on drachen
Repository revision: 98c624708a37bc306130e1499fb1a0c5339a50af
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: Debian GNU/Linux buster/sid
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 21:46:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
Michael Heidegger <michael_heerdegen <at> web.de> writes:
> Hello,
>
> a user asked in emacs-help why
>
> (re-search-backward "a*")
>
> at the end of a line consisting only "a"s didn't move point. With
> today's documentation, that question can't be answered.
>
> Some time ago, we had this sentence in the docstring:
>
> The match found is the one starting last in the buffer
> and yet ending before the origin of the search.
>
> but it has been removed. I think we need to say something like that,
> otherwise the semantics of backward re search is unclear.
I've been bitten by this before. I'm sure the sentence you cite is
correct, but I would suggest something more explicit about backwards
searches. The most useful thing I could have read when I was wondering
why this didn't work would be something like: "re-search-backward always
behaves "non-greedily", i.e., it will find the shortest match before
point".
That might not be technically correct, but those are the terms that
would have made sense to me: in particular, the "*" token is supposed to
be "greedy", so why isn't it greedy backwards? This doesn't explain why
it isn't, but it would have explicitly told me that it wouldn't be.
Eric
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 22:00:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
> Michael Heidegger <michael_heerdegen <at> web.de> writes:
FWIW, my last name is "Heerdegen" AFAIK.
> > The match found is the one starting last in the buffer
> > and yet ending before the origin of the search.
> I've been bitten by this before. I'm sure the sentence you cite is
> correct, but I would suggest something more explicit about backwards
> searches. The most useful thing I could have read when I was wondering
> why this didn't work would be something like: "re-search-backward always
> behaves "non-greedily", i.e., it will find the shortest match before
> point".
>
> That might not be technically correct, but those are the terms that
> would have made sense to me: in particular, the "*" token is supposed to
> be "greedy", so why isn't it greedy backwards? This doesn't explain why
> it isn't, but it would have explicitly told me that it wouldn't be.
Without thinking long about it, I guess the above definition, and greedy
operators behaving non-greedy for backwards search, could be equivalent,
more or less.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 22:14:01 GMT)
Full text and
rfc822 format available.
Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
>
>> Michael Heidegger <michael_heerdegen <at> web.de> writes:
>
> FWIW, my last name is "Heerdegen" AFAIK.
It's not too late to change!
I blame `flyspell-auto-correct-previous-word' for this stuff the same
way that other people blame autocorrect on iOS. Apparently I randomly
hit "C-;" a lot.
>> > The match found is the one starting last in the buffer
>> > and yet ending before the origin of the search.
>
>> I've been bitten by this before. I'm sure the sentence you cite is
>> correct, but I would suggest something more explicit about backwards
>> searches. The most useful thing I could have read when I was wondering
>> why this didn't work would be something like: "re-search-backward always
>> behaves "non-greedily", i.e., it will find the shortest match before
>> point".
>>
>> That might not be technically correct, but those are the terms that
>> would have made sense to me: in particular, the "*" token is supposed to
>> be "greedy", so why isn't it greedy backwards? This doesn't explain why
>> it isn't, but it would have explicitly told me that it wouldn't be.
>
> Without thinking long about it, I guess the above definition, and greedy
> operators behaving non-greedy for backwards search, could be equivalent,
> more or less.
I agree they're equivalent, but it would take me longer to think about
it, particularly when I'm trying to make a regexp match and am already
annoyed. But it was just a suggestion -- so long as something gets in
there, I don't mind.
Eric
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 22:15:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
> Michael Heidegger <michael_heerdegen <at> web.de> writes:
>
>>
>> (re-search-backward "a*")
>>
>> at the end of a line consisting only "a"s didn't move point. With
>> today's documentation, that question can't be answered.
The docstring should definitely be clarified, but technically it can
still be answered, if you read very carefully:
(re-search-backward REGEXP &optional BOUND NOERROR COUNT)
Search backward from point for regular expression REGEXP.
This function is almost identical to ‘re-search-forward’, except that
by default it searches backward instead of forward, and the sign of
COUNT also indicates exactly the opposite searching direction.
(re-search-forward REGEXP &optional BOUND NOERROR COUNT)
[...]
With COUNT positive/negative, the match found is [...] located
entirely after/before the origin of the search.
>> Some time ago, we had this sentence in the docstring:
>>
>> The match found is the one starting last in the buffer
>> and yet ending before the origin of the search.
>>
>> but it has been removed. I think we need to say something like that,
>> otherwise the semantics of backward re search is unclear.
Yeah, it is sufficiently surprising that it should be called out
specifically.
> I've been bitten by this before. I'm sure the sentence you cite is,
> correct, but I would suggest something more explicit about backwards
> searches. The most useful thing I could have read when I was wondering
> why this didn't work would be something like: "re-search-backward always
> behaves "non-greedily", i.e., it will find the shortest match before
> point".
It is greedy:
(with-temp-buffer
(insert "xxxxyyyy")
(and (re-search-backward "x+y*" nil t)
(match-string 0))) ;=> "xyyyy"
Non-greedy wouldn't match any "y"s. It's a bit tricky to explain both
correctly and clearly...
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 22:49:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Noam Postavsky <npostavs <at> gmail.com> writes:
> The docstring should definitely be clarified, but technically it can
> still be answered, if you read very carefully:
>
> (re-search-backward REGEXP &optional BOUND NOERROR COUNT)
>
> Search backward from point for regular expression REGEXP.
> This function is almost identical to ‘re-search-forward’, except that
> by default it searches backward instead of forward, and the sign of
> COUNT also indicates exactly the opposite searching direction.
>
> (re-search-forward REGEXP &optional BOUND NOERROR COUNT)
>
> [...]
> With COUNT positive/negative, the match found is [...] located
> entirely after/before the origin of the search.
You mean the sentence about the COUNT arg? Yes, _very_ carefully.
> It is greedy:
>
> (with-temp-buffer
> (insert "xxxxyyyy")
> (and (re-search-backward "x+y*" nil t)
> (match-string 0))) ;=> "xyyyy"
>
> Non-greedy wouldn't match any "y"s. It's a bit tricky to explain both
> correctly and clearly...
Ok, good example. You convinced me that the sentence we once had was
actually quite good.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 22:51:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 31584 <at> debbugs.gnu.org (full text, mbox):
On 05/24/18 18:14 PM, Noam Postavsky wrote:
> The docstring should definitely be clarified, but technically it can
> still be answered, if you read very carefully:
If reading the documentation takes only a little less effort than
reading the code...
>> I've been bitten by this before. I'm sure the sentence you cite is,
>> correct, but I would suggest something more explicit about backwards
>> searches. The most useful thing I could have read when I was wondering
>> why this didn't work would be something like: "re-search-backward always
>> behaves "non-greedily", i.e., it will find the shortest match before
>> point".
>
> It is greedy:
>
> (with-temp-buffer
> (insert "xxxxyyyy")
> (and (re-search-backward "x+y*" nil t)
> (match-string 0))) ;=> "xyyyy"
>
> Non-greedy wouldn't match any "y"s. It's a bit tricky to explain both
> correctly and clearly...
Yeah, my wording is bad. I think an example might be most clear. Maybe:
#+BEGIN_SRC elisp
(with-temp-buffer
(let ((re "x+y+"))
(insert "xxxxyyyy")
(goto-char (point-min))
(re-search-forward re nil t)
(match-string 0) => "xxxxyyyy"
(goto-char (point-max))
(re-search-backward re nil t)
(match-string 0))) => "xyyyy"
#+END_SRC
Or if there's something more concise...
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Thu, 24 May 2018 23:56:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 31584 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
>> (with-temp-buffer
>> (insert "xxxxyyyy")
>> (and (re-search-backward "x+y*" nil t)
>> (match-string 0))) ;=> "xyyyy"
>>
>> Non-greedy wouldn't match any "y"s. It's a bit tricky to explain both
>> correctly and clearly...
>
> Ok, good example. You convinced me that the sentence we once had was
> actually quite good.
Actually, the manual has a pretty good explanation, maybe we can just
link to it:
[v1-0001-Note-caveat-for-backward-regexp-searching-in-docs.patch (text/x-diff, inline)]
From 8caeb0df40fc1cc34cd165d68238216198e01169 Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs <at> gmail.com>
Date: Thu, 24 May 2018 19:49:11 -0400
Subject: [PATCH v1] Note caveat for backward regexp searching in docstring
(Bug#31584)
* src/search.c (Fre_search_backward): Emphasize that backwards
searches may give shorter than expected matches.
* doc/lispref/searching.texi (Regexp Search): Add an anchor for
re-search-backward to reference.
---
doc/lispref/searching.texi | 2 ++
src/search.c | 5 ++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/doc/lispref/searching.texi b/doc/lispref/searching.texi
index fca877117d..6c1ebb22b5 100644
--- a/doc/lispref/searching.texi
+++ b/doc/lispref/searching.texi
@@ -1102,6 +1102,8 @@ Regexp Search
@end example
@end deffn
+@c This anchor is referenced by re-search-backward's docstring.
+@anchor{re-search-backward}
@deffn Command re-search-backward regexp &optional limit noerror count
This function searches backward in the current buffer for a string of
text that is matched by the regular expression @var{regexp}, leaving
diff --git a/src/search.c b/src/search.c
index 842e9309a2..0600e1a4e3 100644
--- a/src/search.c
+++ b/src/search.c
@@ -2233,8 +2233,11 @@ DEFUN ("re-search-backward", Fre_search_backward, Sre_search_backward, 1, 4,
This function is almost identical to `re-search-forward', except that
by default it searches backward instead of forward, and the sign of
COUNT also indicates exactly the opposite searching direction.
+See `re-search-forward' for details.
-See `re-search-forward' for details. */)
+Note that searching backwards may give a shorter match than expected,
+because the matching still happens in the forward direction. See Info
+anchor `(elisp) re-search-backward' for details. */)
(Lisp_Object regexp, Lisp_Object bound, Lisp_Object noerror, Lisp_Object count)
{
return search_command (regexp, bound, noerror, count, -1, 1, 0);
--
2.11.0
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 00:24:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Noam Postavsky <npostavs <at> gmail.com> writes:
> Actually, the manual has a pretty good explanation, maybe we can just
> link to it:
> -See `re-search-forward' for details. */)
> +Note that searching backwards may give a shorter match than expected,
> +because the matching still happens in the forward direction. See Info
> +anchor `(elisp) re-search-backward' for details. */)
> (Lisp_Object regexp, Lisp_Object bound, Lisp_Object noerror, Lisp_Object count)
> {
> return search_command (regexp, bound, noerror, count, -1, 1, 0);
Too bad that the anchor is located after the relevant description.
FWIW; I still prefer the original sentence, I find it describes the
behavior best, and short (which is good for a docstring). It is also
good to have an alternative and more verbose explanation in the manual.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 00:30:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 31584 <at> debbugs.gnu.org (full text, mbox):
> It is greedy:
> (with-temp-buffer
> (insert "xxxxyyyy")
> (and (re-search-backward "x+y*" nil t) (match-string 0))) ;=> "xyyyy"
>
> Non-greedy wouldn't match any "y"s. It's a bit tricky to explain both
> correctly and clearly...
Maybe it would help to say that the pattern is always matched in a forward direction, even when it matches text that is before point.
The pattern itself is not read backward (you don't write +x*y
for the reverse search of x+y*), and it doesn't match backward.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 00:32:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
>> +Note that searching backwards may give a shorter match than expected,
>> +because the matching still happens in the forward direction. See Info
>> +anchor `(elisp) re-search-backward' for details. */)
> Too bad that the anchor is located after the relevant description.
I don't understand what you mean.
> FWIW; I still prefer the original sentence, I find it describes the
> behavior best, and short (which is good for a docstring). It is also
> good to have an alternative and more verbose explanation in the manual.
I find the original sentence kind of cryptic, but I'm okay to be
outvoted on this.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 00:40:02 GMT)
Full text and
rfc822 format available.
Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):
Noam Postavsky <npostavs <at> gmail.com> writes:
> Michael Heerdegen <michael_heerdegen <at> web.de> writes:
>
>>> +Note that searching backwards may give a shorter match than expected,
>>> +because the matching still happens in the forward direction. See Info
>>> +anchor `(elisp) re-search-backward' for details. */)
>
>> Too bad that the anchor is located after the relevant description.
>
> I don't understand what you mean.
>
>> FWIW; I still prefer the original sentence, I find it describes the
>> behavior best, and short (which is good for a docstring). It is also
>> good to have an alternative and more verbose explanation in the manual.
>
> I find the original sentence kind of cryptic, but I'm okay to be
> outvoted on this.
I think Drew's statement that "the pattern is always matched in a
forward direction, even when it matches text that is before point" makes
a lot of sense. That's just a 2¢, though, that's all I've got for this
issue.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 01:12:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Noam Postavsky <npostavs <at> gmail.com> writes:
> > Too bad that the anchor is located after the relevant description.
>
> I don't understand what you mean.
The anchor is in this paragraph:
Nonincremental search for a regexp is done with the commands
‘re-search-forward’ and ‘re-search-backward’. [...]
But I thought you wanted to refer to the description in the paragraph
before that, that is:
Forward and backward regexp search are not symmetrical, because
regexp matching in Emacs always operates forward, starting with the
beginning of the regexp. Thus, forward regexp search scans forward,
trying a forward match at each possible starting position. Backward
regexp search scans backward, trying a forward match at each possible
starting position. These search methods are not mirror images.
The problem is that most people will only read forward. Did I miss
something?
> > FWIW; I still prefer the original sentence, I find it describes the
> > behavior best, and short (which is good for a docstring). It is also
> > good to have an alternative and more verbose explanation in the manual.
>
> I find the original sentence kind of cryptic, but I'm okay to be
> outvoted on this.
So far, I'm outvoted.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 01:28:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> The anchor is in this paragraph:
>
> Nonincremental search for a regexp is done with the commands
> ‘re-search-forward’ and ‘re-search-backward’. [...]
Ah, no, the anchor (which I added as part of the patch) is in the elisp
manual, not the emacs manual.
<<<<<ANCHOR IS HERE>>>>>>
-- Command: re-search-backward regexp &optional limit noerror count
This function searches backward in the current buffer for a string
of text that is matched by the regular expression REGEXP, leaving
point at the beginning of the first text found.
This function is analogous to `re-search-forward', but they are not
simple mirror images. `re-search-forward' finds the match whose
beginning is as close as possible to the starting point. If
`re-search-backward' were a perfect mirror image, it would find the
match whose end is as close as possible. However, in fact it
finds the match whose beginning is as close as possible (and yet
ends before the starting point). The reason for this is that
matching a regular expression at a given spot always works from
beginning to end, and starts at a specified beginning position.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 01:49:01 GMT)
Full text and
rfc822 format available.
Message #47 received at 31584 <at> debbugs.gnu.org (full text, mbox):
Noam Postavsky <npostavs <at> gmail.com> writes:
> Ah, no, the anchor (which I added as part of the patch) is in the elisp
> manual, not the emacs manual.
>
> <<<<<ANCHOR IS HERE>>>>>>
> -- Command: re-search-backward regexp &optional limit noerror count
Ah ok, thanks, then I don't have any objections.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 06:24:01 GMT)
Full text and
rfc822 format available.
Message #50 received at 31584 <at> debbugs.gnu.org (full text, mbox):
> From: Noam Postavsky <npostavs <at> gmail.com>
> Date: Thu, 24 May 2018 19:55:18 -0400
> Cc: Eric Abrahamsen <eric <at> ericabrahamsen.net>, 31584 <at> debbugs.gnu.org
>
> Actually, the manual has a pretty good explanation, maybe we can just
> link to it:
Yes, this LGTM.
Thanks, please push to the release branch.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#31584
; Package
emacs
.
(Fri, 25 May 2018 12:01:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 31584 <at> debbugs.gnu.org (full text, mbox):
tags 31584 fixed
close 31584 26.1
quit
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Noam Postavsky <npostavs <at> gmail.com>
>> Date: Thu, 24 May 2018 19:55:18 -0400
>> Cc: Eric Abrahamsen <eric <at> ericabrahamsen.net>, 31584 <at> debbugs.gnu.org
>>
>> Actually, the manual has a pretty good explanation, maybe we can just
>> link to it:
>
> Yes, this LGTM.
>
> Thanks, please push to the release branch.
Done (with a slight tweak to the docstring phrasing)
[1: 2f44d2d5b1]: 2018-05-25 07:54:30 -0400
Note caveat for backward regexp searching in docstring (Bug#31584)
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=2f44d2d5b15008fde44a56ca24f0c3b6b9e63faf
Added tag(s) fixed.
Request was from
Noam Postavsky <npostavs <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Fri, 25 May 2018 12:01:02 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 26.1, send any further explanations to
31584 <at> debbugs.gnu.org and Michael Heerdegen <michael_heerdegen <at> web.de>
Request was from
Noam Postavsky <npostavs <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Fri, 25 May 2018 12:01:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 23 Jun 2018 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 357 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.