GNU bug report logs -
#12375
Broken matching of regexps in fancy splitting
Previous Next
Reported by: jathd <jathdr <at> gmail.com>
Date: Fri, 7 Sep 2012 05:31:02 UTC
Severity: normal
Tags: fixed
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 12375 in the body.
You can then email your comments to 12375 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Fri, 07 Sep 2012 05:31:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
jathd <jathdr <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bugs <at> gnus.org
.
(Fri, 07 Sep 2012 05:31:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi.
I'm not sure if this counts as a bug, but it's certainly an unexpected
behaviour (to me) to which I have found no reference in the manual.
The situation:
* In .gnus.el, specify splitting with
(setq nnmail-split-methods 'nnmail-split-fancy)
(setq nnmail-split-fancy
'(to "\\(\\w+\\)-devel@.*" "list.devel.\\1"))
* Visit a message whose only relevant header (relative to the "to"
abbreviation) is
To: Batteries-devel <batteries-devel <at> lists.forge.ocamlcore.org>
* Press B q and read the message in the minibuffer.
Current message: This message would go to list.devel.s
Expected message: This message would go to list.devel.batteries
I believe the problem comes from the way the regexp value I gave (the
VALUE field in the split) is matched against the contents of the header,
as defined in the function nnmail-split-it in nnmail.el. Once the
interesting bit of the header has been found, the match is done with
re-search-backward, which explains why the "\\w+" bit only matches the
last letter of "batteries".
This conflicts with what is usually meant by a regexp matching a
string. I know I was expecting the match to be done as if by
(string-match regexp relevant-header-part). Here's a simple patch that
changes re-search-backward to re-search-forward:
========================================================================
--- a/lisp/nnmail.el
+++ b/lisp/nnmail.el
@@ -1463,13 +1463,13 @@ See the documentation for the variable `nnmail-split-fancy' for details."
(setq split-rest nil)
(setq split-rest (cddr split-rest))))
(when split-rest
- (goto-char end)
+ (goto-char start-of-value)
(let ((value (nth 1 split)))
(if (symbolp value)
(setq value (cdr (assq value nnmail-split-abbrev-alist))))
;; Someone might want to do a \N sub on this match, so get the
;; correct match positions.
- (re-search-backward value start-of-value))
+ (re-search-forward value end))
(dolist (sp (nnmail-split-it (car split-rest)))
(unless (member sp split-result)
(push sp split-result))))))
========================================================================
This produces the expected message, and I don't think it should break
anything.
[As a side note: is there a particular reason why the file nnmail.el
uses re-search-backward everywhere and not re-search-forward, or does it
come down to taste?]
Ma Gnus v0.6
GNU Emacs 24.2.50.1 (x86_64-apple-darwin10.8.0, NS apple-appkit-1038.36)
of 2012-09-07 on mbk.local
200 news-1.free.fr (5-1) NNRP Service Ready - newsmaster <at> proxad.net (posting ok)
500 What?
--
jathd
Information forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Tue, 25 Dec 2012 15:15:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 12375 <at> debbugs.gnu.org (full text, mbox):
jathd <jathdr <at> gmail.com> writes:
> I'm not sure if this counts as a bug, but it's certainly an unexpected
> behaviour (to me) to which I have found no reference in the manual.
>
> The situation:
>
> * In .gnus.el, specify splitting with
>
> (setq nnmail-split-methods 'nnmail-split-fancy)
> (setq nnmail-split-fancy
> '(to "\\(\\w+\\)-devel@.*" "list.devel.\\1"))
>
> * Visit a message whose only relevant header (relative to the "to"
> abbreviation) is
>
> To: Batteries-devel <batteries-devel <at> lists.forge.ocamlcore.org>
>
> * Press B q and read the message in the minibuffer.
>
> Current message: This message would go to list.devel.s
> Expected message: This message would go to list.devel.batteries
>
> I believe the problem comes from the way the regexp value I gave (the
> VALUE field in the split) is matched against the contents of the header,
> as defined in the function nnmail-split-it in nnmail.el. Once the
> interesting bit of the header has been found, the match is done with
> re-search-backward, which explains why the "\\w+" bit only matches the
> last letter of "batteries".
Do you have `nnmail-split-fancy-match-partial-words' set? If not, the
splitting machinery will put a \\< in front of the regexp, leading to
"batteries" being matched.
--
(domestic pets only, the antidote for overdose, milk.)
http://lars.ingebrigtsen.no * Lars Magne Ingebrigtsen
Information forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Sat, 02 Feb 2013 19:04:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 12375 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> Do you have `nnmail-split-fancy-match-partial-words' set? If not, the
> splitting machinery will put a \\< in front of the regexp, leading to
> "batteries" being matched.
I have `nnmail-split-fancy-match-partial-words' set to `nil', and I
checked that the splitting process does put a "\\<" in the regexp.
The problem is that this regexp is used to decide whether a message
follows a rule (in other words, that regexp is matched against the
message header), but when extracting the value with which to replace
"\\1" in the group name "list.devel.\\1" I gave, it matches a second
time with the *original regexp* I gave (and not the whole header this
time, it narrows the search to the interesting part of the relevant
header field). So this time, it uses the regexp without the "\\<", which
explains why it replaces "\\1" with "s" instead of "batteries".
So there's a simple workaround: I just put a "\\<" in the regexp in
`nnmail-split-fancy', and it works fine. So I don't know if this is a
bug or not; the workaround is simple, but I still find the behaviour of
the matcher a bit unexpected. Namely, I would expect it to match as if
using `(string-match regexp-I-gave contents-of-relevant-header-field)'.
Maybe the manual should have a word about this?
--
jathd
Information forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Sat, 06 Jul 2013 17:55:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 12375 <at> debbugs.gnu.org (full text, mbox):
jathd <jathdr <at> gmail.com> writes:
> The problem is that this regexp is used to decide whether a message
> follows a rule (in other words, that regexp is matched against the
> message header), but when extracting the value with which to replace
> "\\1" in the group name "list.devel.\\1" I gave, it matches a second
> time with the *original regexp* I gave (and not the whole header this
> time, it narrows the search to the interesting part of the relevant
> header field). So this time, it uses the regexp without the "\\<", which
> explains why it replaces "\\1" with "s" instead of "batteries".
Hm. Looking at the code, I can't see where it's re-matching against the
original regexp. `nnmail-expand-newtext' just uses the result from
`match-beginning'/`match-end', which should be the right thing to do.
If I've reading the code correctly, which I may not be doing.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Sun, 11 Aug 2013 22:08:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 12375 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> Hm. Looking at the code, I can't see where it's re-matching against the
> original regexp.
It's around line 1470:
(let ((value (nth 1 split)))
(if (symbolp value)
(setq value (cdr (assq value nnmail-split-abbrev-alist))))
;; Someone might want to do a \N sub on this match, so get the
;; correct match positions.
(re-search-backward value start-of-value))
The `value' is the regexp I specified in `nnmail-split-fancy'. Before
that, to see whether a header matches, the function does
(re-search-backward (cdr cached-pair) nil t)
where (cdr cached-pair) is a regexp it constructed the first time
around. Since it prepended a "\\<" to *that* regexp, the header is
correctly identified; but it didn't prepend anything to `value', so the
extraction goes wrong.
--
jathd
Information forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Thu, 30 Jan 2014 23:39:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 12375 <at> debbugs.gnu.org (full text, mbox):
jathd <jathdr <at> gmail.com> writes:
> The `value' is the regexp I specified in `nnmail-split-fancy'. Before
> that, to see whether a header matches, the function does
>
> (re-search-backward (cdr cached-pair) nil t)
>
> where (cdr cached-pair) is a regexp it constructed the first time
> around. Since it prepended a "\\<" to *that* regexp, the header is
> correctly identified; but it didn't prepend anything to `value', so the
> extraction goes wrong.
Aha.
I've now rewritten that function to save the match data explicitly after
doing the first match so that we don't have to redo the search. It
should be faster, too.
Does this fix the problem for you? It's in git Gnus, but should be in
bzr Emacs soon...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Mon, 10 Feb 2014 18:22:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 12375 <at> debbugs.gnu.org (full text, mbox):
On 2014-01-31 00:37 +0100, Lars Ingebrigtsen wrote:
> jathd <jathdr <at> gmail.com> writes:
>
>> The `value' is the regexp I specified in `nnmail-split-fancy'. Before
>> that, to see whether a header matches, the function does
>>
>> (re-search-backward (cdr cached-pair) nil t)
>>
>> where (cdr cached-pair) is a regexp it constructed the first time
>> around. Since it prepended a "\\<" to *that* regexp, the header is
>> correctly identified; but it didn't prepend anything to `value', so the
>> extraction goes wrong.
>
> Aha.
>
> I've now rewritten that function to save the match data explicitly after
> doing the first match so that we don't have to redo the search. It
> should be faster, too.
>
> Does this fix the problem for you? It's in git Gnus, but should be in
> bzr Emacs soon...
It seems this happened recently, and it apparently broke something for
me: I'm subscribed to Debian bug 715194[1], and messages coming from
that bug have the following header line:
,----
| Resent-CC: Debian OpenSSH Maintainers <debian-ssh <at> lists.debian.org>
`----
Now I'm using the following split rule, copied from the Gnus manual:
,----
| (to "debian-\\b\\(\\w+\\)@lists.debian.org" "mail.debian.\\1")
`----
With that rule, messages to the above bug used to appear under
mail.debian.ssh, but since my latest Emacs upgrade yesterday, they show
up under "mail.debian.resent-cc: debian openssh maintainers <" instead.
Huh?
Cheers,
Sven
1. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=715194
Information forwarded
to
bugs <at> gnus.org
:
bug#12375
; Package
gnus
.
(Wed, 12 Feb 2014 05:54:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 12375 <at> debbugs.gnu.org (full text, mbox):
Sven Joachim <svenjoac <at> gmx.de> writes:
> With that rule, messages to the above bug used to appear under
> mail.debian.ssh, but since my latest Emacs upgrade yesterday, they show
> up under "mail.debian.resent-cc: debian openssh maintainers <" instead.
> Huh?
Oops. Fixed in bzr Emacs now, so it'll be fixed in git Gnus soon.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
bug reassigned from package 'gnus' to 'emacs,gnus'.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Wed, 25 Jan 2017 14:51:02 GMT)
Full text and
rfc822 format available.
bug No longer marked as found in versions 5.130006.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Wed, 25 Jan 2017 14:51:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org, bugs <at> gnus.org
:
bug#12375
; Package
emacs,gnus
.
(Wed, 25 Jan 2017 15:04:03 GMT)
Full text and
rfc822 format available.
Message #33 received at 12375 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> Sven Joachim <svenjoac <at> gmx.de> writes:
>
>> With that rule, messages to the above bug used to appear under
>> mail.debian.ssh, but since my latest Emacs upgrade yesterday, they show
>> up under "mail.debian.resent-cc: debian openssh maintainers <" instead.
>> Huh?
>
> Oops. Fixed in bzr Emacs now, so it'll be fixed in git Gnus soon.
I guess this can be closed now...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Added tag(s) fixed.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Wed, 25 Jan 2017 15:04:04 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
12375 <at> debbugs.gnu.org and jathd <jathdr <at> gmail.com>
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Wed, 25 Jan 2017 15:04:04 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 23 Feb 2017 12:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 8 years and 175 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.