GNU bug report logs - #29343
Match data doesn't contain elements for trailing non-matched subgroups

Previous Next

Package: emacs;

Reported by: Philipp Stephani <p.stephani2 <at> gmail.com>

Date: Fri, 17 Nov 2017 20:12:01 UTC

Severity: minor

Found in version 27.0.50

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29343 in the body.
You can then email your comments to 29343 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Fri, 17 Nov 2017 20:12:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Philipp Stephani <p.stephani2 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 17 Nov 2017 20:12:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; Match data doesn't contain elements for non-matched subgroups
Date: Fri, 17 Nov 2017 21:11:13 +0100
$ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
(0 1 nil nil 0 1)

Note that neither the `a` nor the `c` group matched, but there are
entries for `a` in `match-data`, but not for `c`.  This makes working
with the match data unnecessarily hard because its length depends on
whether certain optional groups have matched or not.  I haven't seen any
discussion about this behavior in either the manual or the docstring.  I
think the match data in this case should be (0 1 nil nil 0 1 nil nil).


In GNU Emacs 27.0.50 (build 12, x86_64-pc-linux-gnu, GTK+ Version 3.22.17)
 of 2017-11-16 built on localhost
Repository revision: bc462efec89c3317a6ee3ef9404356c1c7e52bda
Windowing system distributor 'The X.Org Foundation', version 11.0.11903000
System Description:	Debian GNU/Linux

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --enable-gcc-warnings=warn-only
 --enable-gtk-deprecation-warnings --without-pop --with-mailutils
 --enable-checking --enable-check-lisp-object-type --with-modules
 'CFLAGS=-O0 -ggdb3''

Configured features:
XPM JPEG TIFF GIF PNG SOUND DBUS GSETTINGS NOTIFY GNUTLS FREETYPE XFT
ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 MODULES

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair time-date
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote dbusbind inotify
dynamic-setting system-font-setting font-render-setting move-toolbar gtk
x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 95129 7264)
 (symbols 48 20393 1)
 (miscs 40 41 120)
 (strings 32 28284 1631)
 (string-bytes 1 747257)
 (vectors 16 14056)
 (vector-slots 8 497402 8748)
 (floats 8 49 68)
 (intervals 56 224 0)
 (buffers 992 12))

-- 
Google Germany GmbH
Erika-Mann-Straße 33
80636 München

Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

If you received this communication by mistake, please don’t forward it to
anyone else (it may contain confidential or privileged information), please
erase all copies of it, including all attachments, and please let the sender
know it went to the wrong person.  Thanks.




Severity set to 'minor' from 'normal' Request was from Noam Postavsky <npostavs <at> users.sourceforge.net> to control <at> debbugs.gnu.org. (Sun, 19 Nov 2017 15:32:02 GMT) Full text and rfc822 format available.

Changed bug title to 'Match data doesn't contain elements for trailing non-matched subgroups' from '27.0.50; Match data doesn't contain elements for non-matched subgroups' Request was from Noam Postavsky <npostavs <at> users.sourceforge.net> to control <at> debbugs.gnu.org. (Sun, 19 Nov 2017 15:32:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Sat, 16 Dec 2017 14:30:02 GMT) Full text and rfc822 format available.

Message #12 received at 29343 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: 29343 <at> debbugs.gnu.org
Subject: Re: bug#29343: 27.0.50; Match data doesn't contain elements for
 non-matched subgroups
Date: Sat, 16 Dec 2017 14:29:12 +0000
[Message part 1 (text/plain, inline)]
Philipp Stephani <p.stephani2 <at> gmail.com> schrieb am Fr., 17. Nov. 2017 um
21:12 Uhr:

>
> $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$"
> "b") (print (match-data)))'
> (0 1 nil nil 0 1)
>
> Note that neither the `a` nor the `c` group matched, but there are
> entries for `a` in `match-data`, but not for `c`.  This makes working
> with the match data unnecessarily hard because its length depends on
> whether certain optional groups have matched or not.  I haven't seen any
> discussion about this behavior in either the manual or the docstring.  I
> think the match data in this case should be (0 1 nil nil 0 1 nil nil).
>
>
It turns out that this is harder than I expected, because the information
about the number of groups in the pattern isn't stored anywhere, and
search_regs.num_regs may be different from the group count. If it turns out
too hard to fix, the behavior should at least be documented.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Sat, 17 Mar 2018 00:38:02 GMT) Full text and rfc822 format available.

Message #15 received at 29343 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: 29343 <at> debbugs.gnu.org
Subject: Re: bug#29343: 27.0.50;
 Match data doesn't contain elements for non-matched subgroups
Date: Fri, 16 Mar 2018 20:37:41 -0400
Philipp Stephani <p.stephani2 <at> gmail.com> writes:

> $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
> (0 1 nil nil 0 1)
>
> Note that neither the `a` nor the `c` group matched, but there are
> entries for `a` in `match-data`, but not for `c`.  This makes working
> with the match data unnecessarily hard because its length depends on
> whether certain optional groups have matched or not.  I haven't seen any
> discussion about this behavior in either the manual or the docstring.  I
> think the match data in this case should be (0 1 nil nil 0 1 nil nil).

You can get that result by passing a list of the expected length as the
REUSE argument to match-data:

(progn
  (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b")
  (match-data t (make-list 8 nil)))
  ;=> (0 1 nil nil 0 1 nil nil)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Fri, 19 Apr 2019 18:23:02 GMT) Full text and rfc822 format available.

Message #18 received at 29343 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 29343 <at> debbugs.gnu.org
Subject: Re: bug#29343: 27.0.50; Match data doesn't contain elements for
 non-matched subgroups
Date: Fri, 19 Apr 2019 20:22:23 +0200
Am Sa., 17. März 2018 um 01:37 Uhr schrieb Noam Postavsky <npostavs <at> gmail.com>:
>
> Philipp Stephani <p.stephani2 <at> gmail.com> writes:
>
> > $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
> > (0 1 nil nil 0 1)
> >
> > Note that neither the `a` nor the `c` group matched, but there are
> > entries for `a` in `match-data`, but not for `c`.  This makes working
> > with the match data unnecessarily hard because its length depends on
> > whether certain optional groups have matched or not.  I haven't seen any
> > discussion about this behavior in either the manual or the docstring.  I
> > think the match data in this case should be (0 1 nil nil 0 1 nil nil).
>
> You can get that result by passing a list of the expected length as the
> REUSE argument to match-data:

True, but that also requires knowing the expected length. In the most
general case this should work for unknown regular expressions.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Fri, 19 Apr 2019 18:30:02 GMT) Full text and rfc822 format available.

Message #21 received at 29343 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: 29343 <at> debbugs.gnu.org
Subject: Re: bug#29343: 27.0.50;
 Match data doesn't contain elements for non-matched subgroups
Date: Fri, 19 Apr 2019 14:29:27 -0400
Philipp Stephani <p.stephani2 <at> gmail.com> writes:

> Am Sa., 17. März 2018 um 01:37 Uhr schrieb Noam Postavsky <npostavs <at> gmail.com>:
>>
>> Philipp Stephani <p.stephani2 <at> gmail.com> writes:
>>
>> > $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
>> > (0 1 nil nil 0 1)
>> >
>> > Note that neither the `a` nor the `c` group matched, but there are
>> > entries for `a` in `match-data`, but not for `c`.  This makes working
>> > with the match data unnecessarily hard because its length depends on
>> > whether certain optional groups have matched or not.  I haven't seen any
>> > discussion about this behavior in either the manual or the docstring.  I
>> > think the match data in this case should be (0 1 nil nil 0 1 nil nil).
>>
>> You can get that result by passing a list of the expected length as the
>> REUSE argument to match-data:
>
> True, but that also requires knowing the expected length. In the most
> general case this should work for unknown regular expressions.

I don't understand how the general case you describe could occur.  If
you don't know the expected length, that means you don't what groups are
in the regexp, so you can only rely on group 0 existing, i.e., you only
care about the first two elements in the match-data.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Fri, 19 Apr 2019 18:43:02 GMT) Full text and rfc822 format available.

Message #24 received at 29343 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 29343 <at> debbugs.gnu.org
Subject: Re: bug#29343: 27.0.50; Match data doesn't contain elements for
 non-matched subgroups
Date: Fri, 19 Apr 2019 20:42:23 +0200
Am Fr., 19. Apr. 2019 um 20:29 Uhr schrieb Noam Postavsky <npostavs <at> gmail.com>:
>
> Philipp Stephani <p.stephani2 <at> gmail.com> writes:
>
> > Am Sa., 17. März 2018 um 01:37 Uhr schrieb Noam Postavsky <npostavs <at> gmail.com>:
> >>
> >> Philipp Stephani <p.stephani2 <at> gmail.com> writes:
> >>
> >> > $ emacs -Q -batch -eval '(progn (string-match "^\\(a\\)?\\(b\\)\\(c\\)?$" "b") (print (match-data)))'
> >> > (0 1 nil nil 0 1)
> >> >
> >> > Note that neither the `a` nor the `c` group matched, but there are
> >> > entries for `a` in `match-data`, but not for `c`.  This makes working
> >> > with the match data unnecessarily hard because its length depends on
> >> > whether certain optional groups have matched or not.  I haven't seen any
> >> > discussion about this behavior in either the manual or the docstring.  I
> >> > think the match data in this case should be (0 1 nil nil 0 1 nil nil).
> >>
> >> You can get that result by passing a list of the expected length as the
> >> REUSE argument to match-data:
> >
> > True, but that also requires knowing the expected length. In the most
> > general case this should work for unknown regular expressions.
>
> I don't understand how the general case you describe could occur.  If
> you don't know the expected length, that means you don't what groups are
> in the regexp, so you can only rely on group 0 existing, i.e., you only
> care about the first two elements in the match-data.
>

The context here is https://github.com/magnars/s.el/pull/117. Normally
you'd expect something like Python's Match.group
(https://docs.python.org/3/library/re.html#re.Match.group), i.e. a
group match per defined group, even if the group didn't match. That
Emacs doesn't behave this way is surprising and should at least be
documented.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Fri, 19 Apr 2019 18:55:02 GMT) Full text and rfc822 format available.

Message #27 received at 29343 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: 29343 <at> debbugs.gnu.org
Subject: Re: bug#29343: 27.0.50;
 Match data doesn't contain elements for non-matched subgroups
Date: Fri, 19 Apr 2019 14:54:01 -0400
Philipp Stephani <p.stephani2 <at> gmail.com> writes:

>> >> You can get that result by passing a list of the expected length as the
>> >> REUSE argument to match-data:
>> >
>> > True, but that also requires knowing the expected length. In the most
>> > general case this should work for unknown regular expressions.

> The context here is https://github.com/magnars/s.el/pull/117.

Ah, I see, the problem is that s-match is trying to present a "nicer"
interface, so it doesn't have a REUSE argument.

> That Emacs doesn't behave this way is surprising and should at least
> be documented.

Yeah, no argument there.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29343; Package emacs. (Sat, 29 Jan 2022 15:41:01 GMT) Full text and rfc822 format available.

Message #30 received at 29343 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: 29343 <at> debbugs.gnu.org
Subject: Re: bug#29343: Match data doesn't contain elements for trailing
 non-matched subgroups
Date: Sat, 29 Jan 2022 16:40:05 +0100
Philipp Stephani <p.stephani2 <at> gmail.com> writes:

> It turns out that this is harder than I expected, because the
> information about the number of groups in the pattern isn't stored
> anywhere, and search_regs.num_regs may be different from the group
> count. If it turns out too hard to fix, the behavior should at least
> be documented.

I've now mentioned this in the doc string in Emacs 29.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 29.1, send any further explanations to 29343 <at> debbugs.gnu.org and Philipp Stephani <p.stephani2 <at> gmail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sat, 29 Jan 2022 15:41:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 27 Feb 2022 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 117 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.