GNU bug report logs - #17373
24.3.50; match data is incorrect if there are too many groups

Previous Next

Package: emacs;

Reported by: Nicolas Richard <theonewiththeevillook <at> yahoo.fr>

Date: Tue, 29 Apr 2014 19:20:02 UTC

Severity: minor

Tags: confirmed

Found in versions 24.3.50, 25.0.94

To reply to this bug, email your comments to 17373 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#17373; Package emacs. (Tue, 29 Apr 2014 19:20:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Nicolas Richard <theonewiththeevillook <at> yahoo.fr>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 29 Apr 2014 19:20:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Richard <theonewiththeevillook <at> yahoo.fr>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3.50; match data is incorrect if there are too many groups
Date: Tue, 29 Apr 2014 21:19:11 +0200
Hi,

The following reports 2. Replace 255 by 254, and it'll report 512 as expected
#+BEGIN_SRC emacs-lisp
  (with-temp-buffer
    (insert "bar")
    (when
        (re-search-backward
         (concat
          (mapconcat (lambda (x) (format "\\(%s\\)" x)) (make-list 255 "foo") "\\|")
          "\\|"
          "\\(bar\\)")
         nil t)
      (length (match-data))))
#+END_SRC

Regexps with many groups is the kind of thing is used in AUCTeX, in
TeX-auto-parse-region. What auctex does in that function is construct a
big regexp out of a list of smaller ones (each small one is made into a
group) ; then when the big regexp matches it then tries to find out
which of the smaller regexps actually matched by checking which group is
non-nil.

In GNU Emacs 24.3.50.7 (i686-pc-linux-gnu, GTK+ Version 2.24.20)
 of 2014-04-10 on LDLC-portable
Windowing system distributor `The X.Org Foundation', version 11.0.11405000
System Description:	Ubuntu 13.10

Configured using:
 `configure 'CFLAGS=-g3 -O2''

Important settings:
  value of $LANG: fr_BE.UTF-8
  locale-coding-system: utf-8-unix

-- 
Nico.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17373; Package emacs. (Mon, 19 May 2014 05:48:02 GMT) Full text and rfc822 format available.

Message #8 received at 17373 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 17373 <at> debbugs.gnu.org
Subject: Re: 24.3.50; match data is incorrect if there are too many groups
Date: Sun, 18 May 2014 22:47:33 -0700
Yes, unfortunately Emacs currently has a limit of at most 256 groups of 
match data: one for the entire pattern, and 255 for parenthesized 
subpatterns.  If you go over the limit, the excess matches are silently 
discarded.  I don't see this limitation documented anywhere; it should 
be.  Or better yet, the limitation should be removed.

The limitation is wired into the representation of the 'start_memory' 
code in compiled regular expressions: this code has a one-byte operand. 
 As far as I know, the limitation is specific to Emacs, and is not 
present in the Gnulib or glibc versions of the regexp matcher.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17373; Package emacs. (Mon, 19 May 2014 13:49:02 GMT) Full text and rfc822 format available.

Message #11 received at 17373 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 17373 <at> debbugs.gnu.org
Subject: RE: bug#17373: 24.3.50; match data is incorrect if there are too many
 groups
Date: Mon, 19 May 2014 06:48:16 -0700 (PDT)
> Yes, unfortunately Emacs currently has a limit of at most 256 groups of
> match data: one for the entire pattern, and 255 for parenthesized
> subpatterns.  If you go over the limit, the excess matches are silently
> discarded.  I don't see this limitation documented anywhere; it should
> be.  Or better yet, the limitation should be removed.

Good to know.  +1, to documenting it, at least.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#17373; Package emacs. (Wed, 10 Feb 2016 17:12:02 GMT) Full text and rfc822 format available.

Message #14 received at 17373 <at> debbugs.gnu.org (full text, mbox):

From: Marcin Borkowski <mbork <at> wmi.amu.edu.pl>
To: 17373 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Drew Adams <drew.adams <at> oracle.com>
Subject: Re: bug#17373: 24.3.50;
 match data is incorrect if there are too many groups
Date: Wed, 10 Feb 2016 18:11:54 +0100
On 2014-05-19, at 07:48, Drew Adams <drew.adams <at> oracle.com> wrote:

>> Yes, unfortunately Emacs currently has a limit of at most 256 groups of
>> match data: one for the entire pattern, and 255 for parenthesized
>> subpatterns.  If you go over the limit, the excess matches are silently
>> discarded.  I don't see this limitation documented anywhere; it should
>> be.  Or better yet, the limitation should be removed.
>
> Good to know.  +1, to documenting it, at least.

I can write a patch to the manual, but I'm a bit afraid that if this
gets documented, the limit will stay there forever.  Is there a chance
of someone fluent in C to fix this?

(Incidentally, I have one package of mine where this limit could strike,
too.)

Best,

--
Marcin Borkowski




bug Marked as found in versions 25.0.94. Request was from Noam Postavsky <npostavs <at> users.sourceforge.net> to control <at> debbugs.gnu.org. (Sat, 04 Jun 2016 22:48:02 GMT) Full text and rfc822 format available.

Severity set to 'minor' from 'normal' Request was from Noam Postavsky <npostavs <at> users.sourceforge.net> to control <at> debbugs.gnu.org. (Sat, 04 Jun 2016 22:48:02 GMT) Full text and rfc822 format available.

Added tag(s) confirmed. Request was from Noam Postavsky <npostavs <at> users.sourceforge.net> to control <at> debbugs.gnu.org. (Sat, 04 Jun 2016 22:51:02 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 9 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.