From unknown Thu Aug 14 21:49:56 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#24071 <24071@debbugs.gnu.org> To: bug#24071 <24071@debbugs.gnu.org> Subject: Status: [PATCH] Refactor regex character class parsing in [:name:] Reply-To: bug#24071 <24071@debbugs.gnu.org> Date: Fri, 15 Aug 2025 04:49:56 +0000 retitle 24071 [PATCH] Refactor regex character class parsing in [:name:] reassign 24071 emacs submitter 24071 Michal Nazarewicz severity 24071 wishlist tag 24071 patch thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Jul 25 18:54:34 2016 Received: (at submit) by debbugs.gnu.org; 25 Jul 2016 22:54:34 +0000 Received: from localhost ([127.0.0.1]:37300 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bRold-00050t-Nd for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:55824) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bRolb-00050f-3C for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bRolT-0005sH-Dg for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:25 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42783) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bRolT-0005r0-AK for submit@debbugs.gnu.org; Mon, 25 Jul 2016 18:54:23 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46781) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bRolP-0007ax-Pr for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:54:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bRolN-0005oH-Ap for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:54:19 -0400 Received: from mail-wm0-x231.google.com ([2a00:1450:400c:c09::231]:38653) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bRolM-0005nJ-RY for bug-gnu-emacs@gnu.org; Mon, 25 Jul 2016 18:54:17 -0400 Received: by mail-wm0-x231.google.com with SMTP id o80so177033260wme.1 for ; Mon, 25 Jul 2016 15:54:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=FcEjFQ+83mh++7v+kFomi+vajanEptI4Ru6908zsEqY=; b=ZsT0HoecG0SMNwu/vqCFqGH7O25ioJb5b2PBoVK18dOYTl0vbwRoFF++hLC2ohJjN3 6J3PxK8FBzfshgeiv2SRFUL7DyBdKDS5CU17GH9Gh/KGaplSgg4otCHyf7lHxwWipk7E bSxRblcVS63U+kc1frwBoou5AMz7RqZDNLcHV/df6HOL9yVto+GgRV0BZuIA/555iuGD eUhPdq/adMKFU+kALXOzT3oHdD3geYYkLJKDuJPVao+l1v03WMnlutz2VWjGEDXNWyOq IB7orx1syhLsUnA+t6ozTst3HwMl+Xn/btohypqAoFNh7cbphmA61s4jSHhLyjvYjP/4 2wxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:subject:date:message-id :mime-version:content-transfer-encoding; bh=FcEjFQ+83mh++7v+kFomi+vajanEptI4Ru6908zsEqY=; b=E63BUVx4JXhyTDcVwAh8EmUNeGmPh//Pj/LEdoivl5/crch/5k4v3T6Ikh14zpxTJa ERjCirGVsHWpX7aDKNYe/hY3mfDMLu0PQHfhl26QZWWxAXjZA/zPWw1nXmL4MINBx6Ik ro/K0FXwy+Hoy3Ei5RM1RU2b1LA3aUL6XsoYPE6QLlA2DpjPUzAL9Y9ydaYKYYyYWoW2 PUhzEJKmhuT7imJCXObN585iVYarH4I8pg6Mxp16K2774FJvMYS2oP/0OBtGizAN56K3 KsBYooI/wrjv+WhEUbbNN8twfvUrWB3WXz69FGr6GJi9jOgdnnk/m1/eHCpM063DYCap Vx6A== X-Gm-Message-State: ALyK8tJq6QZWepkuAFBhTuEH9lhyKtwAPOraNPGGE18G7mGKdjsDeChS/r5U5XmKo301DJc4 X-Received: by 10.28.199.4 with SMTP id x4mr40468597wmf.70.1469487254383; Mon, 25 Jul 2016 15:54:14 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([172.16.113.135]) by smtp.gmail.com with ESMTPSA id c8sm17983949wjm.19.2016.07.25.15.54.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 25 Jul 2016 15:54:12 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id D022B1E35E8; Tue, 26 Jul 2016 00:54:11 +0200 (CEST) From: Michal Nazarewicz To: bug-gnu-emacs@gnu.org Subject: [PATCH] Refactor regex character class parsing in [:name:] Date: Tue, 26 Jul 2016 00:54:05 +0200 Message-Id: <1469487245-11126-1-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) re_wctype function is used in three separate places and in all of those places almost exact code extracting the name from [:name:] surrounds it. Furthermore, re_wctype requires a NUL-terminated string, so the name of the character class is copied to a temporary buffer. The code duplication and unnecessary memory copying can be avoided by pushing the responsibility of parsing the whole [:name:] sequence to the function. Furthermore, since now the function has access to the length of the character class name (since it’s doing the parsing), it can take advantage of that information in skipping some string comparisons and using a constant-length memcmp instead of strcmp which needs to take care of NUL bytes. * src/regex.c (re_wctype): Delete function. Replace it with: (re_wctype_parse): New function which parses a whole [:name:] string and returns a RECC_* constant or -1 if the string is not of [:name:] format. (regex_compile): Use re_wctype_parse. * src/syntax.c (skip_chars): Use re_wctype_parse. --- src/regex.c | 312 +++++++++++++++++++++++++++++------------------------------ src/regex.h | 14 +-- src/syntax.c | 96 +++++------------- 3 files changed, 183 insertions(+), 239 deletions(-) Unless there is some opposition to this patch, I’ll submit it in a week or so. diff --git a/src/regex.c b/src/regex.c index 297bf71..50e6862 100644 --- a/src/regex.c +++ b/src/regex.c @@ -1969,29 +1969,98 @@ struct range_table_work_area #if ! WIDE_CHAR_SUPPORT -/* Map a string to the char class it names (if any). */ +/* Parse a character class, i.e. string such as "[:name:]". *strp + points to the string to be parsed and limit is length, in bytes, of + that string. + + If *strp point to a string that begins with "[:name:]", where name is + a non-empty sequence of lower case letters, *strp will be advanced past the + closing square bracket and RECC_* constant which maps to the name will be + returned. If name is not a valid character class name zero, or RECC_ERROR, + is returned. + + Otherwise, if *strp doesn’t begin with "[:name:]", -1 is returned. + + The function can be used on ASCII as well as multitude strings. + */ re_wctype_t -re_wctype (const_re_char *str) +re_wctype_parse (const unsigned char **strp, unsigned limit) { - const char *string = (const char *) str; - if (STREQ (string, "alnum")) return RECC_ALNUM; - else if (STREQ (string, "alpha")) return RECC_ALPHA; - else if (STREQ (string, "word")) return RECC_WORD; - else if (STREQ (string, "ascii")) return RECC_ASCII; - else if (STREQ (string, "nonascii")) return RECC_NONASCII; - else if (STREQ (string, "graph")) return RECC_GRAPH; - else if (STREQ (string, "lower")) return RECC_LOWER; - else if (STREQ (string, "print")) return RECC_PRINT; - else if (STREQ (string, "punct")) return RECC_PUNCT; - else if (STREQ (string, "space")) return RECC_SPACE; - else if (STREQ (string, "upper")) return RECC_UPPER; - else if (STREQ (string, "unibyte")) return RECC_UNIBYTE; - else if (STREQ (string, "multibyte")) return RECC_MULTIBYTE; - else if (STREQ (string, "digit")) return RECC_DIGIT; - else if (STREQ (string, "xdigit")) return RECC_XDIGIT; - else if (STREQ (string, "cntrl")) return RECC_CNTRL; - else if (STREQ (string, "blank")) return RECC_BLANK; - else return 0; + const char *beg = (const char *)*strp, *end, *it; + + if (limit < 5 || beg[0] != '[' || beg[1] != ':') + return -1; + + end = beg + limit - 2; /* ‘- 2’ for the closing ":]" */ + beg += 2; /* skip opening "[:" */ + it = beg; + while (it != end && *it >= 'a' && *it <= 'z') + ++it; + if (it[0] != ':' || it[1] != ']') + return -1; + + *strp = (const unsigned char *)(it + 2); + + /* Sort tests in the length=five case by frequency the classes to minimise + number of times we fail the comparison. The frequencies of character class + names used in Emacs sources as of 2016-07-15: + + $ find \( -name \*.c -o -name \*.el \) -exec grep -h '\[:[a-z]*:]' {} + | + sed 's/]/]\n/g' |grep -o '\[:[a-z]*:]' |sort |uniq -c |sort -nr + 213 [:alnum:] + 104 [:alpha:] + 62 [:space:] + 39 [:digit:] + 36 [:blank:] + 26 [:upper:] + 24 [:word:] + 21 [:lower:] + 10 [:punct:] + 10 [:ascii:] + 9 [:xdigit:] + 4 [:nonascii:] + 4 [:graph:] + 2 [:print:] + 2 [:cntrl:] + 1 [:ff:] + + If you update this list, consider also updating chain of or’ed conditions + in execute_charset function. + */ + + switch (it - beg) { + case 4: + if (!memcmp (beg, "word", 4)) return RECC_WORD; + break; + case 5: + if (!memcmp (beg, "alnum", 5)) return RECC_ALNUM; + if (!memcmp (beg, "alpha", 5)) return RECC_ALPHA; + if (!memcmp (beg, "space", 5)) return RECC_SPACE; + if (!memcmp (beg, "digit", 5)) return RECC_DIGIT; + if (!memcmp (beg, "blank", 5)) return RECC_BLANK; + if (!memcmp (beg, "upper", 5)) return RECC_UPPER; + if (!memcmp (beg, "lower", 5)) return RECC_LOWER; + if (!memcmp (beg, "punct", 5)) return RECC_PUNCT; + if (!memcmp (beg, "ascii", 5)) return RECC_ASCII; + if (!memcmp (beg, "graph", 5)) return RECC_GRAPH; + if (!memcmp (beg, "print", 5)) return RECC_PRINT; + if (!memcmp (beg, "cntrl", 5)) return RECC_CNTRL; + break; + case 6: + if (!memcmp (beg, "xdigit", 6)) return RECC_XDIGIT; + break; + case 7: + if (!memcmp (beg, "unibyte", 7)) return RECC_UNIBYTE; + break; + case 8: + if (!memcmp (beg, "nonascii", 8)) return RECC_NONASCII; + break; + case 9: + if (!memcmp (beg, "multibyte", 9)) return RECC_MULTIBYTE; + break; + } + + return RECC_ERROR; } /* True if CH is in the char class CC. */ @@ -2776,10 +2845,74 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, { boolean escaped_char = false; const unsigned char *p2 = p; + re_wctype_t cc; re_wchar_t ch; if (p == pend) FREE_STACK_RETURN (REG_EBRACK); + /* See if we're at the beginning of a possible character + class. */ + if (syntax & RE_CHAR_CLASSES && + (cc = re_wctype_parse(&p, pend - p)) != -1) + { + if (cc == 0) + FREE_STACK_RETURN (REG_ECTYPE); + + if (p == pend) + FREE_STACK_RETURN (REG_EBRACK); + +#ifndef emacs + for (ch = 0; ch < (1 << BYTEWIDTH); ++ch) + if (re_iswctype (btowc (ch), cc)) + { + c = TRANSLATE (ch); + if (c < (1 << BYTEWIDTH)) + SET_LIST_BIT (c); + } +#else /* emacs */ + /* Most character classes in a multibyte match just set + a flag. Exceptions are is_blank, is_digit, is_cntrl, and + is_xdigit, since they can only match ASCII characters. + We don't need to handle them for multibyte. They are + distinguished by a negative wctype. */ + + /* Setup the gl_state object to its buffer-defined value. + This hardcodes the buffer-global syntax-table for ASCII + chars, while the other chars will obey syntax-table + properties. It's not ideal, but it's the way it's been + done until now. */ + SETUP_BUFFER_SYNTAX_TABLE (); + + for (ch = 0; ch < 256; ++ch) + { + c = RE_CHAR_TO_MULTIBYTE (ch); + if (! CHAR_BYTE8_P (c) + && re_iswctype (c, cc)) + { + SET_LIST_BIT (ch); + c1 = TRANSLATE (c); + if (c1 == c) + continue; + if (ASCII_CHAR_P (c1)) + SET_LIST_BIT (c1); + else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0) + SET_LIST_BIT (c1); + } + } + SET_RANGE_TABLE_WORK_AREA_BIT + (range_table_work, re_wctype_to_bit (cc)); +#endif /* emacs */ + /* In most cases the matching rule for char classes only + uses the syntax table for multibyte chars, so that the + content of the syntax-table is not hardcoded in the + range_table. SPACE and WORD are the two exceptions. */ + if ((1 << cc) & ((1 << RECC_SPACE) | (1 << RECC_WORD))) + bufp->used_syntax = 1; + + /* Repeat the loop. */ + continue; + } + /* Don't translate yet. The range TRANSLATE(X..Y) cannot always be determined from TRANSLATE(X) and TRANSLATE(Y) So the translation is done later in a loop. Example: @@ -2803,119 +2936,6 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, break; } - /* See if we're at the beginning of a possible character - class. */ - - if (!escaped_char && - syntax & RE_CHAR_CLASSES && c == '[' && *p == ':') - { - /* Leave room for the null. */ - unsigned char str[CHAR_CLASS_MAX_LENGTH + 1]; - const unsigned char *class_beg; - - PATFETCH (c); - c1 = 0; - class_beg = p; - - /* If pattern is `[[:'. */ - if (p == pend) FREE_STACK_RETURN (REG_EBRACK); - - for (;;) - { - PATFETCH (c); - if ((c == ':' && *p == ']') || p == pend) - break; - if (c1 < CHAR_CLASS_MAX_LENGTH) - str[c1++] = c; - else - /* This is in any case an invalid class name. */ - str[0] = '\0'; - } - str[c1] = '\0'; - - /* If isn't a word bracketed by `[:' and `:]': - undo the ending character, the letters, and - leave the leading `:' and `[' (but set bits for - them). */ - if (c == ':' && *p == ']') - { - re_wctype_t cc = re_wctype (str); - - if (cc == 0) - FREE_STACK_RETURN (REG_ECTYPE); - - /* Throw away the ] at the end of the character - class. */ - PATFETCH (c); - - if (p == pend) FREE_STACK_RETURN (REG_EBRACK); - -#ifndef emacs - for (ch = 0; ch < (1 << BYTEWIDTH); ++ch) - if (re_iswctype (btowc (ch), cc)) - { - c = TRANSLATE (ch); - if (c < (1 << BYTEWIDTH)) - SET_LIST_BIT (c); - } -#else /* emacs */ - /* Most character classes in a multibyte match - just set a flag. Exceptions are is_blank, - is_digit, is_cntrl, and is_xdigit, since - they can only match ASCII characters. We - don't need to handle them for multibyte. - They are distinguished by a negative wctype. */ - - /* Setup the gl_state object to its buffer-defined - value. This hardcodes the buffer-global - syntax-table for ASCII chars, while the other chars - will obey syntax-table properties. It's not ideal, - but it's the way it's been done until now. */ - SETUP_BUFFER_SYNTAX_TABLE (); - - for (ch = 0; ch < 256; ++ch) - { - c = RE_CHAR_TO_MULTIBYTE (ch); - if (! CHAR_BYTE8_P (c) - && re_iswctype (c, cc)) - { - SET_LIST_BIT (ch); - c1 = TRANSLATE (c); - if (c1 == c) - continue; - if (ASCII_CHAR_P (c1)) - SET_LIST_BIT (c1); - else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0) - SET_LIST_BIT (c1); - } - } - SET_RANGE_TABLE_WORK_AREA_BIT - (range_table_work, re_wctype_to_bit (cc)); -#endif /* emacs */ - /* In most cases the matching rule for char classes - only uses the syntax table for multibyte chars, - so that the content of the syntax-table is not - hardcoded in the range_table. SPACE and WORD are - the two exceptions. */ - if ((1 << cc) & ((1 << RECC_SPACE) | (1 << RECC_WORD))) - bufp->used_syntax = 1; - - /* Repeat the loop. */ - continue; - } - else - { - /* Go back to right after the "[:". */ - p = class_beg; - SET_LIST_BIT ('['); - - /* Because the `:' may start the range, we - can't simply set bit and repeat the loop. - Instead, just set it to C and handle below. */ - c = ':'; - } - } - if (p < pend && p[0] == '-' && p[1] != ']') { @@ -4659,28 +4679,8 @@ execute_charset (const_re_char **pp, unsigned c, unsigned corig, bool unibyte) re_wchar_t range_start, range_end; /* Sort tests by the most commonly used classes with some adjustment to which - tests are easiest to perform. Frequencies of character class names used in - Emacs sources as of 2016-07-15: - - $ find \( -name \*.c -o -name \*.el \) -exec grep -h '\[:[a-z]*:]' {} + | - sed 's/]/]\n/g' |grep -o '\[:[a-z]*:]' |sort |uniq -c |sort -nr - 213 [:alnum:] - 104 [:alpha:] - 62 [:space:] - 39 [:digit:] - 36 [:blank:] - 26 [:upper:] - 24 [:word:] - 21 [:lower:] - 10 [:punct:] - 10 [:ascii:] - 9 [:xdigit:] - 4 [:nonascii:] - 4 [:graph:] - 2 [:print:] - 2 [:cntrl:] - 1 [:ff:] - */ + tests are easiest to perform. Take a look at comment in re_wctype_parse + for table with frequencies of character class names. */ if ((class_bits & BIT_MULTIBYTE) || (class_bits & BIT_ALNUM && ISALNUM (c)) || diff --git a/src/regex.h b/src/regex.h index 817167a..01b659a 100644 --- a/src/regex.h +++ b/src/regex.h @@ -585,25 +585,13 @@ extern void regfree (regex_t *__preg); /* Solaris 2.5 has a bug: must be included before . */ # include # include -#endif -#if WIDE_CHAR_SUPPORT -/* The GNU C library provides support for user-defined character classes - and the functions from ISO C amendment 1. */ -# ifdef CHARCLASS_NAME_MAX -# define CHAR_CLASS_MAX_LENGTH CHARCLASS_NAME_MAX -# else -/* This shouldn't happen but some implementation might still have this - problem. Use a reasonable default value. */ -# define CHAR_CLASS_MAX_LENGTH 256 -# endif typedef wctype_t re_wctype_t; typedef wchar_t re_wchar_t; # define re_wctype wctype # define re_iswctype iswctype # define re_wctype_to_bit(cc) 0 #else -# define CHAR_CLASS_MAX_LENGTH 9 /* Namely, `multibyte'. */ # ifndef emacs # define btowc(c) c # endif @@ -621,7 +609,7 @@ typedef enum { RECC_ERROR = 0, } re_wctype_t; extern char re_iswctype (int ch, re_wctype_t cc); -extern re_wctype_t re_wctype (const unsigned char* str); +extern re_wctype_t re_wctype_parse (const unsigned char **strp, unsigned limit); typedef int re_wchar_t; diff --git a/src/syntax.c b/src/syntax.c index f8d987b..667de40 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -1691,44 +1691,22 @@ skip_chars (bool forwardp, Lisp_Object string, Lisp_Object lim, /* At first setup fastmap. */ while (i_byte < size_byte) { - c = str[i_byte++]; - - if (handle_iso_classes && c == '[' - && i_byte < size_byte - && str[i_byte] == ':') + if (handle_iso_classes) { - const unsigned char *class_beg = str + i_byte + 1; - const unsigned char *class_end = class_beg; - const unsigned char *class_limit = str + size_byte - 2; - /* Leave room for the null. */ - unsigned char class_name[CHAR_CLASS_MAX_LENGTH + 1]; - re_wctype_t cc; - - if (class_limit - class_beg > CHAR_CLASS_MAX_LENGTH) - class_limit = class_beg + CHAR_CLASS_MAX_LENGTH; - - while (class_end < class_limit - && *class_end >= 'a' && *class_end <= 'z') - class_end++; - - if (class_end == class_beg - || *class_end != ':' || class_end[1] != ']') - goto not_a_class_name; - - memcpy (class_name, class_beg, class_end - class_beg); - class_name[class_end - class_beg] = 0; - - cc = re_wctype (class_name); + const unsigned char *ch = str + i_byte; + re_wctype_t cc = re_wctype_parse (&ch, size_byte - i_byte); if (cc == 0) error ("Invalid ISO C character class"); - - iso_classes = Fcons (make_number (cc), iso_classes); - - i_byte = class_end + 2 - str; - continue; + if (cc != -1) + { + iso_classes = Fcons (make_number (cc), iso_classes); + i_byte = ch - str; + continue; + } } - not_a_class_name: + c = str[i_byte++]; + if (c == '\\') { if (i_byte == size_byte) @@ -1808,54 +1786,32 @@ skip_chars (bool forwardp, Lisp_Object string, Lisp_Object lim, while (i_byte < size_byte) { int leading_code = str[i_byte]; - c = STRING_CHAR_AND_LENGTH (str + i_byte, len); - i_byte += len; - if (handle_iso_classes && c == '[' - && i_byte < size_byte - && STRING_CHAR (str + i_byte) == ':') + if (handle_iso_classes) { - const unsigned char *class_beg = str + i_byte + 1; - const unsigned char *class_end = class_beg; - const unsigned char *class_limit = str + size_byte - 2; - /* Leave room for the null. */ - unsigned char class_name[CHAR_CLASS_MAX_LENGTH + 1]; - re_wctype_t cc; - - if (class_limit - class_beg > CHAR_CLASS_MAX_LENGTH) - class_limit = class_beg + CHAR_CLASS_MAX_LENGTH; - - while (class_end < class_limit - && *class_end >= 'a' && *class_end <= 'z') - class_end++; - - if (class_end == class_beg - || *class_end != ':' || class_end[1] != ']') - goto not_a_class_name_multibyte; - - memcpy (class_name, class_beg, class_end - class_beg); - class_name[class_end - class_beg] = 0; - - cc = re_wctype (class_name); + const unsigned char *ch = str + i_byte; + re_wctype_t cc = re_wctype_parse (&ch, size_byte - i_byte); if (cc == 0) error ("Invalid ISO C character class"); - - iso_classes = Fcons (make_number (cc), iso_classes); - - i_byte = class_end + 2 - str; - continue; + if (cc != -1) + { + iso_classes = Fcons (make_number (cc), iso_classes); + i_byte = ch - str; + continue; + } } - not_a_class_name_multibyte: - if (c == '\\') + if (leading_code== '\\') { - if (i_byte == size_byte) + if (++i_byte == size_byte) break; leading_code = str[i_byte]; - c = STRING_CHAR_AND_LENGTH (str + i_byte, len); - i_byte += len; } + c = STRING_CHAR_AND_LENGTH (str + i_byte, len); + i_byte += len; + + /* Treat `-' as range character only if another character follows. */ if (i_byte + 1 < size_byte -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Tue Jul 26 10:47:15 2016 Received: (at 24071) by debbugs.gnu.org; 26 Jul 2016 14:47:16 +0000 Received: from localhost ([127.0.0.1]:38240 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bS3dX-0007Fp-Ee for submit@debbugs.gnu.org; Tue, 26 Jul 2016 10:47:15 -0400 Received: from eggs.gnu.org ([208.118.235.92]:47955) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bS3dR-0007FH-57 for 24071@debbugs.gnu.org; Tue, 26 Jul 2016 10:47:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bS3dI-0003Xy-51 for 24071@debbugs.gnu.org; Tue, 26 Jul 2016 10:47:00 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:40765) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bS3dI-0003Xn-1Q; Tue, 26 Jul 2016 10:46:56 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:1682 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bS3dG-0001GE-3L; Tue, 26 Jul 2016 10:46:54 -0400 Date: Tue, 26 Jul 2016 17:46:42 +0300 Message-Id: <83d1m0tq25.fsf@gnu.org> From: Eli Zaretskii To: Michal Nazarewicz , Dima Kogan In-reply-to: <1469487245-11126-1-git-send-email-mina86@mina86.com> (message from Michal Nazarewicz on Tue, 26 Jul 2016 00:54:05 +0200) Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] References: <1469487245-11126-1-git-send-email-mina86@mina86.com> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 24071 Cc: 24071@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) > From: Michal Nazarewicz > Date: Tue, 26 Jul 2016 00:54:05 +0200 > > re_wctype function is used in three separate places and in all of > those places almost exact code extracting the name from [:name:] > surrounds it. Furthermore, re_wctype requires a NUL-terminated > string, so the name of the character class is copied to a temporary > buffer. > > The code duplication and unnecessary memory copying can be avoided by > pushing the responsibility of parsing the whole [:name:] sequence to > the function. > > Furthermore, since now the function has access to the length of the > character class name (since it’s doing the parsing), it can take > advantage of that information in skipping some string comparisons and > using a constant-length memcmp instead of strcmp which needs to take > care of NUL bytes. Thanks. If we are going to make some serious refactoring in regex.c, I think we should start with having a test suite for it. The dima_regex_embedded_modifiers branch, created by Dima Kogan (CC'ed) in the Emacs repository includes a suite taken from glibc. Dima, could you perhaps merge the parts of the test suite that can already be used to the master branch, so that we could use them to verify changes in regex.c? From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 11:29:15 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 15:29:15 +0000 Received: from localhost ([127.0.0.1]:39337 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSQlm-0005pP-Re for submit@debbugs.gnu.org; Wed, 27 Jul 2016 11:29:15 -0400 Received: from mail-wm0-f53.google.com ([74.125.82.53]:38452) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSQll-0005pC-6A for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 11:29:13 -0400 Received: by mail-wm0-f53.google.com with SMTP id o80so68052261wme.1 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 08:29:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:in-reply-to:organization:references :user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=w9xu1YsGA+69JGs5bbW50UOn44W+A46xjJA9FjULTCk=; b=I3d8zuqvoD9rE6nbK5BL9wOKWdnEu0FCbI81o3M+27L2UAMszoMfHNRCgGNnSS1dD3 OrS5JS2Hj2R9EqOmv2nmcHGzTAtY/J1TquW+hwymy7rDWsoUdoZ1uwP/UUDsutkpuh8V aDY+4p5P4kPxCCmACDKkUmeFz661CoZ5krkqcKVsU9CKQ8Q2swgTeYhzQlQgkWS2foO7 /HhCsSng7Hs+CDu8N49vTheLeMjDFkZzWaz7oH2DxNj7+FL+tuWDRA2sk+Rjx78pgbbj AKqRb/KfIMnAzt6PCItvTblC/QweFxHjzeomN+XB3UTBdrtKg8fgRKK/1F7TjGZZilPd 095A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:user-agent:face:date:message-id :mime-version:content-transfer-encoding; bh=w9xu1YsGA+69JGs5bbW50UOn44W+A46xjJA9FjULTCk=; b=EFylWE8NqjuoCyt/vSqskXUuQoKfs4aZhfy41Xdf98xPS/HSr+oVaKuiQq2FX338Zk uboHAykblph6HUMKFGmb1ZrquJVvtF8fpW16zAoBW/X/Mb/77R0mIsD12sbhPqc8vMfZ it5NCdoWaY9BWnHtdX8DKn1njROt877s1dHGQOwYrXLa3vzE2Wkn9xwPUdnHzBTkTShp h325ibr+Fdgkhl7vBqks3r8DeNo/OtOe9YZhieLjGiIKtOBFnl0ZmNWrLta/GxV2N+Z7 oaJfFFcmMLL+R7OrZnXoLlh9FuMs21zZXGNoE4rVebu0bVw26XybsGqBwzZIUOJ2dfVA pZpw== X-Gm-Message-State: AEkoouujoirj/0B3f/BBcFX8jpAZnoR3S9HT5d+L1I+//Spt90Qq/r/X3RMLRfJoGHJd7TJy X-Received: by 10.28.94.18 with SMTP id s18mr31820029wmb.44.1469633346999; Wed, 27 Jul 2016 08:29:06 -0700 (PDT) Received: from mpn-glaptop ([2620:0:105f:301:e136:c882:c6f0:97af]) by smtp.gmail.com with ESMTPSA id d80sm38922789wmd.14.2016.07.27.08.29.04 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Wed, 27 Jul 2016 08:29:05 -0700 (PDT) From: Michal Nazarewicz To: Eli Zaretskii , Dima Kogan Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] In-Reply-To: <83d1m0tq25.fsf@gnu.org> Organization: http://mina86.com/ References: <1469487245-11126-1-git-send-email-mina86@mina86.com> <83d1m0tq25.fsf@gnu.org> User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.1.50.1 (x86_64-unknown-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:160727:24071@debbugs.gnu.org::BL+8XuAiFsGF1NEz:00000000000000000000000000000000000000003117 X-Hashcash: 1:20:160727:lists@dima.secretsauce.net::T12Tw6AFunOmJAEc:000000000000000000000000000000000004Ib1 X-Hashcash: 1:20:160727:eliz@gnu.org::3M8fa9mCXybaSBJ7:000001oNm Date: Wed, 27 Jul 2016 17:29:04 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: 24071@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) On Tue, Jul 26 2016, Eli Zaretskii wrote: >> From: Michal Nazarewicz >> Date: Tue, 26 Jul 2016 00:54:05 +0200 >>=20 >> re_wctype function is used in three separate places and in all of >> those places almost exact code extracting the name from [:name:] >> surrounds it. Furthermore, re_wctype requires a NUL-terminated >> string, so the name of the character class is copied to a temporary >> buffer. >>=20 >> The code duplication and unnecessary memory copying can be avoided by >> pushing the responsibility of parsing the whole [:name:] sequence to >> the function. >>=20 >> Furthermore, since now the function has access to the length of the >> character class name (since it=E2=80=99s doing the parsing), it can take >> advantage of that information in skipping some string comparisons and >> using a constant-length memcmp instead of strcmp which needs to take >> care of NUL bytes. > > Thanks. > > If we are going to make some serious refactoring in regex.c, I think > we should start with having a test suite for it. I agree. Which is why I started test/src/regex-tests.el=C2=B9. Since this patch touches only character classes I limited the tests to character classes. =C2=B9 If fact, the bug I=E2=80=99ve fixed with the previous patch was disc= overed precisely because I=E2=80=99ve written tests for this patch. > The dima_regex_embedded_modifiers branch, created by Dima Kogan > (CC'ed) in the Emacs repository includes a suite taken from glibc. > Dima, could you perhaps merge the parts of the test suite that can > already be used to the master branch, so that we could use them to > verify changes in regex.c? This looks relatively straightforward; I can take care of it. I=E2=80=99ll send a link to the result soon. --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:28:48 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:28:48 +0000 Received: from localhost ([127.0.0.1]:39364 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSRhP-0007JU-VT for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:28:48 -0400 Received: from eggs.gnu.org ([208.118.235.92]:35896) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSRhO-0007JH-7d for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:28:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bSRhE-0005KR-Q5 for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:28:41 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:59067) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bSRhE-0005KC-Ma; Wed, 27 Jul 2016 12:28:36 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:3074 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bSRhC-00019O-PU; Wed, 27 Jul 2016 12:28:35 -0400 Date: Wed, 27 Jul 2016 19:28:25 +0300 Message-Id: <83fuqvrqom.fsf@gnu.org> From: Eli Zaretskii To: Michal Nazarewicz In-reply-to: (message from Michal Nazarewicz on Wed, 27 Jul 2016 17:29:04 +0200) Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] References: <1469487245-11126-1-git-send-email-mina86@mina86.com> <83d1m0tq25.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 24071 Cc: lists@dima.secretsauce.net, 24071@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) > From: Michal Nazarewicz > Cc: 24071@debbugs.gnu.org > Date: Wed, 27 Jul 2016 17:29:04 +0200 > > > If we are going to make some serious refactoring in regex.c, I think > > we should start with having a test suite for it. > > I agree. Which is why I started test/src/regex-tests.el¹. Since this > patch touches only character classes I limited the tests to character > classes. I know. What I wrote was not a complaint about the past, it was a suggestion for the future. A single localized change doesn't yet justify importing a large test suite. But this later patch looks like a beginning of a series of refactoring (is it?), hopefully followed by more features, so I thought we should have a firm ground first. > > The dima_regex_embedded_modifiers branch, created by Dima Kogan > > (CC'ed) in the Emacs repository includes a suite taken from glibc. > > Dima, could you perhaps merge the parts of the test suite that can > > already be used to the master branch, so that we could use them to > > verify changes in regex.c? > > This looks relatively straightforward; I can take care of it. I’ll > send a link to the result soon. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:50:59 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:51:00 +0000 Received: from localhost ([127.0.0.1]:39378 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2t-0007r1-Ec for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:50:59 -0400 Received: from mail-wm0-f52.google.com ([74.125.82.52]:37670) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2s-0007qd-Kr for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:50:58 -0400 Received: by mail-wm0-f52.google.com with SMTP id i5so71437389wmg.0 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 09:50:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=dczkYOoJJZtLS+B+E5bX3x3LsEyaGqovxha4SDZ21vQ=; b=JAWs8mgF9AVzLbj7zPt7wX4yid7T/zjJFGRia+1Ge5PqY4PcM/32a5ziCtcApwNMdk 8etmbHnTnMIUWWq6G+yJnKvNu7ZBsqHpRIifho+9dV0VXoFLjifZyrATUygt7NULCupI UXGkR8NP5ZKFueTjAdszPdF+hxqoYO6WXNj7nq1/Xb/Fwk2FZNNt4F19nwuR2xmtz4bO F9+cudYZkx3yVizhRpt3DVRgZSx3AH4B21IIQ1H+TyxKCkZjs/Mu3Yk6PH5PIifCwsdO oLtOD8kO8kCW6KOLg9zqyHROnBQJqZCV4X6XLj/FGQAJECQ8V78j2XLFCreiXLeEoroD WNYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=dczkYOoJJZtLS+B+E5bX3x3LsEyaGqovxha4SDZ21vQ=; b=cs+e7bioOHA/EbgFuTnl17bk2DluQXFxG2ZM02tkkBond+bSB+kxq3XdeIUZLfe5b3 S2vOHmeitfLIBeJ9ZdMCgF2Vz3d4wqewNrbJvW215WphSSnwhxBB30hY9+dXb6M2j94n D3H93p12IbfOTGP63pmZcUnZxRXiXEphbM5TeLKCMnfHfOO1hei9OO8Xxw8m1xKiiZ9I 67rJFbQW9d2mRtn0/OQwO25MlvFyPu7gzWobf2JO6kyW4OMGMDv+Yf7DTYfvY3x8V0YU /R57CCJ2pC+5ceu22JRZ89uO0BfxJcxGcSb/VXBRii9Zc/lF0O8nJU0k0pJNYKDpIKrp uBxw== X-Gm-Message-State: ALyK8tJKvvFU8oDhRHs+A1vUNQwNSah3R+U1CKC4bCPPuYTSssEiwobBtA1AmXTcSNBd2mR+ X-Received: by 10.28.17.9 with SMTP id 9mr55339255wmr.73.1469638252347; Wed, 27 Jul 2016 09:50:52 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([172.16.113.135]) by smtp.gmail.com with ESMTPSA id a2sm7221626wjg.46.2016.07.27.09.50.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:50:51 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 44AE81E021A; Wed, 27 Jul 2016 18:50:50 +0200 (CEST) From: Michal Nazarewicz To: 24071@debbugs.gnu.org Subject: [PATCH 3/7] Fix reading of regex-resources in regex-tests Date: Wed, 27 Jul 2016 18:50:43 +0200 Message-Id: <1469638247-20131-3-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1469638247-20131-1-git-send-email-mina86@mina86.com> References: <83d1m0tq25.fsf@gnu.org> <1469638247-20131-1-git-send-email-mina86@mina86.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) * test/src/regex-tests.el (regex-tests-generic-line): Referring to ‘buffer-file-name’ does not work when running the test from command line, i.e. via make, which results in (wrong-type-argument stringp nil) failures. Replace it with hard-coded path. (regex-tests-BOOST, regex-tests-PCRE, regex-tests-PTESTS-whitelist, regex-tests-TESTS-whitelist): ‘regex-tests-generic-line’ now includes the ‘regex-resources’ path component so the tests don’t need to specify it explicitly. --- test/src/regex-tests.el | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el index 13a9f86..1407441 100644 --- a/test/src/regex-tests.el +++ b/test/src/regex-tests.el @@ -99,8 +99,7 @@ regex-tests-generic-line `(with-temp-buffer (modify-syntax-entry ?_ "w;; ") ; tests expect _ to be a word - (insert-file-contents ,(concat (file-name-directory (buffer-file-name)) test-file)) - + (insert-file-contents ,(concat "src/regex-resources/" test-file)) (let ((case-fold-search nil) (line-number 1) (whitelist-idx 0)) @@ -419,7 +418,7 @@ regex-tests-BOOST (let (failures basic icase newline notbol noteol) (regex-tests-generic-line - ?; "regex-resources/BOOST.tests" regex-tests-BOOST-whitelist + ?; "BOOST.tests" regex-tests-BOOST-whitelist (if (save-excursion (re-search-forward "^-" nil t)) (setq basic (save-excursion (re-search-forward "REG_BASIC" nil t)) icase (save-excursion (re-search-forward "REG_ICASE" nil t)) @@ -496,7 +495,7 @@ regex-tests-PCRE (let (failures pattern icase string what-failed matches-observed) (regex-tests-generic-line - ?# "regex-resources/PCRE.tests" regex-tests-PCRE-whitelist + ?# "PCRE.tests" regex-tests-PCRE-whitelist (cond @@ -570,7 +569,7 @@ regex-tests-PTESTS-whitelist (defun regex-tests-PTESTS () (let (failures) (regex-tests-generic-line - ?# "regex-resources/PTESTS" regex-tests-PTESTS-whitelist + ?# "PTESTS" regex-tests-PTESTS-whitelist (let* ((fields (split-string (buffer-string) "¦")) ;; string has 1-based index of first char in the @@ -632,7 +631,7 @@ regex-tests-TESTS-whitelist (defun regex-tests-TESTS () (let (failures) (regex-tests-generic-line - ?# "regex-resources/TESTS" regex-tests-TESTS-whitelist + ?# "TESTS" regex-tests-TESTS-whitelist (if (save-excursion (re-search-forward "^\\([^:]+\\):\\(.*\\):\\([^:]*\\)$" nil t)) (let* ((what-failed (let ((raw (string-to-number (match-string 1)))) -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:51:02 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:51:02 +0000 Received: from localhost ([127.0.0.1]:39384 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2v-0007rY-TQ for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:02 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:37674) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2s-0007qe-Su for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:50:59 -0400 Received: by mail-wm0-f46.google.com with SMTP id i5so71437607wmg.0 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 09:50:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UsZmDC0Nj2FbrnEn+HcXW9psZjaNw91rX/RCzePsJZk=; b=UTWb1SEdXm/DwpagrtcuM2abYOQi/X3O96V0sgacGLxYyewImNxJ3oOKB2osJzQB0l 2tw6h2g+wk+MHTrhZWD5pNCqOwC89CSaznmtXA26vBIom4K6Jp2XFteJLgewfSRbOns3 yaQhwFAq80/FQ08S3KF5jO4Rlsfg/aeSiIk0snwntPRYeqEZSuAzULLPyR0UjsBvHBtA aOHtDO0p29Qv/TJn90Vm0grvSyCOrFypuyRb4NIWwBwMS/5sSsh4HQUQDmIT0za5/S+V DcHX2YUAQLJuWK/pZlendP4Ctp3r10r0v+rYtZU+9uqR7bk2JWQ7Lk0Hh3sJqPz5YHsH eJNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=UsZmDC0Nj2FbrnEn+HcXW9psZjaNw91rX/RCzePsJZk=; b=WOBLPDlYEEc+dpBLW6h6sn0CNX13zTo7xoeFSR/EtJfnzh7Ztp0wry7G8vEh/OXfOI ceHrpLr7Q23ApnH2FXARZl/CqU9LUVMqVB+sU2GutCDTGjE/8AV1jYph3y53hya841gn 1cunZsDMKYlYt42xgwrlU9as8aToWAoIwKfbbedw2k6zaWFsgQGRnWbZYS83VYhjERU1 YRAQ2a8FkPNFy8aA9jdmNFFRGit3jelbRP1DX+vmuUIwuN6x0UQquW+Bv0OK34eSspvL Bx3rd67EkjKCZ0j7qkRAVxk/OTKWAP+ZgEOB3Ojn3/EIDyNp6aZ7DvpBr9G3LH22TCW9 ngVA== X-Gm-Message-State: AEkoousVX0sH+aHaTjJZNxHnnYG2C7UaUYUiTYDCnuoPWPlR1AA8QF8A24qOOuIYZwMxI6jX X-Received: by 10.194.104.197 with SMTP id gg5mr28375110wjb.6.1469638252717; Wed, 27 Jul 2016 09:50:52 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([2620:0:105f:301:2815:7585:2e57:6c3b]) by smtp.gmail.com with ESMTPSA id f3sm7301601wjh.2.2016.07.27.09.50.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:50:51 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 634981E35E4; Wed, 27 Jul 2016 18:50:50 +0200 (CEST) From: Michal Nazarewicz To: 24071@debbugs.gnu.org Subject: [PATCH 4/7] =?UTF-8?q?Don=E2=80=99t=20(require=20'cl)?= Date: Wed, 27 Jul 2016 18:50:44 +0200 Message-Id: <1469638247-20131-4-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1469638247-20131-1-git-send-email-mina86@mina86.com> References: <83d1m0tq25.fsf@gnu.org> <1469638247-20131-1-git-send-email-mina86@mina86.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) * test/src/regex-test.el: Don’t (require 'cl). (regex-tests-PCRE): s/loop/cl-loop/ --- test/src/regex-tests.el | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el index 1407441..97b9633 100644 --- a/test/src/regex-tests.el +++ b/test/src/regex-tests.el @@ -20,7 +20,6 @@ ;;; Code: (require 'ert) -(require 'cl) (ert-deftest regex-word-cc-fallback-test () "Test that ‘[[:cc:]]*x’ matches ‘x’ (bug#24020). @@ -516,9 +515,9 @@ regex-tests-PCRE ('invalid-regexp 'compilation-failed)) matches-observed - (loop for x from 0 to 20 - collect (and (not what-failed) - (or (match-string x string) ""))))) + (cl-loop for x from 0 to 20 + collect (and (not what-failed) + (or (match-string x string) ""))))) nil) ;; verification line: failed match -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:51:03 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:51:03 +0000 Received: from localhost ([127.0.0.1]:39386 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2w-0007re-BM for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:02 -0400 Received: from mail-wm0-f45.google.com ([74.125.82.45]:33212) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2t-0007qh-KT for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:00 -0400 Received: by mail-wm0-f45.google.com with SMTP id p129so32246141wmp.0 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 09:50:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kk++d/v4vdWXv1gW3p8WEcOmSTeuGxnnpJo4o2x8DUQ=; b=FLOKfokEU7Q4IgxUC+lfbKp1VzzzdCVTtgg44Jkz0q/FWTL4SR/N5VqVollkWUwID5 kSNjCtMTlptiweFW5TOfD2SA0aYiLUcEZc3F1Keo6acJzLhXywkt4FgKhO36uRzzZBVa 7enzBeU/6Ygcpg/Mlqwz334ze5otvBAxwGlafV1BaC4uaJrez+VWUun6VX/2r1fbAnqr SM127nxv/rvXb7p9b1bLdzFd1QTMW97xmSu8liNWh/HB2LsiwxB6iHJGh+Wi1UrQTXy0 sV1BOdYmI77XQVFKfzWSrPldlE8N+yeuT9lzjXFx/ihXMIaMw8ZofSsjGL4W+B/zl+Lj fd+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=kk++d/v4vdWXv1gW3p8WEcOmSTeuGxnnpJo4o2x8DUQ=; b=Ev8wHUvPOII5GhPhbg+AkMVC/TWPl/41YTaTAYkFMOnKXJ73MtSYlGxUYJ65x+akid DNEtocFwrhNXdEuWIhI9OXWsJ/wlHSyJ6GXIDAmb+825PK2JofisL9gLl2BcQX53BgRq WTturX51w0336YtatSPSZa9gTp6jC/jlNytwwedaYGmxYon2EBO02i3ZJtfLiVLEEw56 vJx+eBAGPu+yTl8ySMkMrh/CwDYtBD2i2/KoxcUHqZBKLDaKzOHz1GzoB5k83BCib/LJ W2ThzwVspTo4fDbq0wr4zCeud47CXfJVR2zVjeK8MigSX7Q2w/y4fhb3Mx85Yy7WXZeJ SBlg== X-Gm-Message-State: AEkoouv7ikKgjphP9Bn+PCKmx9/8VIbhwFzrRcPt7wz0LZ5pVep9nVhQI5zoOF6DoVE2WLBU X-Received: by 10.28.64.86 with SMTP id n83mr35314755wma.52.1469638253498; Wed, 27 Jul 2016 09:50:53 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([2620:0:105f:301:2815:7585:2e57:6c3b]) by smtp.gmail.com with ESMTPSA id r13sm7997050wmf.12.2016.07.27.09.50.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:50:51 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 739E21E35E5; Wed, 27 Jul 2016 18:50:50 +0200 (CEST) From: Michal Nazarewicz To: 24071@debbugs.gnu.org Subject: [PATCH 5/7] Split regex glibc test cases into separet tests Date: Wed, 27 Jul 2016 18:50:45 +0200 Message-Id: <1469638247-20131-5-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1469638247-20131-1-git-send-email-mina86@mina86.com> References: <83d1m0tq25.fsf@gnu.org> <1469638247-20131-1-git-send-email-mina86@mina86.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) * test/src/regex-tests.el (regex-tests): Remove and split into multiple tests cases. (regex-tests-glbic-BOOST, regex-tests-glibc-BOOST, regex-tests-glibc-PTESTS, regex-tests-glibc-TESTS): New test cases split from ‘regex-tests’. --- test/src/regex-tests.el | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el index 97b9633..8981267 100644 --- a/test/src/regex-tests.el +++ b/test/src/regex-tests.el @@ -651,12 +651,24 @@ regex-tests-TESTS (error "Error parsing TESTS file line: '%s'" (buffer-string)))) failures)) -(ert-deftest regex-tests () - "Tests of the regular expression engine. This evaluates the -BOOST, PCRE, PTESTS and TESTS test cases from glibc." - (should-not (regex-tests-BOOST)) - (should-not (regex-tests-PCRE)) - (should-not (regex-tests-PTESTS)) +(ert-deftest regex-tests-BOOST () + "Tests of the regular expression engine. +This evaluates the BOOST test cases from glibc." + (should-not (regex-tests-BOOST))) + +(ert-deftest regex-tests-BOOST () + "Tests of the regular expression engine. +This evaluates the PCRE test cases from glibc." + (should-not (regex-tests-PCRE))) + +(ert-deftest regex-tests-PTESTS () + "Tests of the regular expression engine. +This evaluates the PTESTS test cases from glibc." + (should-not (regex-tests-PTESTS))) + +(ert-deftest regex-tests-TESTS () + "Tests of the regular expression engine. +This evaluates the TESTS test cases from glibc." (should-not (regex-tests-TESTS))) ;;; regex-tests.el ends here -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:51:04 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:51:04 +0000 Received: from localhost ([127.0.0.1]:39389 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2x-0007rx-0Z for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:04 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:37677) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2t-0007qf-5J for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:00 -0400 Received: by mail-wm0-f41.google.com with SMTP id i5so71437753wmg.0 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 09:50:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=X9AirU5k8HCGqbBmBGP5anjRjS5bxxg/L/E1h0CnxkM=; b=EXFMkuXhdgYJrqnAXkClGZBz2fVDL2K2Gq1u2aPMReJ2NqiiLohmqCRiIVFeExW315 ayUKs7RCdaFsLtMVFfrA0Do6cZyoFMbYExZ9bZhKXeSwr9AfdR7JcOz+KAQmEhK56PIq aoON1rRs/RN4xd8pPsIwzo5LARaQG2oySxlU/HTVF70DaYMw3WtmFZpmExVvUqmRjgbm 3PGEYLnY3jUm9lVWfDEIWDulaOjpKemdQhe3+nvCZZZRGON5b0BiM/hJmhuRJGpRvBYr ywrYsLQMSvNbgWleIPMkBVksqngdp+ssd1D2HdjamxOgKzPL1hZewg59HDO4DdYByYpv oWyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=X9AirU5k8HCGqbBmBGP5anjRjS5bxxg/L/E1h0CnxkM=; b=VFxdGStjEPjh15rV4vr814A5hL0Ods8sqzLbhydX/8UXB76IPMdT1dZQc6Js+HyDY3 x4W/V8KijUtQaIcgrfFezpLFkiNRfjO8MxWtm3UIZ6kQyk09MVtfaZjmR/OzZBDGvD4I tzvBHuwx4RGD/Wh16VJf1Ws+HZuTqXq2yyBW3ViyLfxhLw2oHHU9r9n8YVY1v+8ev5+X yQpvxfyhe1Gjtvwq1y2tnlfheZqpBHIo1w9ck4/tXybbgAfXucsnAb4MMdR7ULbrv9VY rLQiE3kfq4gP4fsUnO2MD+6SQFSeC94v3drhlsnxljgnNQrplPVBrKnNfudMCI/oZHzt RbIw== X-Gm-Message-State: AEkoousRBL1ICGXjAT/R43L9dwYjYLxkCC58Syu3mdDI/Rc4Ogmw8gn2vpIyJjLKOWuVDG+I X-Received: by 10.28.18.11 with SMTP id 11mr32508673wms.11.1469638253052; Wed, 27 Jul 2016 09:50:53 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([2620:0:105f:301:2815:7585:2e57:6c3b]) by smtp.gmail.com with ESMTPSA id pm1sm7221024wjb.40.2016.07.27.09.50.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:50:51 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 3C73C1E35E3; Wed, 27 Jul 2016 18:50:50 +0200 (CEST) From: Michal Nazarewicz To: 24071@debbugs.gnu.org Subject: [PATCH 2/7] Added driver for the regex tests Date: Wed, 27 Jul 2016 18:50:42 +0200 Message-Id: <1469638247-20131-2-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1469638247-20131-1-git-send-email-mina86@mina86.com> References: <83d1m0tq25.fsf@gnu.org> <1469638247-20131-1-git-send-email-mina86@mina86.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: Dima Kogan , Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) From: Dima Kogan * test/src/regex-tests.el (regex-tests): Test executing glibc tests cases. [mina86@mina86.com: merged test with existing file] --- test/src/regex-tests.el | 572 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 572 insertions(+) diff --git a/test/src/regex-tests.el b/test/src/regex-tests.el index 00165ab..13a9f86 100644 --- a/test/src/regex-tests.el +++ b/test/src/regex-tests.el @@ -20,6 +20,7 @@ ;;; Code: (require 'ert) +(require 'cl) (ert-deftest regex-word-cc-fallback-test () "Test that ‘[[:cc:]]*x’ matches ‘x’ (bug#24020). @@ -89,4 +90,575 @@ regex--test-cc (regex--test-cc "unibyte" "abcABC012 \t\n\1" "łą\u2622") (regex--test-cc "multibyte" "łą\u2622" "abcABC012 \t\n\1"))) + +(defmacro regex-tests-generic-line (comment-char test-file whitelist &rest body) + "Reads a line of the test file TEST-FILE, skipping +comments (defined by COMMENT-CHAR), and evaluates the tests in +this line as defined in the BODY. Line numbers in the WHITELIST +are known failures, and are skipped." + + `(with-temp-buffer + (modify-syntax-entry ?_ "w;; ") ; tests expect _ to be a word + (insert-file-contents ,(concat (file-name-directory (buffer-file-name)) test-file)) + + (let ((case-fold-search nil) + (line-number 1) + (whitelist-idx 0)) + + (goto-char (point-min)) + + (while (not (eobp)) + (let ((start (point))) + (end-of-line) + (narrow-to-region start (point)) + + (goto-char (point-min)) + + (when + (and + ;; ignore comments + (save-excursion + (re-search-forward ,(concat "^[^" (string comment-char) "]") nil t)) + + ;; skip lines in the whitelist + (let ((whitelist-next + (condition-case nil + (aref ,whitelist whitelist-idx) (args-out-of-range nil)))) + (cond + ;; whitelist exhausted. do process this line + ((null whitelist-next) t) + + ;; we're not yet at the next whitelist element. do + ;; process this line + ((< line-number whitelist-next) t) + + ;; we're past the next whitelist element. This + ;; shouldn't happen + ((> line-number whitelist-next) + (error + (format + "We somehow skipped the next whitelist element: line %d" whitelist-next))) + + ;; we're at the next whitelist element. Skip this + ;; line, and advance the whitelist index + (t + (setq whitelist-idx (1+ whitelist-idx)) nil)))) + ,@body) + + (widen) + (forward-line) + (beginning-of-line) + (setq line-number (1+ line-number))))))) + +(defun regex-tests-compare (string what-failed bounds-ref &optional substring-ref) + "I just ran a search, looking at STRING. WHAT-FAILED describes +what failed, if anything; valid values are 'search-failed, +'compilation-failed and nil. I compare the beginning/end of each +group with their expected values. This is done with either +BOUNDS-REF or SUBSTRING-REF; one of those should be non-nil. +BOUNDS-REF is a sequence \[start-ref0 end-ref0 start-ref1 +end-ref1 ....] while SUBSTRING-REF is the expected substring +obtained by indexing the input string by start/end-ref. + +If the search was supposed to fail then start-ref0/substring-ref0 +is 'search-failed. If the search wasn't even supposed to compile +successfully, then start-ref0/substring-ref0 is +'compilation-failed. If I only care about a match succeeding, +this can be set to t. + +This function returns a string that describes the failure, or nil +on success" + + (when (or + (and bounds-ref substring-ref) + (not (or bounds-ref substring-ref))) + (error "Exactly one of bounds-ref and bounds-ref should be non-nil")) + + (let ((what-failed-ref (car (or bounds-ref substring-ref)))) + + (cond + ((eq what-failed 'search-failed) + (cond + ((eq what-failed-ref 'search-failed) + nil) + ((eq what-failed-ref 'compilation-failed) + "Expected pattern failure; but no match") + (t + "Expected match; but no match"))) + + ((eq what-failed 'compilation-failed) + (cond + ((eq what-failed-ref 'search-failed) + "Expected no match; but pattern failure") + ((eq what-failed-ref 'compilation-failed) + nil) + (t + "Expected match; but pattern failure"))) + + ;; The regex match succeeded + ((eq what-failed-ref 'search-failed) + "Expected no match; but match") + ((eq what-failed-ref 'compilation-failed) + "Expected pattern failure; but match") + + ;; The regex match succeeded, as expected. I now check all the + ;; bounds + (t + (let ((idx 0) + msg + ref next-ref-function compare-ref-function mismatched-ref-function) + + (if bounds-ref + (setq ref bounds-ref + next-ref-function (lambda (x) (cddr x)) + compare-ref-function (lambda (ref start-pos end-pos) + (or (eq (car ref) t) + (and (eq start-pos (car ref)) + (eq end-pos (cadr ref))))) + mismatched-ref-function (lambda (ref start-pos end-pos) + (format + "beginning/end positions: %d/%s and %d/%s" + start-pos (car ref) end-pos (cadr ref)))) + (setq ref substring-ref + next-ref-function (lambda (x) (cdr x)) + compare-ref-function (lambda (ref start-pos end-pos) + (or (eq (car ref) t) + (string= (substring string start-pos end-pos) (car ref)))) + mismatched-ref-function (lambda (ref start-pos end-pos) + (format + "beginning/end positions: %d/%s and %d/%s" + start-pos (car ref) end-pos (cadr ref))))) + + (while (not (or (null ref) msg)) + + (let ((start (match-beginning idx)) + (end (match-end idx))) + + (when (not (funcall compare-ref-function ref start end)) + (setq msg + (format + "Have expected match, but mismatch in group %d: %s" idx (funcall mismatched-ref-function ref start end)))) + + (setq ref (funcall next-ref-function ref) + idx (1+ idx)))) + + (or msg + nil)))))) + + + +(defun regex-tests-match (pattern string bounds-ref &optional substring-ref) + "I match the given STRING against PATTERN. I compare the +beginning/end of each group with their expected values. +BOUNDS-REF is a sequence [start-ref0 end-ref0 start-ref1 end-ref1 +....]. + +If the search was supposed to fail then start-ref0 is +'search-failed. If the search wasn't even supposed to compile +successfully, then start-ref0 is 'compilation-failed. + +This function returns a string that describes the failure, or nil +on success" + + (if (string-match "\\[\\([\\.=]\\)..?\\1\\]" pattern) + ;; Skipping test: [.x.] and [=x=] forms not supported by emacs + nil + + (regex-tests-compare + string + (condition-case nil + (if (string-match pattern string) nil 'search-failed) + ('invalid-regexp 'compilation-failed)) + bounds-ref substring-ref))) + + +(defconst regex-tests-re-even-escapes + "\\(?:^\\|[^\\\\]\\)\\(?:\\\\\\\\\\)*" + "Regex that matches an even number of \\ characters") + +(defconst regex-tests-re-odd-escapes + (concat regex-tests-re-even-escapes "\\\\") + "Regex that matches an odd number of \\ characters") + + +(defun regex-tests-unextend (pattern) + "Basic conversion from extended regexen to emacs ones. This is +mostly a hack that adds \\ to () and | and {}, and removes it if +it already exists. We also change \\S (and \\s) to \\S- (and +\\s-) because extended regexen see the former as whitespace, but +emacs requires an extra symbol character" + + (with-temp-buffer + (insert pattern) + (goto-char (point-min)) + + (while (re-search-forward "[()|{}]" nil t) + ;; point is past special character. If it is escaped, unescape + ;; it + + (if (save-excursion + (re-search-backward (concat regex-tests-re-odd-escapes ".\\=") nil t)) + + ;; This special character is preceded by an odd number of \, + ;; so I unescape it by removing the last one + (progn + (forward-char -2) + (delete-char 1) + (forward-char 1)) + + ;; This special character is preceded by an even (possibly 0) + ;; number of \. I add an escape + (forward-char -1) + (insert "\\") + (forward-char 1))) + + ;; convert \s to \s- + (goto-char (point-min)) + (while (re-search-forward (concat regex-tests-re-odd-escapes "[Ss]") nil t) + (insert "-")) + + (buffer-string))) + +(defun regex-tests-BOOST-frob-escapes (s ispattern) + "Mangle \\ the way it is done in frob_escapes() in +regex-tests-BOOST.c in glibc: \\t, \\n, \\r are interpreted; +\\\\, \\^, \{, \\|, \} are unescaped for the string (not +pattern)" + + ;; this is all similar to (regex-tests-unextend) + (with-temp-buffer + (insert s) + + (let ((interpret-list (list "t" "n" "r"))) + (while interpret-list + (goto-char (point-min)) + (while (re-search-forward + (concat "\\(" regex-tests-re-even-escapes "\\)" + "\\\\" (car interpret-list)) + nil t) + (replace-match (concat "\\1" (car (read-from-string + (concat "\"\\" (car interpret-list) "\"")))))) + + (setq interpret-list (cdr interpret-list)))) + + (when (not ispattern) + ;; unescape \\, \^, \{, \|, \} + (let ((unescape-list (list "\\\\" "^" "{" "|" "}"))) + (while unescape-list + (goto-char (point-min)) + (while (re-search-forward + (concat "\\(" regex-tests-re-even-escapes "\\)" + "\\\\" (car unescape-list)) + nil t) + (replace-match (concat "\\1" (car unescape-list)))) + + (setq unescape-list (cdr unescape-list)))) + ) + (buffer-string))) + + + + +(defconst regex-tests-BOOST-whitelist + [ + ;; emacs is more stringent with regexen involving unbalanced ) + 63 65 69 + + ;; in emacs, regex . doesn't match \n + 91 + + ;; emacs is more forgiving with * and ? that don't apply to + ;; characters + 107 108 109 122 123 124 140 141 142 + + ;; emacs accepts regexen with {} + 161 + + ;; emacs doesn't fail on bogus ranges such as [3-1] or [1-3-5] + 222 223 + + ;; emacs doesn't match (ab*)[ab]*\1 greedily: only 4 chars of + ;; ababaaa match + 284 294 + + ;; ambiguous groupings are ambiguous + 443 444 445 446 448 449 450 + + ;; emacs doesn't know how to handle weird ranges such as [a-Z] and + ;; [[:alpha:]-a] + 539 580 581 + + ;; emacs matches non-greedy regex ab.*? non-greedily + 639 677 712 + ] + "Line numbers in the boost test that should be skipped. These +are false-positive test failures that represent known/benign +differences in behavior.") + +;; - Format +;; - Comments are lines starting with ; +;; - Lines starting with - set options passed to regcomp() and regexec(): +;; - if no "REG_BASIC" is found, with have an extended regex +;; - These set a flag: +;; - REG_ICASE +;; - REG_NEWLINE +;; - REG_NOTBOL +;; - REG_NOTEOL +;; +;; - Test lines are +;; pattern string start0 end0 start1 end1 ... +;; +;; - pattern, string can have escapes +;; - string can have whitespace if enclosed in "" +;; - if string is "!", then the pattern is supposed to fail compilation +;; - start/end are of group0, group1, etc. group 0 is the full match +;; - start<0 indicates "no match" +;; - start is the 0-based index of the first character +;; - end is the 0-based index of the first character past the group +(defun regex-tests-BOOST () + (let (failures + basic icase newline notbol noteol) + (regex-tests-generic-line + ?; "regex-resources/BOOST.tests" regex-tests-BOOST-whitelist + (if (save-excursion (re-search-forward "^-" nil t)) + (setq basic (save-excursion (re-search-forward "REG_BASIC" nil t)) + icase (save-excursion (re-search-forward "REG_ICASE" nil t)) + newline (save-excursion (re-search-forward "REG_NEWLINE" nil t)) + notbol (save-excursion (re-search-forward "REG_NOTBOL" nil t)) + noteol (save-excursion (re-search-forward "REG_NOTEOL" nil t))) + + (save-excursion + (or (re-search-forward "\\(\\S-+\\)\\s-+\"\\(.*\\)\"\\s-+?\\(.+\\)" nil t) + (re-search-forward "\\(\\S-+\\)\\s-+\\(\\S-+\\)\\s-+?\\(.+\\)" nil t) + (re-search-forward "\\(\\S-+\\)\\s-+\\(!\\)" nil t))) + + (let* ((pattern-raw (match-string 1)) + (string-raw (match-string 2)) + (positions-raw (match-string 3)) + (pattern (regex-tests-BOOST-frob-escapes pattern-raw t)) + (string (regex-tests-BOOST-frob-escapes string-raw nil)) + (positions + (if (string= string "!") + (list 'compilation-failed 0) + (mapcar + (lambda (x) + (let ((x (string-to-number x))) + (if (< x 0) nil x))) + (split-string positions-raw))))) + + (when (null (car positions)) + (setcar positions 'search-failed)) + + (when (not basic) + (setq pattern (regex-tests-unextend pattern))) + + ;; great. I now have all the data parsed. Let's use it to do + ;; stuff + (let* ((case-fold-search icase) + (msg (regex-tests-match pattern string positions))) + + (if (and + ;; Skipping test: notbol/noteol not supported + (not notbol) (not noteol) + + msg) + + ;; store failure + (setq failures + (cons (format "line number %d: Regex '%s': %s" + line-number pattern msg) + failures))))))) + + failures)) + +(defconst regex-tests-PCRE-whitelist + [ + ;; ambiguous groupings are ambiguous + 610 611 1154 1157 1160 1168 1171 1176 1179 1182 1185 1188 1193 1196 1203 + ] + "Line numbers in the PCRE test that should be skipped. These +are false-positive test failures that represent known/benign +differences in behavior.") + +;; - Format +;; +;; regex +;; input_string +;; group_num: group_match | "No match" +;; input_string +;; group_num: group_match | "No match" +;; input_string +;; group_num: group_match | "No match" +;; input_string +;; group_num: group_match | "No match" +;; ... +(defun regex-tests-PCRE () + (let (failures + pattern icase string what-failed matches-observed) + (regex-tests-generic-line + ?# "regex-resources/PCRE.tests" regex-tests-PCRE-whitelist + + (cond + + ;; pattern + ((save-excursion (re-search-forward "^/\\(.*\\)/\\(.*i?\\)$" nil t)) + (setq icase (string= "i" (match-string 2)) + pattern (regex-tests-unextend (match-string 1)))) + + ;; string. read it in, match against pattern, and save all the results + ((save-excursion (re-search-forward "^ \\(.*\\)" nil t)) + (let ((case-fold-search icase)) + (setq string (match-string 1) + + ;; the regex match under test + what-failed + (condition-case nil + (if (string-match pattern string) nil 'search-failed) + ('invalid-regexp 'compilation-failed)) + + matches-observed + (loop for x from 0 to 20 + collect (and (not what-failed) + (or (match-string x string) ""))))) + nil) + + ;; verification line: failed match + ((save-excursion (re-search-forward "^No match" nil t)) + (unless what-failed + (setq failures + (cons (format "line number %d: Regex '%s': Expected no match; but match" + line-number pattern) + failures)))) + + ;; verification line: succeeded match + ((save-excursion (re-search-forward "^ *\\([0-9]+\\): \\(.*\\)" nil t)) + (let* ((match-ref (match-string 2)) + (idx (string-to-number (match-string 1)))) + + (if what-failed + "Expected match; but no match" + (unless (string= match-ref (elt matches-observed idx)) + (setq failures + (cons (format "line number %d: Regex '%s': Have expected match, but group %d is wrong: '%s'/'%s'" + line-number pattern + idx match-ref (elt matches-observed idx)) + failures)))))) + + ;; reset + (t (setq pattern nil) nil))) + + failures)) + +(defconst regex-tests-PTESTS-whitelist + [ + ;; emacs doesn't barf on weird ranges such as [b-a], but simply + ;; fails to match + 138 + + ;; emacs doesn't see DEL (0x78) as a [:cntrl:] character + 168 + ] + "Line numbers in the PTESTS test that should be skipped. These +are false-positive test failures that represent known/benign +differences in behavior.") + +;; - Format +;; - fields separated by ¦ (note: this is not a |) +;; - start¦end¦pattern¦string +;; - start is the 1-based index of the first character +;; - end is the 1-based index of the last character +(defun regex-tests-PTESTS () + (let (failures) + (regex-tests-generic-line + ?# "regex-resources/PTESTS" regex-tests-PTESTS-whitelist + (let* ((fields (split-string (buffer-string) "¦")) + + ;; string has 1-based index of first char in the + ;; match. -1 means "no match". -2 means "invalid + ;; regex". + ;; + ;; start-ref is 0-based index of first char in the + ;; match + ;; + ;; string==0 is a special case, and I have to treat + ;; it as start-ref = 0 + (start-ref (let ((raw (string-to-number (elt fields 0)))) + (cond + ((= raw -2) 'compilation-failed) + ((= raw -1) 'search-failed) + ((= raw 0) 0) + (t (1- raw))))) + + ;; string has 1-based index of last char in the + ;; match. end-ref is 0-based index of first char past + ;; the match + (end-ref (string-to-number (elt fields 1))) + (pattern (elt fields 2)) + (string (elt fields 3))) + + (let ((msg (regex-tests-match pattern string (list start-ref end-ref)))) + (when msg + (setq failures + (cons (format "line number %d: Regex '%s': %s" + line-number pattern msg) + failures)))))) + failures)) + +(defconst regex-tests-TESTS-whitelist + [ + ;; emacs doesn't barf on weird ranges such as [b-a], but simply + ;; fails to match + 42 + + ;; emacs is more forgiving with * and ? that don't apply to + ;; characters + 57 58 59 60 + + ;; emacs is more stringent with regexen involving unbalanced ) + 67 + ] + "Line numbers in the TESTS test that should be skipped. These +are false-positive test failures that represent known/benign +differences in behavior.") + +;; - Format +;; - fields separated by :. Watch for [\[:xxx:]] +;; - expected:pattern:string +;; +;; expected: +;; | 0 | successful match | +;; | 1 | failed match | +;; | 2 | regcomp() should fail | +(defun regex-tests-TESTS () + (let (failures) + (regex-tests-generic-line + ?# "regex-resources/TESTS" regex-tests-TESTS-whitelist + (if (save-excursion (re-search-forward "^\\([^:]+\\):\\(.*\\):\\([^:]*\\)$" nil t)) + (let* ((what-failed + (let ((raw (string-to-number (match-string 1)))) + (cond + ((= raw 2) 'compilation-failed) + ((= raw 1) 'search-failed) + (t t)))) + (string (match-string 3)) + (pattern (regex-tests-unextend (match-string 2)))) + + (let ((msg (regex-tests-match pattern string nil (list what-failed)))) + (when msg + (setq failures + (cons (format "line number %d: Regex '%s': %s" + line-number pattern msg) + failures))))) + + (error "Error parsing TESTS file line: '%s'" (buffer-string)))) + failures)) + +(ert-deftest regex-tests () + "Tests of the regular expression engine. This evaluates the +BOOST, PCRE, PTESTS and TESTS test cases from glibc." + (should-not (regex-tests-BOOST)) + (should-not (regex-tests-PCRE)) + (should-not (regex-tests-PTESTS)) + (should-not (regex-tests-TESTS))) + ;;; regex-tests.el ends here -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:51:04 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:51:05 +0000 Received: from localhost ([127.0.0.1]:39391 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2y-0007sE-Gb for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:04 -0400 Received: from mail-wm0-f43.google.com ([74.125.82.43]:37690) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2u-0007qi-86 for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:00 -0400 Received: by mail-wm0-f43.google.com with SMTP id i5so71438374wmg.0 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 09:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=X0x6qNUD/ei9JG110ErSTsWelEZuEUFqFoghci+Kwlc=; b=nDfPKJVwnnEg3ze1eH6o/3VwJYvTnUgDn1V+NZWR+xnjCTUaYqfMphF9SMFwW4EKHb sPmlRQbiMbXzqEKja9sh8sjP5UBmVMIr5sCkz6yUH0BAEz59SbAA1ekiB8qfOK19vjAA UzoU0BbesVDG2G7Mpkcx6W8QQeEkLrm6agMu7AJJYNMuu0HNIoRumGpnkz4a7T8Hap6Z 5YhnbWKr+apZIMS4fmOdfiaxAOCjAxNa9m5gtdASVyGYW6FTh2nnq1c8SbugNWir3pOa ma+a1EYYGmCKW6pdHmTNmh3fDmdOzYbF2KpArWJ5tdYWHy61LwOntXtqbxKtOYogD/ox Zdlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=X0x6qNUD/ei9JG110ErSTsWelEZuEUFqFoghci+Kwlc=; b=VolAhJRMdg+2r0BDXQoCe7Wn0NTLK+uZlRQO4HxougRb6YoiYEw91A+MmO/FHOxO6I 7/gzpkXE94eVwK3UUBzx63NnWFoA6/4KTfLo/tp8zgLppTXHO1uZSKpmWyJmmz68RXBU EN969cd1FEBvEhnl356JHbzMqH5aP+29WWD5vOSckr5wwnakal2Xu+xEdQ7LO78soqwk lP2VODyaN31Z8173+GhzWZSHtkvf6U3LBodyihqlI8tULLblx5gY1T7S9c/SR+4erCbN 4MHdUt6gSkKQgmmAIXYIaBk/EvsyelAWhH8LlwEOS3PaMMjxyD5TGLbOgeeRMWK43omI 0RCA== X-Gm-Message-State: ALyK8tKj6VEANDqAbMPPbWOTaNjGhAr0XjxBt+URhsZgiX88u+s2nAKi3jsZQjFT3lp+KDZo X-Received: by 10.28.131.199 with SMTP id f190mr54335682wmd.30.1469638254170; Wed, 27 Jul 2016 09:50:54 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([2620:0:105f:301:2815:7585:2e57:6c3b]) by smtp.gmail.com with ESMTPSA id q4sm7263095wjk.24.2016.07.27.09.50.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:50:53 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 83FB31E35E8; Wed, 27 Jul 2016 18:50:50 +0200 (CEST) From: Michal Nazarewicz To: 24071@debbugs.gnu.org Subject: [PATCH 6/7] Remove dead opcodes in regex bytecode Date: Wed, 27 Jul 2016 18:50:46 +0200 Message-Id: <1469638247-20131-6-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1469638247-20131-1-git-send-email-mina86@mina86.com> References: <83d1m0tq25.fsf@gnu.org> <1469638247-20131-1-git-send-email-mina86@mina86.com> X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) There is no way to specify before_dot and after_dot opcodes in a regex so code handling those ends up being dead. Remove it. * src/regex.c (print_partial_compiled_pattern, regex_compile, analyze_first, re_match_2_internal): Remove handling and references to before_dot and after_dot opcodes. --- src/regex.c | 28 +--------------------------- 1 file changed, 1 insertion(+), 27 deletions(-) diff --git a/src/regex.c b/src/regex.c index 1f2a1f08..cca00e9 100644 --- a/src/regex.c +++ b/src/regex.c @@ -669,9 +669,7 @@ typedef enum notsyntaxspec #ifdef emacs - ,before_dot, /* Succeeds if before point. */ - at_dot, /* Succeeds if at point. */ - after_dot, /* Succeeds if after point. */ + , at_dot, /* Succeeds if at point. */ /* Matches any character whose category-set contains the specified category. The operator is followed by a byte which contains a @@ -1053,18 +1051,10 @@ print_partial_compiled_pattern (re_char *start, re_char *end) break; # ifdef emacs - case before_dot: - fprintf (stderr, "/before_dot"); - break; - case at_dot: fprintf (stderr, "/at_dot"); break; - case after_dot: - fprintf (stderr, "/after_dot"); - break; - case categoryspec: fprintf (stderr, "/categoryspec"); mcnt = *p++; @@ -3422,8 +3412,6 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, goto normal_char; #ifdef emacs - /* There is no way to specify the before_dot and after_dot - operators. rms says this is ok. --karl */ case '=': laststart = b; BUF_PUSH (at_dot); @@ -4000,9 +3988,7 @@ analyze_first (const_re_char *p, const_re_char *pend, char *fastmap, /* All cases after this match the empty string. These end with `continue'. */ - case before_dot: case at_dot: - case after_dot: #endif /* !emacs */ case no_op: case begline: @@ -6150,24 +6136,12 @@ re_match_2_internal (struct re_pattern_buffer *bufp, const_re_char *string1, break; #ifdef emacs - case before_dot: - DEBUG_PRINT ("EXECUTING before_dot.\n"); - if (PTR_BYTE_POS (d) >= PT_BYTE) - goto fail; - break; - case at_dot: DEBUG_PRINT ("EXECUTING at_dot.\n"); if (PTR_BYTE_POS (d) != PT_BYTE) goto fail; break; - case after_dot: - DEBUG_PRINT ("EXECUTING after_dot.\n"); - if (PTR_BYTE_POS (d) <= PT_BYTE) - goto fail; - break; - case categoryspec: case notcategoryspec: { -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:51:10 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:51:10 +0000 Received: from localhost ([127.0.0.1]:39393 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS33-0007sa-TS for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:10 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:38418) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2v-0007qm-HH for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:03 -0400 Received: by mail-wm0-f46.google.com with SMTP id o80so71418779wme.1 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 09:51:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=1g8VfmYLBC8/5cV7O+2AWcpG8IkNKCk51i7GoYB5NW4=; b=a8SCOK4eNmZwDE8ghHJQSZn1blJk5uz63/HQfho3dQRypENgWaEtxo4GzBKMi7V1z7 lcvkKvxDhZjy1UrL7uB2AAA+Eks7YWhnNbATIwOLIemIJyjMvatSf5SDqd2E9NL2ctLH SV01eWkJNtA5QMMzIP9DrqKyjDi2pZ72JRSdTuzSsHm7Ugy0WPyGll7ZrKVnV9xhbBgJ 6JpwU1PM3AdPh7aCMC+1CpdASk3h6PsWL6ZHqvD+TzSvkEO0q0CU+hAo74TNrs9jQ0QO Y//jMRrOZ2hg/bhEEchsIDX23GlSHOR/PFQVpxmEAN9EhEMvAnM9PUGCWwShxpl+149D lBog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=1g8VfmYLBC8/5cV7O+2AWcpG8IkNKCk51i7GoYB5NW4=; b=kUpiMCB/05rQZ+PImxgNzuiHsmEx0IXcH+7w6l78IZk7esq3pvBtn4aYGQIfd7D28k 4N8ZtyNuy3C//6bKskOQH/3+khUObad6WtZpKuqb49rrWI5F2Z84bpXNOEDu7NkHYLiD Qn4Y8dRHgIOAIwTnujNvoC/qHeiWJ1DDL4k/v0JDjKcpyTNMiE1ulMIqX+RkUvZHVwix iklV/7xmerAv9znvUIN1/D2YpbCVl+x8h48vUAkzqfS/HyIiCAxqr1dbyOzIhy1PZWUr nNj8GpazXVS7kCtEuNA52uSVWHGNQVIqP2GuyAYf5H08pBGxYZjZu2QG8M99Q96PAGL8 PYdQ== X-Gm-Message-State: AEkooutuE81QvaId0RaHqnpWv22d4fi7P4byyjVtVLcsMwnb49Prnb54zeC6VOrlRr0X/D7y X-Received: by 10.28.168.150 with SMTP id r144mr28415814wme.66.1469638254809; Wed, 27 Jul 2016 09:50:54 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([2620:0:105f:301:2815:7585:2e57:6c3b]) by smtp.gmail.com with ESMTPSA id b203sm7975702wmh.20.2016.07.27.09.50.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:50:53 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id A65F91E35EA; Wed, 27 Jul 2016 18:50:50 +0200 (CEST) From: Michal Nazarewicz To: 24071@debbugs.gnu.org Subject: [PATCH 7/7] Refactor regex character class parsing in [:name:] Date: Wed, 27 Jul 2016 18:50:47 +0200 Message-Id: <1469638247-20131-7-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1469638247-20131-1-git-send-email-mina86@mina86.com> References: <83d1m0tq25.fsf@gnu.org> <1469638247-20131-1-git-send-email-mina86@mina86.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) re_wctype function is used in three separate places and in all of those places almost exact code extracting the name from [:name:] surrounds it. Furthermore, re_wctype requires a NUL-terminated string, so the name of the character class is copied to a temporary buffer. The code duplication and unnecessary memory copying can be avoided by pushing the responsibility of parsing the whole [:name:] sequence to the function. Furthermore, since now the function has access to the length of the character class name (since it’s doing the parsing), it can take advantage of that information in skipping some string comparisons and using a constant-length memcmp instead of strcmp which needs to take care of NUL bytes. * src/regex.c (re_wctype): Delete function. Replace it with: (re_wctype_parse): New function which parses a whole [:name:] string and returns a RECC_* constant or -1 if the string is not of [:name:] format. (regex_compile): Use re_wctype_parse. * src/syntax.c (skip_chars): Use re_wctype_parse. --- src/regex.c | 312 +++++++++++++++++++++++++++++------------------------------ src/regex.h | 14 +-- src/syntax.c | 96 +++++------------- 3 files changed, 183 insertions(+), 239 deletions(-) diff --git a/src/regex.c b/src/regex.c index cca00e9..d482316 100644 --- a/src/regex.c +++ b/src/regex.c @@ -1959,29 +1959,98 @@ struct range_table_work_area #if ! WIDE_CHAR_SUPPORT -/* Map a string to the char class it names (if any). */ +/* Parse a character class, i.e. string such as "[:name:]". *strp + points to the string to be parsed and limit is length, in bytes, of + that string. + + If *strp point to a string that begins with "[:name:]", where name is + a non-empty sequence of lower case letters, *strp will be advanced past the + closing square bracket and RECC_* constant which maps to the name will be + returned. If name is not a valid character class name zero, or RECC_ERROR, + is returned. + + Otherwise, if *strp doesn’t begin with "[:name:]", -1 is returned. + + The function can be used on ASCII and multibyte (UTF-8-encoded) strings. + */ re_wctype_t -re_wctype (const_re_char *str) +re_wctype_parse (const unsigned char **strp, unsigned limit) { - const char *string = (const char *) str; - if (STREQ (string, "alnum")) return RECC_ALNUM; - else if (STREQ (string, "alpha")) return RECC_ALPHA; - else if (STREQ (string, "word")) return RECC_WORD; - else if (STREQ (string, "ascii")) return RECC_ASCII; - else if (STREQ (string, "nonascii")) return RECC_NONASCII; - else if (STREQ (string, "graph")) return RECC_GRAPH; - else if (STREQ (string, "lower")) return RECC_LOWER; - else if (STREQ (string, "print")) return RECC_PRINT; - else if (STREQ (string, "punct")) return RECC_PUNCT; - else if (STREQ (string, "space")) return RECC_SPACE; - else if (STREQ (string, "upper")) return RECC_UPPER; - else if (STREQ (string, "unibyte")) return RECC_UNIBYTE; - else if (STREQ (string, "multibyte")) return RECC_MULTIBYTE; - else if (STREQ (string, "digit")) return RECC_DIGIT; - else if (STREQ (string, "xdigit")) return RECC_XDIGIT; - else if (STREQ (string, "cntrl")) return RECC_CNTRL; - else if (STREQ (string, "blank")) return RECC_BLANK; - else return 0; + const char *beg = (const char *)*strp, *end, *it; + + if (limit < 5 || beg[0] != '[' || beg[1] != ':') + return -1; + + end = beg + limit - 2; /* ‘- 2’ for the closing ":]" */ + beg += 2; /* skip opening "[:" */ + it = beg; + while (it != end && *it >= 'a' && *it <= 'z') + ++it; + if (it[0] != ':' || it[1] != ']') + return -1; + + *strp = (const unsigned char *)(it + 2); + + /* Sort tests in the length=five case by frequency the classes to minimise + number of times we fail the comparison. The frequencies of character class + names used in Emacs sources as of 2016-07-27: + + $ find \( -name \*.c -o -name \*.el \) -exec grep -h '\[:[a-z]*:]' {} + | + sed 's/]/]\n/g' |grep -o '\[:[a-z]*:]' |sort |uniq -c |sort -nr + 213 [:alnum:] + 104 [:alpha:] + 62 [:space:] + 39 [:digit:] + 36 [:blank:] + 26 [:word:] + 26 [:upper:] + 21 [:lower:] + 10 [:xdigit:] + 10 [:punct:] + 10 [:ascii:] + 4 [:nonascii:] + 4 [:graph:] + 2 [:print:] + 2 [:cntrl:] + 1 [:ff:] + + If you update this list, consider also updating chain of or’ed conditions + in execute_charset function. + */ + + switch (it - beg) { + case 4: + if (!memcmp (beg, "word", 4)) return RECC_WORD; + break; + case 5: + if (!memcmp (beg, "alnum", 5)) return RECC_ALNUM; + if (!memcmp (beg, "alpha", 5)) return RECC_ALPHA; + if (!memcmp (beg, "space", 5)) return RECC_SPACE; + if (!memcmp (beg, "digit", 5)) return RECC_DIGIT; + if (!memcmp (beg, "blank", 5)) return RECC_BLANK; + if (!memcmp (beg, "upper", 5)) return RECC_UPPER; + if (!memcmp (beg, "lower", 5)) return RECC_LOWER; + if (!memcmp (beg, "punct", 5)) return RECC_PUNCT; + if (!memcmp (beg, "ascii", 5)) return RECC_ASCII; + if (!memcmp (beg, "graph", 5)) return RECC_GRAPH; + if (!memcmp (beg, "print", 5)) return RECC_PRINT; + if (!memcmp (beg, "cntrl", 5)) return RECC_CNTRL; + break; + case 6: + if (!memcmp (beg, "xdigit", 6)) return RECC_XDIGIT; + break; + case 7: + if (!memcmp (beg, "unibyte", 7)) return RECC_UNIBYTE; + break; + case 8: + if (!memcmp (beg, "nonascii", 8)) return RECC_NONASCII; + break; + case 9: + if (!memcmp (beg, "multibyte", 9)) return RECC_MULTIBYTE; + break; + } + + return RECC_ERROR; } /* True if CH is in the char class CC. */ @@ -2766,10 +2835,74 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, { boolean escaped_char = false; const unsigned char *p2 = p; + re_wctype_t cc; re_wchar_t ch; if (p == pend) FREE_STACK_RETURN (REG_EBRACK); + /* See if we're at the beginning of a possible character + class. */ + if (syntax & RE_CHAR_CLASSES && + (cc = re_wctype_parse(&p, pend - p)) != -1) + { + if (cc == 0) + FREE_STACK_RETURN (REG_ECTYPE); + + if (p == pend) + FREE_STACK_RETURN (REG_EBRACK); + +#ifndef emacs + for (ch = 0; ch < (1 << BYTEWIDTH); ++ch) + if (re_iswctype (btowc (ch), cc)) + { + c = TRANSLATE (ch); + if (c < (1 << BYTEWIDTH)) + SET_LIST_BIT (c); + } +#else /* emacs */ + /* Most character classes in a multibyte match just set + a flag. Exceptions are is_blank, is_digit, is_cntrl, and + is_xdigit, since they can only match ASCII characters. + We don't need to handle them for multibyte. They are + distinguished by a negative wctype. */ + + /* Setup the gl_state object to its buffer-defined value. + This hardcodes the buffer-global syntax-table for ASCII + chars, while the other chars will obey syntax-table + properties. It's not ideal, but it's the way it's been + done until now. */ + SETUP_BUFFER_SYNTAX_TABLE (); + + for (ch = 0; ch < 256; ++ch) + { + c = RE_CHAR_TO_MULTIBYTE (ch); + if (! CHAR_BYTE8_P (c) + && re_iswctype (c, cc)) + { + SET_LIST_BIT (ch); + c1 = TRANSLATE (c); + if (c1 == c) + continue; + if (ASCII_CHAR_P (c1)) + SET_LIST_BIT (c1); + else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0) + SET_LIST_BIT (c1); + } + } + SET_RANGE_TABLE_WORK_AREA_BIT + (range_table_work, re_wctype_to_bit (cc)); +#endif /* emacs */ + /* In most cases the matching rule for char classes only + uses the syntax table for multibyte chars, so that the + content of the syntax-table is not hardcoded in the + range_table. SPACE and WORD are the two exceptions. */ + if ((1 << cc) & ((1 << RECC_SPACE) | (1 << RECC_WORD))) + bufp->used_syntax = 1; + + /* Repeat the loop. */ + continue; + } + /* Don't translate yet. The range TRANSLATE(X..Y) cannot always be determined from TRANSLATE(X) and TRANSLATE(Y) So the translation is done later in a loop. Example: @@ -2793,119 +2926,6 @@ regex_compile (const_re_char *pattern, size_t size, reg_syntax_t syntax, break; } - /* See if we're at the beginning of a possible character - class. */ - - if (!escaped_char && - syntax & RE_CHAR_CLASSES && c == '[' && *p == ':') - { - /* Leave room for the null. */ - unsigned char str[CHAR_CLASS_MAX_LENGTH + 1]; - const unsigned char *class_beg; - - PATFETCH (c); - c1 = 0; - class_beg = p; - - /* If pattern is `[[:'. */ - if (p == pend) FREE_STACK_RETURN (REG_EBRACK); - - for (;;) - { - PATFETCH (c); - if ((c == ':' && *p == ']') || p == pend) - break; - if (c1 < CHAR_CLASS_MAX_LENGTH) - str[c1++] = c; - else - /* This is in any case an invalid class name. */ - str[0] = '\0'; - } - str[c1] = '\0'; - - /* If isn't a word bracketed by `[:' and `:]': - undo the ending character, the letters, and - leave the leading `:' and `[' (but set bits for - them). */ - if (c == ':' && *p == ']') - { - re_wctype_t cc = re_wctype (str); - - if (cc == 0) - FREE_STACK_RETURN (REG_ECTYPE); - - /* Throw away the ] at the end of the character - class. */ - PATFETCH (c); - - if (p == pend) FREE_STACK_RETURN (REG_EBRACK); - -#ifndef emacs - for (ch = 0; ch < (1 << BYTEWIDTH); ++ch) - if (re_iswctype (btowc (ch), cc)) - { - c = TRANSLATE (ch); - if (c < (1 << BYTEWIDTH)) - SET_LIST_BIT (c); - } -#else /* emacs */ - /* Most character classes in a multibyte match - just set a flag. Exceptions are is_blank, - is_digit, is_cntrl, and is_xdigit, since - they can only match ASCII characters. We - don't need to handle them for multibyte. - They are distinguished by a negative wctype. */ - - /* Setup the gl_state object to its buffer-defined - value. This hardcodes the buffer-global - syntax-table for ASCII chars, while the other chars - will obey syntax-table properties. It's not ideal, - but it's the way it's been done until now. */ - SETUP_BUFFER_SYNTAX_TABLE (); - - for (ch = 0; ch < 256; ++ch) - { - c = RE_CHAR_TO_MULTIBYTE (ch); - if (! CHAR_BYTE8_P (c) - && re_iswctype (c, cc)) - { - SET_LIST_BIT (ch); - c1 = TRANSLATE (c); - if (c1 == c) - continue; - if (ASCII_CHAR_P (c1)) - SET_LIST_BIT (c1); - else if ((c1 = RE_CHAR_TO_UNIBYTE (c1)) >= 0) - SET_LIST_BIT (c1); - } - } - SET_RANGE_TABLE_WORK_AREA_BIT - (range_table_work, re_wctype_to_bit (cc)); -#endif /* emacs */ - /* In most cases the matching rule for char classes - only uses the syntax table for multibyte chars, - so that the content of the syntax-table is not - hardcoded in the range_table. SPACE and WORD are - the two exceptions. */ - if ((1 << cc) & ((1 << RECC_SPACE) | (1 << RECC_WORD))) - bufp->used_syntax = 1; - - /* Repeat the loop. */ - continue; - } - else - { - /* Go back to right after the "[:". */ - p = class_beg; - SET_LIST_BIT ('['); - - /* Because the `:' may start the range, we - can't simply set bit and repeat the loop. - Instead, just set it to C and handle below. */ - c = ':'; - } - } - if (p < pend && p[0] == '-' && p[1] != ']') { @@ -4645,28 +4665,8 @@ execute_charset (const_re_char **pp, unsigned c, unsigned corig, bool unibyte) re_wchar_t range_start, range_end; /* Sort tests by the most commonly used classes with some adjustment to which - tests are easiest to perform. Frequencies of character class names used in - Emacs sources as of 2016-07-15: - - $ find \( -name \*.c -o -name \*.el \) -exec grep -h '\[:[a-z]*:]' {} + | - sed 's/]/]\n/g' |grep -o '\[:[a-z]*:]' |sort |uniq -c |sort -nr - 213 [:alnum:] - 104 [:alpha:] - 62 [:space:] - 39 [:digit:] - 36 [:blank:] - 26 [:upper:] - 24 [:word:] - 21 [:lower:] - 10 [:punct:] - 10 [:ascii:] - 9 [:xdigit:] - 4 [:nonascii:] - 4 [:graph:] - 2 [:print:] - 2 [:cntrl:] - 1 [:ff:] - */ + tests are easiest to perform. Take a look at comment in re_wctype_parse + for table with frequencies of character class names. */ if ((class_bits & BIT_MULTIBYTE) || (class_bits & BIT_ALNUM && ISALNUM (c)) || diff --git a/src/regex.h b/src/regex.h index 817167a..01b659a 100644 --- a/src/regex.h +++ b/src/regex.h @@ -585,25 +585,13 @@ extern void regfree (regex_t *__preg); /* Solaris 2.5 has a bug: must be included before . */ # include # include -#endif -#if WIDE_CHAR_SUPPORT -/* The GNU C library provides support for user-defined character classes - and the functions from ISO C amendment 1. */ -# ifdef CHARCLASS_NAME_MAX -# define CHAR_CLASS_MAX_LENGTH CHARCLASS_NAME_MAX -# else -/* This shouldn't happen but some implementation might still have this - problem. Use a reasonable default value. */ -# define CHAR_CLASS_MAX_LENGTH 256 -# endif typedef wctype_t re_wctype_t; typedef wchar_t re_wchar_t; # define re_wctype wctype # define re_iswctype iswctype # define re_wctype_to_bit(cc) 0 #else -# define CHAR_CLASS_MAX_LENGTH 9 /* Namely, `multibyte'. */ # ifndef emacs # define btowc(c) c # endif @@ -621,7 +609,7 @@ typedef enum { RECC_ERROR = 0, } re_wctype_t; extern char re_iswctype (int ch, re_wctype_t cc); -extern re_wctype_t re_wctype (const unsigned char* str); +extern re_wctype_t re_wctype_parse (const unsigned char **strp, unsigned limit); typedef int re_wchar_t; diff --git a/src/syntax.c b/src/syntax.c index f8d987b..667de40 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -1691,44 +1691,22 @@ skip_chars (bool forwardp, Lisp_Object string, Lisp_Object lim, /* At first setup fastmap. */ while (i_byte < size_byte) { - c = str[i_byte++]; - - if (handle_iso_classes && c == '[' - && i_byte < size_byte - && str[i_byte] == ':') + if (handle_iso_classes) { - const unsigned char *class_beg = str + i_byte + 1; - const unsigned char *class_end = class_beg; - const unsigned char *class_limit = str + size_byte - 2; - /* Leave room for the null. */ - unsigned char class_name[CHAR_CLASS_MAX_LENGTH + 1]; - re_wctype_t cc; - - if (class_limit - class_beg > CHAR_CLASS_MAX_LENGTH) - class_limit = class_beg + CHAR_CLASS_MAX_LENGTH; - - while (class_end < class_limit - && *class_end >= 'a' && *class_end <= 'z') - class_end++; - - if (class_end == class_beg - || *class_end != ':' || class_end[1] != ']') - goto not_a_class_name; - - memcpy (class_name, class_beg, class_end - class_beg); - class_name[class_end - class_beg] = 0; - - cc = re_wctype (class_name); + const unsigned char *ch = str + i_byte; + re_wctype_t cc = re_wctype_parse (&ch, size_byte - i_byte); if (cc == 0) error ("Invalid ISO C character class"); - - iso_classes = Fcons (make_number (cc), iso_classes); - - i_byte = class_end + 2 - str; - continue; + if (cc != -1) + { + iso_classes = Fcons (make_number (cc), iso_classes); + i_byte = ch - str; + continue; + } } - not_a_class_name: + c = str[i_byte++]; + if (c == '\\') { if (i_byte == size_byte) @@ -1808,54 +1786,32 @@ skip_chars (bool forwardp, Lisp_Object string, Lisp_Object lim, while (i_byte < size_byte) { int leading_code = str[i_byte]; - c = STRING_CHAR_AND_LENGTH (str + i_byte, len); - i_byte += len; - if (handle_iso_classes && c == '[' - && i_byte < size_byte - && STRING_CHAR (str + i_byte) == ':') + if (handle_iso_classes) { - const unsigned char *class_beg = str + i_byte + 1; - const unsigned char *class_end = class_beg; - const unsigned char *class_limit = str + size_byte - 2; - /* Leave room for the null. */ - unsigned char class_name[CHAR_CLASS_MAX_LENGTH + 1]; - re_wctype_t cc; - - if (class_limit - class_beg > CHAR_CLASS_MAX_LENGTH) - class_limit = class_beg + CHAR_CLASS_MAX_LENGTH; - - while (class_end < class_limit - && *class_end >= 'a' && *class_end <= 'z') - class_end++; - - if (class_end == class_beg - || *class_end != ':' || class_end[1] != ']') - goto not_a_class_name_multibyte; - - memcpy (class_name, class_beg, class_end - class_beg); - class_name[class_end - class_beg] = 0; - - cc = re_wctype (class_name); + const unsigned char *ch = str + i_byte; + re_wctype_t cc = re_wctype_parse (&ch, size_byte - i_byte); if (cc == 0) error ("Invalid ISO C character class"); - - iso_classes = Fcons (make_number (cc), iso_classes); - - i_byte = class_end + 2 - str; - continue; + if (cc != -1) + { + iso_classes = Fcons (make_number (cc), iso_classes); + i_byte = ch - str; + continue; + } } - not_a_class_name_multibyte: - if (c == '\\') + if (leading_code== '\\') { - if (i_byte == size_byte) + if (++i_byte == size_byte) break; leading_code = str[i_byte]; - c = STRING_CHAR_AND_LENGTH (str + i_byte, len); - i_byte += len; } + c = STRING_CHAR_AND_LENGTH (str + i_byte, len); + i_byte += len; + + /* Treat `-' as range character only if another character follows. */ if (i_byte + 1 < size_byte -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 12:51:12 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 16:51:12 +0000 Received: from localhost ([127.0.0.1]:39395 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS34-0007sc-Jf for submit@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:12 -0400 Received: from mail-wm0-f45.google.com ([74.125.82.45]:34891) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSS2u-0007qj-BQ for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 12:51:04 -0400 Received: by mail-wm0-f45.google.com with SMTP id f65so220701811wmi.0 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 09:51:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0hjq6ProBu8qsYQ8sPvkU66iD/dGFRF3KvWu9bOx1Zg=; b=QYh4xoKWnUyvTKGuI6IlnbWbujnZqIKOWvl6jzkvGBN1/3z4VauAgFNQPDxK5//Sdw a4FN1iXZeOzS17xGjB+FnNywaK28YmH8x3E4mYL99mCz2hq7OoV5j2J6ndgIRkJ9rvqM 853x+SKkEOXRK1KmfTcMvuaiupjrNWONjEBvRU2rKg5EZyX7i53P9nrCg5Hti4OkD9pe agFfr6SUuCY7+MSkqNcEKy+ZUH1Xfb66KmuUenvujbiugcufU5bbP6T0b+LgbE2rebMf ZMxxmFzl2pHxAt9JFW+9UMA95WfG5P5ynkglB8scQKAMGc0Uthus9pYpb8iCdwQ7oDe+ NMWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=0hjq6ProBu8qsYQ8sPvkU66iD/dGFRF3KvWu9bOx1Zg=; b=UDy0+98zWclkuKohwOTzjyTbkQxxAx5ZKzNGqmmJIqRIpipVqpS+GCGxb3gN9hMPNg behLzXelDb19iVrQIGCzhHiAVG5SHSG4R/W72HXFQTNaOpUpioA+JpgMH2n0sncI3eFT reSaZMTlJg2iv+UscZbUxk0+T2y0HeiXkHqXaFU4d3idqEEU8zDMikRg+kHetZC7zv49 wN5XZK2SLcHCPrMKMaVAK+xVmvO1jCQW3tYVUstiAaJFONN/1xtE+UwYmu0YzmTDwmHp VaywY0JjlIvxiyi+k9lwkhylXY//NxFuGEJzSTHiQuj00v3wUPCMxeMHYWKTEl839kjy q60w== X-Gm-Message-State: ALyK8tLgFfyz13y57JAdUxLfvU/RrNWOLmrb7z7O+8plmYM4UkUb9O2kd55NZADdsMb+oiLV X-Received: by 10.28.101.5 with SMTP id z5mr55875894wmb.9.1469638251821; Wed, 27 Jul 2016 09:50:51 -0700 (PDT) Received: from mpn.zrh.corp.google.com ([2620:0:105f:301:2815:7585:2e57:6c3b]) by smtp.gmail.com with ESMTPSA id a3sm7231338wjm.30.2016.07.27.09.50.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jul 2016 09:50:50 -0700 (PDT) Received: by mpn.zrh.corp.google.com (Postfix, from userid 126942) id 1D1641E0268; Wed, 27 Jul 2016 18:50:48 +0200 (CEST) From: Michal Nazarewicz To: 24071@debbugs.gnu.org Subject: [PATCH 1/7] New regex tests imported from glibc 2.21 Date: Wed, 27 Jul 2016 18:50:41 +0200 Message-Id: <1469638247-20131-1-git-send-email-mina86@mina86.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <83d1m0tq25.fsf@gnu.org> References: <83d1m0tq25.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 24071 Cc: Dima Kogan , Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) From: Dima Kogan * test/src/regex-resources/BOOST.tests: * test/src/regex-resources/PCRE.tests: * test/src/regex-resources/PTESTS: * test/src/regex-resources/TESTS: New test data files [mina86@mina86.com: Moved files from test/src/regex/* to test/src/*.] --- test/src/regex-resources/BOOST.tests | 829 ++++++++++++ test/src/regex-resources/PCRE.tests | 2386 ++++++++++++++++++++++++++++++++++ test/src/regex-resources/PTESTS | 341 +++++ test/src/regex-resources/TESTS | 167 +++ 4 files changed, 3723 insertions(+) create mode 100644 test/src/regex-resources/BOOST.tests create mode 100644 test/src/regex-resources/PCRE.tests create mode 100644 test/src/regex-resources/PTESTS create mode 100644 test/src/regex-resources/TESTS diff --git a/test/src/regex-resources/BOOST.tests b/test/src/regex-resources/BOOST.tests new file mode 100644 index 0000000..98fd3b6 --- /dev/null +++ b/test/src/regex-resources/BOOST.tests @@ -0,0 +1,829 @@ +; +; +; this file contains a script of tests to run through regress.exe +; +; comments start with a semicolon and proceed to the end of the line +; +; changes to regular expression compile flags start with a "-" as the first +; non-whitespace character and consist of a list of the printable names +; of the flags, for example "match_default" +; +; Other lines contain a test to perform using the current flag status +; the first token contains the expression to compile, the second the string +; to match it against. If the second string is "!" then the expression should +; not compile, that is the first string is an invalid regular expression. +; This is then followed by a list of integers that specify what should match, +; each pair represents the starting and ending positions of a subexpression +; starting with the zeroth subexpression (the whole match). +; A value of -1 indicates that the subexpression should not take part in the +; match at all, if the first value is -1 then no part of the expression should +; match the string. +; +; Tests taken from BOOST testsuite and adapted to glibc regex. +; +; Boost Software License - Version 1.0 - August 17th, 2003 +; +; Permission is hereby granted, free of charge, to any person or organization +; obtaining a copy of the software and accompanying documentation covered by +; this license (the "Software") to use, reproduce, display, distribute, +; execute, and transmit the Software, and to prepare derivative works of the +; Software, and to permit third-parties to whom the Software is furnished to +; do so, all subject to the following: +; +; The copyright notices in the Software and this entire statement, including +; the above license grant, this restriction and the following disclaimer, +; must be included in all copies of the Software, in whole or in part, and +; all derivative works of the Software, unless such copies or derivative +; works are solely in the form of machine-executable object code generated by +; a source language processor. +; +; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +; IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +; FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT +; SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE +; FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, +; ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER +; DEALINGS IN THE SOFTWARE. +; + +- match_default normal REG_EXTENDED + +; +; try some really simple literals: +a a 0 1 +Z Z 0 1 +Z aaa -1 -1 +Z xxxxZZxxx 4 5 + +; and some simple brackets: +(a) zzzaazz 3 4 3 4 +() zzz 0 0 0 0 +() "" 0 0 0 0 +( ! +) ) 0 1 +(aa ! +aa) baa)b 1 4 +a b -1 -1 +\(\) () 0 2 +\(a\) (a) 0 3 +\() () 0 2 +(\) ! +p(a)rameter ABCparameterXYZ 3 12 4 5 +[pq](a)rameter ABCparameterXYZ 3 12 4 5 + +; now try escaped brackets: +- match_default bk_parens REG_BASIC +\(a\) zzzaazz 3 4 3 4 +\(\) zzz 0 0 0 0 +\(\) "" 0 0 0 0 +\( ! +\) ! +\(aa ! +aa\) ! +() () 0 2 +(a) (a) 0 3 +(\) ! +\() ! + +; now move on to "." wildcards +- match_default normal REG_EXTENDED REG_STARTEND +. a 0 1 +. \n 0 1 +. \r 0 1 +. \0 0 1 + +; +; now move on to the repetion ops, +; starting with operator * +- match_default normal REG_EXTENDED +a* b 0 0 +ab* a 0 1 +ab* ab 0 2 +ab* sssabbbbbbsss 3 10 +ab*c* a 0 1 +ab*c* abbb 0 4 +ab*c* accc 0 4 +ab*c* abbcc 0 5 +*a ! +\<* ! +\>* ! +\n* \n\n 0 2 +\** ** 0 2 +\* * 0 1 + +; now try operator + +ab+ a -1 -1 +ab+ ab 0 2 +ab+ sssabbbbbbsss 3 10 +ab+c+ a -1 -1 +ab+c+ abbb -1 -1 +ab+c+ accc -1 -1 +ab+c+ abbcc 0 5 ++a ! +\<+ ! +\>+ ! +\n+ \n\n 0 2 +\+ + 0 1 +\+ ++ 0 1 +\++ ++ 0 2 + +; now try operator ? +- match_default normal REG_EXTENDED +a? b 0 0 +ab? a 0 1 +ab? ab 0 2 +ab? sssabbbbbbsss 3 5 +ab?c? a 0 1 +ab?c? abbb 0 2 +ab?c? accc 0 2 +ab?c? abcc 0 3 +?a ! +\? ! +\n? \n\n 0 1 +\? ? 0 1 +\? ?? 0 1 +\?? ?? 0 1 + +; now try operator {} +- match_default normal REG_EXTENDED +a{2} a -1 -1 +a{2} aa 0 2 +a{2} aaa 0 2 +a{2,} a -1 -1 +a{2,} aa 0 2 +a{2,} aaaaa 0 5 +a{2,4} a -1 -1 +a{2,4} aa 0 2 +a{2,4} aaa 0 3 +a{2,4} aaaa 0 4 +a{2,4} aaaaa 0 4 +a{} ! +a{2 ! +a} a} 0 2 +\{\} {} 0 2 + +- match_default normal REG_BASIC +a\{2\} a -1 -1 +a\{2\} aa 0 2 +a\{2\} aaa 0 2 +a\{2,\} a -1 -1 +a\{2,\} aa 0 2 +a\{2,\} aaaaa 0 5 +a\{2,4\} a -1 -1 +a\{2,4\} aa 0 2 +a\{2,4\} aaa 0 3 +a\{2,4\} aaaa 0 4 +a\{2,4\} aaaaa 0 4 +{} {} 0 2 + +; now test the alternation operator | +- match_default normal REG_EXTENDED +a|b a 0 1 +a|b b 0 1 +a(b|c) ab 0 2 1 2 +a(b|c) ac 0 2 1 2 +a(b|c) ad -1 -1 -1 -1 +a\| a| 0 2 + +; now test the set operator [] +- match_default normal REG_EXTENDED +; try some literals first +[abc] a 0 1 +[abc] b 0 1 +[abc] c 0 1 +[abc] d -1 -1 +[^bcd] a 0 1 +[^bcd] b -1 -1 +[^bcd] d -1 -1 +[^bcd] e 0 1 +a[b]c abc 0 3 +a[ab]c abc 0 3 +a[^ab]c adc 0 3 +a[]b]c a]c 0 3 +a[[b]c a[c 0 3 +a[-b]c a-c 0 3 +a[^]b]c adc 0 3 +a[^-b]c adc 0 3 +a[b-]c a-c 0 3 +a[b ! +a[] ! + +; then some ranges +[b-e] a -1 -1 +[b-e] b 0 1 +[b-e] e 0 1 +[b-e] f -1 -1 +[^b-e] a 0 1 +[^b-e] b -1 -1 +[^b-e] e -1 -1 +[^b-e] f 0 1 +a[1-3]c a2c 0 3 +a[3-1]c ! +a[1-3-5]c ! +a[1- ! + +; and some classes +a[[:alpha:]]c abc 0 3 +a[[:unknown:]]c ! +a[[: ! +a[[:alpha ! +a[[:alpha:] ! +a[[:alpha,:] ! +a[[:]:]]b ! +a[[:-:]]b ! +a[[:alph:]] ! +a[[:alphabet:]] ! +[[:alnum:]]+ -%@a0X_- 3 6 +[[:alpha:]]+ -%@aX_0- 3 5 +[[:blank:]]+ "a \tb" 1 4 +[[:cntrl:]]+ a\n\tb 1 3 +[[:digit:]]+ a019b 1 4 +[[:graph:]]+ " a%b " 1 4 +[[:lower:]]+ AabC 1 3 +; This test fails with STLPort, disable for now as this is a corner case anyway... +;[[:print:]]+ "\na b\n" 1 4 +[[:punct:]]+ " %-&\t" 1 4 +[[:space:]]+ "a \n\t\rb" 1 5 +[[:upper:]]+ aBCd 1 3 +[[:xdigit:]]+ p0f3Cx 1 5 + +; now test flag settings: +- escape_in_lists REG_NO_POSIX_TEST +[\n] \n 0 1 +- REG_NO_POSIX_TEST + +; line anchors +- match_default normal REG_EXTENDED +^ab ab 0 2 +^ab xxabxx -1 -1 +ab$ ab 0 2 +ab$ abxx -1 -1 +- match_default match_not_bol match_not_eol normal REG_EXTENDED REG_NOTBOL REG_NOTEOL +^ab ab -1 -1 +^ab xxabxx -1 -1 +ab$ ab -1 -1 +ab$ abxx -1 -1 + +; back references +- match_default normal REG_PERL +a(b)\2c ! +a(b\1)c ! +a(b*)c\1d abbcbbd 0 7 1 3 +a(b*)c\1d abbcbd -1 -1 +a(b*)c\1d abbcbbbd -1 -1 +^(.)\1 abc -1 -1 +a([bc])\1d abcdabbd 4 8 5 6 +; strictly speaking this is at best ambiguous, at worst wrong, this is what most +; re implimentations will match though. +a(([bc])\2)*d abbccd 0 6 3 5 3 4 + +a(([bc])\2)*d abbcbd -1 -1 +a((b)*\2)*d abbbd 0 5 1 4 2 3 +; perl only: +(ab*)[ab]*\1 ababaaa 0 7 0 1 +(a)\1bcd aabcd 0 5 0 1 +(a)\1bc*d aabcd 0 5 0 1 +(a)\1bc*d aabd 0 4 0 1 +(a)\1bc*d aabcccd 0 7 0 1 +(a)\1bc*[ce]d aabcccd 0 7 0 1 +^(a)\1b(c)*cd$ aabcccd 0 7 0 1 4 5 + +; posix only: +- match_default extended REG_EXTENDED +(ab*)[ab]*\1 ababaaa 0 7 0 1 + +; +; word operators: +\w a 0 1 +\w z 0 1 +\w A 0 1 +\w Z 0 1 +\w _ 0 1 +\w } -1 -1 +\w ` -1 -1 +\w [ -1 -1 +\w @ -1 -1 +; non-word: +\W a -1 -1 +\W z -1 -1 +\W A -1 -1 +\W Z -1 -1 +\W _ -1 -1 +\W } 0 1 +\W ` 0 1 +\W [ 0 1 +\W @ 0 1 +; word start: +\ abc 0 3 +abc\> abcd -1 -1 +abc\> abc\n 0 3 +abc\> abc:: 0 3 +; word boundary: +\babcd " abcd" 2 6 +\bab cab -1 -1 +\bab "\nab" 1 3 +\btag ::tag 2 5 +abc\b abc 0 3 +abc\b abcd -1 -1 +abc\b abc\n 0 3 +abc\b abc:: 0 3 +; within word: +\B ab 1 1 +a\Bb ab 0 2 +a\B ab 0 1 +a\B a -1 -1 +a\B "a " -1 -1 + +; +; buffer operators: +\`abc abc 0 3 +\`abc \nabc -1 -1 +\`abc " abc" -1 -1 +abc\' abc 0 3 +abc\' abc\n -1 -1 +abc\' "abc " -1 -1 + +; +; now follows various complex expressions designed to try and bust the matcher: +a(((b)))c abc 0 3 1 2 1 2 1 2 +a(b|(c))d abd 0 3 1 2 -1 -1 +a(b|(c))d acd 0 3 1 2 1 2 +a(b*|c)d abbd 0 4 1 3 +; just gotta have one DFA-buster, of course +a[ab]{20} aaaaabaaaabaaaabaaaab 0 21 +; and an inline expansion in case somebody gets tricky +a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab] aaaaabaaaabaaaabaaaab 0 21 +; and in case somebody just slips in an NFA... +a[ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab][ab](wee|week)(knights|night) aaaaabaaaabaaaabaaaabweeknights 0 31 21 24 24 31 +; one really big one +1234567890123456789012345678901234567890123456789012345678901234567890 a1234567890123456789012345678901234567890123456789012345678901234567890b 1 71 +; fish for problems as brackets go past 8 +[ab][cd][ef][gh][ij][kl][mn] xacegikmoq 1 8 +[ab][cd][ef][gh][ij][kl][mn][op] xacegikmoq 1 9 +[ab][cd][ef][gh][ij][kl][mn][op][qr] xacegikmoqy 1 10 +[ab][cd][ef][gh][ij][kl][mn][op][q] xacegikmoqy 1 10 +; and as parenthesis go past 9: +(a)(b)(c)(d)(e)(f)(g)(h) zabcdefghi 1 9 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 +(a)(b)(c)(d)(e)(f)(g)(h)(i) zabcdefghij 1 10 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 +(a)(b)(c)(d)(e)(f)(g)(h)(i)(j) zabcdefghijk 1 11 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 +(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k) zabcdefghijkl 1 12 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 +(a)d|(b)c abc 1 3 -1 -1 1 2 +_+((www)|(ftp)|(mailto)):_* "_wwwnocolon _mailto:" 12 20 13 19 -1 -1 -1 -1 13 19 + +; subtleties of matching +;a(b)?c\1d acd 0 3 -1 -1 +; POSIX is about the following test: +a(b)?c\1d acd -1 -1 -1 -1 +a(b?c)+d accd 0 4 2 3 +(wee|week)(knights|night) weeknights 0 10 0 3 3 10 +.* abc 0 3 +a(b|(c))d abd 0 3 1 2 -1 -1 +a(b|(c))d acd 0 3 1 2 1 2 +a(b*|c|e)d abbd 0 4 1 3 +a(b*|c|e)d acd 0 3 1 2 +a(b*|c|e)d ad 0 2 1 1 +a(b?)c abc 0 3 1 2 +a(b?)c ac 0 2 1 1 +a(b+)c abc 0 3 1 2 +a(b+)c abbbc 0 5 1 4 +a(b*)c ac 0 2 1 1 +(a|ab)(bc([de]+)f|cde) abcdef 0 6 0 1 1 6 3 5 +a([bc]?)c abc 0 3 1 2 +a([bc]?)c ac 0 2 1 1 +a([bc]+)c abc 0 3 1 2 +a([bc]+)c abcc 0 4 1 3 +a([bc]+)bc abcbc 0 5 1 3 +a(bb+|b)b abb 0 3 1 2 +a(bbb+|bb+|b)b abb 0 3 1 2 +a(bbb+|bb+|b)b abbb 0 4 1 3 +a(bbb+|bb+|b)bb abbb 0 4 1 2 +(.*).* abcdef 0 6 0 6 +(a*)* bc 0 0 0 0 +xyx*xz xyxxxxyxxxz 5 11 + +; do we get the right subexpression when it is used more than once? +a(b|c)*d ad 0 2 -1 -1 +a(b|c)*d abcd 0 4 2 3 +a(b|c)+d abd 0 3 1 2 +a(b|c)+d abcd 0 4 2 3 +a(b|c?)+d ad 0 2 1 1 +a(b|c){0,0}d ad 0 2 -1 -1 +a(b|c){0,1}d ad 0 2 -1 -1 +a(b|c){0,1}d abd 0 3 1 2 +a(b|c){0,2}d ad 0 2 -1 -1 +a(b|c){0,2}d abcd 0 4 2 3 +a(b|c){0,}d ad 0 2 -1 -1 +a(b|c){0,}d abcd 0 4 2 3 +a(b|c){1,1}d abd 0 3 1 2 +a(b|c){1,2}d abd 0 3 1 2 +a(b|c){1,2}d abcd 0 4 2 3 +a(b|c){1,}d abd 0 3 1 2 +a(b|c){1,}d abcd 0 4 2 3 +a(b|c){2,2}d acbd 0 4 2 3 +a(b|c){2,2}d abcd 0 4 2 3 +a(b|c){2,4}d abcd 0 4 2 3 +a(b|c){2,4}d abcbd 0 5 3 4 +a(b|c){2,4}d abcbcd 0 6 4 5 +a(b|c){2,}d abcd 0 4 2 3 +a(b|c){2,}d abcbd 0 5 3 4 +; perl only: these conflict with the POSIX test below +;a(b|c?)+d abcd 0 4 3 3 +;a(b+|((c)*))+d abd 0 3 2 2 2 2 -1 -1 +;a(b+|((c)*))+d abcd 0 4 3 3 3 3 2 3 + +; posix only: +- match_default extended REG_EXTENDED REG_STARTEND + +a(b|c?)+d abcd 0 4 2 3 +a(b|((c)*))+d abcd 0 4 2 3 2 3 2 3 +a(b+|((c)*))+d abd 0 3 1 2 -1 -1 -1 -1 +a(b+|((c)*))+d abcd 0 4 2 3 2 3 2 3 +a(b|((c)*))+d ad 0 2 1 1 1 1 -1 -1 +a(b|((c)*))*d abcd 0 4 2 3 2 3 2 3 +a(b+|((c)*))*d abd 0 3 1 2 -1 -1 -1 -1 +a(b+|((c)*))*d abcd 0 4 2 3 2 3 2 3 +a(b|((c)*))*d ad 0 2 1 1 1 1 -1 -1 + +- match_default normal REG_PERL +; try to match C++ syntax elements: +; line comment: +//[^\n]* "++i //here is a line comment\n" 4 28 +; block comment: +/\*([^*]|\*+[^*/])*\*+/ "/* here is a block comment */" 0 29 26 27 +/\*([^*]|\*+[^*/])*\*+/ "/**/" 0 4 -1 -1 +/\*([^*]|\*+[^*/])*\*+/ "/***/" 0 5 -1 -1 +/\*([^*]|\*+[^*/])*\*+/ "/****/" 0 6 -1 -1 +/\*([^*]|\*+[^*/])*\*+/ "/*****/" 0 7 -1 -1 +/\*([^*]|\*+[^*/])*\*+/ "/*****/*/" 0 7 -1 -1 +; preprossor directives: +^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol" 0 19 -1 -1 +^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) #x" 0 25 -1 -1 +; perl only: +^[[:blank:]]*#([^\n]*\\[[:space:]]+)*[^\n]* "#define some_symbol(x) \\ \r\n foo();\\\r\n printf(#x);" 0 53 30 42 +; literals: +((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFF 0 4 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1 +((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 35 0 2 0 2 -1 -1 0 2 -1 -1 -1 -1 -1 -1 +((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFu 0 5 0 4 0 4 -1 -1 -1 -1 -1 -1 -1 -1 +((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFL 0 5 0 4 0 4 -1 -1 4 5 -1 -1 -1 -1 +((0x[[:xdigit:]]+)|([[:digit:]]+))u?((int(8|16|32|64))|L)? 0xFFFFFFFFFFFFFFFFuint64 0 24 0 18 0 18 -1 -1 19 24 19 24 22 24 +; strings: +'([^\\']|\\.)*' '\\x3A' 0 6 4 5 +'([^\\']|\\.)*' '\\'' 0 4 1 3 +'([^\\']|\\.)*' '\\n' 0 4 1 3 + +; finally try some case insensitive matches: +- match_default normal REG_EXTENDED REG_ICASE +; upper and lower have no meaning here so they fail, however these +; may compile with other libraries... +;[[:lower:]] ! +;[[:upper:]] ! +0123456789@abcdefghijklmnopqrstuvwxyz\[\\\]\^_`ABCDEFGHIJKLMNOPQRSTUVWXYZ\{\|\} 0123456789@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]\^_`abcdefghijklmnopqrstuvwxyz\{\|\} 0 72 + +; known and suspected bugs: +- match_default normal REG_EXTENDED +\( ( 0 1 +\) ) 0 1 +\$ $ 0 1 +\^ ^ 0 1 +\. . 0 1 +\* * 0 1 +\+ + 0 1 +\? ? 0 1 +\[ [ 0 1 +\] ] 0 1 +\| | 0 1 +\\ \\ 0 1 +# # 0 1 +\# # 0 1 +a- a- 0 2 +\- - 0 1 +\{ { 0 1 +\} } 0 1 +0 0 0 1 +1 1 0 1 +9 9 0 1 +b b 0 1 +B B 0 1 +< < 0 1 +> > 0 1 +w w 0 1 +W W 0 1 +` ` 0 1 +' ' 0 1 +\n \n 0 1 +, , 0 1 +a a 0 1 +f f 0 1 +n n 0 1 +r r 0 1 +t t 0 1 +v v 0 1 +c c 0 1 +x x 0 1 +: : 0 1 +(\.[[:alnum:]]+){2} "w.a.b " 1 5 3 5 + +- match_default normal REG_EXTENDED REG_ICASE +a A 0 1 +A a 0 1 +[abc]+ abcABC 0 6 +[ABC]+ abcABC 0 6 +[a-z]+ abcABC 0 6 +[A-Z]+ abzANZ 0 6 +[a-Z]+ abzABZ 0 6 +[A-z]+ abzABZ 0 6 +[[:lower:]]+ abyzABYZ 0 8 +[[:upper:]]+ abzABZ 0 6 +[[:alpha:]]+ abyzABYZ 0 8 +[[:alnum:]]+ 09abyzABYZ 0 10 + +; word start: +\ abc 0 3 +abc\> abcd -1 -1 +abc\> abc\n 0 3 +abc\> abc:: 0 3 + +; collating elements and rewritten set code: +- match_default normal REG_EXTENDED REG_STARTEND +;[[.zero.]] 0 0 1 +;[[.one.]] 1 0 1 +;[[.two.]] 2 0 1 +;[[.three.]] 3 0 1 +[[.a.]] baa 1 2 +;[[.right-curly-bracket.]] } 0 1 +;[[.NUL.]] \0 0 1 +[[:<:]z] ! +[a[:>:]] ! +[[=a=]] a 0 1 +;[[=right-curly-bracket=]] } 0 1 +- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE +[[.A.]] A 0 1 +[[.A.]] a 0 1 +[[.A.]-b]+ AaBb 0 4 +[A-[.b.]]+ AaBb 0 4 +[[.a.]-B]+ AaBb 0 4 +[a-[.B.]]+ AaBb 0 4 +- match_default normal REG_EXTENDED REG_STARTEND +[[.a.]-c]+ abcd 0 3 +[a-[.c.]]+ abcd 0 3 +[[:alpha:]-a] ! +[a-[:alpha:]] ! + +; try mutli-character ligatures: +;[[.ae.]] ae 0 2 +;[[.ae.]] aE -1 -1 +;[[.AE.]] AE 0 2 +;[[.Ae.]] Ae 0 2 +;[[.ae.]-b] a -1 -1 +;[[.ae.]-b] b 0 1 +;[[.ae.]-b] ae 0 2 +;[a-[.ae.]] a 0 1 +;[a-[.ae.]] b -1 -1 +;[a-[.ae.]] ae 0 2 +- match_default normal REG_EXTENDED REG_STARTEND REG_ICASE +;[[.ae.]] AE 0 2 +;[[.ae.]] Ae 0 2 +;[[.AE.]] Ae 0 2 +;[[.Ae.]] aE 0 2 +;[[.AE.]-B] a -1 -1 +;[[.Ae.]-b] b 0 1 +;[[.Ae.]-b] B 0 1 +;[[.ae.]-b] AE 0 2 + +- match_default normal REG_EXTENDED REG_STARTEND REG_NO_POSIX_TEST +\s+ "ab ab" 2 5 +\S+ " abc " 2 5 + +- match_default normal REG_EXTENDED REG_STARTEND +\`abc abc 0 3 +\`abc aabc -1 -1 +abc\' abc 0 3 +abc\' abcd -1 -1 +abc\' abc\n\n -1 -1 +abc\' abc 0 3 + +; extended repeat checking to exercise new algorithms: +ab.*xy abxy_ 0 4 +ab.*xy ab_xy_ 0 5 +ab.*xy abxy 0 4 +ab.*xy ab_xy 0 5 +ab.* ab 0 2 +ab.* ab__ 0 4 + +ab.{2,5}xy ab__xy_ 0 6 +ab.{2,5}xy ab____xy_ 0 8 +ab.{2,5}xy ab_____xy_ 0 9 +ab.{2,5}xy ab__xy 0 6 +ab.{2,5}xy ab_____xy 0 9 +ab.{2,5} ab__ 0 4 +ab.{2,5} ab_______ 0 7 +ab.{2,5}xy ab______xy -1 -1 +ab.{2,5}xy ab_xy -1 -1 + +ab.*?xy abxy_ 0 4 +ab.*?xy ab_xy_ 0 5 +ab.*?xy abxy 0 4 +ab.*?xy ab_xy 0 5 +ab.*? ab 0 2 +ab.*? ab__ 0 4 + +ab.{2,5}?xy ab__xy_ 0 6 +ab.{2,5}?xy ab____xy_ 0 8 +ab.{2,5}?xy ab_____xy_ 0 9 +ab.{2,5}?xy ab__xy 0 6 +ab.{2,5}?xy ab_____xy 0 9 +ab.{2,5}? ab__ 0 4 +ab.{2,5}? ab_______ 0 7 +ab.{2,5}?xy ab______xy -1 -1 +ab.{2,5}xy ab_xy -1 -1 + +; again but with slower algorithm variant: +- match_default REG_EXTENDED +; now again for single character repeats: + +ab_*xy abxy_ 0 4 +ab_*xy ab_xy_ 0 5 +ab_*xy abxy 0 4 +ab_*xy ab_xy 0 5 +ab_* ab 0 2 +ab_* ab__ 0 4 + +ab_{2,5}xy ab__xy_ 0 6 +ab_{2,5}xy ab____xy_ 0 8 +ab_{2,5}xy ab_____xy_ 0 9 +ab_{2,5}xy ab__xy 0 6 +ab_{2,5}xy ab_____xy 0 9 +ab_{2,5} ab__ 0 4 +ab_{2,5} ab_______ 0 7 +ab_{2,5}xy ab______xy -1 -1 +ab_{2,5}xy ab_xy -1 -1 + +ab_*?xy abxy_ 0 4 +ab_*?xy ab_xy_ 0 5 +ab_*?xy abxy 0 4 +ab_*?xy ab_xy 0 5 +ab_*? ab 0 2 +ab_*? ab__ 0 4 + +ab_{2,5}?xy ab__xy_ 0 6 +ab_{2,5}?xy ab____xy_ 0 8 +ab_{2,5}?xy ab_____xy_ 0 9 +ab_{2,5}?xy ab__xy 0 6 +ab_{2,5}?xy ab_____xy 0 9 +ab_{2,5}? ab__ 0 4 +ab_{2,5}? ab_______ 0 7 +ab_{2,5}?xy ab______xy -1 -1 +ab_{2,5}xy ab_xy -1 -1 + +; and again for sets: +ab[_,;]*xy abxy_ 0 4 +ab[_,;]*xy ab_xy_ 0 5 +ab[_,;]*xy abxy 0 4 +ab[_,;]*xy ab_xy 0 5 +ab[_,;]* ab 0 2 +ab[_,;]* ab__ 0 4 + +ab[_,;]{2,5}xy ab__xy_ 0 6 +ab[_,;]{2,5}xy ab____xy_ 0 8 +ab[_,;]{2,5}xy ab_____xy_ 0 9 +ab[_,;]{2,5}xy ab__xy 0 6 +ab[_,;]{2,5}xy ab_____xy 0 9 +ab[_,;]{2,5} ab__ 0 4 +ab[_,;]{2,5} ab_______ 0 7 +ab[_,;]{2,5}xy ab______xy -1 -1 +ab[_,;]{2,5}xy ab_xy -1 -1 + +ab[_,;]*?xy abxy_ 0 4 +ab[_,;]*?xy ab_xy_ 0 5 +ab[_,;]*?xy abxy 0 4 +ab[_,;]*?xy ab_xy 0 5 +ab[_,;]*? ab 0 2 +ab[_,;]*? ab__ 0 4 + +ab[_,;]{2,5}?xy ab__xy_ 0 6 +ab[_,;]{2,5}?xy ab____xy_ 0 8 +ab[_,;]{2,5}?xy ab_____xy_ 0 9 +ab[_,;]{2,5}?xy ab__xy 0 6 +ab[_,;]{2,5}?xy ab_____xy 0 9 +ab[_,;]{2,5}? ab__ 0 4 +ab[_,;]{2,5}? ab_______ 0 7 +ab[_,;]{2,5}?xy ab______xy -1 -1 +ab[_,;]{2,5}xy ab_xy -1 -1 + +; and again for tricky sets with digraphs: +;ab[_[.ae.]]*xy abxy_ 0 4 +;ab[_[.ae.]]*xy ab_xy_ 0 5 +;ab[_[.ae.]]*xy abxy 0 4 +;ab[_[.ae.]]*xy ab_xy 0 5 +;ab[_[.ae.]]* ab 0 2 +;ab[_[.ae.]]* ab__ 0 4 + +;ab[_[.ae.]]{2,5}xy ab__xy_ 0 6 +;ab[_[.ae.]]{2,5}xy ab____xy_ 0 8 +;ab[_[.ae.]]{2,5}xy ab_____xy_ 0 9 +;ab[_[.ae.]]{2,5}xy ab__xy 0 6 +;ab[_[.ae.]]{2,5}xy ab_____xy 0 9 +;ab[_[.ae.]]{2,5} ab__ 0 4 +;ab[_[.ae.]]{2,5} ab_______ 0 7 +;ab[_[.ae.]]{2,5}xy ab______xy -1 -1 +;ab[_[.ae.]]{2,5}xy ab_xy -1 -1 + +;ab[_[.ae.]]*?xy abxy_ 0 4 +;ab[_[.ae.]]*?xy ab_xy_ 0 5 +;ab[_[.ae.]]*?xy abxy 0 4 +;ab[_[.ae.]]*?xy ab_xy 0 5 +;ab[_[.ae.]]*? ab 0 2 +;ab[_[.ae.]]*? ab__ 0 2 + +;ab[_[.ae.]]{2,5}?xy ab__xy_ 0 6 +;ab[_[.ae.]]{2,5}?xy ab____xy_ 0 8 +;ab[_[.ae.]]{2,5}?xy ab_____xy_ 0 9 +;ab[_[.ae.]]{2,5}?xy ab__xy 0 6 +;ab[_[.ae.]]{2,5}?xy ab_____xy 0 9 +;ab[_[.ae.]]{2,5}? ab__ 0 4 +;ab[_[.ae.]]{2,5}? ab_______ 0 4 +;ab[_[.ae.]]{2,5}?xy ab______xy -1 -1 +;ab[_[.ae.]]{2,5}xy ab_xy -1 -1 + +; new bugs detected in spring 2003: +- normal match_continuous REG_NO_POSIX_TEST +b abc 1 2 + +() abc 0 0 0 0 +^() abc 0 0 0 0 +^()+ abc 0 0 0 0 +^(){1} abc 0 0 0 0 +^(){2} abc 0 0 0 0 +^((){2}) abc 0 0 0 0 0 0 +() "" 0 0 0 0 +()\1 "" 0 0 0 0 +()\1 a 0 0 0 0 +a()\1b ab 0 2 1 1 +a()b\1 ab 0 2 1 1 + +; subtleties of matching with no sub-expressions marked +- normal match_nosubs REG_NO_POSIX_TEST +a(b?c)+d accd 0 4 +(wee|week)(knights|night) weeknights 0 10 +.* abc 0 3 +a(b|(c))d abd 0 3 +a(b|(c))d acd 0 3 +a(b*|c|e)d abbd 0 4 +a(b*|c|e)d acd 0 3 +a(b*|c|e)d ad 0 2 +a(b?)c abc 0 3 +a(b?)c ac 0 2 +a(b+)c abc 0 3 +a(b+)c abbbc 0 5 +a(b*)c ac 0 2 +(a|ab)(bc([de]+)f|cde) abcdef 0 6 +a([bc]?)c abc 0 3 +a([bc]?)c ac 0 2 +a([bc]+)c abc 0 3 +a([bc]+)c abcc 0 4 +a([bc]+)bc abcbc 0 5 +a(bb+|b)b abb 0 3 +a(bbb+|bb+|b)b abb 0 3 +a(bbb+|bb+|b)b abbb 0 4 +a(bbb+|bb+|b)bb abbb 0 4 +(.*).* abcdef 0 6 +(a*)* bc 0 0 + +- normal nosubs REG_NO_POSIX_TEST +a(b?c)+d accd 0 4 +(wee|week)(knights|night) weeknights 0 10 +.* abc 0 3 +a(b|(c))d abd 0 3 +a(b|(c))d acd 0 3 +a(b*|c|e)d abbd 0 4 +a(b*|c|e)d acd 0 3 +a(b*|c|e)d ad 0 2 +a(b?)c abc 0 3 +a(b?)c ac 0 2 +a(b+)c abc 0 3 +a(b+)c abbbc 0 5 +a(b*)c ac 0 2 +(a|ab)(bc([de]+)f|cde) abcdef 0 6 +a([bc]?)c abc 0 3 +a([bc]?)c ac 0 2 +a([bc]+)c abc 0 3 +a([bc]+)c abcc 0 4 +a([bc]+)bc abcbc 0 5 +a(bb+|b)b abb 0 3 +a(bbb+|bb+|b)b abb 0 3 +a(bbb+|bb+|b)b abbb 0 4 +a(bbb+|bb+|b)bb abbb 0 4 +(.*).* abcdef 0 6 +(a*)* bc 0 0 + diff --git a/test/src/regex-resources/PCRE.tests b/test/src/regex-resources/PCRE.tests new file mode 100644 index 0000000..0fb9cad --- /dev/null +++ b/test/src/regex-resources/PCRE.tests @@ -0,0 +1,2386 @@ +# PCRE version 4.4 21-August-2003 + +# Tests taken from PCRE and modified to suit glibc regex. +# +# PCRE LICENCE +# ------------ +# +# PCRE is a library of functions to support regular expressions whose syntax +# and semantics are as close as possible to those of the Perl 5 language. +# +# Written by: Philip Hazel +# +# University of Cambridge Computing Service, +# Cambridge, England. Phone: +44 1223 334714. +# +# Copyright (c) 1997-2003 University of Cambridge +# +# Permission is granted to anyone to use this software for any purpose on any +# computer system, and to redistribute it freely, subject to the following +# restrictions: +# +# 1. This software is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. +# +# 2. The origin of this software must not be misrepresented, either by +# explicit claim or by omission. In practice, this means that if you use +# PCRE in software that you distribute to others, commercially or +# otherwise, you must put a sentence like this +# +# Regular expression support is provided by the PCRE library package, +# which is open source software, written by Philip Hazel, and copyright +# by the University of Cambridge, England. +# +# somewhere reasonably visible in your documentation and in any relevant +# files or online help data or similar. A reference to the ftp site for +# the source, that is, to +# +# ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/ +# +# should also be given in the documentation. However, this condition is not +# intended to apply to whole chains of software. If package A includes PCRE, +# it must acknowledge it, but if package B is software that includes package +# A, the condition is not imposed on package B (unless it uses PCRE +# independently). +# +# 3. Altered versions must be plainly marked as such, and must not be +# misrepresented as being the original software. +# +# 4. If PCRE is embedded in any software that is released under the GNU +# General Purpose Licence (GPL), or Lesser General Purpose Licence (LGPL), +# then the terms of that licence shall supersede any condition above with +# which it is incompatible. +# +# The documentation for PCRE, supplied in the "doc" directory, is distributed +# under the same terms as the software itself. +# +# End +# + +/the quick brown fox/ + the quick brown fox + 0: the quick brown fox + The quick brown FOX +No match + What do you know about the quick brown fox? + 0: the quick brown fox + What do you know about THE QUICK BROWN FOX? +No match + +/The quick brown fox/i + the quick brown fox + 0: the quick brown fox + The quick brown FOX + 0: The quick brown FOX + What do you know about the quick brown fox? + 0: the quick brown fox + What do you know about THE QUICK BROWN FOX? + 0: THE QUICK BROWN FOX + +/a*abc?xyz+pqr{3}ab{2,}xy{4,5}pq{0,6}AB{0,}zz/ + abxyzpqrrrabbxyyyypqAzz + 0: abxyzpqrrrabbxyyyypqAzz + abxyzpqrrrabbxyyyypqAzz + 0: abxyzpqrrrabbxyyyypqAzz + aabxyzpqrrrabbxyyyypqAzz + 0: aabxyzpqrrrabbxyyyypqAzz + aaabxyzpqrrrabbxyyyypqAzz + 0: aaabxyzpqrrrabbxyyyypqAzz + aaaabxyzpqrrrabbxyyyypqAzz + 0: aaaabxyzpqrrrabbxyyyypqAzz + abcxyzpqrrrabbxyyyypqAzz + 0: abcxyzpqrrrabbxyyyypqAzz + aabcxyzpqrrrabbxyyyypqAzz + 0: aabcxyzpqrrrabbxyyyypqAzz + aaabcxyzpqrrrabbxyyyypAzz + 0: aaabcxyzpqrrrabbxyyyypAzz + aaabcxyzpqrrrabbxyyyypqAzz + 0: aaabcxyzpqrrrabbxyyyypqAzz + aaabcxyzpqrrrabbxyyyypqqAzz + 0: aaabcxyzpqrrrabbxyyyypqqAzz + aaabcxyzpqrrrabbxyyyypqqqAzz + 0: aaabcxyzpqrrrabbxyyyypqqqAzz + aaabcxyzpqrrrabbxyyyypqqqqAzz + 0: aaabcxyzpqrrrabbxyyyypqqqqAzz + aaabcxyzpqrrrabbxyyyypqqqqqAzz + 0: aaabcxyzpqrrrabbxyyyypqqqqqAzz + aaabcxyzpqrrrabbxyyyypqqqqqqAzz + 0: aaabcxyzpqrrrabbxyyyypqqqqqqAzz + aaaabcxyzpqrrrabbxyyyypqAzz + 0: aaaabcxyzpqrrrabbxyyyypqAzz + abxyzzpqrrrabbxyyyypqAzz + 0: abxyzzpqrrrabbxyyyypqAzz + aabxyzzzpqrrrabbxyyyypqAzz + 0: aabxyzzzpqrrrabbxyyyypqAzz + aaabxyzzzzpqrrrabbxyyyypqAzz + 0: aaabxyzzzzpqrrrabbxyyyypqAzz + aaaabxyzzzzpqrrrabbxyyyypqAzz + 0: aaaabxyzzzzpqrrrabbxyyyypqAzz + abcxyzzpqrrrabbxyyyypqAzz + 0: abcxyzzpqrrrabbxyyyypqAzz + aabcxyzzzpqrrrabbxyyyypqAzz + 0: aabcxyzzzpqrrrabbxyyyypqAzz + aaabcxyzzzzpqrrrabbxyyyypqAzz + 0: aaabcxyzzzzpqrrrabbxyyyypqAzz + aaaabcxyzzzzpqrrrabbxyyyypqAzz + 0: aaaabcxyzzzzpqrrrabbxyyyypqAzz + aaaabcxyzzzzpqrrrabbbxyyyypqAzz + 0: aaaabcxyzzzzpqrrrabbbxyyyypqAzz + aaaabcxyzzzzpqrrrabbbxyyyyypqAzz + 0: aaaabcxyzzzzpqrrrabbbxyyyyypqAzz + aaabcxyzpqrrrabbxyyyypABzz + 0: aaabcxyzpqrrrabbxyyyypABzz + aaabcxyzpqrrrabbxyyyypABBzz + 0: aaabcxyzpqrrrabbxyyyypABBzz + >>>aaabxyzpqrrrabbxyyyypqAzz + 0: aaabxyzpqrrrabbxyyyypqAzz + >aaaabxyzpqrrrabbxyyyypqAzz + 0: aaaabxyzpqrrrabbxyyyypqAzz + >>>>abcxyzpqrrrabbxyyyypqAzz + 0: abcxyzpqrrrabbxyyyypqAzz + *** Failers +No match + abxyzpqrrabbxyyyypqAzz +No match + abxyzpqrrrrabbxyyyypqAzz +No match + abxyzpqrrrabxyyyypqAzz +No match + aaaabcxyzzzzpqrrrabbbxyyyyyypqAzz +No match + aaaabcxyzzzzpqrrrabbbxyyypqAzz +No match + aaabcxyzpqrrrabbxyyyypqqqqqqqAzz +No match + +/^(abc){1,2}zz/ + abczz + 0: abczz + 1: abc + abcabczz + 0: abcabczz + 1: abc + *** Failers +No match + zz +No match + abcabcabczz +No match + >>abczz +No match + +/^(b+|a){1,2}c/ + bc + 0: bc + 1: b + bbc + 0: bbc + 1: bb + bbbc + 0: bbbc + 1: bbb + bac + 0: bac + 1: a + bbac + 0: bbac + 1: a + aac + 0: aac + 1: a + abbbbbbbbbbbc + 0: abbbbbbbbbbbc + 1: bbbbbbbbbbb + bbbbbbbbbbbac + 0: bbbbbbbbbbbac + 1: a + *** Failers +No match + aaac +No match + abbbbbbbbbbbac +No match + +/^[]cde]/ + ]thing + 0: ] + cthing + 0: c + dthing + 0: d + ething + 0: e + *** Failers +No match + athing +No match + fthing +No match + +/^[^]cde]/ + athing + 0: a + fthing + 0: f + *** Failers + 0: * + ]thing +No match + cthing +No match + dthing +No match + ething +No match + +/^[0-9]+$/ + 0 + 0: 0 + 1 + 0: 1 + 2 + 0: 2 + 3 + 0: 3 + 4 + 0: 4 + 5 + 0: 5 + 6 + 0: 6 + 7 + 0: 7 + 8 + 0: 8 + 9 + 0: 9 + 10 + 0: 10 + 100 + 0: 100 + *** Failers +No match + abc +No match + +/^.*nter/ + enter + 0: enter + inter + 0: inter + uponter + 0: uponter + +/^xxx[0-9]+$/ + xxx0 + 0: xxx0 + xxx1234 + 0: xxx1234 + *** Failers +No match + xxx +No match + +/^.+[0-9][0-9][0-9]$/ + x123 + 0: x123 + xx123 + 0: xx123 + 123456 + 0: 123456 + *** Failers +No match + 123 +No match + x1234 + 0: x1234 + +/^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/ + abc!pqr=apquxz.ixr.zzz.ac.uk + 0: abc!pqr=apquxz.ixr.zzz.ac.uk + 1: abc + 2: pqr + *** Failers +No match + !pqr=apquxz.ixr.zzz.ac.uk +No match + abc!=apquxz.ixr.zzz.ac.uk +No match + abc!pqr=apquxz:ixr.zzz.ac.uk +No match + abc!pqr=apquxz.ixr.zzz.ac.ukk +No match + +/:/ + Well, we need a colon: somewhere + 0: : + *** Fail if we don't +No match + +/([0-9a-f:]+)$/i + 0abc + 0: 0abc + 1: 0abc + abc + 0: abc + 1: abc + fed + 0: fed + 1: fed + E + 0: E + 1: E + :: + 0: :: + 1: :: + 5f03:12C0::932e + 0: 5f03:12C0::932e + 1: 5f03:12C0::932e + fed def + 0: def + 1: def + Any old stuff + 0: ff + 1: ff + *** Failers +No match + 0zzz +No match + gzzz +No match + Any old rubbish +No match + +/^.*\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$/ + .1.2.3 + 0: .1.2.3 + 1: 1 + 2: 2 + 3: 3 + A.12.123.0 + 0: A.12.123.0 + 1: 12 + 2: 123 + 3: 0 + *** Failers +No match + .1.2.3333 +No match + 1.2.3 +No match + 1234.2.3 +No match + +/^([0-9]+)\s+IN\s+SOA\s+(\S+)\s+(\S+)\s*\(\s*$/ + 1 IN SOA non-sp1 non-sp2( + 0: 1 IN SOA non-sp1 non-sp2( + 1: 1 + 2: non-sp1 + 3: non-sp2 + 1 IN SOA non-sp1 non-sp2 ( + 0: 1 IN SOA non-sp1 non-sp2 ( + 1: 1 + 2: non-sp1 + 3: non-sp2 + *** Failers +No match + 1IN SOA non-sp1 non-sp2( +No match + +/^[a-zA-Z0-9][a-zA-Z0-9-]*(\.[a-zA-Z0-9][a-zA-z0-9-]*)*\.$/ + a. + 0: a. + Z. + 0: Z. + 2. + 0: 2. + ab-c.pq-r. + 0: ab-c.pq-r. + 1: .pq-r + sxk.zzz.ac.uk. + 0: sxk.zzz.ac.uk. + 1: .uk + x-.y-. + 0: x-.y-. + 1: .y- + *** Failers +No match + -abc.peq. +No match + +/^\*\.[a-z]([a-z0-9-]*[a-z0-9]+)?(\.[a-z]([a-z0-9-]*[a-z0-9]+)?)*$/ + *.a + 0: *.a + *.b0-a + 0: *.b0-a + 1: 0-a + *.c3-b.c + 0: *.c3-b.c + 1: 3-b + 2: .c + *.c-a.b-c + 0: *.c-a.b-c + 1: -a + 2: .b-c + 3: -c + *** Failers +No match + *.0 +No match + *.a- +No match + *.a-b.c- +No match + *.c-a.0-c +No match + +/^[0-9a-f](\.[0-9a-f])*$/i + a.b.c.d + 0: a.b.c.d + 1: .d + A.B.C.D + 0: A.B.C.D + 1: .D + a.b.c.1.2.3.C + 0: a.b.c.1.2.3.C + 1: .C + +/^".*"\s*(;.*)?$/ + "1234" + 0: "1234" + "abcd" ; + 0: "abcd" ; + 1: ; + "" ; rhubarb + 0: "" ; rhubarb + 1: ; rhubarb + *** Failers +No match + "1234" : things +No match + +/^(a(b(c)))(d(e(f)))(h(i(j)))(k(l(m)))$/ + abcdefhijklm + 0: abcdefhijklm + 1: abc + 2: bc + 3: c + 4: def + 5: ef + 6: f + 7: hij + 8: ij + 9: j +10: klm +11: lm +12: m + +/^a*\w/ + z + 0: z + az + 0: az + aaaz + 0: aaaz + a + 0: a + aa + 0: aa + aaaa + 0: aaaa + a+ + 0: a + aa+ + 0: aa + +/^a+\w/ + az + 0: az + aaaz + 0: aaaz + aa + 0: aa + aaaa + 0: aaaa + aa+ + 0: aa + +/^[0-9]{8}\w{2,}/ + 1234567890 + 0: 1234567890 + 12345678ab + 0: 12345678ab + 12345678__ + 0: 12345678__ + *** Failers +No match + 1234567 +No match + +/^[aeiou0-9]{4,5}$/ + uoie + 0: uoie + 1234 + 0: 1234 + 12345 + 0: 12345 + aaaaa + 0: aaaaa + *** Failers +No match + 123456 +No match + +/\`(abc|def)=(\1){2,3}\'/ + abc=abcabc + 0: abc=abcabc + 1: abc + 2: abc + def=defdefdef + 0: def=defdefdef + 1: def + 2: def + *** Failers +No match + abc=defdef +No match + +/(cat(a(ract|tonic)|erpillar)) \1()2(3)/ + cataract cataract23 + 0: cataract cataract23 + 1: cataract + 2: aract + 3: ract + 4: + 5: 3 + catatonic catatonic23 + 0: catatonic catatonic23 + 1: catatonic + 2: atonic + 3: tonic + 4: + 5: 3 + caterpillar caterpillar23 + 0: caterpillar caterpillar23 + 1: caterpillar + 2: erpillar + 3: + 4: + 5: 3 + + +/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/ + From abcd Mon Sep 01 12:33:02 1997 + 0: From abcd Mon Sep 01 12:33 + 1: abcd + +/^From\s+\S+\s+([a-zA-Z]{3}\s+){2}[0-9]{1,2}\s+[0-9][0-9]:[0-9][0-9]/ + From abcd Mon Sep 01 12:33:02 1997 + 0: From abcd Mon Sep 01 12:33 + 1: Sep + From abcd Mon Sep 1 12:33:02 1997 + 0: From abcd Mon Sep 1 12:33 + 1: Sep + *** Failers +No match + From abcd Sep 01 12:33:02 1997 +No match + +/^(a)\1{2,3}(.)/ + aaab + 0: aaab + 1: a + 2: b + aaaab + 0: aaaab + 1: a + 2: b + aaaaab + 0: aaaaa + 1: a + 2: a + aaaaaab + 0: aaaaa + 1: a + 2: a + +/^[ab]{1,3}(ab*|b)/ + aabbbbb + 0: aabbbbb + 1: abbbbb + +/^(cow|)\1(bell)/ + cowcowbell + 0: cowcowbell + 1: cow + 2: bell + bell + 0: bell + 1: + 2: bell + *** Failers +No match + cowbell +No match + +/^(a|)\1+b/ + aab + 0: aab + 1: a + aaaab + 0: aaaab + 1: a + b + 0: b + 1: + *** Failers +No match + ab +No match + +/^(a|)\1{2}b/ + aaab + 0: aaab + 1: a + b + 0: b + 1: + *** Failers +No match + ab +No match + aab +No match + aaaab +No match + +/^(a|)\1{2,3}b/ + aaab + 0: aaab + 1: a + aaaab + 0: aaaab + 1: a + b + 0: b + 1: + *** Failers +No match + ab +No match + aab +No match + aaaaab +No match + +/ab{1,3}bc/ + abbbbc + 0: abbbbc + abbbc + 0: abbbc + abbc + 0: abbc + *** Failers +No match + abc +No match + abbbbbc +No match + +/([^.]*)\.([^:]*):[T ]+(.*)/ + track1.title:TBlah blah blah + 0: track1.title:TBlah blah blah + 1: track1 + 2: title + 3: Blah blah blah + +/([^.]*)\.([^:]*):[T ]+(.*)/i + track1.title:TBlah blah blah + 0: track1.title:TBlah blah blah + 1: track1 + 2: title + 3: Blah blah blah + +/([^.]*)\.([^:]*):[t ]+(.*)/i + track1.title:TBlah blah blah + 0: track1.title:TBlah blah blah + 1: track1 + 2: title + 3: Blah blah blah + +/^abc$/ + abc + 0: abc + *** Failers +No match + +/[-az]+/ + az- + 0: az- + *** Failers + 0: a + b +No match + +/[az-]+/ + za- + 0: za- + *** Failers + 0: a + b +No match + +/[a-z]+/ + abcdxyz + 0: abcdxyz + +/[0-9-]+/ + 12-34 + 0: 12-34 + *** Failers +No match + aaa +No match + +/(abc)\1/i + abcabc + 0: abcabc + 1: abc + ABCabc + 0: ABCabc + 1: ABC + abcABC + 0: abcABC + 1: abc + +/a{0}bc/ + bc + 0: bc + +/^([^a])([^b])([^c]*)([^d]{3,4})/ + baNOTccccd + 0: baNOTcccc + 1: b + 2: a + 3: NOT + 4: cccc + baNOTcccd + 0: baNOTccc + 1: b + 2: a + 3: NOT + 4: ccc + baNOTccd + 0: baNOTcc + 1: b + 2: a + 3: NO + 4: Tcc + bacccd + 0: baccc + 1: b + 2: a + 3: + 4: ccc + *** Failers + 0: *** Failers + 1: * + 2: * + 3: * Fail + 4: ers + anything +No match + baccd +No match + +/[^a]/ + Abc + 0: A + +/[^a]/i + Abc + 0: b + +/[^a]+/ + AAAaAbc + 0: AAA + +/[^a]+/i + AAAaAbc + 0: bc + +/[^k]$/ + abc + 0: c + *** Failers + 0: s + abk +No match + +/[^k]{2,3}$/ + abc + 0: abc + kbc + 0: bc + kabc + 0: abc + *** Failers + 0: ers + abk +No match + akb +No match + akk +No match + +/^[0-9]{8,}@.+[^k]$/ + 12345678@a.b.c.d + 0: 12345678@a.b.c.d + 123456789@x.y.z + 0: 123456789@x.y.z + *** Failers +No match + 12345678@x.y.uk +No match + 1234567@a.b.c.d +No match + +/(a)\1{8,}/ + aaaaaaaaa + 0: aaaaaaaaa + 1: a + aaaaaaaaaa + 0: aaaaaaaaaa + 1: a + *** Failers +No match + aaaaaaa +No match + +/[^a]/ + aaaabcd + 0: b + aaAabcd + 0: A + +/[^a]/i + aaaabcd + 0: b + aaAabcd + 0: b + +/[^az]/ + aaaabcd + 0: b + aaAabcd + 0: A + +/[^az]/i + aaaabcd + 0: b + aaAabcd + 0: b + +/P[^*]TAIRE[^*]{1,6}LL/ + xxxxxxxxxxxPSTAIREISLLxxxxxxxxx + 0: PSTAIREISLL + +/P[^*]TAIRE[^*]{1,}LL/ + xxxxxxxxxxxPSTAIREISLLxxxxxxxxx + 0: PSTAIREISLL + +/(\.[0-9][0-9][1-9]?)[0-9]+/ + 1.230003938 + 0: .230003938 + 1: .23 + 1.875000282 + 0: .875000282 + 1: .875 + 1.235 + 0: .235 + 1: .23 + +/\b(foo)\s+(\w+)/i + Food is on the foo table + 0: foo table + 1: foo + 2: table + +/foo(.*)bar/ + The food is under the bar in the barn. + 0: food is under the bar in the bar + 1: d is under the bar in the + +/(.*)([0-9]*)/ + I have 2 numbers: 53147 + 0: I have 2 numbers: 53147 + 1: I have 2 numbers: 53147 + 2: + +/(.*)([0-9]+)/ + I have 2 numbers: 53147 + 0: I have 2 numbers: 53147 + 1: I have 2 numbers: 5314 + 2: 7 + +/(.*)([0-9]+)$/ + I have 2 numbers: 53147 + 0: I have 2 numbers: 53147 + 1: I have 2 numbers: 5314 + 2: 7 + +/(.*)\b([0-9]+)$/ + I have 2 numbers: 53147 + 0: I have 2 numbers: 53147 + 1: I have 2 numbers: + 2: 53147 + +/(.*[^0-9])([0-9]+)$/ + I have 2 numbers: 53147 + 0: I have 2 numbers: 53147 + 1: I have 2 numbers: + 2: 53147 + +/[[:digit:]][[:digit:]]\/[[:digit:]][[:digit:]]\/[[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ + 01/01/2000 + 0: 01/01/2000 + +/^(a){0,0}/ + bcd + 0: + abc + 0: + aab + 0: + +/^(a){0,1}/ + bcd + 0: + abc + 0: a + 1: a + aab + 0: a + 1: a + +/^(a){0,2}/ + bcd + 0: + abc + 0: a + 1: a + aab + 0: aa + 1: a + +/^(a){0,3}/ + bcd + 0: + abc + 0: a + 1: a + aab + 0: aa + 1: a + aaa + 0: aaa + 1: a + +/^(a){0,}/ + bcd + 0: + abc + 0: a + 1: a + aab + 0: aa + 1: a + aaa + 0: aaa + 1: a + aaaaaaaa + 0: aaaaaaaa + 1: a + +/^(a){1,1}/ + bcd +No match + abc + 0: a + 1: a + aab + 0: a + 1: a + +/^(a){1,2}/ + bcd +No match + abc + 0: a + 1: a + aab + 0: aa + 1: a + +/^(a){1,3}/ + bcd +No match + abc + 0: a + 1: a + aab + 0: aa + 1: a + aaa + 0: aaa + 1: a + +/^(a){1,}/ + bcd +No match + abc + 0: a + 1: a + aab + 0: aa + 1: a + aaa + 0: aaa + 1: a + aaaaaaaa + 0: aaaaaaaa + 1: a + +/^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/ + 123456654321 + 0: 123456654321 + +/^[[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ + 123456654321 + 0: 123456654321 + +/^[abc]{12}/ + abcabcabcabc + 0: abcabcabcabc + +/^[a-c]{12}/ + abcabcabcabc + 0: abcabcabcabc + +/^(a|b|c){12}/ + abcabcabcabc + 0: abcabcabcabc + 1: c + +/^[abcdefghijklmnopqrstuvwxy0123456789]/ + n + 0: n + *** Failers +No match + z +No match + +/abcde{0,0}/ + abcd + 0: abcd + *** Failers +No match + abce +No match + +/ab[cd]{0,0}e/ + abe + 0: abe + *** Failers +No match + abcde +No match + +/ab(c){0,0}d/ + abd + 0: abd + *** Failers +No match + abcd +No match + +/a(b*)/ + a + 0: a + 1: + ab + 0: ab + 1: b + abbbb + 0: abbbb + 1: bbbb + *** Failers + 0: a + 1: + bbbbb +No match + +/ab[0-9]{0}e/ + abe + 0: abe + *** Failers +No match + ab1e +No match + +/(A|B)*CD/ + CD + 0: CD + +/(AB)*\1/ + ABABAB + 0: ABABAB + 1: AB + +/([0-9]+)(\w)/ + 12345a + 0: 12345a + 1: 12345 + 2: a + 12345+ + 0: 12345 + 1: 1234 + 2: 5 + +/(abc|)+/ + abc + 0: abc + 1: abc + abcabc + 0: abcabc + 1: abc + abcabcabc + 0: abcabcabc + 1: abc + xyz + 0: + 1: + +/([a]*)*/ + a + 0: a + 1: a + aaaaa + 0: aaaaa + 1: aaaaa + +/([ab]*)*/ + a + 0: a + 1: a + b + 0: b + 1: b + ababab + 0: ababab + 1: ababab + aaaabcde + 0: aaaab + 1: aaaab + bbbb + 0: bbbb + 1: bbbb + +/([^a]*)*/ + b + 0: b + 1: b + bbbb + 0: bbbb + 1: bbbb + aaa + 0: + +/([^ab]*)*/ + cccc + 0: cccc + 1: cccc + abab + 0: + +/abc/ + abc + 0: abc + xabcy + 0: abc + ababc + 0: abc + *** Failers +No match + xbc +No match + axc +No match + abx +No match + +/ab*c/ + abc + 0: abc + +/ab*bc/ + abc + 0: abc + abbc + 0: abbc + abbbbc + 0: abbbbc + +/.{1}/ + abbbbc + 0: a + +/.{3,4}/ + abbbbc + 0: abbb + +/ab{0,}bc/ + abbbbc + 0: abbbbc + +/ab+bc/ + abbc + 0: abbc + *** Failers +No match + abc +No match + abq +No match + +/ab+bc/ + abbbbc + 0: abbbbc + +/ab{1,}bc/ + abbbbc + 0: abbbbc + +/ab{1,3}bc/ + abbbbc + 0: abbbbc + +/ab{3,4}bc/ + abbbbc + 0: abbbbc + +/ab{4,5}bc/ + *** Failers +No match + abq +No match + abbbbc +No match + +/ab?bc/ + abbc + 0: abbc + abc + 0: abc + +/ab{0,1}bc/ + abc + 0: abc + +/ab?c/ + abc + 0: abc + +/ab{0,1}c/ + abc + 0: abc + +/^abc$/ + abc + 0: abc + *** Failers +No match + abbbbc +No match + abcc +No match + +/^abc/ + abcc + 0: abc + +/abc$/ + aabc + 0: abc + *** Failers +No match + aabc + 0: abc + aabcd +No match + +/^/ + abc + 0: + +/$/ + abc + 0: + +/a.c/ + abc + 0: abc + axc + 0: axc + +/a.*c/ + axyzc + 0: axyzc + +/a[bc]d/ + abd + 0: abd + *** Failers +No match + axyzd +No match + abc +No match + +/a[b-d]e/ + ace + 0: ace + +/a[b-d]/ + aac + 0: ac + +/a[-b]/ + a- + 0: a- + +/a[b-]/ + a- + 0: a- + +/a[]]b/ + a]b + 0: a]b + +/a[^bc]d/ + aed + 0: aed + *** Failers +No match + abd +No match + abd +No match + +/a[^-b]c/ + adc + 0: adc + +/a[^]b]c/ + adc + 0: adc + *** Failers +No match + a-c + 0: a-c + a]c +No match + +/\ba\b/ + a- + 0: a + -a + 0: a + -a- + 0: a + +/\by\b/ + *** Failers +No match + xy +No match + yz +No match + xyz +No match + +/\Ba\B/ + *** Failers + 0: a + a- +No match + -a +No match + -a- +No match + +/\By\b/ + xy + 0: y + +/\by\B/ + yz + 0: y + +/\By\B/ + xyz + 0: y + +/\w/ + a + 0: a + +/\W/ + - + 0: - + *** Failers + 0: * + - + 0: - + a +No match + +/a\sb/ + a b + 0: a b + +/a\Sb/ + a-b + 0: a-b + *** Failers +No match + a-b + 0: a-b + a b +No match + +/[0-9]/ + 1 + 0: 1 + +/[^0-9]/ + - + 0: - + *** Failers + 0: * + - + 0: - + 1 +No match + +/ab|cd/ + abc + 0: ab + abcd + 0: ab + +/()ef/ + def + 0: ef + 1: + +/a\(b/ + a(b + 0: a(b + +/a\(*b/ + ab + 0: ab + a((b + 0: a((b + +/((a))/ + abc + 0: a + 1: a + 2: a + +/(a)b(c)/ + abc + 0: abc + 1: a + 2: c + +/a+b+c/ + aabbabc + 0: abc + +/a{1,}b{1,}c/ + aabbabc + 0: abc + +/(a+|b)*/ + ab + 0: ab + 1: b + +/(a+|b){0,}/ + ab + 0: ab + 1: b + +/(a+|b)+/ + ab + 0: ab + 1: b + +/(a+|b){1,}/ + ab + 0: ab + 1: b + +/(a+|b)?/ + ab + 0: a + 1: a + +/(a+|b){0,1}/ + ab + 0: a + 1: a + +/[^ab]*/ + cde + 0: cde + +/abc/ + *** Failers +No match + b +No match + + +/a*/ + + +/([abc])*d/ + abbbcd + 0: abbbcd + 1: c + +/([abc])*bcd/ + abcd + 0: abcd + 1: a + +/a|b|c|d|e/ + e + 0: e + +/(a|b|c|d|e)f/ + ef + 0: ef + 1: e + +/abcd*efg/ + abcdefg + 0: abcdefg + +/ab*/ + xabyabbbz + 0: ab + xayabbbz + 0: a + +/(ab|cd)e/ + abcde + 0: cde + 1: cd + +/[abhgefdc]ij/ + hij + 0: hij + +/(abc|)ef/ + abcdef + 0: ef + 1: + +/(a|b)c*d/ + abcd + 0: bcd + 1: b + +/(ab|ab*)bc/ + abc + 0: abc + 1: a + +/a([bc]*)c*/ + abc + 0: abc + 1: bc + +/a([bc]*)(c*d)/ + abcd + 0: abcd + 1: bc + 2: d + +/a([bc]+)(c*d)/ + abcd + 0: abcd + 1: bc + 2: d + +/a([bc]*)(c+d)/ + abcd + 0: abcd + 1: b + 2: cd + +/a[bcd]*dcdcde/ + adcdcde + 0: adcdcde + +/a[bcd]+dcdcde/ + *** Failers +No match + abcde +No match + adcdcde +No match + +/(ab|a)b*c/ + abc + 0: abc + 1: ab + +/((a)(b)c)(d)/ + abcd + 0: abcd + 1: abc + 2: a + 3: b + 4: d + +/[a-zA-Z_][a-zA-Z0-9_]*/ + alpha + 0: alpha + +/^a(bc+|b[eh])g|.h$/ + abh + 0: bh + +/(bc+d$|ef*g.|h?i(j|k))/ + effgz + 0: effgz + 1: effgz + ij + 0: ij + 1: ij + 2: j + reffgz + 0: effgz + 1: effgz + *** Failers +No match + effg +No match + bcdd +No match + +/((((((((((a))))))))))/ + a + 0: a + 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a + 9: a +10: a + +/((((((((((a))))))))))\9/ + aa + 0: aa + 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a + 9: a +10: a + +/(((((((((a)))))))))/ + a + 0: a + 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a + 9: a + +/multiple words of text/ + *** Failers +No match + aa +No match + uh-uh +No match + +/multiple words/ + multiple words, yeah + 0: multiple words + +/(.*)c(.*)/ + abcde + 0: abcde + 1: ab + 2: de + +/\((.*), (.*)\)/ + (a, b) + 0: (a, b) + 1: a + 2: b + +/abcd/ + abcd + 0: abcd + +/a(bc)d/ + abcd + 0: abcd + 1: bc + +/a[-]?c/ + ac + 0: ac + +/(abc)\1/ + abcabc + 0: abcabc + 1: abc + +/([a-c]*)\1/ + abcabc + 0: abcabc + 1: abc + +/(a)|\1/ + a + 0: a + 1: a + *** Failers + 0: a + 1: a + ab + 0: a + 1: a + x +No match + +/abc/i + ABC + 0: ABC + XABCY + 0: ABC + ABABC + 0: ABC + *** Failers +No match + aaxabxbaxbbx +No match + XBC +No match + AXC +No match + ABX +No match + +/ab*c/i + ABC + 0: ABC + +/ab*bc/i + ABC + 0: ABC + ABBC + 0: ABBC + +/ab+bc/i + *** Failers +No match + ABC +No match + ABQ +No match + +/ab+bc/i + ABBBBC + 0: ABBBBC + +/^abc$/i + ABC + 0: ABC + *** Failers +No match + ABBBBC +No match + ABCC +No match + +/^abc/i + ABCC + 0: ABC + +/abc$/i + AABC + 0: ABC + +/^/i + ABC + 0: + +/$/i + ABC + 0: + +/a.c/i + ABC + 0: ABC + AXC + 0: AXC + +/a.*c/i + *** Failers +No match + AABC + 0: AABC + AXYZD +No match + +/a[bc]d/i + ABD + 0: ABD + +/a[b-d]e/i + ACE + 0: ACE + *** Failers +No match + ABC +No match + ABD +No match + +/a[b-d]/i + AAC + 0: AC + +/a[-b]/i + A- + 0: A- + +/a[b-]/i + A- + 0: A- + +/a[]]b/i + A]B + 0: A]B + +/a[^bc]d/i + AED + 0: AED + +/a[^-b]c/i + ADC + 0: ADC + *** Failers +No match + ABD +No match + A-C +No match + +/a[^]b]c/i + ADC + 0: ADC + +/ab|cd/i + ABC + 0: AB + ABCD + 0: AB + +/()ef/i + DEF + 0: EF + 1: + +/$b/i + *** Failers +No match + A]C +No match + B +No match + +/a\(b/i + A(B + 0: A(B + +/a\(*b/i + AB + 0: AB + A((B + 0: A((B + +/((a))/i + ABC + 0: A + 1: A + 2: A + +/(a)b(c)/i + ABC + 0: ABC + 1: A + 2: C + +/a+b+c/i + AABBABC + 0: ABC + +/a{1,}b{1,}c/i + AABBABC + 0: ABC + +/(a+|b)*/i + AB + 0: AB + 1: B + +/(a+|b){0,}/i + AB + 0: AB + 1: B + +/(a+|b)+/i + AB + 0: AB + 1: B + +/(a+|b){1,}/i + AB + 0: AB + 1: B + +/(a+|b)?/i + AB + 0: A + 1: A + +/(a+|b){0,1}/i + AB + 0: A + 1: A + +/[^ab]*/i + CDE + 0: CDE + +/([abc])*d/i + ABBBCD + 0: ABBBCD + 1: C + +/([abc])*bcd/i + ABCD + 0: ABCD + 1: A + +/a|b|c|d|e/i + E + 0: E + +/(a|b|c|d|e)f/i + EF + 0: EF + 1: E + +/abcd*efg/i + ABCDEFG + 0: ABCDEFG + +/ab*/i + XABYABBBZ + 0: AB + XAYABBBZ + 0: A + +/(ab|cd)e/i + ABCDE + 0: CDE + 1: CD + +/[abhgefdc]ij/i + HIJ + 0: HIJ + +/^(ab|cd)e/i + ABCDE +No match + +/(abc|)ef/i + ABCDEF + 0: EF + 1: + +/(a|b)c*d/i + ABCD + 0: BCD + 1: B + +/(ab|ab*)bc/i + ABC + 0: ABC + 1: A + +/a([bc]*)c*/i + ABC + 0: ABC + 1: BC + +/a([bc]*)(c*d)/i + ABCD + 0: ABCD + 1: BC + 2: D + +/a([bc]+)(c*d)/i + ABCD + 0: ABCD + 1: BC + 2: D + +/a([bc]*)(c+d)/i + ABCD + 0: ABCD + 1: B + 2: CD + +/a[bcd]*dcdcde/i + ADCDCDE + 0: ADCDCDE + +/a[bcd]+dcdcde/i + +/(ab|a)b*c/i + ABC + 0: ABC + 1: AB + +/((a)(b)c)(d)/i + ABCD + 0: ABCD + 1: ABC + 2: A + 3: B + 4: D + +/[a-zA-Z_][a-zA-Z0-9_]*/i + ALPHA + 0: ALPHA + +/^a(bc+|b[eh])g|.h$/i + ABH + 0: BH + +/(bc+d$|ef*g.|h?i(j|k))/i + EFFGZ + 0: EFFGZ + 1: EFFGZ + IJ + 0: IJ + 1: IJ + 2: J + REFFGZ + 0: EFFGZ + 1: EFFGZ + *** Failers +No match + ADCDCDE +No match + EFFG +No match + BCDD +No match + +/((((((((((a))))))))))/i + A + 0: A + 1: A + 2: A + 3: A + 4: A + 5: A + 6: A + 7: A + 8: A + 9: A +10: A + +/((((((((((a))))))))))\9/i + AA + 0: AA + 1: A + 2: A + 3: A + 4: A + 5: A + 6: A + 7: A + 8: A + 9: A +10: A + +/(((((((((a)))))))))/i + A + 0: A + 1: A + 2: A + 3: A + 4: A + 5: A + 6: A + 7: A + 8: A + 9: A + +/multiple words of text/i + *** Failers +No match + AA +No match + UH-UH +No match + +/multiple words/i + MULTIPLE WORDS, YEAH + 0: MULTIPLE WORDS + +/(.*)c(.*)/i + ABCDE + 0: ABCDE + 1: AB + 2: DE + +/\((.*), (.*)\)/i + (A, B) + 0: (A, B) + 1: A + 2: B + +/abcd/i + ABCD + 0: ABCD + +/a(bc)d/i + ABCD + 0: ABCD + 1: BC + +/a[-]?c/i + AC + 0: AC + +/(abc)\1/i + ABCABC + 0: ABCABC + 1: ABC + +/([a-c]*)\1/i + ABCABC + 0: ABCABC + 1: ABC + +/((foo)|(bar))*/ + foobar + 0: foobar + 1: bar + 2: foo + 3: bar + +/^(.+)?B/ + AB + 0: AB + 1: A + +/^([^a-z])|(\^)$/ + . + 0: . + 1: . + +/^[<>]&/ + <&OUT + 0: <& + +/^(){3,5}/ + abc + 0: + 1: + +/^(a+)*ax/ + aax + 0: aax + 1: a + +/^((a|b)+)*ax/ + aax + 0: aax + 1: a + 2: a + +/^((a|bc)+)*ax/ + aax + 0: aax + 1: a + 2: a + +/(a|x)*ab/ + cab + 0: ab + +/(a)*ab/ + cab + 0: ab + +/(ab)[0-9]\1/i + Ab4ab + 0: Ab4ab + 1: Ab + ab4Ab + 0: ab4Ab + 1: ab + +/foo\w*[0-9]{4}baz/ + foobar1234baz + 0: foobar1234baz + +/(\w+:)+/ + one: + 0: one: + 1: one: + +/((\w|:)+::)?(\w+)$/ + abcd + 0: abcd + 1: + 2: + 3: abcd + xy:z:::abcd + 0: xy:z:::abcd + 1: xy:z::: + 2: : + 3: abcd + +/^[^bcd]*(c+)/ + aexycd + 0: aexyc + 1: c + +/(a*)b+/ + caab + 0: aab + 1: aa + +/((\w|:)+::)?(\w+)$/ + abcd + 0: abcd + 1: + 2: + 3: abcd + xy:z:::abcd + 0: xy:z:::abcd + 1: xy:z::: + 2: : + 3: abcd + *** Failers + 0: Failers + 1: + 2: + 3: Failers + abcd: +No match + abcd: +No match + +/^[^bcd]*(c+)/ + aexycd + 0: aexyc + 1: c + +/((Z)+|A)*/ + ZABCDEFG + 0: ZA + 1: A + 2: Z + +/(Z()|A)*/ + ZABCDEFG + 0: ZA + 1: A + 2: + +/(Z(())|A)*/ + ZABCDEFG + 0: ZA + 1: A + 2: + 3: + +/(.*)[0-9]+\1/ + abc123abc + 0: abc123abc + 1: abc + abc123bc + 0: bc123bc + 1: bc + +/((.*))[0-9]+\1/ + abc123abc + 0: abc123abc + 1: abc + 2: abc + abc123bc + 0: bc123bc + 1: bc + 2: bc + +/^a{2,5}$/ + aa + 0: aa + aaa + 0: aaa + aaaa + 0: aaaa + aaaaa + 0: aaaaa + *** Failers +No match + a +No match + b +No match + aaaaab +No match + aaaaaa diff --git a/test/src/regex-resources/PTESTS b/test/src/regex-resources/PTESTS new file mode 100644 index 0000000..02b357c --- /dev/null +++ b/test/src/regex-resources/PTESTS @@ -0,0 +1,341 @@ +# 2.8.2 Regular Expression General Requirement +2¦4¦bb*¦abbbc¦ +2¦2¦bb*¦ababbbc¦ +7¦9¦A#*::¦A:A#:qA::qA#::qA##::q¦ +1¦5¦A#*::¦A##::A#::qA::qA#:q¦ +# 2.8.3.1.2 BRE Special Characters +# GA108 +2¦2¦\.¦a.c¦ +2¦2¦\[¦a[c¦ +2¦2¦\\¦a\c¦ +2¦2¦\*¦a*c¦ +2¦2¦\^¦a^c¦ +2¦2¦\$¦a$c¦ +7¦11¦X\*Y\*8¦Y*8X*8X*Y*8¦ +# GA109 +2¦2¦[.]¦a.c¦ +2¦2¦[[]¦a[c¦ +-1¦-1¦[[]¦ac¦ +2¦2¦[\]¦a\c¦ +1¦1¦[\a]¦abc¦ +2¦2¦[\.]¦a\.c¦ +2¦2¦[\.]¦a.\c¦ +2¦2¦[*]¦a*c¦ +2¦2¦[$]¦a$c¦ +2¦2¦[X*Y8]¦7*8YX¦ +# GA110 +2¦2¦*¦a*c¦ +3¦4¦*a¦*b*a*c¦ +1¦5¦**9=¦***9=9¦ +# GA111 +1¦1¦^*¦*bc¦ +-1¦-1¦^*¦a*c¦ +-1¦-1¦^*¦^*ab¦ +1¦5¦^**9=¦***9=¦ +-1¦-1¦^*5<*9¦5<9*5<*9¦ +# GA112 +2¦3¦\(*b\)¦a*b¦ +-1¦-1¦\(*b\)¦ac¦ +1¦6¦A\(**9\)=¦A***9=79¦ +# GA113(1) +1¦3¦\(^*ab\)¦*ab¦ +-1¦-1¦\(^*ab\)¦^*ab¦ +-1¦-1¦\(^*b\)¦a*b¦ +-1¦-1¦\(^*b\)¦^*b¦ +### GA113(2) GNU regex implements GA113(1) +##-1¦-1¦\(^*ab\)¦*ab¦ +##-1¦-1¦\(^*ab\)¦^*ab¦ +##1¦1¦\(^*b\)¦b¦ +##1¦3¦\(^*b\)¦^^b¦ +# GA114 +1¦3¦a^b¦a^b¦ +1¦3¦a\^b¦a^b¦ +1¦1¦^^¦^bc¦ +2¦2¦\^¦a^c¦ +1¦1¦[c^b]¦^abc¦ +1¦1¦[\^ab]¦^ab¦ +2¦2¦[\^ab]¦c\d¦ +-1¦-1¦[^^]¦^¦ +1¦3¦\(a^b\)¦a^b¦ +1¦3¦\(a\^b\)¦a^b¦ +2¦2¦\(\^\)¦a^b¦ +# GA115 +3¦3¦$$¦ab$¦ +-1¦-1¦$$¦$ab¦ +2¦3¦$c¦a$c¦ +2¦2¦[$]¦a$c¦ +1¦2¦\$a¦$a¦ +3¦3¦\$$¦ab$¦ +2¦6¦A\([34]$[34]\)B¦XA4$3BY¦ +# 2.8.3.1.3 Periods in BREs +# GA116 +1¦1¦.¦abc¦ +-1¦-1¦.ab¦abc¦ +1¦3¦ab.¦abc¦ +1¦3¦a.b¦a,b¦ +-1¦-1¦.......¦PqRs6¦ +1¦7¦.......¦PqRs6T8¦ +# 2.8.3.2 RE Bracket Expression +# GA118 +2¦2¦[abc]¦xbyz¦ +-1¦-1¦[abc]¦xyz¦ +2¦2¦[abc]¦xbay¦ +# GA119 +2¦2¦[^a]¦abc¦ +4¦4¦[^]cd]¦cd]ef¦ +2¦2¦[^abc]¦axyz¦ +-1¦-1¦[^abc]¦abc¦ +3¦3¦[^[.a.]b]¦abc¦ +3¦3¦[^[=a=]b]¦abc¦ +2¦2¦[^-ac]¦abcde-¦ +2¦2¦[^ac-]¦abcde-¦ +3¦3¦[^a-b]¦abcde¦ +3¦3¦[^a-bd-e]¦dec¦ +2¦2¦[^---]¦-ab¦ +16¦16¦[^a-zA-Z0-9]¦pqrstVWXYZ23579#¦ +# GA120(1) +3¦3¦[]a]¦cd]ef¦ +1¦1¦[]-a]¦a_b¦ +3¦3¦[][.-.]-0]¦ab0-]¦ +1¦1¦[]^a-z]¦string¦ +# GA120(2) +4¦4¦[^]cd]¦cd]ef¦ +0¦0¦[^]]*¦]]]]]]]]X¦ +0¦0¦[^]]*¦]]]]]]]]¦ +9¦9¦[^]]\{1,\}¦]]]]]]]]X¦ +-1¦-1¦[^]]\{1,\}¦]]]]]]]]¦ +# GA120(3) +3¦3¦[c[.].]d]¦ab]cd¦ +2¦8¦[a-z]*[[.].]][A-Z]*¦Abcd]DEFg¦ +# GA121 +2¦2¦[[.a.]b]¦Abc¦ +1¦1¦[[.a.]b]¦aBc¦ +-1¦-1¦[[.a.]b]¦ABc¦ +3¦3¦[^[.a.]b]¦abc¦ +3¦3¦[][.-.]-0]¦ab0-]¦ +3¦3¦[A-[.].]c]¦ab]!¦ +# GA122 +-2¦-2¦[[.ch.]]¦abc¦ +-2¦-2¦[[.ab.][.CD.][.EF.]]¦yZabCDEFQ9¦ +# GA125 +2¦2¦[[=a=]b]¦Abc¦ +1¦1¦[[=a=]b]¦aBc¦ +-1¦-1¦[[=a=]b]¦ABc¦ +3¦3¦[^[=a=]b]¦abc¦ +# GA126 +#W the expected result for [[:alnum:]]* is 2-7 which is wrong +0¦0¦[[:alnum:]]*¦ aB28gH¦ +2¦7¦[[:alnum:]][[:alnum:]]*¦ aB28gH¦ +#W the expected result for [^[:alnum:]]* is 2-5 which is wrong +0¦0¦[^[:alnum:]]*¦2 ,a¦ +2¦5¦[^[:alnum:]][^[:alnum:]]*¦2 ,a¦ +#W the expected result for [[:alpha:]]* is 2-5 which is wrong +0¦0¦[[:alpha:]]*¦ aBgH2¦ +2¦5¦[[:alpha:]][[:alpha:]]*¦ aBgH2¦ +1¦6¦[^[:alpha:]]*¦2 8,a¦ +1¦2¦[[:blank:]]*¦ ¦ +1¦8¦[^[:blank:]]*¦aB28gH, ¦ +1¦2¦[[:cntrl:]]*¦  ¦ +1¦8¦[^[:cntrl:]]*¦aB2 8gh,¦ +#W the expected result for [[:digit:]]* is 2-3 which is wrong +0¦0¦[[:digit:]]*¦a28¦ +2¦3¦[[:digit:]][[:digit:]]*¦a28¦ +1¦8¦[^[:digit:]]*¦aB gH,¦ +1¦7¦[[:graph:]]*¦aB28gH, ¦ +1¦3¦[^[:graph:]]*¦ ,¦ +1¦2¦[[:lower:]]*¦agB¦ +1¦8¦[^[:lower:]]*¦B2 8H,a¦ +1¦8¦[[:print:]]*¦aB2 8gH, ¦ +1¦2¦[^[:print:]]*¦  ¦ +#W the expected result for [[:punct:]]* is 2-2 which is wrong +0¦0¦[[:punct:]]*¦a,2¦ +2¦3¦[[:punct:]][[:punct:]]*¦a,,2¦ +1¦9¦[^[:punct:]]*¦aB2 8gH¦ +1¦3¦[[:space:]]*¦ ¦ +#W the expected result for [^[:space:]]* is 2-9 which is wrong +0¦0¦[^[:space:]]*¦ aB28gH, ¦ +2¦9¦[^[:space:]][^[:space:]]*¦ aB28gH, ¦ +#W the expected result for [[:upper:]]* is 2-3 which is wrong +0¦0¦[[:upper:]]*¦aBH2¦ +2¦3¦[[:upper:]][[:upper:]]*¦aBH2¦ +1¦8¦[^[:upper:]]*¦a2 8g,B¦ +#W the expected result for [[:xdigit:]]* is 2-5 which is wrong +0¦0¦[[:xdigit:]]*¦gaB28h¦ +2¦5¦[[:xdigit:]][[:xdigit:]]*¦gaB28h¦ +#W the expected result for [^[:xdigit:]]* is 2-7 which is wrong +2¦7¦[^[:xdigit:]][^[:xdigit:]]*¦a gH,2¦ +# GA127 +-2¦-2¦[b-a]¦abc¦ +1¦1¦[a-c]¦bbccde¦ +2¦2¦[a-b]¦-bc¦ +3¦3¦[a-z0-9]¦AB0¦ +3¦3¦[^a-b]¦abcde¦ +3¦3¦[^a-bd-e]¦dec¦ +1¦1¦[]-a]¦a_b¦ +2¦2¦[+--]¦a,b¦ +2¦2¦[--/]¦a.b¦ +2¦2¦[^---]¦-ab¦ +3¦3¦[][.-.]-0]¦ab0-]¦ +3¦3¦[A-[.].]c]¦ab]!¦ +2¦6¦bc[d-w]xy¦abchxyz¦ +# GA129 +1¦1¦[a-cd-f]¦dbccde¦ +-1¦-1¦[a-ce-f]¦dBCCdE¦ +2¦4¦b[n-zA-M]Y¦absY9Z¦ +2¦4¦b[n-zA-M]Y¦abGY9Z¦ +# GA130 +3¦3¦[-xy]¦ac-¦ +2¦4¦c[-xy]D¦ac-D+¦ +2¦2¦[--/]¦a.b¦ +2¦4¦c[--/]D¦ac.D+b¦ +2¦2¦[^-ac]¦abcde-¦ +1¦3¦a[^-ac]c¦abcde-¦ +3¦3¦[xy-]¦zc-¦ +2¦4¦c[xy-]7¦zc-786¦ +2¦2¦[^ac-]¦abcde-¦ +2¦4¦a[^ac-]c¦5abcde-¦ +2¦2¦[+--]¦a,b¦ +2¦4¦a[+--]B¦Xa,By¦ +2¦2¦[^---]¦-ab¦ +4¦6¦X[^---]Y¦X-YXaYXbY¦ +# 2.8.3.3 BREs Matching Multiple Characters +# GA131 +3¦4¦cd¦abcdeabcde¦ +1¦2¦ag*b¦abcde¦ +-1¦-1¦[a-c][e-f]¦abcdef¦ +3¦4¦[a-c][e-f]¦acbedf¦ +4¦8¦abc*XYZ¦890abXYZ#*¦ +4¦9¦abc*XYZ¦890abcXYZ#*¦ +4¦15¦abc*XYZ¦890abcccccccXYZ#*¦ +-1¦-1¦abc*XYZ¦890abc*XYZ#*¦ +# GA132 +2¦4¦\(*bc\)¦a*bc¦ +1¦2¦\(ab\)¦abcde¦ +1¦10¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)¦abcdefghijk¦ +3¦8¦43\(2\(6\)*0\)AB¦654320ABCD¦ +3¦9¦43\(2\(7\)*0\)AB¦6543270ABCD¦ +3¦12¦43\(2\(7\)*0\)AB¦6543277770ABCD¦ +# GA133 +1¦10¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)¦abcdefghijk¦ +-1¦-1¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(k\)\)\)\)\)\)\)\)¦abcdefghijk¦ +# GA134 +2¦4¦\(bb*\)¦abbbc¦ +2¦2¦\(bb*\)¦ababbbc¦ +1¦6¦a\(.*b\)¦ababbbc¦ +1¦2¦a\(b*\)¦ababbbc¦ +1¦20¦a\(.*b\)c¦axcaxbbbcsxbbbbbbbbc¦ +# GA135 +1¦7¦\(a\(b\(c\(d\(e\)\)\)\)\)\4¦abcdededede¦ +#W POSIX does not really specify whether a\(b\)*c\1 matches acb. +#W back references are supposed to expand to the last match, but what +#W if there never was a match as in this case? +-1¦-1¦a\(b\)*c\1¦acb¦ +1¦11¦\(a\(b\(c\(d\(e\(f\(g\)h\(i\(j\)\)\)\)\)\)\)\)\9¦abcdefghijjk¦ +# GA136 +#W These two tests have the same problem as the test in GA135. No match +#W of a subexpression, why should the back reference be usable? +#W 1 2 a\(b\)*c\1 acb +#W 4 7 a\(b\(c\(d\(f\)*\)\)\)\4¦xYzabcdePQRST +-1¦-1¦a\(b\)*c\1¦acb¦ +-1¦-1¦a\(b\(c\(d\(f\)*\)\)\)\4¦xYzabcdePQRST¦ +# GA137 +-2¦-2¦\(a\(b\)\)\3¦foo¦ +-2¦-2¦\(a\(b\)\)\(a\(b\)\)\5¦foo¦ +# GA138 +1¦2¦ag*b¦abcde¦ +1¦10¦a.*b¦abababvbabc¦ +2¦5¦b*c¦abbbcdeabbbbbbcde¦ +2¦5¦bbb*c¦abbbcdeabbbbbbcde¦ +1¦5¦a\(b\)*c\1¦abbcbbb¦ +-1¦-1¦a\(b\)*c\1¦abbdbd¦ +0¦0¦\([a-c]*\)\1¦abcacdef¦ +1¦6¦\([a-c]*\)\1¦abcabcabcd¦ +1¦2¦a^*b¦ab¦ +1¦5¦a^*b¦a^^^b¦ +# GA139 +1¦2¦a\{2\}¦aaaa¦ +1¦7¦\([a-c]*\)\{0,\}¦aabcaab¦ +1¦2¦\(a\)\1\{1,2\}¦aabc¦ +1¦3¦\(a\)\1\{1,2\}¦aaaabc¦ +#W the expression \(\(a\)\1\)\{1,2\} is ill-formed, using \2 +1¦4¦\(\(a\)\2\)\{1,2\}¦aaaabc¦ +# GA140 +1¦2¦a\{2\}¦aaaa¦ +-1¦-1¦a\{2\}¦abcd¦ +0¦0¦a\{0\}¦aaaa¦ +1¦64¦a\{64\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦ +# GA141 +1¦7¦\([a-c]*\)\{0,\}¦aabcaab¦ +#W the expected result for \([a-c]*\)\{2,\} is failure which isn't correct +1¦3¦\([a-c]*\)\{2,\}¦abcdefg¦ +1¦3¦\([a-c]*\)\{1,\}¦abcdefg¦ +-1¦-1¦a\{64,\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦ +# GA142 +1¦3¦a\{2,3\}¦aaaa¦ +-1¦-1¦a\{2,3\}¦abcd¦ +0¦0¦\([a-c]*\)\{0,0\}¦foo¦ +1¦63¦a\{1,63\}¦aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa¦ +# 2.8.3.4 BRE Precedence +# GA143 +#W There are numerous bugs in the original version. +2¦19¦\^\[[[.].]]\\(\\1\\)\*\\{1,2\\}\$¦a^[]\(\1\)*\{1,2\}$b¦ +1¦6¦[[=*=]][[=\=]][[=]=]][[===]][[...]][[:punct:]]¦*\]=.;¦ +1¦6¦[$\(*\)^]*¦$\()*^¦ +1¦1¦[\1]¦1¦ +1¦1¦[\{1,2\}]¦{¦ +#W the expected result for \(*\)*\1* is 2-2 which isn't correct +0¦0¦\(*\)*\1*¦a*b*11¦ +2¦3¦\(*\)*\1*b¦a*b*11¦ +#W the expected result for \(a\(b\{1,2\}\)\{1,2\}\) is 1-5 which isn't correct +1¦3¦\(a\(b\{1,2\}\)\{1,2\}\)¦abbab¦ +1¦5¦\(a\(b\{1,2\}\)\)\{1,2\}¦abbab¦ +1¦1¦^\(^\(^a$\)$\)$¦a¦ +1¦2¦\(a\)\1$¦aa¦ +1¦3¦ab*¦abb¦ +1¦4¦ab\{2,4\}¦abbbc¦ +# 2.8.3.5 BRE Expression Anchoring +# GA144 +1¦1¦^a¦abc¦ +-1¦-1¦^b¦abc¦ +-1¦-1¦^[a-zA-Z]¦99Nine¦ +1¦4¦^[a-zA-Z]*¦Nine99¦ +# GA145(1) +1¦2¦\(^a\)\1¦aabc¦ +-1¦-1¦\(^a\)\1¦^a^abc¦ +1¦2¦\(^^a\)¦^a¦ +1¦1¦\(^^\)¦^^¦ +1¦3¦\(^abc\)¦abcdef¦ +-1¦-1¦\(^def\)¦abcdef¦ +### GA145(2) GNU regex implements GA145(1) +##-1¦-1¦\(^a\)\1¦aabc¦ +##1¦4¦\(^a\)\1¦^a^abc¦ +##-1¦-1¦\(^^a\)¦^a¦ +##1¦2¦\(^^\)¦^^¦ +# GA146 +3¦3¦a$¦cba¦ +-1¦-1¦a$¦abc¦ +5¦7¦[a-z]*$¦99ZZxyz¦ +#W the expected result for [a-z]*$ is failure which isn't correct +10¦9¦[a-z]*$¦99ZZxyz99¦ +3¦3¦$$¦ab$¦ +-1¦-1¦$$¦$ab¦ +3¦3¦\$$¦ab$¦ +# GA147(1) +-1¦-1¦\(a$\)\1¦bcaa¦ +-1¦-1¦\(a$\)\1¦ba$¦ +-1¦-1¦\(ab$\)¦ab$¦ +1¦2¦\(ab$\)¦ab¦ +4¦6¦\(def$\)¦abcdef¦ +-1¦-1¦\(abc$\)¦abcdef¦ +### GA147(2) GNU regex implements GA147(1) +##-1¦-1¦\(a$\)\1¦bcaa¦ +##2¦5¦\(a$\)\1¦ba$a$¦ +##-1¦-1¦\(ab$\)¦ab¦ +##1¦3¦\(ab$\)¦ab$¦ +# GA148 +0¦0¦^$¦¦ +1¦3¦^abc$¦abc¦ +-1¦-1¦^xyz$¦^xyz^¦ +-1¦-1¦^234$¦^234$¦ +1¦9¦^[a-zA-Z0-9]*$¦2aA3bB9zZ¦ +-1¦-1¦^[a-z0-9]*$¦2aA3b#B9zZ¦ diff --git a/test/src/regex-resources/TESTS b/test/src/regex-resources/TESTS new file mode 100644 index 0000000..f2c9886 --- /dev/null +++ b/test/src/regex-resources/TESTS @@ -0,0 +1,167 @@ +0:(.*)*\1:xx +0:^: +0:$: +0:^$: +0:^a$:a +0:abc:abc +1:abc:xbc +1:abc:axc +1:abc:abx +0:abc:xabcy +0:abc:ababc +0:ab*c:abc +0:ab*bc:abc +0:ab*bc:abbc +0:ab*bc:abbbbc +0:ab+bc:abbc +1:ab+bc:abc +1:ab+bc:abq +0:ab+bc:abbbbc +0:ab?bc:abbc +0:ab?bc:abc +1:ab?bc:abbbbc +0:ab?c:abc +0:^abc$:abc +1:^abc$:abcc +0:^abc:abcc +1:^abc$:aabc +0:abc$:aabc +0:^:abc +0:$:abc +0:a.c:abc +0:a.c:axc +0:a.*c:axyzc +1:a.*c:axyzd +1:a[bc]d:abc +0:a[bc]d:abd +1:a[b-d]e:abd +0:a[b-d]e:ace +0:a[b-d]:aac +0:a[-b]:a- +0:a[b-]:a- +2:a[b-a]:- +2:a[]b:- +2:a[:- +0:a]:a] +0:a[]]b:a]b +0:a[^bc]d:aed +1:a[^bc]d:abd +0:a[^-b]c:adc +1:a[^-b]c:a-c +1:a[^]b]c:a]c +0:a[^]b]c:adc +0:ab|cd:abc +0:ab|cd:abcd +0:()ef:def +0:()*:- +2:*a:- +2:^*:- +2:$*:- +2:(*)b:- +1:$b:b +2:a\:- +0:a\(b:a(b +0:a\(*b:ab +0:a\(*b:a((b +1:a\x:a\x +1:abc):- +2:(abc:- +0:((a)):abc +0:(a)b(c):abc +0:a+b+c:aabbabc +0:a**:- +0:a*?:- +0:(a*)*:- +0:(a*)+:- +0:(a|)*:- +0:(a*|b)*:- +0:(a+|b)*:ab +0:(a+|b)+:ab +0:(a+|b)?:ab +0:[^ab]*:cde +0:(^)*:- +0:(ab|)*:- +2:)(:- +1:abc: +1:abc: +0:a*: +0:([abc])*d:abbbcd +0:([abc])*bcd:abcd +0:a|b|c|d|e:e +0:(a|b|c|d|e)f:ef +0:((a*|b))*:- +0:abcd*efg:abcdefg +0:ab*:xabyabbbz +0:ab*:xayabbbz +0:(ab|cd)e:abcde +0:[abhgefdc]ij:hij +1:^(ab|cd)e:abcde +0:(abc|)ef:abcdef +0:(a|b)c*d:abcd +0:(ab|ab*)bc:abc +0:a([bc]*)c*:abc +0:a([bc]*)(c*d):abcd +0:a([bc]+)(c*d):abcd +0:a([bc]*)(c+d):abcd +0:a[bcd]*dcdcde:adcdcde +1:a[bcd]+dcdcde:adcdcde +0:(ab|a)b*c:abc +0:((a)(b)c)(d):abcd +0:[A-Za-z_][A-Za-z0-9_]*:alpha +0:^a(bc+|b[eh])g|.h$:abh +0:(bc+d$|ef*g.|h?i(j|k)):effgz +0:(bc+d$|ef*g.|h?i(j|k)):ij +1:(bc+d$|ef*g.|h?i(j|k)):effg +1:(bc+d$|ef*g.|h?i(j|k)):bcdd +0:(bc+d$|ef*g.|h?i(j|k)):reffgz +1:((((((((((a)))))))))):- +0:(((((((((a))))))))):a +1:multiple words of text:uh-uh +0:multiple words:multiple words, yeah +0:(.*)c(.*):abcde +1:\((.*),:(.*)\) +1:[k]:ab +0:abcd:abcd +0:a(bc)d:abcd +0:a[-]?c:ac +0:(....).*\1:beriberi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Qaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mo'ammar Gadhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Kaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Qadhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar El Kadhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Gadafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar al-Qadafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamer El Kazzafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar al-Gaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar Al Qathafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Al Qathafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mo'ammar el-Gadhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar El Kadhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar al-Qadhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar al-Qadhdhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar Qadafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moamar Gaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar Qadhdhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Khaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar al-Khaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'amar al-Kadafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Ghaddafy +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Ghadafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Ghaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muamar Kaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Quathafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muammar Gheddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Muamar Al-Kaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar Khadafy +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Moammar Qudhafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mu'ammar al-Qaddafi +0:M[ou]'?am+[ae]r .*([AEae]l[- ])?[GKQ]h?[aeu]+([dtz][dhz]?)+af[iy]:Mulazim Awwal Mu'ammar Muhammad Abu Minyar al-Qadhafi +0:[[:digit:]]+:01234 +1:[[:alpha:]]+:01234 +0:^[[:digit:]]*$:01234 +1:^[[:digit:]]*$:01234a +0:^[[:alnum:]]*$:01234a +0:^[[:xdigit:]]*$:01234a +1:^[[:xdigit:]]*$:01234g +0:^[[:alnum:][:space:]]*$:Hello world -- 2.8.0.rc3.226.g39d4020 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 14:30:37 2016 Received: (at 24071) by debbugs.gnu.org; 27 Jul 2016 18:30:37 +0000 Received: from localhost ([127.0.0.1]:39449 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSTbI-0001s8-Sh for submit@debbugs.gnu.org; Wed, 27 Jul 2016 14:30:37 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:37350) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bSTbH-0001rv-Rr for 24071@debbugs.gnu.org; Wed, 27 Jul 2016 14:30:36 -0400 Received: by mail-wm0-f48.google.com with SMTP id i5so74963899wmg.0 for <24071@debbugs.gnu.org>; Wed, 27 Jul 2016 11:30:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:in-reply-to:organization:references :user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=2Iiv1q38+xhkbpCxCywGoG1eeqcw6c5FXPwTEYLuKfA=; b=AbCS8JuKA4u8zrAUwLl5bL2nyphkEgMBGtGWDVb0M00pSwx2mAHzKJjFO5jK3TM68v 2CvEuM14fQf2Et0WigJkwc1Yshrq5PGpa5CdoSY9V9abIWewFy+YOnSrEDgU8QNo+60u Ttcxe2Hn0cnmrABK/PJ1tl++F4grTwr4wNg5LRztQXAwoXHrcvjzh+2CUHeivlt8f7VN mKf2rIfCEVve8XbCla6sMJcvX/2cmBJ4Zdpelbx+H1wEzr2pnsHfmmSMaWKaetR9RtoG pcPs1e0Ix/aoG8QcclmlEXuMfcBypMisLzCK31AwraUU6ql7yQkKaKrDBCuG8zvVJO8d Wicw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:user-agent:face:date:message-id :mime-version:content-transfer-encoding; bh=2Iiv1q38+xhkbpCxCywGoG1eeqcw6c5FXPwTEYLuKfA=; b=S98/F2fwMuhGF+xoYO4Yw4nQaN16mvTAMnqMGFRmKMkMClu6JtThRYeGtcUJWp0lNj 9HjjAeudFcCtDgGbZ2viPMg9mseFiZBseqiio3GCRxeWcXDMlSpq85SchI/A8nJ7xR3S H2b4vmaM2ynlZwV/pvY9ovLKQO3xpIy1N/wgXZAFtKozm/B1iqRHm7g1hll3ZGXlBYbl Ur30EfQFJ+DCwf36sMtEdd1ttIGNvz03WfFMaIQRd3cXMYPEOEwo+jiyNBaE+/7kJAEd e4seomfVkSlvnfJ7tSQC0GXdyu4uSZM8Gy2PeMXNGQneKy5zTB8qzL0uNpS1LN8uzY5X N4hA== X-Gm-Message-State: AEkoouss9wqy2GpxkHf9xw0McMqSmhIFHYC7zimhvnF/NPfmMFIASZx1NGUmYw1wbiuYjbu4 X-Received: by 10.194.168.197 with SMTP id zy5mr32994883wjb.112.1469644229910; Wed, 27 Jul 2016 11:30:29 -0700 (PDT) Received: from mpn-glaptop ([2620:0:105f:301:e136:c882:c6f0:97af]) by smtp.gmail.com with ESMTPSA id 17sm8393709wmf.6.2016.07.27.11.30.29 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Wed, 27 Jul 2016 11:30:29 -0700 (PDT) From: Michal Nazarewicz To: Eli Zaretskii Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] In-Reply-To: <83fuqvrqom.fsf@gnu.org> Organization: http://mina86.com/ References: <1469487245-11126-1-git-send-email-mina86@mina86.com> <83d1m0tq25.fsf@gnu.org> <83fuqvrqom.fsf@gnu.org> User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.1.50.1 (x86_64-unknown-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:160727:24071@debbugs.gnu.org::WWrY8T1AEYyN/zqI:00000000000000000000000000000000000000002UEL X-Hashcash: 1:20:160727:lists@dima.secretsauce.net::cf89/xWXpCCCGqCY:000000000000000000000000000000000004dUk X-Hashcash: 1:20:160727:eliz@gnu.org::gFMM2DqOjlKxOyOz:00000I1rw Date: Wed, 27 Jul 2016 20:30:28 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: lists@dima.secretsauce.net, 24071@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) On Wed, Jul 27 2016, Eli Zaretskii wrote: > But this later patch looks like a beginning of a series of refactoring > (is it?), hopefully followed by more features, Well=E2=80=A6 I had bunch of patches prepared but then I discovered that =E2=80=98#ifndef emacs=E2=80=99 is used by etags and =E2=80=98#ifdef _LIBC= =E2=80=99 cannot go away either so all of them are now gone=E2=80=A6 :( But I am wondering about a way to turn all of the =E2=80=98syntax & RE_FOO= =E2=80=99 checks into compile-time expressions when compiling regex.c for Emacs. --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 29 01:31:52 2016 Received: (at 24071) by debbugs.gnu.org; 29 Jul 2016 05:31:53 +0000 Received: from localhost ([127.0.0.1]:49897 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bT0Om-0004bk-Oh for submit@debbugs.gnu.org; Fri, 29 Jul 2016 01:31:52 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:54037) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bT0Ol-0004bb-7D for 24071@debbugs.gnu.org; Fri, 29 Jul 2016 01:31:51 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id B60F2203D3; Fri, 29 Jul 2016 01:31:50 -0400 (EDT) Received: from frontend2 ([10.202.2.161]) by compute5.internal (MEProxy); Fri, 29 Jul 2016 01:31:50 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=secretsauce.net; h=cc:content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to:x-sasl-enc:x-sasl-enc; s=mesmtp; bh=wGfJp C2mcab8csdEtzLY+rEenH8=; b=KWtjsLg/IQkKZ/ZJGD7Ly2R0/RUYE7xUWPlrl QlIHxfk6ITfhevjPN/cKOcGPDBdGtXajoZe6PLvwSsd7LXosQ8ILZ+0U39yx38Ug sbtZ1Rm8WRnLOR6lu4ZaUG6ANkcnNuxVKLHQIl2Dr5M64geOT2slBO5QzqTuYoek D7h+IU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=smtpout; bh=wGfJpC2mcab8csdEtzLY+rEenH8=; b=rKs1w 95Q9skx8ijWGmlcOsZQFnYjOaPRL42Y6ZH1fB7VxkeqB+BcL/7pOmWx3Kaii6GZW lT/48KvjB3TEw/SvDdw/+dzHAOef9DAyHoRVBym/PgS3mVTW25ZNa4RhJpzeqek7 FkSD1zhZYM1XYK2yX2Q1hCA/UeBmo5tdNC0HV8= X-Sasl-enc: VKRso9HSkiyCq37ZViPq20DsHM0/pHwlUhWnloHyJ/aQ 1469770310 Received: from shorty.local (50-1-153-216.dsl.dynamic.fusionbroadband.com [50.1.153.216]) by mail.messagingengine.com (Postfix) with ESMTPA id 5E526CCDC5; Fri, 29 Jul 2016 01:31:50 -0400 (EDT) Received: from dima by shorty.local with local (Exim 4.87) (envelope-from ) id 1bT0Oi-0001fi-SL; Thu, 28 Jul 2016 22:31:48 -0700 References: <1469487245-11126-1-git-send-email-mina86@mina86.com> <83d1m0tq25.fsf@gnu.org> User-agent: mu4e 0.9.17; emacs 25.0.95.1 From: Dima Kogan To: Eli Zaretskii Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] In-reply-to: <83d1m0tq25.fsf@gnu.org> Date: Thu, 28 Jul 2016 22:31:48 -0700 Message-ID: <87zip1t3gb.fsf@secretsauce.net> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 24071 Cc: 24071@debbugs.gnu.org, Michal Nazarewicz X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Eli Zaretskii writes: > If we are going to make some serious refactoring in regex.c, I think > we should start with having a test suite for it. The > dima_regex_embedded_modifiers branch, created by Dima Kogan (CC'ed) in > the Emacs repository includes a suite taken from glibc. Dima, could > you perhaps merge the parts of the test suite that can already be used > to the master branch, so that we could use them to verify changes in > regex.c? Hi. I just saw this. Do you still need me to get these merged? That branch is very clean, so just cherry-picking the specific commits should do it. From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 29 01:53:47 2016 Received: (at 24071) by debbugs.gnu.org; 29 Jul 2016 05:53:47 +0000 Received: from localhost ([127.0.0.1]:49905 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bT0jy-000552-NP for submit@debbugs.gnu.org; Fri, 29 Jul 2016 01:53:47 -0400 Received: from eggs.gnu.org ([208.118.235.92]:43063) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bT0jw-00054p-Op for 24071@debbugs.gnu.org; Fri, 29 Jul 2016 01:53:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bT0jn-0003HS-5m for 24071@debbugs.gnu.org; Fri, 29 Jul 2016 01:53:39 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:55564) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bT0jn-0003Gy-22; Fri, 29 Jul 2016 01:53:35 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4773 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bT0jm-0003Rj-7H; Fri, 29 Jul 2016 01:53:34 -0400 Date: Fri, 29 Jul 2016 08:53:28 +0300 Message-Id: <83invpq9bb.fsf@gnu.org> From: Eli Zaretskii To: Dima Kogan In-reply-to: <87zip1t3gb.fsf@secretsauce.net> (message from Dima Kogan on Thu, 28 Jul 2016 22:31:48 -0700) Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] References: <1469487245-11126-1-git-send-email-mina86@mina86.com> <83d1m0tq25.fsf@gnu.org> <87zip1t3gb.fsf@secretsauce.net> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-Debbugs-Envelope-To: 24071 Cc: 24071@debbugs.gnu.org, mina86@mina86.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) > From: Dima Kogan > Cc: Michal Nazarewicz , 24071@debbugs.gnu.org > Date: Thu, 28 Jul 2016 22:31:48 -0700 > > Eli Zaretskii writes: > > > If we are going to make some serious refactoring in regex.c, I think > > we should start with having a test suite for it. The > > dima_regex_embedded_modifiers branch, created by Dima Kogan (CC'ed) in > > the Emacs repository includes a suite taken from glibc. Dima, could > > you perhaps merge the parts of the test suite that can already be used > > to the master branch, so that we could use them to verify changes in > > regex.c? > > Hi. I just saw this. Do you still need me to get these merged? No, I think Michal already did that, although the stiff is not yet on the master branch. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Fri Jul 29 09:07:50 2016 Received: (at 24071) by debbugs.gnu.org; 29 Jul 2016 13:07:51 +0000 Received: from localhost ([127.0.0.1]:50139 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bT7W2-0001ul-Mo for submit@debbugs.gnu.org; Fri, 29 Jul 2016 09:07:50 -0400 Received: from mail-wm0-f54.google.com ([74.125.82.54]:36543) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bT7W1-0001uW-1T for 24071@debbugs.gnu.org; Fri, 29 Jul 2016 09:07:49 -0400 Received: by mail-wm0-f54.google.com with SMTP id q128so289435488wma.1 for <24071@debbugs.gnu.org>; Fri, 29 Jul 2016 06:07:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:in-reply-to:organization:references :user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=iRz5/cNXi79GgrrXPdrnzxRs7gt8ZI1tfJmCEnZuUHY=; b=UmDmzzLbaFHWYef9eMX/j3cLtd6u7peVP5NLGHZug6gz9qYiInw77BaF4x88jnlQSh SHc+/R7S0euZP8lF6FFkVh97dZ11fLVH8oPsshvSew3WHsIpc0uJ8RxErpPcpm6hWERv XNelqegp1NrCNbjcO23s9LClm//hQ4eszZxYoMwSdi0F8mrn2P5kRhvsoPJjrYLAGYff 5js+TdUHwVr7RaTxXYZG/PAtOXixQkgcceDmmTt0dZmbZ5nnpMzAo5ksNaBFrLROF9cx 0lLAAB+0M424d8nitzJpG5bMZiBb3fVeddYMW5quUT71XvK/9MPJ0tAjuGKZjBuCGIC3 xDxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:user-agent:face:date:message-id :mime-version:content-transfer-encoding; bh=iRz5/cNXi79GgrrXPdrnzxRs7gt8ZI1tfJmCEnZuUHY=; b=OCbyA4XwXH62s5R95ns1jCMwBG/gsz5ecWM9mbzq0MaQhxvXQCIqlAwo2+rxBnst3Y lFgFpu9YLdPsEC0+Ti04Dx0WlZe1PrWhzcmcKJReqbItlky0346icujoAH1htCqahQxD L9an2mluJuRZqiFv0vYpF8LcemAxHrF9VNsnGvPBvBTkAdwRgho8cMLhD0xTrcOD09ZY NoV4BWKudtQpUpw+hDgHF1xhKkL9dm+MC0BPhCT1Dushvhi3DFRUj30Y+Dk7F3zAG5ev RRTilg9kDfh9cBHcGb3N+lQAms1BdJ2CdRIK5XbhY2UkePG16nwH9lDMXs4P1dbvGymd BYuw== X-Gm-Message-State: AEkoouvSZMJqh+Tthg/cdqVWGy5a+1pooceQhUeP61USPLNCkdtrTCd6ZR8PBx4RQf/uTMAE X-Received: by 10.28.199.4 with SMTP id x4mr1037626wmf.70.1469797663197; Fri, 29 Jul 2016 06:07:43 -0700 (PDT) Received: from mpn-glaptop ([2620:0:105f:301:95ba:30b4:55b6:db93]) by smtp.gmail.com with ESMTPSA id o4sm16510000wjd.15.2016.07.29.06.07.42 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Fri, 29 Jul 2016 06:07:42 -0700 (PDT) From: Michal Nazarewicz To: Dima Kogan , Eli Zaretskii Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] In-Reply-To: <87zip1t3gb.fsf@secretsauce.net> Organization: http://mina86.com/ References: <1469487245-11126-1-git-send-email-mina86@mina86.com> <83d1m0tq25.fsf@gnu.org> <87zip1t3gb.fsf@secretsauce.net> User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.1.50.45 (x86_64-unknown-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:160729:24071@debbugs.gnu.org::BP4sqTq+vmXMhNxF:00000000000000000000000000000000000000000DNp X-Hashcash: 1:20:160729:eliz@gnu.org::CjVhrPswd6Rywpzn:0000019hh X-Hashcash: 1:20:160729:dima@secretsauce.net::8h04ZQFDZX33Yy8u:000000000000000000000000000000000000000001h1L Date: Fri, 29 Jul 2016 15:07:41 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: 24071@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) On Thu, Jul 28 2016, Dima Kogan wrote: > Hi. I just saw this. Do you still need me to get these merged? That > branch is very clean, so just cherry-picking the specific commits > should do it. It=E2=80=99s all sorted out. I did not cherry pick your last two commits though: regex: support for new case-fold embedded modifiers regex: added tests for the case-fold embedded modifiers The result is at https://github.com/mina86/emacs/commits/master and I=E2=80= =99ll push it next week or so. --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 02 12:06:13 2016 Received: (at 24071-done) by debbugs.gnu.org; 2 Aug 2016 16:06:13 +0000 Received: from localhost ([127.0.0.1]:53796 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUcCr-0005Cy-E3 for submit@debbugs.gnu.org; Tue, 02 Aug 2016 12:06:13 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:34593) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUcCq-0005Cl-1Z for 24071-done@debbugs.gnu.org; Tue, 02 Aug 2016 12:06:12 -0400 Received: by mail-wm0-f48.google.com with SMTP id q128so60139021wma.1 for <24071-done@debbugs.gnu.org>; Tue, 02 Aug 2016 09:06:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:subject:in-reply-to:organization:references :user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=nfoQon4tyv45eVCgXz/JX5++bl/EJ+BPN0P+OMeH+rU=; b=JN4nU7ZtGC0UXuFxLo+TGoJoAeG5JNUrAywijYob7U6ApiqQMpr2F1MhKm1aalF6G/ Sn6JxduZlTti07GhqSLUXbVQ1ODSDUuv3OuQP3XkCo2MC9B496lzPo5bh32PhGc/bXuY gD59BSxGnd/uaXQ8LZI1aC86JA2cdEsvt7h/48rO6gIcJwS3afn6ckJvzECJfa3vZNLM UhRuaZDeU+4Ku/AjUC2mQ60OdczZ6EMJu1+1NcnHQ6YVX+wUJqNk15RyIHg3PZalP6UM EtUNb5CFGh6gJEBpQKsf73qzlcVpgUC5bHI37CI+1oeeB4tKu9CM7nMZM5VqNulCjSGu ZZBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:subject:in-reply-to:organization :references:user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=nfoQon4tyv45eVCgXz/JX5++bl/EJ+BPN0P+OMeH+rU=; b=C0QecH5JEk1TcEgphjpzA0hAIqdKqbAGoy4zPqSMNqQRCATJJvE+c98KSukJNL8y/5 Ypae5LjzJDRi848Bjzg3/1Kl2HvNr59zozMUG+Ahhd/vzDRYS6dgM5afoN9JW/Y4GjjP Ij7gd2KDh5SoU1SWP6eW6/R9DWHcUzIjXuQ4xqCHoJXd52quaDwqErRHCom7fwBwnWeT oUuIFlScabvzL/yO4Kld5kF8c9SyYQq88dk4MmrzL+nKyNB2Ib3LwVyFUgqhNsNEeUNO UBeqp0F7cXPFQLrYgw+lQS/1Ehj32VUiy4+CWvwMJqBdLnwHUFDtY/xzlAtk7+XX8VR9 Lt+w== X-Gm-Message-State: AEkoouvhqbHHi5k+o5xDlEjRb+Ae17Zd9DqCu3k6jbPCh9HaPfGHHIQ5Ua1qFILoWQdN9xHl X-Received: by 10.194.173.4 with SMTP id bg4mr57411104wjc.28.1470153965981; Tue, 02 Aug 2016 09:06:05 -0700 (PDT) Received: from mpn-glaptop ([2620:0:105f:301:5e0:8629:cc48:7615]) by smtp.gmail.com with ESMTPSA id ub8sm3289486wjc.39.2016.08.02.09.06.04 for <24071-done@debbugs.gnu.org> (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 02 Aug 2016 09:06:05 -0700 (PDT) From: Michal Nazarewicz To: 24071-done@debbugs.gnu.org Subject: Re: [PATCH] Refactor regex character class parsing in [:name:] In-Reply-To: <1469487245-11126-1-git-send-email-mina86@mina86.com> Organization: http://mina86.com/ References: <1469487245-11126-1-git-send-email-mina86@mina86.com> User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.1.50.1 (x86_64-unknown-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:160802:bug-gnu-emacs@gnu.org::ZEQeinxFelhVSFq/:00000000000000000000000000000000000000001Lul Date: Tue, 02 Aug 2016 18:06:04 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: base64 X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) UHVzaGVkLg0KDQotLSANCkJlc3QgcmVnYXJkcw0K44Of44OP44KmIOKAnPCdk7bwnZOy8J2Tt/Cd k6o4NuKAnSDjg4rjgrbjg6zjg7TjgqTjg4QNCsKrSWYgYXQgZmlyc3QgeW91IGRvbuKAmXQgc3Vj Y2VlZCwgZ2l2ZSB1cCBza3lkaXZpbmfCuw0K From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 02 12:47:26 2016 Received: (at 24071) by debbugs.gnu.org; 2 Aug 2016 16:47:26 +0000 Received: from localhost ([127.0.0.1]:53828 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUcqk-0006D6-6I for submit@debbugs.gnu.org; Tue, 02 Aug 2016 12:47:26 -0400 Received: from eggs.gnu.org ([208.118.235.92]:32857) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUcqi-0006Ct-7m for 24071@debbugs.gnu.org; Tue, 02 Aug 2016 12:47:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bUcqa-0008OX-28 for 24071@debbugs.gnu.org; Tue, 02 Aug 2016 12:47:19 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:34785) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bUcqZ-0008OT-Uz; Tue, 02 Aug 2016 12:47:15 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2558 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bUcqY-0007M7-2i; Tue, 02 Aug 2016 12:47:15 -0400 Date: Tue, 02 Aug 2016 19:46:54 +0300 Message-Id: <83h9b3m83l.fsf@gnu.org> From: Eli Zaretskii To: Michal Nazarewicz In-reply-to: (message from Michal Nazarewicz on Tue, 02 Aug 2016 18:06:04 +0200) Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] References: <1469487245-11126-1-git-send-email-mina86@mina86.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.2 (------) X-Debbugs-Envelope-To: 24071 Cc: 24071@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Eli Zaretskii Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.2 (------) > From: Michal Nazarewicz > Date: Tue, 02 Aug 2016 18:06:04 +0200 > > Pushed. Thanks. I get the following warning when compiling regex.c: regex.c:516:0: warning: macro "STREQ" is not used [-Wunused-macros] #define STREQ(s1, s2) ((strcmp (s1, s2) == 0)) ^ When compiling the test suite, I get this warning: Compiling src/regex-tests.el In toplevel form: src/regex-tests.el:416:1:Warning: Unused lexical variable `newline' From debbugs-submit-bounces@debbugs.gnu.org Tue Aug 02 13:54:44 2016 Received: (at 24071) by debbugs.gnu.org; 2 Aug 2016 17:54:44 +0000 Received: from localhost ([127.0.0.1]:53843 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUdts-0007ju-J2 for submit@debbugs.gnu.org; Tue, 02 Aug 2016 13:54:44 -0400 Received: from mail-wm0-f51.google.com ([74.125.82.51]:35178) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bUdtq-0007jh-K6 for 24071@debbugs.gnu.org; Tue, 02 Aug 2016 13:54:42 -0400 Received: by mail-wm0-f51.google.com with SMTP id f65so418690070wmi.0 for <24071@debbugs.gnu.org>; Tue, 02 Aug 2016 10:54:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=sender:from:to:cc:subject:in-reply-to:organization:references :user-agent:face:date:message-id:mime-version :content-transfer-encoding; bh=9nfzflBpsSE6S6mQd0SoO3ePIh+QoAIps3+WX60x6vI=; b=oWRpiL0e6s4LK09a5bo2KxeEH1tYQzKFdNCR4k0Ajneysv6CPN9XS/TuNYCtlwxef3 DbqSNpUJVgUfGqI+mTTcD5qqnmffOOrxbjqFAqPXCWsEilOwnmrf0rbV2QSdQ8Mu7+Zn AVFU6UgJJnlnpMRf+07x7GCRtxQP4iIGnKE7BsPiRYuFD+zuKhoL45IOerSRFfJ95/PH sn/FS4pJjzOi9cOssI+MRszw6CKxSvedECqIGx+7uYt7PHa5Rk79W0WtNmIwuphTCH9W mf+m/ta7s7uenUKgklipCV3qWvziUM1fkT8wjz9v9KOyDYf8Z+d0c3nEpvfKqy0IFjrk ElKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:in-reply-to :organization:references:user-agent:face:date:message-id :mime-version:content-transfer-encoding; bh=9nfzflBpsSE6S6mQd0SoO3ePIh+QoAIps3+WX60x6vI=; b=YMsEfLMZpUYRYEmyOP7zFoHiGTJ5uyrYoFQWFr+ZGlIca1N3XOuKddzLb7OUQRpdL4 JVSUEW+co5yTFVt9x4pqAsCJaNwtXYHzu0xCTEhcn/dng9gz5GHHVhE+qIwSYIp0CpD1 VYLCizF85tT9T/QsOLqYe/LM8iqYSZBf0G2p0br50K/jXE2PCmMC8l5eppeKejSge1gY 1krxFvBnBY/3AEYiTzPjlNF+ewEzF8igpdXXEHDJsTEAs2zHTe+/EDCIcpFC6rIakooc h8MGHyXvV+uvZUf6lY9YaCLQaqEs3JC08HDpRXnumF25vBWtnDjekjo6Lw8GfPi1FrRn WsHQ== X-Gm-Message-State: AEkoouv3oQ/4ky3OyCF7DiPmD4+pSBTG3iWO5rX7Im6ILp1EsXmp6cRG9AKIS5CozzXc2b1I X-Received: by 10.28.214.130 with SMTP id n124mr60200255wmg.37.1470160476732; Tue, 02 Aug 2016 10:54:36 -0700 (PDT) Received: from mpn-glaptop ([2620:0:105f:301:5e0:8629:cc48:7615]) by smtp.gmail.com with ESMTPSA id n6sm3726167wjj.5.2016.08.02.10.54.34 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Tue, 02 Aug 2016 10:54:34 -0700 (PDT) From: Michal Nazarewicz To: Eli Zaretskii Subject: Re: bug#24071: [PATCH] Refactor regex character class parsing in [:name:] In-Reply-To: <83h9b3m83l.fsf@gnu.org> Organization: http://mina86.com/ References: <1469487245-11126-1-git-send-email-mina86@mina86.com> <83h9b3m83l.fsf@gnu.org> User-Agent: Notmuch/0.19+53~g2e63a09 (http://notmuchmail.org) Emacs/25.1.50.1 (x86_64-unknown-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJFBMVEWbfGlUPDDHgE57V0jUupKjgIObY0PLrom9mH4dFRK4gmjPs41MxjOgAAACP0lEQVQ4T23Sv2vbQBQHcBk1xE6WyALX107VUEgmn6+ouUwpEQQ6uRjttkWP4CkBg2M0BQLBdPFZYPsyFYo7qEtKDQ7on+t7+nF2Ux8ahD587717OmNYrOvycHsZ+o2r051wHTHysAvGb8ygvgu4QWT0sCmkgZCIEnlV2X8BtyraazFGDuxhmKSQJMlwHQ7v5MHSNxmz78rfElwAa3ieVD9e+hBhjaPDDG6NgFo2f4wBMNIo5YmRtF0RyDgFjJjlMIWbnuM4x9MMfABGTlN4qgIQB4A1DEyA1BHWtfeWNUMwiVJKoqh97KrkOO+qzgluVYLvFCUKAX73nONeBr7BGMdM6Sg0kuep03VywLaIzRiVr+GAzKlpQIsAFnWAG2e6DT5WmWDiudZMIc6hYrMOmeMQK9WX0B+/RfjzL9DI7Y9/Iayn29Ci0r2i4f9gMimMSZLCDMalgQGU5hnUtqAN0OGvEmO1Wnl0C0wWSCEHnuHBqmygxdxA8oWXwbipoc1EoNR9DqOpBpOJrnr0criQab9ZT4LL+wI+K7GBQH30CrhUruilgP9DRTrhVWZCiAyILP+wiuLeCKGTD6r/nc8LOJcAwR6IBTUs+7CASw3QFZ0MdA2PI3zNziH4ZKVhXCRMBjeZ1DWMekKwDCASwExy+NQ86TaykaDAFHO4aP48y4fIcDM5yOG8GcTLbOyp8A8azjJI93JFd1EA6yN8sSxMQJWoABqniRZVykYgRXErzrdqExAoUrRb0xfRp8p2A/4XmfilTtkDZ4cAAAAASUVORK5CYII= X-Face: -TR8(rDTHy/(xl?SfWd1|3:TTgDIatE^t'vop%*gVg[kn$t{EpK(P"VQ=~T2#ysNmJKN$"yTRLB4YQs$4{[.]Fc1)*O]3+XO^oXM>Q#b^ix, O)Zbn)q[y06$`e3?C)`CwR9y5riE=fv^X@x$y?D:XO6L&x4f-}}I4=VRNwiA^t1-ZrVK^07.Pi/57c_du'& X-PGP: 50751FF4 X-PGP-FP: AC1F 5F5C D418 88F8 CC84 5858 2060 4012 5075 1FF4 X-Hashcash: 1:20:160802:eliz@gnu.org::HJyESDQA08qltaTP:00000031T X-Hashcash: 1:20:160802:24071@debbugs.gnu.org::YtD4CHnCtSdtQorY:00000000000000000000000000000000000000001UGB Date: Tue, 02 Aug 2016 19:54:33 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.0 (--) X-Debbugs-Envelope-To: 24071 Cc: 24071@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.0 (--) On Tue, Aug 02 2016, Eli Zaretskii wrote: >> From: Michal Nazarewicz >> Date: Tue, 02 Aug 2016 18:06:04 +0200 >>=20 >> Pushed. > > Thanks. I get the following warning when compiling regex.c: > > regex.c:516:0: warning: macro "STREQ" is not used [-Wunused-macros] > #define STREQ(s1, s2) ((strcmp (s1, s2) =3D=3D 0)) > ^ > > When compiling the test suite, I get this warning: > > Compiling src/regex-tests.el > > In toplevel form: > src/regex-tests.el:416:1:Warning: Unused lexical variable `newline' Both fixed. Thanks. --=20 Best regards =E3=83=9F=E3=83=8F=E3=82=A6 =E2=80=9C=F0=9D=93=B6=F0=9D=93=B2=F0=9D=93=B7= =F0=9D=93=AA86=E2=80=9D =E3=83=8A=E3=82=B6=E3=83=AC=E3=83=B4=E3=82=A4=E3=83= =84 =C2=ABIf at first you don=E2=80=99t succeed, give up skydiving=C2=BB From unknown Thu Aug 14 21:49:56 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 31 Aug 2016 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator