From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 17 08:40:27 2014 Received: (at submit) by debbugs.gnu.org; 17 Jan 2014 13:40:27 +0000 Received: from localhost ([127.0.0.1]:54596 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W49er-0006bT-Rm for submit@debbugs.gnu.org; Fri, 17 Jan 2014 08:40:26 -0500 Received: from eggs.gnu.org ([208.118.235.92]:46688) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W49eo-0006bI-U2 for submit@debbugs.gnu.org; Fri, 17 Jan 2014 08:40:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W49ef-00038C-7g for submit@debbugs.gnu.org; Fri, 17 Jan 2014 08:40:22 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_MANY_HDRS_LCASE autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:39858) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W49ef-000385-4I for submit@debbugs.gnu.org; Fri, 17 Jan 2014 08:40:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48330) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W49eX-0004iW-Oc for bug-grep@gnu.org; Fri, 17 Jan 2014 08:40:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W49eQ-0002np-EK for bug-grep@gnu.org; Fri, 17 Jan 2014 08:40:05 -0500 Received: from mxout4.netvision.net.il ([194.90.9.27]:35901) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W49eQ-0002nQ-17 for bug-grep@gnu.org; Fri, 17 Jan 2014 08:39:58 -0500 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from skeeve.com ([93.172.51.72]) by mxout4.netvision.net.il (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MZJ00EEFSMETFP1@mxout4.netvision.net.il> for bug-grep@gnu.org; Fri, 17 Jan 2014 15:39:51 +0200 (IST) Received: from skeeve.com (skeeve.com [127.0.0.1]) by skeeve.com (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id s0HDdnnt017939 for ; Fri, 17 Jan 2014 15:39:49 +0200 Received: (from arnold@localhost) by skeeve.com (8.14.4/8.14.4/Submit) id s0HDdmwB017938 for bug-grep@gnu.org; Fri, 17 Jan 2014 15:39:48 +0200 From: Aharon Robbins Message-id: <201401171339.s0HDdmwB017938@skeeve.com> Date: Fri, 17 Jan 2014 15:39:48 +0200 To: bug-grep@gnu.org Subject: dfa.c and Rational Range Interpretation User-Agent: Heirloom mailx 12.5 6/20/10 X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hello All. I believe that the code in dfa.c that deals with character ranges is incorrect with respect to Rational Range Interpretation. This shows up in the following test case: $ echo \\ | src/grep -Xawk '[\[-\]]' $ Whereas with gawk: $ echo \\ | gawk '/[\[-\]]/' \ >From ascii(7): ... 133 91 5B [ ... 134 92 5C \ '\\' ... 135 93 5D ] So gawk is correct here. (This is on a GLIBC system; in private email Jim reported different behavior on Mac OS X.) In the grep master, the code in question is in dfa.c:parse_bracket_exp, lines 1110 - 1135: { /* Defer to the system regex library about the meaning of range expressions. */ regex_t re; char pattern[6] = { '[', 0, '-', 0, ']', 0 }; char subject[2] = { 0, 0 }; c1 = c; if (case_fold) { c1 = tolower (c1); c2 = tolower (c2); } pattern[1] = c1; pattern[3] = c2; regcomp (&re, pattern, REG_NOSUB); for (c = 0; c < NOTCHAR; ++c) { if ((case_fold && isupper (c))) continue; subject[0] = c; if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH) setbit_case_fold_c (c, ccl); } regfree (&re); } This code lets the regex routines decide what characters match a particular range expression. If the regex routines are not obeying RRI, then dfa.c will not either. Yet, grep now supports RRI. (To me this argues that grep's configure should be checking the system regex routines for correct RRI support, and automatically using the included routines if the system routines are not good. Gawk goes further and simply always uses the included regex routines, *guaranteeing* consistent behavior across systems. But that's a parenthetical issue.) In addition, the call to regcomp could fail, but this isn't being checked. When I add an error report to the call, I get the following on one of the gawk test cases: "[.c.]" ~ /[a-[.e.]]/ --> 1 dfa.c:1176: regcomp(/[a-[]/) failed: Invalid range end Since this relates to [. and .] which dfa and regex don't really support, there's a gap somewhere, but the point is that if regcomp fails, nobody notices. What does regexec do if regcomp fails? Beats me... Next, let's take a harder look at this: for (c = 0; c < NOTCHAR; ++c) { if ((case_fold && isupper (c))) continue; subject[0] = c; if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH) setbit_case_fold_c (c, ccl); } Since c is 0 on the first iteration, regexec is called with subject equal to [ '\0' '\0' ]. The first thing regexec does is length = strlen(string); which in this case will be zero. We really want a length of 1 where the first byte is zero (no arbitrary limits, eh?). Bug in the regexec interface, methinks, but in any case, testing 0 is fruitless. However, this code begs a deeper question. If we're doing RRI, then by definition only the values between the low member of the range and the high member of the range can match the range expression. So why loop over everything from 0 to 255? Thus, gawk replaces the above code with the following: c1 = c; if (case_fold) { c1 = tolower (c1); c2 = tolower (c2); } for (c = c1; c <= c2; c++) setbit_case_fold_c (c, ccl); This sets the bits for exactly those characters in the range. No more, no less. And it doesn't rely on the system regex routines, which makes compiling the dfa go faster. Grep only compiles its dfa once, but gawk can compile arbitrarily many dfa's, since it can match expressions that are computed dynamically. I'm not sure if this analysis covers all the problems with the current code. But I do think that gawk's code is the correct thing to be doing for RRI. Additionally, I recommend that grep's configure check for good RRI support in the system regex routines and switch to the included ones if the system ones don't support it. Finally, the following diff lets grep check the other awk syntax variants. Feel free to apply it. For the above test case, all three give the same results. I hope all this is of interest. Thanks! Arnold ----------------------------------------------------- diff --git a/src/grep.c b/src/grep.c index 1b2198f..12644a2 100644 --- a/src/grep.c +++ b/src/grep.c @@ -19,10 +19,24 @@ Acompile (char const *pattern, size_t size) GEAcompile (pattern, size, RE_SYNTAX_AWK); } +static void +GAcompile (char const *pattern, size_t size) +{ + GEAcompile (pattern, size, RE_SYNTAX_GNU_AWK); +} + +static void +PAcompile (char const *pattern, size_t size) +{ + GEAcompile (pattern, size, RE_SYNTAX_POSIX_AWK); +} + struct matcher const matchers[] = { { "grep", Gcompile, EGexecute }, { "egrep", Ecompile, EGexecute }, { "awk", Acompile, EGexecute }, + { "gawk", GAcompile, EGexecute }, + { "posixawk", PAcompile, EGexecute }, { "fgrep", Fcompile, Fexecute }, { "perl", Pcompile, Pexecute }, { NULL, NULL, NULL }, From debbugs-submit-bounces@debbugs.gnu.org Fri Jan 17 17:43:37 2014 Received: (at 16481) by debbugs.gnu.org; 17 Jan 2014 22:43:37 +0000 Received: from localhost ([127.0.0.1]:55609 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W4I8W-0006KY-Cn for submit@debbugs.gnu.org; Fri, 17 Jan 2014 17:43:37 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:35151) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W4I8S-0006KO-VJ for 16481@debbugs.gnu.org; Fri, 17 Jan 2014 17:43:34 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id F3B5939E8011; Fri, 17 Jan 2014 14:43:31 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IdpIDv3y8sRO; Fri, 17 Jan 2014 14:43:30 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 5052DA60005; Fri, 17 Jan 2014 14:43:30 -0800 (PST) Message-ID: <52D9B211.7050908@cs.ucla.edu> Date: Fri, 17 Jan 2014 14:43:29 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Aharon Robbins , 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> In-Reply-To: <201401171339.s0HDdmwB017938@skeeve.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) Thanks for continuing to bird-dog this. On 01/17/2014 05:39 AM, Aharon Robbins wrote: > the following diff lets grep check the other awk syntax > variants. Feel free to apply it. I did that (the first patch enclosed below). Thanks. > I do think that gawk's code is the correct thing to be doing for RRI. I agree, and installed the second patch enclosed below to implement this. This patch also includes some documentation changes -- if you have a bit of time to review them I'd appreciate it. Also, I notice that there are a few "#ifdef GREP"s in dfa.c Do you happen to know why they're needed? It'd be nice if we could simplify dfa.c to omit the need for the GREP macro. > Additionally, I recommend that grep's configure check for good RRI > support in the system regex routines and switch to the included ones > if the system ones don't support it. Unfortunately that'd break support for equivalence classes and multibyte collation symbols on GNU/Linux platforms, so it may be a bridge too far. Until we get glibc fixed, I think it's OK to live with the situation where [a-z] ordinarily has the rational range interpretation, and this breaks down only for complicated matches where the DFA doesn't suffice; at least it'll work in the usual case. >From c862ced6f31f0ccdf2505ac46e354a1a011149cd Mon Sep 17 00:00:00 2001 From: Aharon Robbins Date: Fri, 17 Jan 2014 12:42:49 -0800 Subject: [PATCH 1/2] grep: add undocumented '-X gawk' and '-X posixawk' options See . * src/grep.c (GAcompile, PAcompile): New functions. (const): Use them. --- src/grep.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/src/grep.c b/src/grep.c index 1b2198f..12644a2 100644 --- a/src/grep.c +++ b/src/grep.c @@ -19,10 +19,24 @@ Acompile (char const *pattern, size_t size) GEAcompile (pattern, size, RE_SYNTAX_AWK); } +static void +GAcompile (char const *pattern, size_t size) +{ + GEAcompile (pattern, size, RE_SYNTAX_GNU_AWK); +} + +static void +PAcompile (char const *pattern, size_t size) +{ + GEAcompile (pattern, size, RE_SYNTAX_POSIX_AWK); +} + struct matcher const matchers[] = { { "grep", Gcompile, EGexecute }, { "egrep", Ecompile, EGexecute }, { "awk", Acompile, EGexecute }, + { "gawk", GAcompile, EGexecute }, + { "posixawk", PAcompile, EGexecute }, { "fgrep", Fcompile, Fexecute }, { "perl", Pcompile, Pexecute }, { NULL, NULL, NULL }, -- 1.8.4.2 >From aba2c718908d6c8fcfd75d55a43a4c9b1e3405a3 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 17 Jan 2014 14:32:10 -0800 Subject: [PATCH 2/2] grep: DFA now uses rational ranges in unibyte locales Problem reported by Aharon Robbins in . * NEWS: * doc/grep.texi (Environment Variables) (Character Classes and Bracket Expressions): Document this. * src/dfa.c (parse_bracket_exp): Treat unibyte locales like multibyte. --- NEWS | 8 ++++++++ doc/grep.texi | 19 +++++++++---------- src/dfa.c | 20 ++------------------ 3 files changed, 19 insertions(+), 28 deletions(-) diff --git a/NEWS b/NEWS index 6e46684..589b2ac 100644 --- a/NEWS +++ b/NEWS @@ -7,6 +7,14 @@ GNU grep NEWS -*- outline -*- grep -i in a multibyte locale is now typically 10 times faster for patterns that do not contain \ or [. + Range expressions in unibyte locales now ordinarily use the rational + range interpretation, in which [a-z] matches only lower-case ASCII + letters regardless of locale, and similarly for other ranges. (This + was already true for multibyte locales.) Portable programs should + continue to specify the C locale when using range expressions, since + these expressions have unspecified behavior in non-GNU systems and + are not yet guaranteed to use the rational range interpretation even + in GNU systems. * Noteworthy changes in release 2.16 (2014-01-01) [stable] diff --git a/doc/grep.texi b/doc/grep.texi index 473a181..42fb9a2 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -960,8 +960,8 @@ They are omitted (i.e., false) by default and become true when specified. @cindex national language support @cindex NLS These variables specify the locale for the @code{LC_COLLATE} category, -which determines the collating sequence -used to interpret range expressions like @samp{[a-z]}. +which might affect how range expressions like @samp{[a-z]} are +interpreted. @item LC_ALL @itemx LC_CTYPE @@ -1223,14 +1223,13 @@ For example, the regular expression Within a bracket expression, a @dfn{range expression} consists of two characters separated by a hyphen. It matches any single character that -sorts between the two characters, inclusive, using the locale's -collating sequence and character set. -For example, in the default C -locale, @samp{[a-d]} is equivalent to @samp{[abcd]}. -Many locales sort -characters in dictionary order, and in these locales @samp{[a-d]} is -typically not equivalent to @samp{[abcd]}; -it might be equivalent to @samp{[aBbCcDd]}, for example. +sorts between the two characters, inclusive. +In the default C locale, the sorting sequence is the native character +order; for example, @samp{[a-d]} is equivalent to @samp{[abcd]}. +In other locales, the sorting sequence is not specified, and +@samp{[a-d]} might be equivalent to @samp{[abcd]} or to +@samp{[aBbCcDd]}, or it might fail to match any character, or the set of +characters that it matches might even be erratic. To obtain the traditional interpretation of bracket expressions, you can use the @samp{C} locale by setting the @env{LC_ALL} environment variable to the value @samp{C}. diff --git a/src/dfa.c b/src/dfa.c index 6ab4e05..5e3140d 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -1108,30 +1108,14 @@ parse_bracket_exp (void) } else { - /* Defer to the system regex library about the meaning - of range expressions. */ - regex_t re; - char pattern[6] = { '[', 0, '-', 0, ']', 0 }; - char subject[2] = { 0, 0 }; c1 = c; if (case_fold) { c1 = tolower (c1); c2 = tolower (c2); } - - pattern[1] = c1; - pattern[3] = c2; - regcomp (&re, pattern, REG_NOSUB); - for (c = 0; c < NOTCHAR; ++c) - { - if ((case_fold && isupper (c))) - continue; - subject[0] = c; - if (regexec (&re, subject, 0, NULL, 0) != REG_NOMATCH) - setbit_case_fold_c (c, ccl); - } - regfree (&re); + for (c = c1; c <= c2; c++) + setbit_case_fold_c (c, ccl); } colon_warning_state |= 8; -- 1.8.4.2 From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 18 14:39:22 2014 Received: (at 16481) by debbugs.gnu.org; 18 Jan 2014 19:39:22 +0000 Received: from localhost ([127.0.0.1]:56375 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W4bjl-0000fF-SR for submit@debbugs.gnu.org; Sat, 18 Jan 2014 14:39:22 -0500 Received: from mxout4.netvision.net.il ([194.90.9.27]:37137) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W4bji-0000f5-EF for 16481@debbugs.gnu.org; Sat, 18 Jan 2014 14:39:19 -0500 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from skeeve.com ([93.172.51.72]) by mxout4.netvision.net.il (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MZM00AB33XG3KA0@mxout4.netvision.net.il> for 16481@debbugs.gnu.org; Sat, 18 Jan 2014 21:39:17 +0200 (IST) Received: from skeeve.com (skeeve.com [127.0.0.1]) by skeeve.com (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id s0IJdEcQ002999; Sat, 18 Jan 2014 21:39:14 +0200 Received: (from arnold@localhost) by skeeve.com (8.14.4/8.14.4/Submit) id s0IJdDQx002998; Sat, 18 Jan 2014 21:39:13 +0200 From: Aharon Robbins Message-id: <201401181939.s0IJdDQx002998@skeeve.com> Date: Sat, 18 Jan 2014 21:39:13 +0200 To: eggert@cs.ucla.edu, arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> In-reply-to: <52D9B211.7050908@cs.ucla.edu> User-Agent: Heirloom mailx 12.5 6/20/10 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Hi Paul. > Thanks for continuing to bird-dog this. It's either "tenacity" or "stubborness". :-) > > I do think that gawk's code is the correct thing to be doing for RRI. > > I agree, and installed the second patch enclosed below to > implement this. Cool! Hurray! One more bit that comes into sync. > This patch also includes some documentation > changes -- if you have a bit of time to review them I'd > appreciate it. It looks ok, but it doesn't really say anything about RRI - grep does RRI in all locales now, which falls under the umbrella of POSIXy implementation-defined behavior, but is just fine. That should be explained. > Also, I notice that there are a few "#ifdef GREP"s in dfa.c > Do you happen to know why they're needed? No idea. They all seem to be related to case_fold. I had not really noticed them, and they must be working fine for me since I don't define GREP. What happens if you compile them in and run the grep test suite? > > Additionally, I recommend that grep's configure check for good RRI > > support in the system regex routines and switch to the included ones > > if the system ones don't support it. > > Unfortunately that'd break support for equivalence classes > and multibyte collation symbols on GNU/Linux platforms, so > it may be a bridge too far. Gawk has lived without these so far. :-) > Until we get glibc fixed, I > think it's OK to live with the situation where [a-z] > ordinarily has the rational range interpretation, and this > breaks down only for complicated matches where the DFA > doesn't suffice; at least it'll work in the usual case. At least document it somewhere. Thanks! Arnold From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 20 12:35:48 2014 Received: (at 16481) by debbugs.gnu.org; 20 Jan 2014 17:35:48 +0000 Received: from localhost ([127.0.0.1]:58591 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5IlH-000111-PP for submit@debbugs.gnu.org; Mon, 20 Jan 2014 12:35:48 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:39965) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5IlF-00010i-13 for 16481@debbugs.gnu.org; Mon, 20 Jan 2014 12:35:46 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 07F3EA60006; Mon, 20 Jan 2014 09:35:44 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qkoJR+CWjgdk; Mon, 20 Jan 2014 09:35:43 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 56128A60005; Mon, 20 Jan 2014 09:35:43 -0800 (PST) Message-ID: <52DD5E6F.6030002@cs.ucla.edu> Date: Mon, 20 Jan 2014 09:35:43 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Aharon Robbins , 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> In-Reply-To: <201401181939.s0IJdDQx002998@skeeve.com> Content-Type: multipart/mixed; boundary="------------080204030905090000070508" X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) This is a multi-part message in MIME format. --------------080204030905090000070508 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Aharon Robbins wrote: > What happens if you compile them in and run the grep test suite? The test suite passes, but grep is bigger and (I presume) slower. The GREP-related changes are for performance, and shouldn't affect behavior. How about if we apply the attached patch to dfa.c, in both gawk and grep? I tried it just now, and gawk passed all its tests too. Or, if there's some reason this patch would introduce a bug into gawk, I'd like to fix the grep test cases to detect the bug. --------------080204030905090000070508 Content-Type: text/x-patch; name="gawk.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="gawk.diff" diff --git a/dfa.c b/dfa.c index ac1cf9a..f84ccd2 100644 --- a/dfa.c +++ b/dfa.c @@ -1135,18 +1135,6 @@ parse_bracket_exp (void) case_fold ? towlower (wc) : (wchar_t) wc; work_mbc->range_ends[work_mbc->nranges++] = case_fold ? towlower (wc2) : (wchar_t) wc2; - -#ifndef GREP - if (case_fold && (iswalpha (wc) || iswalpha (wc2))) - { - REALLOC_IF_NECESSARY (work_mbc->range_sts, - range_sts_al, work_mbc->nranges + 1); - work_mbc->range_sts[work_mbc->nranges] = towupper (wc); - REALLOC_IF_NECESSARY (work_mbc->range_ends, - range_ends_al, work_mbc->nranges + 1); - work_mbc->range_ends[work_mbc->nranges++] = towupper (wc2); - } -#endif } else { @@ -1182,11 +1170,7 @@ parse_bracket_exp (void) work_mbc->nchars + 1); work_mbc->chars[work_mbc->nchars++] = wc; } -#ifdef GREP continue; -#else - wc = towupper (wc); -#endif } if (!setbit_wc (wc, ccl)) { @@ -1780,14 +1764,6 @@ atom (void) else if (MBS_SUPPORT && tok == WCHAR) { addtok_wc (case_fold ? towlower (wctok) : wctok); -#ifndef GREP - if (case_fold && iswalpha (wctok)) - { - addtok_wc (towupper (wctok)); - addtok (OR); - } -#endif - tok = lex (); } else if (MBS_SUPPORT && tok == ANYCHAR && using_utf8 ()) --------------080204030905090000070508-- From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 20 23:22:06 2014 Received: (at 16481) by debbugs.gnu.org; 21 Jan 2014 04:22:06 +0000 Received: from localhost ([127.0.0.1]:58907 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5Sqj-0004bC-Qr for submit@debbugs.gnu.org; Mon, 20 Jan 2014 23:22:06 -0500 Received: from mxout4.netvision.net.il ([194.90.9.27]:42812) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5Sqi-0004b2-92 for 16481@debbugs.gnu.org; Mon, 20 Jan 2014 23:22:05 -0500 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from skeeve.com ([93.172.51.72]) by mxout4.netvision.net.il (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MZQ00CM4HGQZ5K0@mxout4.netvision.net.il> for 16481@debbugs.gnu.org; Tue, 21 Jan 2014 06:22:02 +0200 (IST) Received: from skeeve.com (skeeve.com [127.0.0.1]) by skeeve.com (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id s0L4M1K5002371; Tue, 21 Jan 2014 06:22:01 +0200 Received: (from arnold@localhost) by skeeve.com (8.14.4/8.14.4/Submit) id s0L4Lxs5002369; Tue, 21 Jan 2014 06:21:59 +0200 From: Aharon Robbins Message-id: <201401210421.s0L4Lxs5002369@skeeve.com> Date: Tue, 21 Jan 2014 06:21:59 +0200 To: eggert@cs.ucla.edu, arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> <52DD5E6F.6030002@cs.ucla.edu> In-reply-to: <52DD5E6F.6030002@cs.ucla.edu> User-Agent: Heirloom mailx 12.5 6/20/10 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Hi Paul. > > What happens if you compile them in and run the grep test suite? > > The test suite passes, but grep is bigger and (I presume) slower. The > GREP-related changes are for performance, and shouldn't affect behavior. > > How about if we apply the attached patch to dfa.c, in both gawk and > grep? I tried it just now, and gawk passed all its tests too. Or, if > there's some reason this patch would introduce a bug into gawk, I'd like > to fix the grep test cases to detect the bug. Can you explain a bit more what the two different branches do? In other words, I'm wondering why there are two different branches through the code in the first place, and what are we throwing away by your patch? (I have no preference either way, I just want to understand the implications of the decision. :-) Thanks, Arnold From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 21 01:02:18 2014 Received: (at 16481) by debbugs.gnu.org; 21 Jan 2014 06:02:18 +0000 Received: from localhost ([127.0.0.1]:58944 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5UPh-0007Mg-WF for submit@debbugs.gnu.org; Tue, 21 Jan 2014 01:02:18 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:39244) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5UPd-0007MU-Pa for 16481@debbugs.gnu.org; Tue, 21 Jan 2014 01:02:14 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id D829539E8011; Mon, 20 Jan 2014 22:02:12 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pDgE35mIKaRU; Mon, 20 Jan 2014 22:02:12 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 7A16339E8008; Mon, 20 Jan 2014 22:02:12 -0800 (PST) Message-ID: <52DE0D64.8040406@cs.ucla.edu> Date: Mon, 20 Jan 2014 22:02:12 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Aharon Robbins , 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> <52DD5E6F.6030002@cs.ucla.edu> <201401210421.s0L4Lxs5002369@skeeve.com> In-Reply-To: <201401210421.s0L4Lxs5002369@skeeve.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) Aharon Robbins wrote: > Can you explain a bit more what the two different branches do? Sorry, not easily; I'm not familiar with the code. I assume it has something to do with locales where there's not a one-to-one correspondence between lower-case and upper-case letters. From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 21 11:51:08 2014 Received: (at 16481) by debbugs.gnu.org; 21 Jan 2014 16:51:08 +0000 Received: from localhost ([127.0.0.1]:59817 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5eXc-0001ac-CM for submit@debbugs.gnu.org; Tue, 21 Jan 2014 11:51:08 -0500 Received: from mail-pa0-f41.google.com ([209.85.220.41]:33745) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5eXa-0001aR-0p for 16481@debbugs.gnu.org; Tue, 21 Jan 2014 11:51:06 -0500 Received: by mail-pa0-f41.google.com with SMTP id fa1so6376456pad.0 for <16481@debbugs.gnu.org>; Tue, 21 Jan 2014 08:51:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=W1mP2i0Ka7kRvAXAWBhhgqdXGosRLXfgA+bSr/jKxtQ=; b=dmoTw9w7DHhl9hzhfaKXGGkqK6Jl44ODURmy21cgRSwYf9Fo1brYpw6K7CragdczM7 dHfkUWRko8U2dIzLzzyxVFmkvLMBARgA7BVTAPfG9ffgmdnQMhDnEkymhkAZAbmpiu75 Jw4UEIZmy72Yh1QbXqsFn0ZVFbuATj3NzvEHWJbs/sOXhrITUjA78clgf/LBMR1NcSfb WflY6v1sELvG5VnN+0K/tFNcCB6E0z6+7WdVes7ZhckAsfAD10rJOFhafRKySy24rMfm XCRNfBbwZOSlnuDcQSGNPNG/FnpdjLE+WoOo5pHmAWP8JerzqBLFGvNMvTCa1TiA24IC tL1g== X-Received: by 10.68.241.134 with SMTP id wi6mr25684516pbc.44.1390323064911; Tue, 21 Jan 2014 08:51:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.157.202 with HTTP; Tue, 21 Jan 2014 08:50:44 -0800 (PST) In-Reply-To: <52DE0D64.8040406@cs.ucla.edu> References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> <52DD5E6F.6030002@cs.ucla.edu> <201401210421.s0L4Lxs5002369@skeeve.com> <52DE0D64.8040406@cs.ucla.edu> From: Jim Meyering Date: Tue, 21 Jan 2014 08:50:44 -0800 X-Google-Sender-Auth: c0D3qvLFOuMACJcpGbxfvidS8Ik Message-ID: Subject: Re: bug#16481: dfa.c and Rational Range Interpretation To: Paul Eggert , Norihiro Tanaka Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16481 Cc: Aharon Robbins , 16481@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Mon, Jan 20, 2014 at 10:02 PM, Paul Eggert wrote: > Aharon Robbins wrote: >> >> Can you explain a bit more what the two different branches do? > > Sorry, not easily; I'm not familiar with the code. I assume it has something > to do with locales where there's not a one-to-one correspondence between > lower-case and upper-case letters. Hi Paul, A week or so, Norihiro Tanaka posted the patch in bug 16421, which removes GREP-oriented dfa.c code in favor of what gawk has been using, as well as ensuring that some of grep's case-insensitive searches no longer have to case-convert the data being searched. I was expecting to apply it, along with another small change and a test, but now, feel like I'll have to justify it with some performance data as well. Assuming I find an improvement, expect a complete patch in a day or two. From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 21 16:50:38 2014 Received: (at 16481) by debbugs.gnu.org; 21 Jan 2014 21:50:38 +0000 Received: from localhost ([127.0.0.1]:59912 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5jDR-0001ID-Pf for submit@debbugs.gnu.org; Tue, 21 Jan 2014 16:50:38 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:53648) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W5jDO-0001Hy-PI; Tue, 21 Jan 2014 16:50:35 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id D5DEE39E8011; Tue, 21 Jan 2014 13:50:33 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e4TAgeE+p5Vg; Tue, 21 Jan 2014 13:50:33 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 0CB7B39E8008; Tue, 21 Jan 2014 13:50:33 -0800 (PST) Message-ID: <52DEEBA2.2070908@cs.ucla.edu> Date: Tue, 21 Jan 2014 13:50:26 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Jim Meyering , Norihiro Tanaka Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> <52DD5E6F.6030002@cs.ucla.edu> <201401210421.s0L4Lxs5002369@skeeve.com> <52DE0D64.8040406@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16481 Cc: Aharon Robbins , 16481@debbugs.gnu.org, 16421@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) On 01/21/2014 08:50 AM, Jim Meyering wrote: > I was expecting > to apply it, along with another small change and a test, but now, feel > like I'll have to justify it with some performance data as well. Ouch, I wasn't intending to make work for you! Even if the patch in didn't improve performance, it makes grep simpler and that should be a win. Norihiro Tanaka's patch (which I'd forgotten about, but which is presumably better) also simplifies grep, so you shouldn't need to do a performance analysis to verify that it's a good idea. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 25 13:27:24 2014 Received: (at 16481) by debbugs.gnu.org; 25 Jan 2014 18:27:24 +0000 Received: from localhost ([127.0.0.1]:35877 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W77wx-0005aq-BL for submit@debbugs.gnu.org; Sat, 25 Jan 2014 13:27:24 -0500 Received: from mxout4.netvision.net.il ([194.90.9.27]:52489) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W77wu-0005ag-J1 for 16481@debbugs.gnu.org; Sat, 25 Jan 2014 13:27:21 -0500 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from skeeve.com ([93.172.51.72]) by mxout4.netvision.net.il (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MZY00HZ8Z9HHNE0@mxout4.netvision.net.il> for 16481@debbugs.gnu.org; Sat, 25 Jan 2014 20:27:18 +0200 (IST) Received: from skeeve.com (skeeve.com [127.0.0.1]) by skeeve.com (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id s0PIRGtw003443; Sat, 25 Jan 2014 20:27:16 +0200 Received: (from arnold@localhost) by skeeve.com (8.14.4/8.14.4/Submit) id s0PIRDMT003441; Sat, 25 Jan 2014 20:27:13 +0200 From: Aharon Robbins Message-id: <201401251827.s0PIRDMT003441@skeeve.com> Date: Sat, 25 Jan 2014 20:27:13 +0200 To: eggert@cs.ucla.edu, arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> <52DD5E6F.6030002@cs.ucla.edu> In-reply-to: <52DD5E6F.6030002@cs.ucla.edu> User-Agent: Heirloom mailx 12.5 6/20/10 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Hi Paul & Jim, > > What happens if you compile them in and run the grep test suite? > > The test suite passes, but grep is bigger and (I presume) slower. The > GREP-related changes are for performance, and shouldn't affect behavior. > > How about if we apply the attached patch to dfa.c, in both gawk and > grep? I tried it just now, and gawk passed all its tests too. Or, if > there's some reason this patch would introduce a bug into gawk, I'd like > to fix the grep test cases to detect the bug. The code in question occurs in two functions, parse_bracket_exp() and atom(). The first instance is in parse_bracket_exp(), building a range expression, where we may have multibyte characters. .... if (c1 == '-' && c2 != ']') { if (c2 == '\\' && (syntax_bits & RE_BACKSLASH_ESCAPE_IN_LISTS)) FETCH_WC (c2, wc2, _("unbalanced [")); if (MB_CUR_MAX > 1) { /* When case folding map a range, say [m-z] (or even [M-z]) to the pair of ranges, [m-z] [M-Z]. */ REALLOC_IF_NECESSARY (work_mbc->range_sts, range_sts_al, work_mbc->nranges + 1); REALLOC_IF_NECESSARY (work_mbc->range_ends, range_ends_al, work_mbc->nranges + 1); work_mbc->range_sts[work_mbc->nranges] = case_fold ? towlower (wc) : (wchar_t) wc; work_mbc->range_ends[work_mbc->nranges++] = case_fold ? towlower (wc2) : (wchar_t) wc2; #ifndef GREP if (case_fold && (iswalpha (wc) || iswalpha (wc2))) { REALLOC_IF_NECESSARY (work_mbc->range_sts, range_sts_al, work_mbc->nranges + 1); work_mbc->range_sts[work_mbc->nranges] = towupper (wc); REALLOC_IF_NECESSARY (work_mbc->range_ends, range_ends_al, work_mbc->nranges + 1); work_mbc->range_ends[work_mbc->nranges++] = towupper (wc2); } #endif } To me this looks like when doing case folding (grep -i, IGNORECASE in gawk), we turn the m.b. equivalent of [a-c] into [a-cA-C]. This would seem to be necessary for correctness, and the question is why does grep not need it? The next such bit is later on in the same function: if (case_fold && iswalpha (wc)) { wc = towlower (wc); if (!setbit_wc (wc, ccl)) { REALLOC_IF_NECESSARY (work_mbc->chars, chars_al, work_mbc->nchars + 1); work_mbc->chars[work_mbc->nchars++] = wc; } #ifdef GREP continue; #else wc = towupper (wc); #endif } if (!setbit_wc (wc, ccl)) { REALLOC_IF_NECESSARY (work_mbc->chars, chars_al, work_mbc->nchars + 1); work_mbc->chars[work_mbc->nchars++] = wc; } } while ((wc = wc1, (c = c1) != ']')); This too looks related to case folding and ranges; if I read it correctly, when case folding it added the lower case version and now it has to add the uppercase version of the charcter. Then, in atom(): (Why the bizarre leading `if (0)'?) static void atom (void) { if (0) { /* empty */ } else if (MBS_SUPPORT && tok == WCHAR) { addtok_wc (case_fold ? towlower (wctok) : wctok); #ifndef GREP if (case_fold && iswalpha (wctok)) { addtok_wc (towupper (wctok)); addtok (OR); } #endif tok = lex (); } Here too, we're doing case folding, have added the lower case character and need to add the upper case one. I think to test out this code you'd need a character set where the lower and upper case counterparts are multibyte characters and grep -i is in effect. But I suspect that grep has so much other code to special case grep -i that this code in dfa.c is never reached. In short, I don't think it's right to remove this code, but I don't know how to test it to prove that, either. HTH, Arnold From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 25 13:56:34 2014 Received: (at 16481) by debbugs.gnu.org; 25 Jan 2014 18:56:34 +0000 Received: from localhost ([127.0.0.1]:35904 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W78PB-0007Vq-Vc for submit@debbugs.gnu.org; Sat, 25 Jan 2014 13:56:34 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:37878) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W78P9-0007Vi-NR for 16481@debbugs.gnu.org; Sat, 25 Jan 2014 13:56:32 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id C636EA60004; Sat, 25 Jan 2014 10:56:30 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ljyFEWop1XJL; Sat, 25 Jan 2014 10:56:30 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 1CC0339E8011; Sat, 25 Jan 2014 10:56:30 -0800 (PST) Message-ID: <52E408DD.7070505@cs.ucla.edu> Date: Sat, 25 Jan 2014 10:56:29 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Aharon Robbins , 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> <52DD5E6F.6030002@cs.ucla.edu> <201401251827.s0PIRDMT003441@skeeve.com> In-Reply-To: <201401251827.s0PIRDMT003441@skeeve.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) Aharon Robbins wrote: > I don't think it's right to remove this code, but I don't know how to test it to prove that, either. Perhaps Norihiro Tanaka's recent patch makes this question moot; see: http://bugs.gnu.org/16421 From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 25 14:25:07 2014 Received: (at 16481) by debbugs.gnu.org; 25 Jan 2014 19:25:07 +0000 Received: from localhost ([127.0.0.1]:35910 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W78qn-0008Gx-O4 for submit@debbugs.gnu.org; Sat, 25 Jan 2014 14:25:06 -0500 Received: from mxout4.netvision.net.il ([194.90.9.27]:56993) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W78qk-0008GR-OR for 16481@debbugs.gnu.org; Sat, 25 Jan 2014 14:25:03 -0500 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from skeeve.com ([93.172.51.72]) by mxout4.netvision.net.il (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MZZ00HWO1XPENM0@mxout4.netvision.net.il> for 16481@debbugs.gnu.org; Sat, 25 Jan 2014 21:25:02 +0200 (IST) Received: from skeeve.com (skeeve.com [127.0.0.1]) by skeeve.com (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id s0PJP0CI014634; Sat, 25 Jan 2014 21:25:00 +0200 Received: (from arnold@localhost) by skeeve.com (8.14.4/8.14.4/Submit) id s0PJOxRg014633; Sat, 25 Jan 2014 21:24:59 +0200 From: Aharon Robbins Message-id: <201401251924.s0PJOxRg014633@skeeve.com> Date: Sat, 25 Jan 2014 21:24:59 +0200 To: eggert@cs.ucla.edu, arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <201401181939.s0IJdDQx002998@skeeve.com> <52DD5E6F.6030002@cs.ucla.edu> <201401251827.s0PIRDMT003441@skeeve.com> <52E408DD.7070505@cs.ucla.edu> In-reply-to: <52E408DD.7070505@cs.ucla.edu> User-Agent: Heirloom mailx 12.5 6/20/10 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Hi. > Date: Sat, 25 Jan 2014 10:56:29 -0800 > From: Paul Eggert > To: Aharon Robbins , 16481@debbugs.gnu.org > Subject: Re: bug#16481: dfa.c and Rational Range Interpretation > > Aharon Robbins wrote: > > I don't think it's right to remove this code, but I don't know how > > to test it to prove that, either. > > Perhaps Norihiro Tanaka's recent patch makes this question moot; see: > > http://bugs.gnu.org/16421 Yes, I think so. It keeps the non-GREP code. If y'all are going to apply it then I'd be happy with that. Thanks! Arnold From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 09 18:18:49 2014 Received: (at 16481) by debbugs.gnu.org; 9 Feb 2014 23:18:49 +0000 Received: from localhost ([127.0.0.1]:60991 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCdeD-0007ao-Gf for submit@debbugs.gnu.org; Sun, 09 Feb 2014 18:18:49 -0500 Received: from mail-ea0-f176.google.com ([209.85.215.176]:36768) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCdeB-0007af-De for 16481@debbugs.gnu.org; Sun, 09 Feb 2014 18:18:47 -0500 Received: by mail-ea0-f176.google.com with SMTP id h14so2633399eaj.35 for <16481@debbugs.gnu.org>; Sun, 09 Feb 2014 15:18:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=JVPIFjTZp7vqPY6dEVdV0MFWCVhbtzjZp6C/vFtmaxc=; b=RmnjYuNr/9U+zK3IT1+vAsrZAq7tHAeFb/V4e2FvMiqAXIN61EdJ/5geOExLG+5TIq TyJydMNoSS0hc/xCZLSVLRLltCZLu8wRYTOOv6/fMCHohAqVoce7nJk8qsBbdRuvBwbY UGit7Cjw+RlPQJCp02B8QtSPRSgv3R6wSG5pCR5Jaa0dy5DEd1awDLOHq88snnJO/v0e 7SlRdGFxFLIEzmQxUDYoiVgaaete/s6erjnVEikgLDco848a/LYYT7HAJPqVdnQLqIEi sNPUvQ7mDItYXUurvaDS9wBaBO9GDcZbszHIXXlxj5R3ESJoV8HVyuE2S1iVCtIJidrF w4Yw== X-Received: by 10.14.194.2 with SMTP id l2mr32065957een.39.1391987926580; Sun, 09 Feb 2014 15:18:46 -0800 (PST) Received: from yakj.usersys.redhat.com ([212.96.178.162]) by mx.google.com with ESMTPSA id q44sm12228918eez.1.2014.02.09.15.18.44 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 09 Feb 2014 15:18:45 -0800 (PST) Message-ID: <52F80CD0.8070607@gnu.org> Date: Mon, 10 Feb 2014 00:18:40 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Paul Eggert , Aharon Robbins , 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> In-Reply-To: <52D9B211.7050908@cs.ucla.edu> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.6 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.6 (/) Il 17/01/2014 23:43, Paul Eggert ha scritto: >> > I do think that gawk's code is the correct thing to be doing for RRI. > I agree, and installed the second patch enclosed below to > implement this. This patch also includes some documentation > changes -- if you have a bit of time to review them I'd > appreciate it. Please revert commit 1078b64302bbf5c0a46635772808ff7f75171dbc. The correct course of action for grep is to defer range interpretation to regex, because otherwise you can get mismatches between regexes with backreferences and those without. For example, [A-Z]. will use RRI but ([A-Z])\1 won't, with the confusing result that the first regex won't match a superset of the language described by the second regex. For this reason, if you want to have RRI, then you need to make sure that you compile --with-included-regex. Paolo From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 09 21:35:54 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 02:35:54 +0000 Received: from localhost ([127.0.0.1]:33004 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCgiv-0002en-Ij for submit@debbugs.gnu.org; Sun, 09 Feb 2014 21:35:53 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:49353) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCgiu-0002ef-0G for 16481@debbugs.gnu.org; Sun, 09 Feb 2014 21:35:52 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 8A7DA39E8016; Sun, 9 Feb 2014 18:35:51 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eFFCLCvu6hqS; Sun, 9 Feb 2014 18:35:51 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 1650039E8014; Sun, 9 Feb 2014 18:35:51 -0800 (PST) Message-ID: <52F83B06.6010209@cs.ucla.edu> Date: Sun, 09 Feb 2014 18:35:50 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Paolo Bonzini , Aharon Robbins , 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> In-Reply-To: <52F80CD0.8070607@gnu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) Paolo Bonzini wrote: > The correct course of action for grep is to defer range interpretation > to regex, because otherwise you can get mismatches between regexes with > backreferences and those without. It depends on what one means by "correct". POSIX doesn't say what to do in this situation, so it's OK as far as POSIX is concerned for grep to use RRI in the typical case (i.e., without backreferences), and for grep to use some other interpretation in the rare cases when backreferences are used. The documentation for 'grep' attempts to address this issue, perhaps not as clearly as it could. Maybe the installation instructions should talk about it as well, and suggest --with-included-regex for people who care about this sort of thing. From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 09 22:14:04 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 03:14:04 +0000 Received: from localhost ([127.0.0.1]:33103 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WChJr-0004tr-UL for submit@debbugs.gnu.org; Sun, 09 Feb 2014 22:14:04 -0500 Received: from mail-pd0-f182.google.com ([209.85.192.182]:46101) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WChJq-0004tL-01 for 16481@debbugs.gnu.org; Sun, 09 Feb 2014 22:14:02 -0500 Received: by mail-pd0-f182.google.com with SMTP id v10so5580129pde.27 for <16481@debbugs.gnu.org>; Sun, 09 Feb 2014 19:14:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=HeI/4R7Z2lrpxe12j7+xvg1AZ4fcl5RXb/X1HylH+ms=; b=IHsb7sBnmOQQJQpbjRK6C8kucwAZeucKDeQJISzIGk4lb+UaEH66oHvdf5ruai7E41 DwcerQYlzk9DuDczjykjSetlw+QQxt0BgnzQI4tZRnktx4jxJg/l7biXUrnUwniEilWi b5RmKNNLz4JKI80IRDq8EgoJj9jI7KUxekARhabPJ4M3ROIKJf5nSE70tOiO2R6/VgqI 0iWhR6BECwIMk1Fni63q0wgr5s19sa2D8iqgH/vq1ndkqUcr36PZDdfvCkmUw4SqctNl lCg8hE9M9BCfvlh8z8A1IQwokOA5BVc/Z520SHa7M3EKtQ1WcSw1F4vqnqRlfvn5te7F CuRQ== X-Received: by 10.67.5.233 with SMTP id cp9mr5150117pad.147.1392002040988; Sun, 09 Feb 2014 19:14:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Sun, 9 Feb 2014 19:13:40 -0800 (PST) In-Reply-To: <52F83B06.6010209@cs.ucla.edu> References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> From: Jim Meyering Date: Sun, 9 Feb 2014 19:13:40 -0800 X-Google-Sender-Auth: O7afI7iejjRwWJXbt3cLSlTSoDo Message-ID: Subject: Re: bug#16481: dfa.c and Rational Range Interpretation To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16481 Cc: Paolo Bonzini , Aharon Robbins , 16481@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Sun, Feb 9, 2014 at 6:35 PM, Paul Eggert wrote: > Paolo Bonzini wrote: >> >> The correct course of action for grep is to defer range interpretation >> to regex, because otherwise you can get mismatches between regexes with >> backreferences and those without. > > > It depends on what one means by "correct". POSIX doesn't say what to do in > this situation, so it's OK as far as POSIX is concerned for grep to use RRI > in the typical case (i.e., without backreferences), and for grep to use some > other interpretation in the rare cases when backreferences are used. > > The documentation for 'grep' attempts to address this issue, perhaps not as > clearly as it could. Maybe the installation instructions should talk about > it as well, and suggest --with-included-regex for people who care about this > sort of thing. Has anyone looked at making glibc's regex use RRI? From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 10 03:11:35 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 08:11:35 +0000 Received: from localhost ([127.0.0.1]:33973 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WClxl-000803-U4 for submit@debbugs.gnu.org; Mon, 10 Feb 2014 03:11:34 -0500 Received: from mail-ee0-f43.google.com ([74.125.83.43]:33381) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WClxh-0007zm-4r for 16481@debbugs.gnu.org; Mon, 10 Feb 2014 03:11:30 -0500 Received: by mail-ee0-f43.google.com with SMTP id c41so2761105eek.30 for <16481@debbugs.gnu.org>; Mon, 10 Feb 2014 00:11:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=DRFFp9OafYwZSaGcwEg87wXrvlV1PtRNxG3lA4EjBaY=; b=yBoJ4MJnJoPql5Lqr/bJ42elxsomug+2fzPevKrKq5zA6KH1RQZJI9VTy7CIi3MLL8 wEg34Of8ykTLACGod+16Xif9ALhnySp27RFXdVTQ7dahFdbgQiBh72PJJWuyPKYDpHlV Wm+2N91oHw3G+wKd1XVSU4kFVdWlQ/T9NpdHq7w0mQGKrSyFJAK09jkb3KPLEqDyq6/L ZOPmn5WcKHlAcG9f2Ysma4+7k+DNJosZroQ61nq2C4AhfdlAsYmdG+Hpkvwjdj+fNPe/ 3nBV0elKVleVqyHQWK6Jau19YCw5ukN6cRYKwDlvqSN+dT17So+1v9pHWsLw54ARZQN+ hp1w== X-Received: by 10.15.51.196 with SMTP id n44mr34753880eew.27.1392019888516; Mon, 10 Feb 2014 00:11:28 -0800 (PST) Received: from yakj.usersys.redhat.com (nat-pool-brq-u.redhat.com. [209.132.186.35]) by mx.google.com with ESMTPSA id 8sm51566608eeq.15.2014.02.10.00.11.25 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Feb 2014 00:11:27 -0800 (PST) Message-ID: <52F889A9.7040703@gnu.org> Date: Mon, 10 Feb 2014 09:11:21 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Paul Eggert , Aharon Robbins , 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> In-Reply-To: <52F83B06.6010209@cs.ucla.edu> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 10/02/2014 03:35, Paul Eggert ha scritto: > Paolo Bonzini wrote: >> The correct course of action for grep is to defer range interpretation >> to regex, because otherwise you can get mismatches between regexes with >> backreferences and those without. > > It depends on what one means by "correct". POSIX doesn't say what to do > in this situation, so it's OK as far as POSIX is concerned for grep to > use RRI in the typical case (i.e., without backreferences), and for grep > to use some other interpretation in the rare cases when backreferences > are used. > > The documentation for 'grep' attempts to address this issue, perhaps not > as clearly as it could. Maybe the installation instructions should talk > about it as well, and suggest --with-included-regex for people who care > about this sort of thing. Yeah, that makes sense. I will revert the commit. Paolo From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 10 04:00:40 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 09:00:41 +0000 Received: from localhost ([127.0.0.1]:39086 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCmjG-0002dq-ED for submit@debbugs.gnu.org; Mon, 10 Feb 2014 04:00:39 -0500 Received: from frenzy.freefriends.org ([66.54.153.139]:59257 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCmjE-0002di-2E for 16481@debbugs.gnu.org; Mon, 10 Feb 2014 04:00:37 -0500 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (localhost [127.0.0.1]) by freefriends.org (8.14.8/8.14.8) with ESMTP id s1A907Yq023459; Mon, 10 Feb 2014 02:00:07 -0700 Received: (from arnold@localhost) by freefriends.org (8.14.8/8.14.8/submit) id s1A907aJ023458; Mon, 10 Feb 2014 09:00:07 GMT From: arnold@skeeve.com Message-Id: <201402100900.s1A907aJ023458@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Mon, 10 Feb 2014 02:00:07 -0700 To: eggert@cs.ucla.edu, bonzini@gnu.org, arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> In-Reply-To: <52F889A9.7040703@gnu.org> User-Agent: Heirloom mailx 12.4 7/29/08 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Paolo Bonzini wrote: > Il 10/02/2014 03:35, Paul Eggert ha scritto: > > Paolo Bonzini wrote: > >> The correct course of action for grep is to defer range interpretation > >> to regex, because otherwise you can get mismatches between regexes with > >> backreferences and those without. > > > > It depends on what one means by "correct". POSIX doesn't say what to do > > in this situation, so it's OK as far as POSIX is concerned for grep to > > use RRI in the typical case (i.e., without backreferences), and for grep > > to use some other interpretation in the rare cases when backreferences > > are used. > > > > The documentation for 'grep' attempts to address this issue, perhaps not > > as clearly as it could. Maybe the installation instructions should talk > > about it as well, and suggest --with-included-regex for people who care > > about this sort of thing. > > Yeah, that makes sense. I will revert the commit. I think this is the wrong course of action. Paul suggested updating the doc to be more clear, not reverting the code. Personally, I think grep should always use the included regex so that then the behavior is consistent across all platforms everywhere; this is why gawk always uses its own regex. If the only way to use collating sequences and equivalence classes is with GLIBC, then I think it'd be better to pull the __LIBC bits out into the standalone regex somehow. In reponse to another question: Making GLIBC's regex support RRI isn't hard - getting the GLIBC maintainers to accept the patch, is. :-( My two cents: Jim & Paul will have to decide. Thanks, Arnold From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 10 04:18:28 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 09:18:28 +0000 Received: from localhost ([127.0.0.1]:39102 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCn0V-00036Q-Ib for submit@debbugs.gnu.org; Mon, 10 Feb 2014 04:18:27 -0500 Received: from mail-qc0-f176.google.com ([209.85.216.176]:53627) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCn0T-00036B-L5 for 16481@debbugs.gnu.org; Mon, 10 Feb 2014 04:18:26 -0500 Received: by mail-qc0-f176.google.com with SMTP id e16so10007229qcx.21 for <16481@debbugs.gnu.org>; Mon, 10 Feb 2014 01:18:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=FFivM1wPPEAaNafjOmF2eHxU+9OhtHCz4UnWcSLBX64=; b=t5rdIp7v1Ct2IXKclQ79+Cs+Xq3Rtv90/hXvB2v33O6FBhhQhRh7wPt8yfrlCKfpUg DJ7Stlvz22VNK0hUFjb70GSjskZjXqOmVxrwknRSem2jMO5xRcYtcVyPATW4BwlFKHwU 5ygS7aPuJbH1HzK4QaIcj2Qcy92xdWUBS9+Yqhtkj3SCfVVdhpha+pLai6rCPfl6DMoI wA7DZdW3wJB7ia5ZNPAAsl7wmDgOtPABCtqm+6p9I8lfuL69sp+3q87DYB8wcFn7WI9g zOAWIC1pAmVuF1XXA4HAdB069WE9AZlcaWmZlz6K12tl5jk1HX614YcYROHxwrcoEZfA 3Nbg== X-Received: by 10.224.167.84 with SMTP id p20mr45985479qay.24.1392023900009; Mon, 10 Feb 2014 01:18:20 -0800 (PST) Received: from yakj.usersys.redhat.com (nat-pool-brq-u.redhat.com. [209.132.186.35]) by mx.google.com with ESMTPSA id p10sm8381291qag.8.2014.02.10.01.18.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Feb 2014 01:18:19 -0800 (PST) Message-ID: <52F89958.2050003@gnu.org> Date: Mon, 10 Feb 2014 10:18:16 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: arnold@skeeve.com, eggert@cs.ucla.edu, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> In-Reply-To: <201402100900.s1A907aJ023458@freefriends.org> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 10/02/2014 10:00, arnold@skeeve.com ha scritto: >>> > > >>> > > The documentation for 'grep' attempts to address this issue, perhaps not >>> > > as clearly as it could. Maybe the installation instructions should talk >>> > > about it as well, and suggest --with-included-regex for people who care >>> > > about this sort of thing. >> > >> > Yeah, that makes sense. I will revert the commit. > I think this is the wrong course of action. Paul suggested updating the > doc to be more clear, not reverting the code. If you use --with-included-regex, the patch is a no-op. Thus it can be reverted. > Personally, I think grep should always use the included regex so that > then the behavior is consistent across all platforms everywhere; this > is why gawk always uses its own regex. I wouldn't be surprised if GNU distros patch gawk's regex away to get consistency with grep, sed, etc. Paolo From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 10 05:54:21 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 10:54:21 +0000 Received: from localhost ([127.0.0.1]:39202 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCoVI-0006qR-IM for submit@debbugs.gnu.org; Mon, 10 Feb 2014 05:54:20 -0500 Received: from frenzy.freefriends.org ([66.54.153.139]:59992 helo=freefriends.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCoVG-0006qE-1V for 16481@debbugs.gnu.org; Mon, 10 Feb 2014 05:54:18 -0500 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (localhost [127.0.0.1]) by freefriends.org (8.14.8/8.14.8) with ESMTP id s1AArwET027421; Mon, 10 Feb 2014 03:53:58 -0700 Received: (from arnold@localhost) by freefriends.org (8.14.8/8.14.8/submit) id s1AArw5O027420; Mon, 10 Feb 2014 10:53:58 GMT From: arnold@skeeve.com Message-Id: <201402101053.s1AArw5O027420@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Mon, 10 Feb 2014 03:53:58 -0700 To: eggert@cs.ucla.edu, bonzini@gnu.org, arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> <52F89958.2050003@gnu.org> In-Reply-To: <52F89958.2050003@gnu.org> User-Agent: Heirloom mailx 12.4 7/29/08 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) > If you use --with-included-regex, the patch is a no-op. Thus it can be > reverted. Whether or not you use the included regex, there are still problems, if not outright bugs, in that code that I pointed out in my initial mail several weeks ago. > > Personally, I think grep should always use the included regex so that > > then the behavior is consistent across all platforms everywhere; this > > is why gawk always uses its own regex. > > I wouldn't be surprised if GNU distros patch gawk's regex away to get > consistency with grep, sed, etc. Their loss. To date I know of no distro that does this. And the world is bigger than just GNU syystems. We've gone around on this before and we continue to disagree. I have nothing else to add to this discussion. Arnold From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 10 14:50:18 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 19:50:18 +0000 Received: from localhost ([127.0.0.1]:41816 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCwry-0004JI-5A for submit@debbugs.gnu.org; Mon, 10 Feb 2014 14:50:18 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:36902) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCwru-0004J3-MW for 16481@debbugs.gnu.org; Mon, 10 Feb 2014 14:50:15 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 797BA39E8018; Mon, 10 Feb 2014 11:50:08 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eNxyxkY0J0Ol; Mon, 10 Feb 2014 11:50:07 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id D87EA39E8012; Mon, 10 Feb 2014 11:50:07 -0800 (PST) Message-ID: <52F92D6F.9060101@cs.ucla.edu> Date: Mon, 10 Feb 2014 11:50:07 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Paolo Bonzini , arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> <52F89958.2050003@gnu.org> In-Reply-To: <52F89958.2050003@gnu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) On 02/10/2014 01:18 AM, Paolo Bonzini wrote: > > If you use --with-included-regex, the patch is a no-op. Are we talking about the patch in git commit 1078b64302bbf5c0a46635772808ff7f75171dbc ? If so, then the above comment doesn't sound right. Without the patch, the DFA matcher mishandles expressionsin some cases, as described in Bug#16481. For example, "grep -Xawk '[\[-\]]'" will cause dfa.c to try to compile the regular expression [[-]], which won't workregardless of whether --with-included-regex is being used. More generally, we already had the problem of subtle differences between dfa.c and full-regexp matching on platforms that do not observe RRI, because dfa.c already uses RRI in multibyte locales, regardless of whether the full matcher uses RRI. The change causes non-"C" unibyte locales to behave consistently with multibyte locales, which in some sense is an improvement (though obviously not ideal; it'd be better if it was RRI everywhere). Non-"C" unibyte locales are dying out, so to some extent this is a minor issue. In practice most users these days won't notice or care about this change. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 10 17:13:55 2014 Received: (at 16481) by debbugs.gnu.org; 10 Feb 2014 22:13:55 +0000 Received: from localhost ([127.0.0.1]:41991 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCz6w-0001xX-Sp for submit@debbugs.gnu.org; Mon, 10 Feb 2014 17:13:55 -0500 Received: from mail-ee0-f42.google.com ([74.125.83.42]:63462) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WCz6u-0001xJ-UO for 16481@debbugs.gnu.org; Mon, 10 Feb 2014 17:13:53 -0500 Received: by mail-ee0-f42.google.com with SMTP id b15so3242618eek.1 for <16481@debbugs.gnu.org>; Mon, 10 Feb 2014 14:13:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=Jeh3dMXLzGzL0ykV+l5QTpmamXrgv2RIFQkrOxgBVnw=; b=uPbFhDwrXxGhOPyYRrSfxe7Noba5/N5g5Mpg80qu0eN320om9/yiGeYKSXQHn2K648 GNGSVzm8rK7aJ1KZeQWUPtxkIc5BOVGbrqbpb3uI2k2ajnkwSd9zt9A/zpfTJNgIGJyv Zpj/bm6dn/MoMxz5oYDZUExRkzMQkCRYgDLec2AKZRgo02RMdIxcIIWG635ZRa4FXjH5 t/tPK/FHJ+2F8jrupAfRKbAHg1gFHPTVZPGQiE3uvGzf9bSz3gK8bOnShkKk7zcb602t LXc0YtCcgxgZIzd0bYWgUOrN/HI7Z6SDf0fVGIC2pSVPypUUbEt93VAfOBYWsbfMbR5Q KVMA== X-Received: by 10.15.45.194 with SMTP id b42mr3507143eew.103.1392070426926; Mon, 10 Feb 2014 14:13:46 -0800 (PST) Received: from yakj.usersys.redhat.com (gw1.globalcom.cz. [212.96.178.162]) by mx.google.com with ESMTPSA id k41sm59834238een.19.2014.02.10.14.13.44 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Feb 2014 14:13:45 -0800 (PST) Message-ID: <52F94F16.8090703@gnu.org> Date: Mon, 10 Feb 2014 23:13:42 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Paul Eggert , arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> <52F89958.2050003@gnu.org> <52F92D6F.9060101@cs.ucla.edu> In-Reply-To: <52F92D6F.9060101@cs.ucla.edu> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.6 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.6 (/) Il 10/02/2014 20:50, Paul Eggert ha scritto: > > If so, then the above comment doesn't sound right. Without the patch, > the DFA matcher mishandles expressionsin some cases, as described in > Bug#16481. For example, "grep -Xawk '[\[-\]]'" will cause dfa.c to try > to compile the regular expression [[-]], which won't workregardless of > whether --with-included-regex is being used. Ok, so there is a real bug. But it is not immediately obvious what the problem is, and the bug has (AFAICS) no test case and no mention in the commit message. Without this, I am not sure that the fix should not be the one in this commit. > More generally, we already had the problem of subtle differences between > dfa.c and full-regexp matching on platforms that do not observe RRI, > because dfa.c already uses RRI in multibyte locales, regardless of > whether the full matcher uses RRI. It only does so if the fallback to regex is not requested (dfaexec invoked with backref = NULL). This is never the case for grep. In fact, as far as I know it is never the case, and I've been tempted many times to completely remove the mostly dead code dealing with multibyte ranges if backref = NULL. > The change causes non-"C" unibyte > locales to behave consistently with multibyte locales, which in some > sense is an improvement (though obviously not ideal; it'd be better if > it was RRI everywhere). It would be if glibc were fixed. For me, consistency with other GNU utilities---especially sed---trumps anything else, and this was the main point in fixing multibyte matching in GNU grep 2.6 and newer. > Non-"C" unibyte locales are dying out, so to some extent this is a minor > issue. In practice most users these days won't notice or care about > this change. That's true. Paolo From debbugs-submit-bounces@debbugs.gnu.org Tue Feb 11 16:42:17 2014 Received: (at 16481) by debbugs.gnu.org; 11 Feb 2014 21:42:17 +0000 Received: from localhost ([127.0.0.1]:47910 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WDL5s-00025F-Il for submit@debbugs.gnu.org; Tue, 11 Feb 2014 16:42:17 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:55838) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WDL5m-00024u-Uc for 16481@debbugs.gnu.org; Tue, 11 Feb 2014 16:42:12 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id F1A4CA60005; Tue, 11 Feb 2014 13:42:04 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nCt3blPOefB4; Tue, 11 Feb 2014 13:42:04 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 3C9E5A60003; Tue, 11 Feb 2014 13:42:04 -0800 (PST) Message-ID: <52FA992C.20804@cs.ucla.edu> Date: Tue, 11 Feb 2014 13:42:04 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: Paolo Bonzini , arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> <52F89958.2050003@gnu.org> <52F92D6F.9060101@cs.ucla.edu> <52F94F16.8090703@gnu.org> In-Reply-To: <52F94F16.8090703@gnu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) On 02/10/2014 02:13 PM, Paolo Bonzini wrote: > Ok, so there is a real bug. But it is not immediately obvious what > the problem is, and the bug has (AFAICS) no test case and no mention > in the commit message. Without this, I am not sure that the fix > should not be the one in this commit. You're right, it should have had a test case.I'll add this to my to-do list. > It only does so if the fallback to regex is not requested (dfaexec > invoked with backref = NULL). This is never the case for grep. In > fact, as far as I know it is never the case, and I've been tempted > many times to completely remove the mostly dead code dealing with > multibyte ranges if backref = NULL. > Ouch, I wasn't aware of this. Clearly the patch I put in was wrong -- at least for the documentation that got put into NEWS. Perhaps you're right, and the best thing to do for now is to revert the patch while we can think about a better solution. This should be done soon, since Jim wants to do a grep release. Please let me think about it for a day or two. I would like to fix the bug, anyway, even if that patch wasn't the right way to do it. Longer term, it'd be better to simplify the code (perhaps along the lines that you suggested) as it's too full of gotchas now. From debbugs-submit-bounces@debbugs.gnu.org Tue Feb 11 16:44:21 2014 Received: (at 16481) by debbugs.gnu.org; 11 Feb 2014 21:44:21 +0000 Received: from localhost ([127.0.0.1]:47914 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WDL7s-00028w-Ob for submit@debbugs.gnu.org; Tue, 11 Feb 2014 16:44:21 -0500 Received: from mail-ea0-f177.google.com ([209.85.215.177]:41304) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WDL7q-00028h-Tu for 16481@debbugs.gnu.org; Tue, 11 Feb 2014 16:44:19 -0500 Received: by mail-ea0-f177.google.com with SMTP id m10so1391052eaj.36 for <16481@debbugs.gnu.org>; Tue, 11 Feb 2014 13:44:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=cSv2YbTpKHu8HDMPBs+Ns3q3cBN04qhCLLReBzWa+74=; b=Lc76+0AZydmKd64Gn46HvZoSGxlwzmcaYfMO9xCv42eF6eOl3KR5Z2vXsKz8ynojuU NKTQzhtKAjzXm6dTKH42HNJZAxS+q0VHxyTizerIGHqaGMcCWZhXCWkQu/0OM/yBkmZ2 +/nAIptmHiNnje7oOeCoccIxE01C/q3+8RSh67PJlvWnr/D4k9asmeHnFXwvjZMfiuTT nwwU6DUAZ0mGJyvHZpbF4FjIH9/9z+xDzEctuOmvNAcE7YXyHoewnStfTe8q/7k2vot9 nJ2dps90a6BPxMiLNYm7O11AfkCT5JN8vmJ7pjORkOPPmFZ++Ph9FqhwGhGNvjDcho5v mkAw== X-Received: by 10.14.211.71 with SMTP id v47mr47550335eeo.37.1392155052643; Tue, 11 Feb 2014 13:44:12 -0800 (PST) Received: from yakj.usersys.redhat.com (gw1.globalcom.cz. [212.96.178.162]) by mx.google.com with ESMTPSA id q44sm37339802eez.1.2014.02.11.13.44.10 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 11 Feb 2014 13:44:11 -0800 (PST) Message-ID: <52FA99A9.7020103@gnu.org> Date: Tue, 11 Feb 2014 22:44:09 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Paul Eggert , arnold@skeeve.com, 16481@debbugs.gnu.org Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> <52F89958.2050003@gnu.org> <52F92D6F.9060101@cs.ucla.edu> <52F94F16.8090703@gnu.org> <52FA992C.20804@cs.ucla.edu> In-Reply-To: <52FA992C.20804@cs.ucla.edu> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.6 (/) X-Debbugs-Envelope-To: 16481 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.6 (/) Il 11/02/2014 22:42, Paul Eggert ha scritto: > Ouch, I wasn't aware of this. Clearly the patch I put in was wrong -- > at least for the documentation that got put into NEWS. Yeah, sorry for not spelling it out entirely. I worked on grep in bursts, and as a result I tend to take too many things for granted. > Perhaps you're right, and the best thing to do for now is to revert the > patch while we can think about a better solution. This should be done > soon, since Jim wants to do a grep release. Please let me think about it > for a day or two. I would like to fix the bug, anyway, even if that > patch wasn't the right way to do it. Longer term, it'd be better to > simplify the code (perhaps along the lines that you suggested) as it's > too full of gotchas now. I 100% agree with this. If I don't hear from you I'll revert the patch next Friday. Paolo From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 16 23:45:23 2014 Received: (at 16481) by debbugs.gnu.org; 17 Feb 2014 04:45:23 +0000 Received: from localhost ([127.0.0.1]:55521 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WFG54-0003vC-Tq for submit@debbugs.gnu.org; Sun, 16 Feb 2014 23:45:23 -0500 Received: from mail-pa0-f42.google.com ([209.85.220.42]:39568) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WFG51-0003uu-NN for 16481@debbugs.gnu.org; Sun, 16 Feb 2014 23:45:20 -0500 Received: by mail-pa0-f42.google.com with SMTP id kl14so14781108pab.29 for <16481@debbugs.gnu.org>; Sun, 16 Feb 2014 20:45:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=0Rw9LFXIVDdGrgyJkibG+cO5SKyaTyNFmQclj2o7tOU=; b=bCrw+tg0HzKGHV9MYgbBBmH2LHeOrGvOCHqvlUvNcT7B7swfNk7K5tm3rmVxoexv1b +nxVQeQBlNz9WnotwouyZ6nSj6jLwj/2IDdbAeJBso+jYm7iqvlJk+L2UplB+qYL1MCd 421XmETfZnz9bF3zGLrJEAyHIYIgX9bEeq48VzhOEClwDwQQJkPLkxCFSR++aRZRn3o9 miDyyGngrTDPojkaClGUcPZtuzkq7nIE80s9F89kr2LeAXUnUuE0PaRaJyVHsdAJQZGB HN7AEMojm0iBsfsHGOW6fGygXXsHKXraKBUeDhqMnOfI6SaCDndM/8szahkZ2jmKn+PH V6Bw== X-Received: by 10.66.240.4 with SMTP id vw4mr24120910pac.26.1392612313879; Sun, 16 Feb 2014 20:45:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Sun, 16 Feb 2014 20:44:53 -0800 (PST) In-Reply-To: <52FA99A9.7020103@gnu.org> References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> <52F89958.2050003@gnu.org> <52F92D6F.9060101@cs.ucla.edu> <52F94F16.8090703@gnu.org> <52FA992C.20804@cs.ucla.edu> <52FA99A9.7020103@gnu.org> From: Jim Meyering Date: Sun, 16 Feb 2014 20:44:53 -0800 X-Google-Sender-Auth: fDG74nCbEbeGXPq2lkCB0Gjbjl4 Message-ID: Subject: Re: bug#16481: dfa.c and Rational Range Interpretation To: Paolo Bonzini Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16481 Cc: Paul Eggert , Aharon Robbins , 16481@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Tue, Feb 11, 2014 at 1:44 PM, Paolo Bonzini wrote: > Il 11/02/2014 22:42, Paul Eggert ha scritto: > >> Ouch, I wasn't aware of this. Clearly the patch I put in was wrong -- >> at least for the documentation that got put into NEWS. > > > Yeah, sorry for not spelling it out entirely. I worked on grep in bursts, > and as a result I tend to take too many things for granted. > > >> Perhaps you're right, and the best thing to do for now is to revert the >> patch while we can think about a better solution. This should be done >> soon, since Jim wants to do a grep release. Please let me think about it >> for a day or two. I would like to fix the bug, anyway, even if that >> patch wasn't the right way to do it. Longer term, it'd be better to >> simplify the code (perhaps along the lines that you suggested) as it's >> too full of gotchas now. > > > I 100% agree with this. If I don't hear from you I'll revert the patch next > Friday. Hi guys, I confess that I do not feel strongly about this corner case, but do want to make a release very soon. Paolo, Paul, where do you stand? I would like to make the release by Monday evening. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 17 02:30:07 2014 Received: (at 16481) by debbugs.gnu.org; 17 Feb 2014 07:30:07 +0000 Received: from localhost ([127.0.0.1]:55689 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WFIeS-0000CH-Fr for submit@debbugs.gnu.org; Mon, 17 Feb 2014 02:30:06 -0500 Received: from mail-qc0-f175.google.com ([209.85.216.175]:38603) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WFIeO-0000B7-B7 for 16481@debbugs.gnu.org; Mon, 17 Feb 2014 02:30:01 -0500 Received: by mail-qc0-f175.google.com with SMTP id x13so23454172qcv.34 for <16481@debbugs.gnu.org>; Sun, 16 Feb 2014 23:29:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=FfQvFhCBGjqjMugCuHVRZgGsjIMmR2oeWayMok7zSVk=; b=H5vnCoRxO+VbxsAg1kTKcuKqPhZ1Yj0v/qKrVLHM6aWtpTFn3p2JLpeeQaseQPD8lS Ou5g5X7rg73704xn8p5IBsUwtEIq6FQSODAKKrT2NEZxAXR+5Nht4MdBRRyjLQMSBeSB wV7PgWCs4sceQTJfpmgE1F1SoeMmM6Zr3a7WkxZHbdSKTQS7PFEAN9qds/RUGNeaccE7 4C32tyPjmUi84kYutPxoBT1axO34uFOdwWetRfouHzIM1RGaPw6+Y45Z4ySWIPMY8oG9 cZb9wueKprvBBnG1GFJsQvgLkBD30UDcLAsLTXeB3I+NtjQW/9bhGrjJ0r1bGJuIisOp A52A== X-Received: by 10.224.12.20 with SMTP id v20mr32243965qav.47.1392622194691; Sun, 16 Feb 2014 23:29:54 -0800 (PST) Received: from yakj.usersys.redhat.com (net-37-117-154-249.cust.vodafonedsl.it. [37.117.154.249]) by mx.google.com with ESMTPSA id u20sm20984214qge.2.2014.02.16.23.29.52 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 16 Feb 2014 23:29:54 -0800 (PST) Message-ID: <5301BA6B.6040607@gnu.org> Date: Mon, 17 Feb 2014 08:29:47 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#16481: dfa.c and Rational Range Interpretation References: <201401171339.s0HDdmwB017938@skeeve.com> <52D9B211.7050908@cs.ucla.edu> <52F80CD0.8070607@gnu.org> <52F83B06.6010209@cs.ucla.edu> <52F889A9.7040703@gnu.org> <201402100900.s1A907aJ023458@freefriends.org> <52F89958.2050003@gnu.org> <52F92D6F.9060101@cs.ucla.edu> <52F94F16.8090703@gnu.org> <52FA992C.20804@cs.ucla.edu> <52FA99A9.7020103@gnu.org> In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16481 Cc: Paul Eggert , Aharon Robbins , 16481@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 17/02/2014 05:44, Jim Meyering ha scritto: >> > I 100% agree with this. If I don't hear from you I'll revert the patch next >> > Friday. > Hi guys, > > I confess that I do not feel strongly about this corner case, but > do want to make a release very soon. Paolo, Paul, where do you stand? > > I would like to make the release by Monday evening. I'll revert the patch today. Paolo From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 09 16:07:18 2014 Received: (at 16481-done) by debbugs.gnu.org; 9 Mar 2014 20:07:18 +0000 Received: from localhost ([127.0.0.1]:58392 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WMk0E-00052k-55 for submit@debbugs.gnu.org; Sun, 09 Mar 2014 16:07:18 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:55136) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WMk0A-00052W-Gl for 16481-done@debbugs.gnu.org; Sun, 09 Mar 2014 16:07:15 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 8B81439E8012 for <16481-done@debbugs.gnu.org>; Sun, 9 Mar 2014 13:07:13 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t8T2mTKuMw-H for <16481-done@debbugs.gnu.org>; Sun, 9 Mar 2014 13:07:13 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 42EEB39E8008 for <16481-done@debbugs.gnu.org>; Sun, 9 Mar 2014 13:07:13 -0700 (PDT) Message-ID: <531CC9F0.3010808@cs.ucla.edu> Date: Sun, 09 Mar 2014 13:07:12 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 To: 16481-done@debbugs.gnu.org Subject: Re: dfa.c and Rational Range Interpretation Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 16481-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) It seems that the issues in this bug report are all done in the savannah git master for grep, so I'm marking this as done. At some point I'd like to change the regex code to support RRI, at which point the dfa.c should now automatically adapt without our having to change dfa.c further. But that would be a matter for a gnulib and/or glibc bug report, not dfa and/or grep. From unknown Thu Aug 14 18:37:59 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 07 Apr 2014 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator