From unknown Tue Jun 24 22:35:36 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#20657 <20657@debbugs.gnu.org> To: bug#20657 <20657@debbugs.gnu.org> Subject: Status: Traditional range expression not accepted in regex/dfa Reply-To: bug#20657 <20657@debbugs.gnu.org> Date: Wed, 25 Jun 2025 05:35:36 +0000 retitle 20657 Traditional range expression not accepted in regex/dfa reassign 20657 grep submitter 20657 arnold@skeeve.com severity 20657 wishlist thanks From debbugs-submit-bounces@debbugs.gnu.org Mon May 25 22:42:39 2015 Received: (at submit) by debbugs.gnu.org; 26 May 2015 02:42:39 +0000 Received: from localhost ([127.0.0.1]:56063 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yx4pD-0004VI-0t for submit@debbugs.gnu.org; Mon, 25 May 2015 22:42:39 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52214) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yx4pA-0004V4-Rl for submit@debbugs.gnu.org; Mon, 25 May 2015 22:42:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yx4p4-0002lc-LN for submit@debbugs.gnu.org; Mon, 25 May 2015 22:42:31 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:58845) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yx4p4-0002lY-IB for submit@debbugs.gnu.org; Mon, 25 May 2015 22:42:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37482) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yx4p3-0001bU-LW for bug-grep@gnu.org; Mon, 25 May 2015 22:42:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yx4oz-0002kz-Lh for bug-grep@gnu.org; Mon, 25 May 2015 22:42:29 -0400 Received: from [96.88.95.60] (port=57223 helo=freefriends.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yx4oz-0002kk-Et for bug-grep@gnu.org; Mon, 25 May 2015 22:42:25 -0400 X-Envelope-From: arnold@skeeve.com X-Envelope-To: Received: from freefriends.org (localhost [127.0.0.1]) by freefriends.org (8.14.9/8.14.9) with ESMTP id t4Q2gKUt007025 for ; Mon, 25 May 2015 20:42:21 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.9/8.14.9/submit) id t4Q2gKwH007024 for bug-grep@gnu.org; Tue, 26 May 2015 02:42:20 GMT From: arnold@skeeve.com Message-Id: <201505260242.t4Q2gKwH007024@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Tue, 26 May 2015 05:42:19 +0300 To: bug-grep@gnu.org Subject: Traditional range expression not accepted in regex/dfa User-Agent: Heirloom mailx 12.5 6/20/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hi. I received a bug report for gawk by private email that a regexp of this form: '[^0-9---]' wasn't accepted. The bugaboo here is the "---"; it's a range expression consisting of minus through minus, and apparently long ago was how one got a minus into a bracket expression. This can be seen in current grep also: $ ./src/grep --version ./src/grep (GNU grep) 2.21 Copyright (C) 2014 Free Software Foundation, Inc. ... $ ./src/grep '[^0-9---]' /dev/null ./src/grep: Invalid range end The underlying regex and, I believe, dfa routines don't accept this. Fixing either of them is beyond my skill range, so I thought I'd pass this one upstream to you folks. Thanks! Arnold From debbugs-submit-bounces@debbugs.gnu.org Tue May 26 02:53:43 2015 Received: (at 20657) by debbugs.gnu.org; 26 May 2015 06:53:43 +0000 Received: from localhost ([127.0.0.1]:56179 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yx8kA-0004wu-N6 for submit@debbugs.gnu.org; Tue, 26 May 2015 02:53:43 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:54271) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yx8k7-0004wc-DI for 20657@debbugs.gnu.org; Tue, 26 May 2015 02:53:40 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 07DF6A60004; Mon, 25 May 2015 23:53:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oGFJCdhYL-x6; Mon, 25 May 2015 23:53:32 -0700 (PDT) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id EE01CA60003; Mon, 25 May 2015 23:53:31 -0700 (PDT) Message-ID: <5564186B.90208@cs.ucla.edu> Date: Mon, 25 May 2015 23:53:31 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: arnold@skeeve.com, 20657@debbugs.gnu.org Subject: Re: bug#20657: Traditional range expression not accepted in regex/dfa References: <201505260242.t4Q2gKwH007024@freefriends.org> In-Reply-To: <201505260242.t4Q2gKwH007024@freefriends.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20657 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) arnold@skeeve.com wrote: > The bugaboo here is the "---"; it's > a range expression consisting of minus through minus, and apparently long > ago was how one got a minus into a bracket expression. Actually, long ago expressions like '[^0-9-]' worked just as they do now, and it wasn't ever necessary to use trailing "---". That being said, it is true that in 7th Edition Unix '[^0-9---]' meant the same thing as '[^0-9-]', so in that sense we have an incompatibility with 7th Edition Unix here. > $ ./src/grep '[^0-9---]' /dev/null > ./src/grep: Invalid range end > > The underlying regex and, I believe, dfa routines don't accept this. Yes, that's correct. It's not a bug, though, as the regexp is ambiguous and does not conform to POSIX, which says the following about RE bracket expressions: "To use a as the starting range point, it shall either come first in the bracket expression or be specified as a collating symbol; for example, "[][.-.]-0]", which matches either a or any character or collating element that collates between and 0, inclusive." In your correspondent's example, the hyphen is a starting range point but is neither first in the bracket expression nor is specified as a collating symbol, so the regexp doesn't conform to POSIX. Even though it's not a bug I suppose it wouldn't hurt to make the GNU matchers compatible with 7th Edition Unix here, if someone really wants to take that task on; it's not urgent, though. From debbugs-submit-bounces@debbugs.gnu.org Sat May 30 16:04:40 2015 Received: (at control) by debbugs.gnu.org; 30 May 2015 20:04:41 +0000 Received: from localhost ([127.0.0.1]:33783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yymzo-0007X7-Fr for submit@debbugs.gnu.org; Sat, 30 May 2015 16:04:40 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:53645) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yymzm-0007Wi-MZ for control@debbugs.gnu.org; Sat, 30 May 2015 16:04:39 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 3C93E39E801B for ; Sat, 30 May 2015 13:04:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9SwSbBss7Jia for ; Sat, 30 May 2015 13:04:32 -0700 (PDT) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 4CC9C39E8016 for ; Sat, 30 May 2015 13:04:32 -0700 (PDT) Message-ID: <556A17D0.4000303@cs.ucla.edu> Date: Sat, 30 May 2015 13:04:32 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: grep bug maintainance Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) tag 20605 notabug close 20605 severity 20657 wishlist tag 20638 notabug close 20638 merge 20526 19985 19230 tag 19837 notabug close 19837 merge 16444 19777 close 19563 close 19486 tag 19330 notabug close 19330 tag 19193 notabug close 19193 tag 19071 notabug close 19071 tag 19005 notabug close 19005 close 19000 tag 18888 notabug close 18888 From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 21 22:09:05 2022 Received: (at 20657-done) by debbugs.gnu.org; 22 Apr 2022 02:09:05 +0000 Received: from localhost ([127.0.0.1]:51165 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nhijI-0005py-Sn for submit@debbugs.gnu.org; Thu, 21 Apr 2022 22:09:05 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:54858) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nhijG-0005pR-Rp for 20657-done@debbugs.gnu.org; Thu, 21 Apr 2022 22:09:03 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6FA59160090; Thu, 21 Apr 2022 19:08:57 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 1ky__GCbPFAc; Thu, 21 Apr 2022 19:08:56 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 6270116009A; Thu, 21 Apr 2022 19:08:56 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Y77OIu5_thmm; Thu, 21 Apr 2022 19:08:56 -0700 (PDT) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 0F558160090; Thu, 21 Apr 2022 19:08:56 -0700 (PDT) Content-Type: multipart/mixed; boundary="------------NQzlkdEpiK9JsEipzqpNOjdC" Message-ID: <89b7650f-bb7a-04d8-128c-e9d4977ed566@cs.ucla.edu> Date: Thu, 21 Apr 2022 19:08:55 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: Accepting [xyz---abc] - three minus signs to mean one Content-Language: en-US To: Arnold Robbins References: From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20657-done Cc: bug-gnulib@gnu.org, 20657-done@debbugs.gnu.org, beebe@math.utah.edu X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------NQzlkdEpiK9JsEipzqpNOjdC Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 4/21/22 00:57, Arnold Robbins wrote: > As far as my testing indicates, dfa.c doesn't need a patch, it seems > to accept "---" inside brackets for a single minus. Yes, a brief perusal of the dfa.c source code suggests you're right. Thanks for looking into this. I tend to agree with you that POSIX is not likely to outlaw this extension. > If there are no objections, can we get this into Gnulib? Although the basic idea looks good, I see a few places where the patch can be improved. * The two calls to re_string_peek_byte might go past the end of the pattern (a subscript violation). This is possible because the pattern is not necessarily null-terminated. * The two calls to re_string_fetch_byte can be simplified into a single call to re_string_skip_bytes. * No need to assign to token->opr.c, as it already has the correct value. * Can fall through to the default case to save a bit of duplicate code. * glibc still uses comments /* like this */ for style reasons, and we should stick to that. I wrote a patch with these improvements in mind and installed it into Gnulib (see attached); hope it works for Gawk too. --------------NQzlkdEpiK9JsEipzqpNOjdC Content-Type: text/x-patch; charset=UTF-8; name="0001-regex-match-.-.-like-V7-grep.patch" Content-Disposition: attachment; filename="0001-regex-match-.-.-like-V7-grep.patch" Content-Transfer-Encoding: base64 RnJvbSBkZDgzZGZiM2YyZDJlNTEzOWVhN2QwMDI0MGI1NDQxZGFhMGIzYTU2IE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBUaHUsIDIxIEFwciAyMDIyIDE4OjU2OjEyIC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gcmVnZXg6IG1hdGNoIFsuLi4tLS0uLi5dIGxpa2UgVjcgZ3JlcAoKUHJvYmxlbSByZXBv cnRlZCBieSBBcm5vbGQgUm9iYmlucyBpbjoKaHR0cHM6Ly9idWdzLmdudS5vcmcvMjA2NTcK aHR0cHM6Ly9saXN0cy5nbnUub3JnL3IvYnVnLWdudWxpYi8yMDIyLTA0L21zZzAwMDUzLmh0 bWwKKiBsaWIvcmVnY29tcC5jIChwZWVrX3Rva2VuX2JyYWNrZXQpOiBMZXQgWy4uLi0tLS4u Ll0gbWF0Y2ggJy0nLgpUaGlzIGlzIGFuIGV4dGVuc2lvbiB0byBQT1NJWCwgYW5kIG1hdGNo ZXMgVjcgVW5peCBncmVwLgotLS0KIENoYW5nZUxvZyAgICAgfCAgOSArKysrKysrKysKIGxp Yi9yZWdjb21wLmMgfCAxNiArKysrKysrKysrKysrLS0tCiAyIGZpbGVzIGNoYW5nZWQsIDIy IGluc2VydGlvbnMoKyksIDMgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvQ2hhbmdlTG9n IGIvQ2hhbmdlTG9nCmluZGV4IGNkMTZiYmUwY2QuLmRkZDQ4MjZiY2YgMTAwNjQ0Ci0tLSBh L0NoYW5nZUxvZworKysgYi9DaGFuZ2VMb2cKQEAgLTEsMyArMSwxMiBAQAorMjAyMi0wNC0y MSAgUGF1bCBFZ2dlcnQgIDxlZ2dlcnRAY3MudWNsYS5lZHU+CisKKwlyZWdleDogbWF0Y2gg Wy4uLi0tLS4uLl0gbGlrZSBWNyBncmVwCisJUHJvYmxlbSByZXBvcnRlZCBieSBBcm5vbGQg Um9iYmlucyBpbjoKKwlodHRwczovL2J1Z3MuZ251Lm9yZy8yMDY1NworCWh0dHBzOi8vbGlz dHMuZ251Lm9yZy9yL2J1Zy1nbnVsaWIvMjAyMi0wNC9tc2cwMDA1My5odG1sCisJKiBsaWIv cmVnY29tcC5jIChwZWVrX3Rva2VuX2JyYWNrZXQpOiBMZXQgWy4uLi0tLS4uLl0gbWF0Y2gg Jy0nLgorCVRoaXMgaXMgYW4gZXh0ZW5zaW9uIHRvIFBPU0lYLCBhbmQgbWF0Y2hlcyBWNyBV bml4IGdyZXAuCisKIDIwMjItMDQtMjAgIFBhdWwgRWdnZXJ0ICA8ZWdnZXJ0QGNzLnVjbGEu ZWR1PgogCiAJYmFja3VwZmlsZTogZml4IGJ1ZyB3aGVuIHJlbmFtaW5nIHNpbXBsZSBiYWNr dXBzCmRpZmYgLS1naXQgYS9saWIvcmVnY29tcC5jIGIvbGliL3JlZ2NvbXAuYwppbmRleCBi NjA3Yzg1MzIwLi4xMjJjM2RlNThjIDEwMDY0NAotLS0gYS9saWIvcmVnY29tcC5jCisrKyBi L2xpYi9yZWdjb21wLmMKQEAgLTIwMzgsMTUgKzIwMzgsMjUgQEAgcGVla190b2tlbl9icmFj a2V0IChyZV90b2tlbl90ICp0b2tlbiwgcmVfc3RyaW5nX3QgKmlucHV0LCByZWdfc3ludGF4 X3Qgc3ludGF4KQogICAgIH0KICAgc3dpdGNoIChjKQogICAgIHsKLSAgICBjYXNlICctJzoK LSAgICAgIHRva2VuLT50eXBlID0gT1BfQ0hBUlNFVF9SQU5HRTsKLSAgICAgIGJyZWFrOwog ICAgIGNhc2UgJ10nOgogICAgICAgdG9rZW4tPnR5cGUgPSBPUF9DTE9TRV9CUkFDS0VUOwog ICAgICAgYnJlYWs7CiAgICAgY2FzZSAnXic6CiAgICAgICB0b2tlbi0+dHlwZSA9IE9QX05P Tl9NQVRDSF9MSVNUOwogICAgICAgYnJlYWs7CisgICAgY2FzZSAnLSc6CisgICAgICAvKiBJ biBWNyBVbml4IGdyZXAgYW5kIFVuaXggYXdrIGFuZCBtYXdrLCBbLi4uLS0tLi4uXQorICAg ICAgICAgKDMgYWRqYWNlbnQgbWludXMgc2lnbnMpIHN0YW5kcyBmb3IgYSBzaW5nbGUgbWlu dXMgc2lnbi4KKyAgICAgICAgIFN1cHBvcnQgdGhhdCB3aXRob3V0IGJyZWFraW5nIGFueXRo aW5nIGVsc2UuICAqLworICAgICAgaWYgKCEgKHJlX3N0cmluZ19jdXJfaWR4IChpbnB1dCkg KyAyIDwgcmVfc3RyaW5nX2xlbmd0aCAoaW5wdXQpCisgICAgICAgICAgICAgJiYgcmVfc3Ry aW5nX3BlZWtfYnl0ZSAoaW5wdXQsIDEpID09ICctJworICAgICAgICAgICAgICYmIHJlX3N0 cmluZ19wZWVrX2J5dGUgKGlucHV0LCAyKSA9PSAnLScpKQorICAgICAgICB7CisgICAgICAg ICAgdG9rZW4tPnR5cGUgPSBPUF9DSEFSU0VUX1JBTkdFOworICAgICAgICAgIGJyZWFrOwor ICAgICAgICB9CisgICAgICByZV9zdHJpbmdfc2tpcF9ieXRlcyAoaW5wdXQsIDIpOworICAg ICAgRkFMTFRIUk9VR0g7CiAgICAgZGVmYXVsdDoKICAgICAgIHRva2VuLT50eXBlID0gQ0hB UkFDVEVSOwogICAgIH0KLS0gCjIuMzUuMQoK --------------NQzlkdEpiK9JsEipzqpNOjdC-- From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 24 09:21:20 2022 Received: (at 20657-done) by debbugs.gnu.org; 24 Apr 2022 13:21:20 +0000 Received: from localhost ([127.0.0.1]:57897 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nicAy-0000xk-Bo for submit@debbugs.gnu.org; Sun, 24 Apr 2022 09:21:20 -0400 Received: from freefriends.org ([96.88.95.60]:56434) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nicAv-0000xc-Qc for 20657-done@debbugs.gnu.org; Sun, 24 Apr 2022 09:21:18 -0400 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (freefriends.org [96.88.95.60]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 23ODL6VJ016507 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 24 Apr 2022 07:21:07 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 23ODL68j016506; Sun, 24 Apr 2022 07:21:06 -0600 From: arnold@skeeve.com Message-Id: <202204241321.23ODL68j016506@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Sun, 24 Apr 2022 07:21:06 -0600 To: eggert@cs.ucla.edu, arnold@skeeve.com Subject: Re: Accepting [xyz---abc] - three minus signs to mean one References: <89b7650f-bb7a-04d8-128c-e9d4977ed566@cs.ucla.edu> In-Reply-To: <89b7650f-bb7a-04d8-128c-e9d4977ed566@cs.ucla.edu> User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20657-done Cc: bug-gnulib@gnu.org, 20657-done@debbugs.gnu.org, beebe@math.utah.edu X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Paul. Thanks for this. The patch looks good. I will (eventually) merge it into gawk instead of my change. I plan to add a test to gawk; perhaps grep would benefit from one as well? Thanks, Arnold Paul Eggert wrote: > On 4/21/22 00:57, Arnold Robbins wrote: > > > As far as my testing indicates, dfa.c doesn't need a patch, it seems > > to accept "---" inside brackets for a single minus. > > Yes, a brief perusal of the dfa.c source code suggests you're right. > Thanks for looking into this. I tend to agree with you that POSIX is not > likely to outlaw this extension. > > > > If there are no objections, can we get this into Gnulib? > > Although the basic idea looks good, I see a few places where the patch > can be improved. > > * The two calls to re_string_peek_byte might go past the end of the > pattern (a subscript violation). This is possible because the pattern is > not necessarily null-terminated. > > * The two calls to re_string_fetch_byte can be simplified into a single > call to re_string_skip_bytes. > > * No need to assign to token->opr.c, as it already has the correct value. > > * Can fall through to the default case to save a bit of duplicate code. > > * glibc still uses comments /* like this */ for style reasons, and we > should stick to that. > > I wrote a patch with these improvements in mind and installed it into > Gnulib (see attached); hope it works for Gawk too. From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 24 15:07:49 2022 Received: (at 20657-done) by debbugs.gnu.org; 24 Apr 2022 19:07:49 +0000 Received: from localhost ([127.0.0.1]:60178 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nihaH-0006qT-BP for submit@debbugs.gnu.org; Sun, 24 Apr 2022 15:07:49 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:35964) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nihaF-0006qE-EI for 20657-done@debbugs.gnu.org; Sun, 24 Apr 2022 15:07:48 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 9B60A16009A; Sun, 24 Apr 2022 12:07:41 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id BA2elsKsavEK; Sun, 24 Apr 2022 12:07:40 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E4F621600C5; Sun, 24 Apr 2022 12:07:40 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id BuwhlT1U90gI; Sun, 24 Apr 2022 12:07:40 -0700 (PDT) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id B908016009A; Sun, 24 Apr 2022 12:07:40 -0700 (PDT) Message-ID: Date: Sun, 24 Apr 2022 12:07:40 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: arnold@skeeve.com References: <89b7650f-bb7a-04d8-128c-e9d4977ed566@cs.ucla.edu> <202204241321.23ODL68j016506@freefriends.org> From: Paul Eggert Organization: UCLA Computer Science Department Subject: Re: bug#20657: Accepting [xyz---abc] - three minus signs to mean one In-Reply-To: <202204241321.23ODL68j016506@freefriends.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20657-done Cc: bug-gnulib@gnu.org, 20657-done@debbugs.gnu.org, beebe@math.utah.edu X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 4/24/22 06:21, arnold@skeeve.com wrote: > I plan to add a test to gawk; perhaps grep would benefit from one as well? That'd need more than just a test, as we'd need to also modify regex.m4 to arrange to replace any system regex that has this incompatibility with gnulib regex. And we'd need to document the extension since we shouldn't test undocumented features. Although such work could be done, I expect it'd be a more productive use of our limited time to get this extension into glibc first. I'll add that to my (long) list of things to do. From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 25 00:51:27 2022 Received: (at 20657-done) by debbugs.gnu.org; 25 Apr 2022 04:51:27 +0000 Received: from localhost ([127.0.0.1]:32781 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1niqh4-00013F-Of for submit@debbugs.gnu.org; Mon, 25 Apr 2022 00:51:26 -0400 Received: from freefriends.org ([96.88.95.60]:39204) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1niqh2-000136-LP for 20657-done@debbugs.gnu.org; Mon, 25 Apr 2022 00:51:25 -0400 X-Envelope-From: arnold@skeeve.com Received: from freefriends.org (freefriends.org [96.88.95.60]) by freefriends.org (8.14.7/8.14.7) with ESMTP id 23P4pD3c030853 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 24 Apr 2022 22:51:14 -0600 Received: (from arnold@localhost) by freefriends.org (8.14.7/8.14.7/Submit) id 23P4pDYf030852; Sun, 24 Apr 2022 22:51:13 -0600 From: arnold@skeeve.com Message-Id: <202204250451.23P4pDYf030852@freefriends.org> X-Authentication-Warning: frenzy.freefriends.org: arnold set sender to arnold@skeeve.com using -f Date: Sun, 24 Apr 2022 22:51:13 -0600 To: eggert@cs.ucla.edu, arnold@skeeve.com Subject: Re: bug#20657: Accepting [xyz---abc] - three minus signs to mean one References: <89b7650f-bb7a-04d8-128c-e9d4977ed566@cs.ucla.edu> <202204241321.23ODL68j016506@freefriends.org> In-Reply-To: User-Agent: Heirloom mailx 12.5 7/5/10 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20657-done Cc: bug-gnulib@gnu.org, 20657-done@debbugs.gnu.org, beebe@math.utah.edu X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Paul Eggert wrote: > On 4/24/22 06:21, arnold@skeeve.com wrote: > > I plan to add a test to gawk; perhaps grep would benefit from one as well? > > That'd need more than just a test, as we'd need to also modify regex.m4 > to arrange to replace any system regex that has this incompatibility > with gnulib regex. And we'd need to document the extension since we > shouldn't test undocumented features. Although such work could be done, > I expect it'd be a more productive use of our limited time to get this > extension into glibc first. I'll add that to my (long) list of things to do. OK - I agree that getting this into glibc is higher priority. Thanks, Arnold From unknown Tue Jun 24 22:35:36 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 23 May 2022 11:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator