From unknown Sat Jun 14 19:12:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 RE capture groups and backreferences Resent-From: Ed Morton Original-Sender: "Debbugs-submit" Resent-CC: bug-sed@gnu.org Resent-Date: Fri, 26 Jan 2024 04:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 68725 X-GNU-PR-Package: sed X-GNU-PR-Keywords: To: 68725@debbugs.gnu.org, bug-grep@gnu.org X-Debbugs-Original-To: bug-sed@gnu.org, bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.170624261729447 (code B ref -1); Fri, 26 Jan 2024 04:17:01 +0000 Received: (at submit) by debbugs.gnu.org; 26 Jan 2024 04:16:57 +0000 Received: from localhost ([127.0.0.1]:49827 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rTDeC-0007er-9V for submit@debbugs.gnu.org; Thu, 25 Jan 2024 23:16:57 -0500 Received: from lists.gnu.org ([2001:470:142::17]:41208) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rT2sv-0002fZ-7G for submit@debbugs.gnu.org; Thu, 25 Jan 2024 11:47:28 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rT2sj-0003GL-MR for bug-grep@gnu.org; Thu, 25 Jan 2024 11:47:13 -0500 Received: from resqmta-c1p-023832.sys.comcast.net ([2001:558:fd00:56::9]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rT2sa-00067w-5u for bug-grep@gnu.org; Thu, 25 Jan 2024 11:47:09 -0500 Received: from resomta-c1p-023265.sys.comcast.net ([96.102.18.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resqmta-c1p-023832.sys.comcast.net with ESMTP id SzawrqPM1USfCT2sSrqryj; Thu, 25 Jan 2024 16:46:56 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net; s=20190202a; t=1706201216; bh=G63CfaRvS3rcO514nmTBQ9vQ8vI8MwuFXaFq0bG/ErE=; h=Received:Received:Content-Type:Message-ID:Date:MIME-Version:From: Subject:To:Xfinity-Spam-Result; b=bdFTcS/u/xkIbQaA+UadDx/YGIZZiHep/nQDRxtaoqcqLwHCbOPIRiswwD+RpP6ad sJLLh0eDYyZWVv9AcRc2sXftNhxGBzgzndIyoQV7Y2gr7T4UX4b3BKv4XHVf/bGyy0 yy9oXK+J3rlhLMbc3dyH4Y57Q2tlS/HQckBBOE3D9xms4pifZEFMf9SMbIk19grywl 1K3HbFpic52Sg5tJ/wX2gFXdQfE8c1EqTT3rq76xM5zM4y/bPMlMNd1udYnrvQaibg OJHhRXe81ivRWzbehkXR+JJ6TQXGbs05If8mWUY0cuqP9mbdUQ5BzM5zr3Vg1syKcS nBVVhE/zBOfNA== Received: from [IPV6:::1] ([IPv6:2601:249:d01:7420:59c0:175b:1986:7915]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 256/256 bits) (Client did not present a certificate) by resomta-c1p-023265.sys.comcast.net with ESMTPSA id T2qxrPeCZFTibT2s6rJi02; Thu, 25 Jan 2024 16:46:35 +0000 Content-Type: multipart/alternative; boundary="------------fVSzpxdOAXNSVCXm0u1O39fP" Message-ID: Date: Thu, 25 Jan 2024 10:46:34 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Ed Morton Content-Language: en-US X-Antivirus: Avast (VPS 240125-4, 1/25/2024), Outbound message X-Antivirus-Status: Clean X-CMAE-Envelope: MS4xfEg2i0Y4e41Ep+k/1sifNhz4Zmcjb7RZV7ao4JCFBX3ogEtCEm6z8JC6pGGcGpT/kgblE4cCmA4HuJ5JqtdOHIP21kI6O2VwyF5Ijmlt/ozSNjinMgh0 kNT8xVUmw3fKHlf+/7lxGzF+c8NNPFwzD24Ig0UpaEm7EKp6MAQbJqlfuGbztg2Z48XRRZVUUYsyWX6waLmTguAPjeSOq+2iNi4dNX9pRBGi5MxtEdXVaLh8 KfYZSLrrVz94uGtHK4He5aK8jPvMIh5kRje0nk1B584= Received-SPF: pass client-ip=2001:558:fd00:56::9; envelope-from=mortoneccc@comcast.net; helo=resqmta-c1p-023832.sys.comcast.net X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Mailman-Approved-At: Thu, 25 Jan 2024 23:16:54 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) This is a multi-part message in MIME format. --------------fVSzpxdOAXNSVCXm0u1O39fP Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit There are issues (mostly common but some not) using a regexp like this: |^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$| with GNU grep and GNU sed, hence my contacting both mailing lists but apologies if that was the wrong starting point. This started out as a question on StackOverflow, (https://stackoverflow.com/questions/77820540/searching-palindromes-with-grep-e-egrep/77861446?noredirect=1#comment137299746_77861446) but my "answer" and some comments from there copied below so you don't have to look anywhere else for a description of the issues. Given this input file: |a| |ab| |abba| |abcdef| |abcba| |zufolo| |||Removing the `$` from the end of the regexp (i.e. making it less restrictive) produces fewer matches, which is the opposite of what it should do: a) With the `$` at the end of the regexp: $ grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample a abba abcba zufolo b) Without the `$` at the end of the regexp: $ grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample a abba abcba It's not just GNU grep that behaves strangely, GNU sed has the same behavior from the question when just matching with `sed -nE '/.../p' sample` as GNU `grep` does AND sed behaves differently if we're just doing a match vs if we're doing a match + replace. For example here's `sed` doing a match+replacement and behaving the same way as `grep` above: a) With the `$` at the end of the regexp: $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/&/p' sample a abba abcba zufolo b) Without the `$` at the end of the regexp: $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/&/p' sample a abba abcba but here's sed just doing a match and behaving differently from any of the above: a) With the `$` at the end of the regexp (note the extra `ab` in the output): $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/p' sample a ab abba abcba zufolo b) Without the `$` at the end of the regexp (note the extra `ab` and `abcdef` in the output): $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/p' sample a ab abba abcdef abcba zufolo Also interestingly this: $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/<&>/p' sample outputs: <>zufolo the last line of which means the regexp is apparently matching the start of the line and ignoring the `$` end-of-string metachar present in the regexp! The odd behavior isn't just associated with using `-E`, though, if I remove `-E` and just use [POSIX compliant BREs](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03) then: a) With the `$` at the end of the regexp: $ grep '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$' sample a abba abcba zufolo

$ sed -n 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/&/p' sample a abba abcba zufolo b) Without the `$` at the end of the regexp: $ grep '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1' sample a abba abcba

$ sed -n 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/&/p' sample a abba abcba and again just doing a match in sed below behaves differently from the sed match+replacements above: a) With the `$` at the end of the regexp: $ sed -n '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/p' sample a ab abba abcba zufolo b) Without the `$` at the end of the regexp: $ sed -n '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/p' sample a ab abba abcdef abcba zufolo The above shows that, given the same regexp, sed is apparently matching different strings depending on whether it's doing a substitution or not. These are the version I was using when testing above: $ grep --version | head -1 grep (GNU grep) 3.11 $ sed --version | head -1 sed (GNU sed) 4.9 It was later pointed out that grep in git-=bash produces an error message and core dumps given the original regexp above|, e.g. |grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample| and |grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample| both output|: a assertion "num >= 0" failed: file "regexec.c", line 1394, function: pop_fail_stack Aborted (core dumped)|. Sorry, I can't copy the core off that machine for corporate reasons. Those git-bash tests were using |$ echo $BASH_VERSION| |5.2.15(1)-release ||$ grep --version||grep (GNU grep) 3.0| |Regards, Ed Morton | --------------fVSzpxdOAXNSVCXm0u1O39fP Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit There are issues (mostly common but some not) using a regexp like this:

^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$
with GNU grep and GNU sed, hence my contacting both mailing lists but apologies if that was the wrong starting point.

This started out as a question on StackOverflow, (
https://stackoverflow.com/questions/77820540/searching-palindromes-with-grep-e-egrep/77861446?noredirect=1#comment137299746_77861446) but my "answer" and some comments from there copied below so you don't have to look anywhere else for a description of the issues.

Given this input file:
a
ab
abba
abcdef
abcba
zufolo

Removing the `$` from the end of the regexp (i.e. making it less restrictive) produces fewer matches, which is the opposite of what it should do:

a) With the `$` at the end of the regexp:

    $ grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample
    a
    abba
    abcba
    zufolo

b) Without the `$` at the end of the regexp:

    $ grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample
    a
    abba
    abcba

It's not just GNU grep that behaves strangely, GNU sed has the same behavior from the question when just matching with `sed -nE '/.../p' sample` as GNU `grep` does AND sed behaves differently if we're just doing a match vs if we're doing a match + replace.

For example here's `sed` doing a match+replacement and behaving the same way as `grep` above:

a) With the `$` at the end of the regexp:

    $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/&/p' sample
    a
    abba
    abcba
    zufolo

b) Without the `$` at the end of the regexp:

    $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/&/p' sample
    a
    abba
    abcba

but here's sed just doing a match and behaving differently from any of the above:

a) With the `$` at the end of the regexp (note the extra `ab` in the output):

    $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/p' sample
    a
    ab
    abba
    abcba
    zufolo

b) Without the `$` at the end of the regexp (note the extra `ab` and `abcdef` in  the output):

    $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/p' sample
    a
    ab
    abba
    abcdef
    abcba
    zufolo

Also interestingly this:

    $ sed -nE 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/<&>/p' sample

outputs:

    <a>
    <abba>
    <abcba>
    <>zufolo

the last line of which means the regexp is apparently matching the start of the line and ignoring the `$` end-of-string metachar present in the regexp! 

The odd behavior isn't just associated with using `-E`, though, if I remove `-E` and just use [POSIX compliant BREs](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03) then:

a) With the `$` at the end of the regexp:

    $ grep '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$' sample
    a
    abba
    abcba
    zufolo

<p>

    $ sed -n 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/&/p' sample
    a
    abba
    abcba
    zufolo

b) Without the `$` at the end of the regexp:

    $ grep '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1' sample
    a
    abba
    abcba

<p>

    $ sed -n 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/&/p' sample
    a
    abba
    abcba

and again just doing a match in sed below behaves differently from the sed match+replacements above:

a) With the `$` at the end of the regexp:

    $ sed -n '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1$/p' sample
    a
    ab
    abba
    abcba
    zufolo

b) Without the `$` at the end of the regexp:

    $ sed -n '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\4\3\2\1/p' sample
    a
    ab
    abba
    abcdef
    abcba
    zufolo

The above shows that, given the same regexp, sed is apparently matching different strings depending on whether it's doing a substitution or not.

These are the version I was using when testing above:

    $ grep --version | head -1
    grep (GNU grep) 3.11

    $ sed --version | head -1
    sed (GNU sed) 4.9

It was later pointed out that grep in git-=bash produces an error message and core dumps given the original regexp above, e.g.

    grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample

and

    grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample

both output:

    a
    assertion "num >= 0" failed: file "regexec.c", line 1394, function: pop_fail_stack
                                                                                      Aborted (core dumped).

Sorry, I can't copy the core off that machine for corporate reasons.

Those git-bash tests were using

    $ echo $BASH_VERSION
    5.2.15(1)-release

    $ grep --version
    grep (GNU grep) 3.0

Regards,

	Ed Morton
--------------fVSzpxdOAXNSVCXm0u1O39fP-- From unknown Sat Jun 14 19:12:49 2025 X-Loop: help-debbugs@gnu.org Subject: bug#68725: GNU grep and sed behaving unexpectedly with multiple 1-or-0 RE capture groups and backreferences Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-sed@gnu.org Resent-Date: Tue, 06 Feb 2024 07:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 68725 X-GNU-PR-Package: sed X-GNU-PR-Keywords: To: Ed Morton Cc: 68725@debbugs.gnu.org, bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.17072030018910 (code B ref -1); Tue, 06 Feb 2024 07:04:02 +0000 Received: (at submit) by debbugs.gnu.org; 6 Feb 2024 07:03:21 +0000 Received: from localhost ([127.0.0.1]:53138 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rXFUH-0002Je-2h for submit@debbugs.gnu.org; Tue, 06 Feb 2024 02:03:21 -0500 Received: from lists.gnu.org ([2001:470:142::17]:54332) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rXFUD-0002JC-WB for submit@debbugs.gnu.org; Tue, 06 Feb 2024 02:03:19 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rXFTu-0002A8-Bf for bug-grep@gnu.org; Tue, 06 Feb 2024 02:02:58 -0500 Received: from mail-pj1-f48.google.com ([209.85.216.48]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rXFTs-0003FI-CF for bug-grep@gnu.org; Tue, 06 Feb 2024 02:02:58 -0500 Received: by mail-pj1-f48.google.com with SMTP id 98e67ed59e1d1-290da27f597so384960a91.2 for ; Mon, 05 Feb 2024 23:02:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707202974; x=1707807774; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y/tz04FX95w1jYewn29AfE45D2op9C6G1c2X4LVyOX4=; b=XmIuMxnV+uL/8BYQYlQ8Pg6uAHkby51z7hHi26CASaU9n8HF3c8+WX1cnW1dcMPGeq 9RNje6KdgRz9pfYr1IJKVqBhQLChaUlnMdYhI/5euz4OQf8imS3mnec+d6ZMc6j7B+Hz gAbea3WwUQ8w4PONrTRztIy10t0hUVL/oAP40suaAm7Pxrr4M+DuLK3ag3AqedSaE7Rn nTFwxVZro+NhChRolqMVmdniGJl3mTEgJFPus8lbVv4D+JdXGhJYevXG1g/tp4lh+pD1 XOrRrqzBior0GwHU+LNJB60S2y7m9dbzRHP5tIoDsDnX956zJL6EZ5xrlLUPjWE1WyWq a4gA== X-Gm-Message-State: AOJu0YygNs3nyivdB8LjHoY9OAubKl1zxf5k6IJNpt1F7VUW9wJrq2Lq Cd8cM7daQIXd4phoiTKSCX7L5rY8v1nCTqfupbx/nc+0EuN74yCSNtG/NHzlO3OneDMjPV4wnH/ 02SMMeSM5jpAMEgrfyfx6t8LbmyyAs9w3E+w= X-Google-Smtp-Source: AGHT+IGiex44AMJJ+2v1Z3WaaDAgvKFzY+XjLH+yLV3++Rwk9iWSyKn4WECpaDuXnvZawiYRvv7C2uZIX3DYySuMABc= X-Received: by 2002:a17:90b:30cc:b0:296:88da:218b with SMTP id hi12-20020a17090b30cc00b0029688da218bmr1593522pjb.31.1707202974136; Mon, 05 Feb 2024 23:02:54 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Mon, 5 Feb 2024 23:02:42 -0800 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=209.85.216.48; envelope-from=meyering@gmail.com; helo=mail-pj1-f48.google.com X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On Fri, Jan 26, 2024 at =?UTF-8?Q?6:51=E2=80=AFAM?= Ed Morton wrote: > > There are issues (mostly common but some not) using a regexp like this: > > |^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$| > > wit [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (meyering[at]gmail.com) -0.0 T_SCC_BODY_TEXT_LINE No description available. 0.2 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) On Fri, Jan 26, 2024 at 6:51=E2=80=AFAM Ed Morton = wrote: > > There are issues (mostly common but some not) using a regexp like this: > > |^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$| > > with GNU grep and GNU sed, hence my contacting both mailing lists but > apologies if that was the wrong starting point. > > This started out as a question on StackOverflow, > (https://stackoverflow.com/questions/77820540/searching-palindromes-with-= grep-e-egrep/77861446?noredirect=3D1#comment137299746_77861446) > but my "answer" and some comments from there copied below so you don't > have to look anywhere else for a description of the issues. > > Given this input file: > > |a| > |ab| > |abba| > |abcdef| > |abcba| > |zufolo| > |||Removing the `$` from the end of the regexp (i.e. making it less > restrictive) produces fewer matches, which is the opposite of what it > should do: a) With the `$` at the end of the regexp: $ grep -E > '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample a abba abcba zufolo b) > Without the `$` at the end of the regexp: $ grep -E > '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1' sample a abba abcba Thanks for reporting that. This is as far as I've gotten for now, but this sure looks like a bug: $ echo zufolo | grep -E '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' zufolo Obviously, that string should not match. Note that it works properly with the -P option in place of that -E. > It's not just > GNU grep that behaves strangely, GNU sed has the same behavior from the > question when just matching with `sed -nE '/.../p' sample` as GNU `grep` > does AND sed behaves differently if we're just doing a match vs if we're > doing a match + replace. For example here's `sed` doing a > match+replacement and behaving the same way as `grep` above: a) With the > `$` at the end of the regexp: $ sed -nE > 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/&/p' sample a abba abcba zufolo b) > Without the `$` at the end of the regexp: $ sed -nE > 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/&/p' sample a abba abcba but here's > sed just doing a match and behaving differently from any of the above: > a) With the `$` at the end of the regexp (note the extra `ab` in the > output): $ sed -nE '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/p' sample a ab > abba abcba zufolo b) Without the `$` at the end of the regexp (note the > extra `ab` and `abcdef` in the output): $ sed -nE > '/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1/p' sample a ab abba abcdef abcba > zufolo Also interestingly this: $ sed -nE > 's/^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$/<&>/p' sample outputs: > <>zufolo the last line of which means the regexp is apparently > matching the start of the line and ignoring the `$` end-of-string > metachar present in the regexp! The odd behavior isn't just associated > with using `-E`, though, if I remove `-E` and just use [POSIX compliant > BREs](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09= .html#tag_09_03) > then: a) With the `$` at the end of the regexp: $ grep > '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\= 4\3\2\1$' > sample a abba abcba zufolo

$ sed -n > 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\= 5\4\3\2\1$/&/p' > sample a abba abcba zufolo b) Without the `$` at the end of the regexp: > $ grep > '^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5\= 4\3\2\1' > sample a abba abcba

$ sed -n > 's/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\= 5\4\3\2\1/&/p' > sample a abba abcba and again just doing a match in sed below behaves > differently from the sed match+replacements above: a) With the `$` at > the end of the regexp: $ sed -n > '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5= \4\3\2\1$/p' > sample a ab abba abcba zufolo b) Without the `$` at the end of the > regexp: $ sed -n > '/^\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\)\(.\{0,1\}\).\{0,1\}\5= \4\3\2\1/p' > sample a ab abba abcdef abcba zufolo The above shows that, given the > same regexp, sed is apparently matching different strings depending on > whether it's doing a substitution or not. These are the version I was > using when testing above: $ grep --version | head -1 grep (GNU grep) > 3.11 $ sed --version | head -1 sed (GNU sed) 4.9 It was later pointed > out that grep in git-=3Dbash produces an error message and core dumps > given the original regexp above|, e.g. |grep -E '^(.?)(.?)(.?)(.?)(.?).?\= 5\4\3\2\1' sample| and |grep -E > '^(.?)(.?)(.?)(.?)(.?).?\5\4\3\2\1$' sample| both output|: a assertion > "num >=3D 0" failed: file "regexec.c", line 1394, function: pop_fail_stac= k > Aborted (core dumped)|. Sorry, I can't copy the core off that machine > for corporate reasons. Those git-bash tests were using |$ echo > $BASH_VERSION| |5.2.15(1)-release ||$ grep --version||grep (GNU grep) 3.0= | > |Regards, Ed Morton | >