From unknown Fri Jun 20 07:09:57 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#42764 <42764@debbugs.gnu.org> To: bug#42764 <42764@debbugs.gnu.org> Subject: Status: csplit does not suppress the last match when not using {*} Reply-To: bug#42764 <42764@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:09:57 +0000 retitle 42764 csplit does not suppress the last match when not using {*} reassign 42764 coreutils submitter 42764 Emanuele Giacomelli severity 42764 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 08 10:51:24 2020 Received: (at submit) by debbugs.gnu.org; 8 Aug 2020 14:51:24 +0000 Received: from localhost ([127.0.0.1]:59456 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k4QBv-00022B-6M for submit@debbugs.gnu.org; Sat, 08 Aug 2020 10:51:24 -0400 Received: from lists.gnu.org ([209.51.188.17]:33800) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k4Kx6-0002nc-It for submit@debbugs.gnu.org; Sat, 08 Aug 2020 05:15:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55124) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k4Kx6-0001Ta-C9 for bug-coreutils@gnu.org; Sat, 08 Aug 2020 05:15:44 -0400 Received: from sonic307-53.consmr.mail.ir2.yahoo.com ([87.248.110.30]:39038) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1k4Kx3-000199-Ds for bug-coreutils@gnu.org; Sat, 08 Aug 2020 05:15:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.it; s=s2048; t=1596878136; bh=Ht4F+bUguyWk3nZYT0R9X7eCMBTPWNzOdnZJIZXUhoA=; h=Date:From:To:Subject:References:From:Subject; b=UQtMCnfeqMWImnjp7GcHl7suISMUCEKvAB0s6VobGbmYUTMU7m/AUKHq9dAoqQTT3qQI/yd1weSN8QVqzirZ6LYVz/bQQzF6JGU31kfYcmW2AXXCEs3CJF2SxJ6o1t5Y2LxDsKrbpPYyekpFGYxI5oMT1A2sPyGhAjDy6d+atpJPGJjyLoId/YzuU1CrTwe8sAFtvj1GyRmmAs8QimV+A0Uzi4nVo7lsEHv/DD5eJhBVygE9cHM5Q32hJU+7jNKZefaWNQ4Wjt1LJJ7MhgWLCgXuNQmHYgSRscH4luB6LwfDn4ZW/IYYuVjnlLssXM1wZIicgPCbeDF5tR0A3a10Ig== X-YMail-OSG: _Fxlsd4VM1mroHJb69zBpfWlGxo8q6.0usGmUSvUyYIPwT.9xj2COhtWOQuMmE5 lHNjHj1gs_VQwSeetHdQsccdl.2LNJrtfYQ1_VXSedZsWft.G3tm4VF_CLmsZKJkUFfOrWFC2wh4 DVV_zM28HAIIz1ZsgP8QJO4NVVE951DipQdXcipoPAnPURXQoa9S0y5vdnmfajacHotVb.MQfkko l9AZbhj9aHHVddB4pCAwbvXfnZulE4T2qcdDFmJHt3mYYdlYSH6mPre0aUus.y.O8ehCn0e9B14a pPxauL1Z1_p.wcs2aq7HUXtYK1jCCr0PQNKH_TcvL95gN9a9U43gcIpRbYgGk63WdvB4HhOi.K_Q AXZGFJ65N6hDcyk7l96O3HrGphwxuRgTjUi65HcidjWm1cXj0Wp7Q3HfSM.m_eHxIyAFYxmXi3VZ fMHP2c55ks5pmZmRMocAMUvmT_PYDwAonQ4HvElV6WSQQ6CMppRDZNnIIgyzlmzjyOy3oy..NYR6 bxkGc2jByVGb69EDCrkqUa4O4kp12BwE1nyXPXHcJY5S9PUF5IHdcJ7G36pKg9jfH5Xs5HKf5qA. nJYNuM7sk5PyJfv94I5D6iz8llkyCZaI.UfwY0MIsfVlXWYdBboQs6NHdGVckC5jowBJK3T0Gzq4 r7bn5bevAAllPBjzLxgUnZXvykkXpNC9LjvELt9QHLPBf1IT5D0sMvw8pvMJMP_whJygWx99TMsd rfVGRxTxkGt_X8MWkFZl5CisCtsJq6titgPQXAk7ETKSvhxfyyED5Le7vvzxGLjzmNNJ_qIVB9Mk J7DrnbfhLK0FklkwlMa0WC_erLTjKZ0f.6flxLS3kXpL7g1Jjngme18AgaAb73cHJ4x.JVS6zW8W t1sjgcz0i3CFSgiwjynfht52arC6AvW64Ldgq2pSAbvKbe10oEhDqjQJMqGKX8kW21mn5kgepcY0 0kCAv1Kh3kS5myyn.Ead.iX8V3qjTU2ZLeLu2cR5X9v9avMg4G2frnvc9eNqAUadjzMTPTnjccth jZKcFW4SvHv8B2XUmj1QWEVSSYLeq5ZJo5rBvhINuifIK.XhJnsRicu1ydhyvW_xlZRyqnv5EyaI WxdsE.idrG4aW3BQPeszySt0HtPkR8qgVt.t3WJfIWA_O6PQS4MVydgNxRaKyW1TndwuPE1K4Tmt XeR0nfQjXbTmdJi_ykqIkZYmUnILYqM4sjEMbfFujK2whaH1q3DzxeI9Ty3ZDtdU.A45P6_r9LO6 vWE1dw5jVndMSXDLIENkRmxggzMaYRDnaLiDnCB3aCk58MHGFD9cvzaiAq9jxoB_lhuvNulE0k56 .bQyGLDdAh2k5coaEME6cj.JaUZjUCh.MxzcKRpG.J9_NmPbpFgZLKoiqi6elz4Yk3fpIYVrluQ8 w Received: from sonic.gate.mail.ne1.yahoo.com by sonic307.consmr.mail.ir2.yahoo.com with HTTP; Sat, 8 Aug 2020 09:15:36 +0000 Date: Sat, 8 Aug 2020 09:12:51 +0000 (UTC) From: Emanuele Giacomelli To: "bug-coreutils@gnu.org" Message-ID: <1958929830.2286133.1596877971233@mail.yahoo.com> Subject: csplit does not suppress the last match when not using {*} MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_2286132_1667625265.1596877971233" References: <1958929830.2286133.1596877971233.ref@mail.yahoo.com> X-Mailer: WebService/1.1.16436 YMailNorrin Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0 Content-Length: 7399 Received-SPF: pass client-ip=87.248.110.30; envelope-from=vpooldyn-linux@yahoo.it; helo=sonic307-53.consmr.mail.ir2.yahoo.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/08 05:15:36 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -30 X-Spam_score: -3.1 X-Spam_bar: --- X-Spam_report: (-3.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 08 Aug 2020 10:51:22 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) ------=_Part_2286132_1667625265.1596877971233 Content-Type: multipart/alternative; boundary="----=_Part_2286131_974301236.1596877971231" ------=_Part_2286131_974301236.1596877971231 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Good day, I am experiencing an odd behaviour in csplit which may actually be a bug. I am testing this against the code cloned from https://github.com/coreutils/coreutils.git, on the commit described by git as v8.32-52-gc0e5f8c59. Suppose I have the following YAML file: =3D=3D> test.yaml <=3D=3D value1: 123 --- value2: 456 --- value3: 789 and I want to split it at '---' lines. First I would try the following: =C2=A0=C2=A0=C2=A0 csplit -z --suppress-matched test.yaml '/^---$/' '{1}' which outputs: =C2=A0=C2=A0=C2=A0 12 =C2=A0=C2=A0=C2=A0 12 =C2=A0=C2=A0=C2=A0 16 and creates the following files: =C2=A0=C2=A0=C2=A0 =3D=3D> xx00 <=3D=3D =C2=A0=C2=A0=C2=A0 value1: 123 =C2=A0=C2=A0=C2=A0 =3D=3D> xx01 <=3D=3D =C2=A0=C2=A0=C2=A0 value2: 456 =C2=A0=C2=A0=C2=A0 =3D=3D> xx02 <=3D=3D =C2=A0=C2=A0=C2=A0 --- =C2=A0=C2=A0=C2=A0 value3: 789 The last portion still contains the '---', despite it being suppressed from the second part. Now, if I try again with: =C2=A0=C2=A0=C2=A0 csplit -z --suppress-matched test.yaml '/^---$/' '{*}' I get: =C2=A0=C2=A0=C2=A0 12 =C2=A0=C2=A0=C2=A0 12 =C2=A0=C2=A0=C2=A0 12 and: =C2=A0=C2=A0=C2=A0 =3D=3D> xx00 <=3D=3D =C2=A0=C2=A0=C2=A0 value1: 123 =C2=A0=C2=A0=C2=A0 =3D=3D> xx01 <=3D=3D =C2=A0=C2=A0=C2=A0 value2: 456 =C2=A0=C2=A0=C2=A0 =3D=3D> xx02 <=3D=3D =C2=A0=C2=A0=C2=A0 value3: 789 where the last part does not contain the matched line, as expected. While trying to figure out the problem, I noticed that match suppression is done at the beginning of process_regexp. For a match-twice scenario like the first one, the function is called twice, then the rest of the file is simply dumped by split_file. This means that the two calls to process_regexp will: * suppress nothing for call #1 because nothing has been matched yet; * suppress the first match in call #2. Then, the rest of the file is dumped but no one actually suppressed the second match, which appears in the last segment. When using asterisk repetition, the file is instead dumped by process_regexp, which gets its chance to suppress the matched line. I came up with the attached patch, which simply moves match suppression at the end of process_regexp. With this modification, the invocation: =C2=A0=C2=A0=C2=A0 csplit -z --suppress-matched test.yaml '/^---$/' '{1}' now produces: =C2=A0=C2=A0=C2=A0 12 =C2=A0=C2=A0=C2=A0 12 =C2=A0=C2=A0=C2=A0 12 and: =3D=3D> xx00 <=3D=3D value1: 123 =3D=3D> xx01 <=3D=3D value2: 456 =3D=3D> xx02 <=3D=3D value3: 789 which is what I would expect. ------=_Part_2286131_974301236.1596877971231 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Good day,

I am experiencing an o= dd behaviour in csplit which may actually be a
bug.

I am testing = this against the code cloned from
https://github.com/coreutils/coreutils= .git, on the commit described by
git as v8.32-52-gc0e5f8c59.

Supp= ose I have the following YAML file:

=3D=3D> test.yaml <=3D=3D<= br>value1: 123
---
value2: 456
---
value3: 789

and I wan= t to split it at '---' lines. First I would try the following:

 = ;   csplit -z --suppress-matched test.yaml '/^---$/' '{1}'
which outputs:

    12
    12
&n= bsp;   16
=1C
and creates the following files:

 = ;   =3D=3D> xx00 <=3D=3D
    value1: 123<= br>
    =3D=3D> xx01 <=3D=3D
    = value2: 456

    =3D=3D> xx02 <=3D=3D
 &= nbsp;  ---
    value3: 789

The last portion s= till contains the '---', despite it being suppressed
from the second par= t.

Now, if I try again with:

    csplit -z --s= uppress-matched test.yaml '/^---$/' '{*}'

I get:

  =   12
    12
    12

and:
=
    =3D=3D> xx00 <=3D=3D
    val= ue1: 123

    =3D=3D> xx01 <=3D=3D
 &nbs= p;  value2: 456

    =3D=3D> xx02 <=3D=3D    value3: 789

where the last part does not contai= n the matched line, as expected.

While trying to figure out the prob= lem, I noticed that match suppression
is done at the beginning of proces= s_regexp. For a match-twice scenario
like the first one, the function is= called twice, then the rest of the
file is simply dumped by split_file.=

This means that the two calls to process_regexp will:

* supp= ress nothing for call #1 because nothing has been matched yet;
* suppres= s the first match in call #2.

Then, the rest of the file is dumped b= ut no one actually suppressed the
second match, which appears in the las= t segment. When using asterisk
repetition, the file is instead dumped by= process_regexp, which gets its
chance to suppress the matched line.
=
I came up with the attached patch, which simply moves match suppression=
at the end of process_regexp. With this modification, the invocation:
    csplit -z --suppress-matched test.yaml '/^---$/' '= {1}'

now produces:

    12
   = ; 12
    12

and:

=3D=3D> xx00 <=3D=3D=
value1: 123

=3D=3D> xx01 <=3D=3D
value2: 456

=3D= =3D> xx02 <=3D=3D
value3: 789

which is what I would expect.=

------=_Part_2286131_974301236.1596877971231-- ------=_Part_2286132_1667625265.1596877971233 Content-Type: text/x-patch Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="=?UTF-8?b?cGF0Y2gucGF0Y2g=?=" Content-ID: <613cbfdd-d68a-6cdd-840c-aa4c66639c56@yahoo.com> ZGlmZiAtLWdpdCBhL3NyYy9jc3BsaXQuYyBiL3NyYy9jc3BsaXQuYwppbmRleCA5YmQ5YzQzYjUu LjkzZmY2MGRjNiAxMDA2NDQKLS0tIGEvc3JjL2NzcGxpdC5jCisrKyBiL3NyYy9jc3BsaXQuYwpA QCAtODAzLDkgKzgwMyw2IEBAIHByb2Nlc3NfcmVnZXhwIChzdHJ1Y3QgY29udHJvbCAqcCwgdWlu dG1heF90IHJlcGV0aXRpb24pCiAgIGlmICghaWdub3JlKQogICAgIGNyZWF0ZV9vdXRwdXRfZmls ZSAoKTsKIAotICBpZiAoc3VwcHJlc3NfbWF0Y2hlZCAmJiBjdXJyZW50X2xpbmUgPiAwKQotICAg IHJlbW92ZV9saW5lICgpOwotCiAgIC8qIElmIHRoZXJlIGlzIG5vIG9mZnNldCBmb3IgdGhlIHJl Z3VsYXIgZXhwcmVzc2lvbiwgb3IKICAgICAgaXQgaXMgcG9zaXRpdmUsIHRoZW4gaXQgaXMgbm90 IG5lY2Vzc2FyeSB0byBidWZmZXIgdGhlIGxpbmVzLiAqLwogCkBAIC04OTMsNiArODkwLDkgQEAg cHJvY2Vzc19yZWdleHAgKHN0cnVjdCBjb250cm9sICpwLCB1aW50bWF4X3QgcmVwZXRpdGlvbikK IAogICBpZiAocC0+b2Zmc2V0ID4gMCkKICAgICBjdXJyZW50X2xpbmUgPSBicmVha19saW5lOwor CisgIGlmIChzdXBwcmVzc19tYXRjaGVkKQorICAgIHJlbW92ZV9saW5lICgpOwogfQogCiAvKiBT cGxpdCB0aGUgaW5wdXQgZmlsZSBhY2NvcmRpbmcgdG8gdGhlIGNvbnRyb2wgcmVjb3JkcyB3ZSBo YXZlIGJ1aWx0LiAqLwo= ------=_Part_2286132_1667625265.1596877971233-- From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 08 16:57:03 2020 Received: (at 42764-done) by debbugs.gnu.org; 8 Aug 2020 20:57:03 +0000 Received: from localhost ([127.0.0.1]:59666 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k4Vtm-0004Wp-Pc for submit@debbugs.gnu.org; Sat, 08 Aug 2020 16:57:03 -0400 Received: from mail-wm1-f52.google.com ([209.85.128.52]:35921) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1k4Vti-0004WE-Do; Sat, 08 Aug 2020 16:57:00 -0400 Received: by mail-wm1-f52.google.com with SMTP id 3so4886846wmi.1; Sat, 08 Aug 2020 13:56:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=hliAMcX/c6iY/KYtI3jJRimBJ6+tK9oSPhLOOTCcziw=; b=ZU3PmJ0paaoINgdgl3F/wmMziEsdjEY+JQXx2EcjMbawW7mktWPXYcBj75Pmn2D4ym euQQLMeQws+m/U/NJTR+7nJWp7TUCx+YdpoKoWlOfl1IjlLdB4HEaRD8/DRbI78c7NPG rM97wyQihgFM8RCjpBUCCyVkpBR6uajgq4RHN/eV3zaQcZYsZH/yqX2n4U+L6gwCET7v XkBHWWXyEh8IkEfEq9F8Jdue8ZVx61Yd3co/in9O3yhRNtgmKCeo8/kQjf//fnT30HE4 Mp85qykTqLZZ62vfAMtwzM9TPB90VwiXV+KMZc5a/4YEEaPPEDqNRwCXKXWB3DdHGz1R R2cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language; bh=hliAMcX/c6iY/KYtI3jJRimBJ6+tK9oSPhLOOTCcziw=; b=Nd2SYrcQb069QVlb/XUEkWCZ+yqfubq5U61piTXjMMJd1OZGc2r9OM99bbvfQmrziK QPdVQAhygQ+ujPVQlCNjUW0OT5fhS+uuI0iBcWGbUUOAhRVRTku4hPQJit04zDVCFYJy 6g09+uDFzoHWZuXTkNMGDqMvEkUQw2/5xEWqIwUSeaBma+FhMxigsG8LKEW2ptBTBij9 Ff/D7fi9V4+K+FEl6WSg8xJsQj6hu9A7XbPHp6QYA1w+TgXnbUv7q9ra270kYUooOpnS fHAaxxMZaSGtLfjTSrDTuVPuyvGg3u1JnaFfyzoiTj93tHXk+kWALDRG6YZULH8WxNdF ejpQ== X-Gm-Message-State: AOAM531E5cv378gJmKasGSuisfRhJsaPzNRblKrtJz/Nkaptf/fe7ZRr a1Az0RnJMf1vdyiWAV7ze8GVMy3p X-Google-Smtp-Source: ABdhPJy4mkzxEMa2aLUukBElAuPoFId8MNhnjob3gvHrsTitPSkgfTVMR/4bwfC4/6OOnU6yUzzHug== X-Received: by 2002:a1c:23c4:: with SMTP id j187mr18619998wmj.58.1596920211661; Sat, 08 Aug 2020 13:56:51 -0700 (PDT) Received: from localhost.localdomain (86-42-14-227-dynamic.agg2.lod.rsl-rtd.eircom.net. [86.42.14.227]) by smtp.googlemail.com with ESMTPSA id 111sm15399528wrc.53.2020.08.08.13.56.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 08 Aug 2020 13:56:50 -0700 (PDT) Subject: Re: bug#42764: csplit does not suppress the last match when not using {*} To: Emanuele Giacomelli , 42764-done@debbugs.gnu.org References: <1958929830.2286133.1596877971233.ref@mail.yahoo.com> <1958929830.2286133.1596877971233@mail.yahoo.com> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <820d4ead-bed8-0ae6-90a3-5bcc3e056e5e@draigBrady.com> Date: Sat, 8 Aug 2020 21:56:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:80.0) Gecko/20100101 Thunderbird/80.0 MIME-Version: 1.0 In-Reply-To: <1958929830.2286133.1596877971233@mail.yahoo.com> Content-Type: multipart/mixed; boundary="------------8E53F67AD5701282DF57B5A6" Content-Language: en-US X-Spam-Score: -1.5 (-) X-Debbugs-Envelope-To: 42764-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.5 (--) This is a multi-part message in MIME format. --------------8E53F67AD5701282DF57B5A6 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 08/08/2020 10:12, Emanuele Giacomelli via GNU coreutils Bug Reports wrote: > Good day, > > I am experiencing an odd behaviour in csplit which may actually be a > bug. > > I am testing this against the code cloned from > https://github.com/coreutils/coreutils.git, on the commit described by > git as v8.32-52-gc0e5f8c59. > > Suppose I have the following YAML file: > > ==> test.yaml <== > value1: 123 > --- > value2: 456 > --- > value3: 789 > > and I want to split it at '---' lines. First I would try the following: > >     csplit -z --suppress-matched test.yaml '/^---$/' '{1}' > > which outputs: > >     12 >     12 >     16 > > and creates the following files: > >     ==> xx00 <== >     value1: 123 > >     ==> xx01 <== >     value2: 456 > >     ==> xx02 <== >     --- >     value3: 789 > > The last portion still contains the '---', despite it being suppressed > from the second part. > > Now, if I try again with: > >     csplit -z --suppress-matched test.yaml '/^---$/' '{*}' > > I get: > >     12 >     12 >     12 > > and: > >     ==> xx00 <== >     value1: 123 > >     ==> xx01 <== >     value2: 456 > >     ==> xx02 <== >     value3: 789 > > where the last part does not contain the matched line, as expected. > > While trying to figure out the problem, I noticed that match suppression > is done at the beginning of process_regexp. For a match-twice scenario > like the first one, the function is called twice, then the rest of the > file is simply dumped by split_file. > > This means that the two calls to process_regexp will: > > * suppress nothing for call #1 because nothing has been matched yet; > * suppress the first match in call #2. > > Then, the rest of the file is dumped but no one actually suppressed the > second match, which appears in the last segment. When using asterisk > repetition, the file is instead dumped by process_regexp, which gets its > chance to suppress the matched line. > > I came up with the attached patch, which simply moves match suppression > at the end of process_regexp. With this modification, the invocation: > >     csplit -z --suppress-matched test.yaml '/^---$/' '{1}' > > now produces: > >     12 >     12 >     12 > > and: > > ==> xx00 <== > value1: 123 > > ==> xx01 <== > value2: 456 > > ==> xx02 <== > value3: 789 > > which is what I would expect. > I agree with this analysis. The usual manifestation would probably be when there was only a single match. I.E. when not specifying a repetition count, we were not suppressing the single match. I'll apply the attached in your name later today (which also adds a test). Marking this as done. thanks! Pádraig --------------8E53F67AD5701282DF57B5A6 Content-Type: text/x-patch; charset=UTF-8; name="csplit--suppress-last.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="csplit--suppress-last.patch" >From 7cf45f4f6a093a927d3139c87f52999dd2c750ec Mon Sep 17 00:00:00 2001 From: Emanuele Giacomelli Date: Sat, 8 Aug 2020 21:29:13 +0100 Subject: [PATCH] csplit: fix regex suppression with specific match count * src/csplit.c (process_regexp): Process the line suppression in all invocations so that the last match is suppressed. Previously with a non infinite match count, the last regex pattern was not suppressed. * NEWS: Mention the bug fix. * tests/misc/csplit-suppress-matched.pl: Add a test case. Fixes https://bugs.gnu.org/42764 --- NEWS | 4 ++++ src/csplit.c | 6 +++--- tests/misc/csplit-suppress-matched.pl | 12 +++++++++--- 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/NEWS b/NEWS index 1881de115..61b711611 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,10 @@ GNU coreutils NEWS -*- outline -*- is a non regular file. [bug introduced in coreutils-8.6] + csplit --suppress-matched now elides the last matched line + when a specific number of pattern matches are performed. + [bug introduced with the --suppress-matched feature in coreutils-8.22] + du no longer crashes on XFS file systems when the directory hierarchy is heavily changed during the run. [bug introduced in coreutils-8.25] diff --git a/src/csplit.c b/src/csplit.c index 9bd9c43b5..93ff60dc6 100644 --- a/src/csplit.c +++ b/src/csplit.c @@ -803,9 +803,6 @@ process_regexp (struct control *p, uintmax_t repetition) if (!ignore) create_output_file (); - if (suppress_matched && current_line > 0) - remove_line (); - /* If there is no offset for the regular expression, or it is positive, then it is not necessary to buffer the lines. */ @@ -893,6 +890,9 @@ process_regexp (struct control *p, uintmax_t repetition) if (p->offset > 0) current_line = break_line; + + if (suppress_matched) + remove_line (); } /* Split the input file according to the control records we have built. */ diff --git a/tests/misc/csplit-suppress-matched.pl b/tests/misc/csplit-suppress-matched.pl index 80f5299d0..e15ebb0f2 100755 --- a/tests/misc/csplit-suppress-matched.pl +++ b/tests/misc/csplit-suppress-matched.pl @@ -67,21 +67,27 @@ my @csplit_tests = {OUTPUTS => [ "a\na\nYY\n", "\nXX\nb\nb\nYY\n","\nXX\nc\nYY\n", "\nXX\nd\nd\nd\n" ] }], - # the newline (matched line) does not appears in the output files + # the newline (matched line) does not appear in the output files ["re-1", " --suppress-matched -q - '/^\$/' '{*}'", {IN_PIPE => $IN_UNIQ}, {OUTPUTS => ["a\na\nYY\n", "XX\nb\nb\nYY\n", "XX\nc\nYY\n", "XX\nd\nd\nd\n"]}], - # the 'XX' (matched line + offset 1) does not appears in the output files. + # the 'XX' (matched line + offset 1) does not appear in the output files. # the newline appears in the files (before each split, at the end of the file) ["re-2", "--suppress-matched -q - '/^\$/1' '{*}'", {IN_PIPE => $IN_UNIQ}, {OUTPUTS => ["a\na\nYY\n\n","b\nb\nYY\n\n","c\nYY\n\n","d\nd\nd\n"]}], - # the 'YY' (matched line + offset of -1) does not appears in the output files + # the 'YY' (matched line + offset of -1) does not appear in the output files # the newline appears in the files (as the first line of the new split) ["re-3", " --suppress-matched -q - '/^\$/-1' '{*}'", {IN_PIPE => $IN_UNIQ}, {OUTPUTS => ["a\na\n", "\nXX\nb\nb\n", "\nXX\nc\n", "\nXX\nd\nd\nd\n"]}], + # the last matched line for a non infinite match repetition is suppressed. + # Up to and including coreutils 8.32, the last match was output. + ["re-4", " --suppress-matched -q - '/^\$/' '{2}'", {IN_PIPE => $IN_UNIQ}, + {OUTPUTS => ["a\na\nYY\n", "XX\nb\nb\nYY\n", "XX\nc\nYY\n", + "XX\nd\nd\nd\n"]}], + # Test two consecutive matched lines # without suppress-matched, the second file should contain a single newline. ["re-4.1", "-q - '/^\$/' '{*}'", {IN_PIPE => "a\n\n\nb\n"}, -- 2.26.2 --------------8E53F67AD5701282DF57B5A6-- From unknown Fri Jun 20 07:09:57 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 06 Sep 2020 11:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator