From unknown Sat Jun 14 03:49:38 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#31979 <31979@debbugs.gnu.org> To: bug#31979 <31979@debbugs.gnu.org> Subject: Status: csplit: a regexp pattern does not consider the negative offset of a previous regexp pattern Reply-To: bug#31979 <31979@debbugs.gnu.org> Date: Sat, 14 Jun 2025 10:49:38 +0000 retitle 31979 csplit: a regexp pattern does not consider the negative offse= t of a previous regexp pattern reassign 31979 coreutils submitter 31979 St=C3=A9phane Campinas severity 31979 normal tag 31979 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Jun 26 11:11:51 2018 Received: (at submit) by debbugs.gnu.org; 26 Jun 2018 15:11:51 +0000 Received: from localhost ([127.0.0.1]:36377 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fXpdG-0001KZ-9p for submit@debbugs.gnu.org; Tue, 26 Jun 2018 11:11:51 -0400 Received: from eggs.gnu.org ([208.118.235.92]:36335) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fXjH6-0006Gt-KC for submit@debbugs.gnu.org; Tue, 26 Jun 2018 04:24:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fXjH0-0008Bb-Gc for submit@debbugs.gnu.org; Tue, 26 Jun 2018 04:24:27 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_20,FREEMAIL_FROM, FROM_EXCESS_BASE64,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:52951) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fXjH0-0008BV-CT for submit@debbugs.gnu.org; Tue, 26 Jun 2018 04:24:26 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55532) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fXjGy-00083Q-Pe for bug-coreutils@gnu.org; Tue, 26 Jun 2018 04:24:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fXjGu-00082x-OS for bug-coreutils@gnu.org; Tue, 26 Jun 2018 04:24:24 -0400 Received: from mail-wm0-x229.google.com ([2a00:1450:400c:c09::229]:36243) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fXjGu-00081T-GY for bug-coreutils@gnu.org; Tue, 26 Jun 2018 04:24:20 -0400 Received: by mail-wm0-x229.google.com with SMTP id u18-v6so791085wmc.1 for ; Tue, 26 Jun 2018 01:24:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=om8NSvBH3BgoQvxWI0q/6BfKRtT19BS8DCVV69p4d/U=; b=cOPHN+6YhnKRwZO8MlWFpNLP6Csy3REs6Vvu2Tj5VnNMUMdJgMmYDAM3pSakdz+RcB gDtY3UNp/XGj/Ie/IGF9TJaWdrOfkkwRH0GDZ6woa0JxbhzSd7JM3RyRg9oPCODAbc2o CeeLsPapiTqz063H1wT0N6WTEcSfjp7EsshJLJQKdLXEi7nxfe5mPmzyAKVM/QETS7Wj /R92cL8HLnHWGbdKdvN9GraJtjVtaIVeqRAljd785KxtUBirLBpdr6/f0VNHVZf/jtOU D6AS7DrMUirDm7tupASwhBjr8sRlEs0pJeQlRwczDbEtOIe4bVx8N29xh+XSyECdIPHE tr8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=om8NSvBH3BgoQvxWI0q/6BfKRtT19BS8DCVV69p4d/U=; b=glDJ3z0E7WxD1fvysH8XsSPiVOZmkYRvOQq7kB/Iu3v/co0d2+xQk5nUYOicI9m8ZB g06m0ozsWvvBTMX377Xhj6WYBG5H3yNpVIKQo4ZaQ5EoLfUv0QYU0m1RJ4ORQH+qJxMH dwQxJkCeNE8F3+ueUaaEGudhjlQrXZYCPYA2dv1loHKp9tKNbFmeYqUMmW4Iiv/M/GVb BaWQQ0uPC7loh97hvVeXJhvPfp9NPbEpOlFihQlSXJufn1f6+GIxKdS8xcomKuXdiY/T QyUuNYN6K+zi2G8mX+14AMQGcntfJVjO91twciBaUaiVK5+l+VEWDXglA2sz93HTgnrk 3Wdw== X-Gm-Message-State: APt69E091t+RG6R3ezO+7+cON22V62iMPS1qDFtX7qOyzssu9IjhwWj7 gYWfnMM+GTUY7fN8MyBg3BikDpDg X-Google-Smtp-Source: AAOMgpc62aKQoIWD10yH5ld+e9zqDHrhVMbh+tDlbFRBeL54CfzXIApuXj4ellkNhGwbpNXyMKuCYw== X-Received: by 2002:a1c:f20d:: with SMTP id s13-v6mr745022wmc.36.1530001459045; Tue, 26 Jun 2018 01:24:19 -0700 (PDT) Received: from mars.localdomain (acme.u-bordeaux.fr. [147.210.143.140]) by smtp.googlemail.com with ESMTPSA id v10-v6sm1550734wrn.97.2018.06.26.01.24.17 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Jun 2018 01:24:17 -0700 (PDT) Date: Tue, 26 Jun 2018 10:24:29 +0200 From: =?utf-8?B?U3TDqXBoYW5l?= Campinas To: bug-coreutils@gnu.org Subject: csplit: a regexp pattern does not consider the negative offset of a previous regexp pattern Message-ID: <20180626082429.om5ld4xqqxmx6bh6@mars.localdomain> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="ukvcen6qas5baa4y" Content-Disposition: inline User-Agent: NeoMutt/20180622 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -3.9 (---) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 26 Jun 2018 11:11:49 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.9 (----) --ukvcen6qas5baa4y Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, When using two consecutive regexp patterns with a negative offset applied to the first one, the second one doesn't start its input section after the offset. =46rom the invocation [0] page it should: > [...] If it is given, the input up to (but not including) the matching line plus or minus offset is put into the output file, and the line after that begins the next section of input. Here is an example of the problem, where I want to split a file of 50 lines having a number on each line, ranging from 1 to 50. # My environment: - Linux mars 4.17.2-1-ARCH #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86= _64 GNU/Linux - csplit (GNU coreutils) 8.29 # A failing example with the unexpected behavior $ csplit numbers50.txt /15/-5 /12/ 18 csplit: =E2=80=98/12/=E2=80=99: match not found 123 # A working example when using a regexp pattern followed by a linenum patte= rn $ csplit numbers50.txt /15/-5 12 18 6 117 =09 $ head xx* =3D=3D> xx00 <=3D=3D 1 2 3 4 5 6 7 8 9 =09 =3D=3D> xx01 <=3D=3D 10 11 =09 =3D=3D> xx02 <=3D=3D 12 13 14 15 16 17 18 19 20 21 I think that both should work and output the same thing. I have found this while trying to port csplit to rust at [1] for some more information, as I have tried to understand the cause of this behavior in the code. Cheers, [0] https://www.gnu.org/software/coreutils/manual/html_node/csplit-invocati= on.html#csplit-invocation [1] https://github.com/uutils/coreutils/issues/501#issuecomment-399569870 --=20 Stephane Campinas --ukvcen6qas5baa4y Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEenizR7QLGj2mNf8ZgnJmQjakLC8FAlsx+D0ACgkQgnJmQjak LC/vHQ/8Cyw5Dckvm/rYa5TOfmtp+R1AtdeBTd6M8xpZThnHN2boPaat/P7NPmfg gJIZisZbQZbYs/3sGzyyjSo8S4OIJJaJpeFs9xi39xGh1i/gEXvZg8HbjBX/8dqC Z0jyDupUmeNjnMEIrrzkhxf0RofE3xznKdPSs2iLhXkiluk2PhszHyk1JGjc4Myo AW6eHLMwS2Px5uvXhr0wmMcETnPlGmoJ5BfUhfXHUlDNu1tcd6tTgaq2ZdZ0JWz2 OB2YAqYuNVu0tC4LuB05/66yrwRgGvTeX1PZ8/08MwKlGv9z1qADlnmB/PyHb/Zg EqPN2WnrAx98bduIk/ERzfWp3o64rz/vdlPk0eQO+OiNzMlXkswGND5Douwk0eMO XMadZTde97hCshraJ1WMvERAVzTYZHMJs78BE+LFNYQBNfqAdAxipuC4LDNTqQPb GAgd2J/Dy7KfasrJCEZ8wrs5Gq2jaVKw8nHCkemtTc+6ZpVZC92BFMA/e984GUjO /P6eqJtA6qnFIilmsb4lOvEn6Dt90WEASSytyvtA6qudngAWi0/CgnhIdXuGodGv /A7r1+JHqnNkGgiCAxflx5ncDUUf4wSoYJyBK20BQo/oUgJqMcfXQWyAduEjnQdj r4dt0N0MVYWkqU/CUzfcZWUHTQv+K+oNznqpfXgjSeEG3/pZM0Q= =nNgD -----END PGP SIGNATURE----- --ukvcen6qas5baa4y-- From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 24 08:40:54 2018 Received: (at 31979) by debbugs.gnu.org; 24 Sep 2018 12:40:54 +0000 Received: from localhost ([127.0.0.1]:51458 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g4QAX-00040U-Rj for submit@debbugs.gnu.org; Mon, 24 Sep 2018 08:40:54 -0400 Received: from mail-wr1-f48.google.com ([209.85.221.48]:44910) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g4QAX-00040H-86 for 31979@debbugs.gnu.org; Mon, 24 Sep 2018 08:40:53 -0400 Received: by mail-wr1-f48.google.com with SMTP id v16-v6so19515380wro.11 for <31979@debbugs.gnu.org>; Mon, 24 Sep 2018 05:40:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=VxqoVIZ7FOU0KPCu6fBGavNU/REELmwZl20rWUW2Xa8=; b=pS58+H1u5o4BcOZAJ2SqMBzDjb73xM24+uywCpWrxqL1GvxuTeZJJOHSudyDN9Prz0 8mHh0IrRuc+EnzOxEv/UhDlMo24BY0oMtUTaEtG5TZz+gu1DvYdpshWRP2G4LUqlyg4D QwAnh2DNAhuPM41cSMS5t5iF9VxNIPA5kENbKc+EzvJYhk/szRpIvDMqTqVoU9RMAybQ DeUD98Su7E0S6SAf4AKcPPQl1eZkBe9ev8NlINP1FsA99sWDf3P+QUM/Uycim9uVb++a wdm+TgGNlfkyBmeW98G29sUIh9M43q3HXpumRoiUVdVaQwxjn33h74t+cyge0IoXDGgu zJ7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=VxqoVIZ7FOU0KPCu6fBGavNU/REELmwZl20rWUW2Xa8=; b=V8zrS3ZdpHoe6GxbnZA/atrviUelF17r0DC7ppYV9OwmMNuLFgDa1olJPaf3Pdipwi VUWA9KpiUb+F9RBkKUZm4K9UH+yC0g6xPuYZTusVILM/L+W/cPmVYgF625ceX3+qdPWY 4oqK88GO6dEvE9eeQgJFo5E7j+3YCyKi2vVr9997s+ou69EUa2WLfjp4nUqE9gmGYTpE WMFGt9V8nLV+4nY3WSdb7NmcSh5d/ASwURtvPjJvRa4pDEZXqxpJ4eAcnxXfUtuGTCoP ss00bFw0Ta5DNgDaz/HjEjbIRcw8TNtEl5kILsYb+1rDzkatOK6orphFQAp0Vrz6Ybc8 QCbw== X-Gm-Message-State: ABuFfogm4OxTJq2yjFAOLpGwQmbSyxR3b8kC09HOdKzfeoiHSIc4XvRr 1AaQiRFxek+kRBjRBq21DlgfJnQK X-Google-Smtp-Source: ACcGV61fS0WKF86XvwWHx0eW5pP+9k2J06O056CrdCLAppyV+GQz7uaPGlMT1jsCEb+D10DDCfpGFQ== X-Received: by 2002:a5d:68c9:: with SMTP id p9-v6mr8377468wrw.108.1537792846996; Mon, 24 Sep 2018 05:40:46 -0700 (PDT) Received: from mars.localdomain (acme.ensc.fr. [147.210.143.140]) by smtp.googlemail.com with ESMTPSA id f9-v6sm5659182wmc.24.2018.09.24.05.40.46 for <31979@debbugs.gnu.org> (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 24 Sep 2018 05:40:46 -0700 (PDT) Date: Mon, 24 Sep 2018 14:40:45 +0200 From: =?utf-8?B?U3TDqXBoYW5l?= Campinas To: 31979@debbugs.gnu.org Subject: csplit: a regexp pattern does not consider the negative offset of a Message-ID: <20180924124045.bjbxwg4vovg7qnax@mars.localdomain> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="lfig2qyorppyvfg4" Content-Disposition: inline User-Agent: NeoMutt/20180716 X-Spam-Score: 0.1 (/) X-Debbugs-Envelope-To: 31979 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.9 (/) --lfig2qyorppyvfg4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, After attempting to port csplit, I think I understand why it is like that: it is to stop the iteration in case a pattern should be executed several times. Therefore, maybe an easy fix is to alter the documentation to indicate that lines within a negative offset are not matched in subsequent patterns, with the exception of the line-based pattern. Cheers, --=20 Stephane Campinas --lfig2qyorppyvfg4 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCgAdFiEEipi5pnt+pUplKNfibVYg2QghATMFAluo20wACgkQbVYg2Qgh ATN/GQgAo3XWHZ/C1e3hDo91wwKtS02X+k/k6NuVP2AXjF9w2h2SD7jpzzFMCXMn +vlF+LiYE6R5u99qO+nLc9zkmuSiwdSZl9bVqBruZb/3ZG/DkPdWCcDUzoLwQB4+ TUHlkFPXDPbPDWvPnnOusoobvh4p4ApFjIuMGeNPxY4eQh6cYHlD6lqmYvb+kNI6 K08MDXUmo0U7xbs+jTnkRlKPhUMrWh9Mh9aF14J2Py4LmGMpRiaU11rMHdEgpZ5S zWSx0kkfVStWQvDbBJb0/C8DP6wwN28nSKX+M5YzHXUzdapzBDwklJI5wgFBGFyG rrv6xI208WpL8SW6M2bLefC4WdjL6w== =nbij -----END PGP SIGNATURE----- --lfig2qyorppyvfg4-- From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 25 02:53:04 2018 Received: (at 31979) by debbugs.gnu.org; 25 Sep 2018 06:53:04 +0000 Received: from localhost ([127.0.0.1]:52770 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g4hDT-0002Eg-Vc for submit@debbugs.gnu.org; Tue, 25 Sep 2018 02:53:04 -0400 Received: from mail.magicbluesmoke.com ([82.195.144.49]:44606) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1g4hDS-0002ED-0c; Tue, 25 Sep 2018 02:53:02 -0400 Received: from localhost.localdomain (unknown [76.21.115.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.magicbluesmoke.com (Postfix) with ESMTPSA id EF1EB9939; Tue, 25 Sep 2018 07:52:59 +0100 (IST) Subject: Re: bug#31979: csplit: a regexp pattern does not consider the negative offset of a To: =?UTF-8?Q?St=c3=a9phane_Campinas?= , 31979@debbugs.gnu.org References: <20180626082429.om5ld4xqqxmx6bh6@mars.localdomain> <20180924124045.bjbxwg4vovg7qnax@mars.localdomain> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <3312c5f8-2dda-7b26-59ab-41aac87cf0ef@draigBrady.com> Date: Mon, 24 Sep 2018 23:52:57 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20180924124045.bjbxwg4vovg7qnax@mars.localdomain> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 31979 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tag 31979 notabug close 31979 stop On 24/09/18 05:40, Stéphane Campinas wrote: > Hi, > > After attempting to port csplit, I think I understand why it is like > that: it is to stop the iteration in case a pattern should be executed > several times. Therefore, maybe an easy fix is to alter the > documentation to indicate that lines within a negative offset are not > matched in subsequent patterns, with the exception of the line-based > pattern. Thanks for following up. I pushed that clarification in your name at: https://git.sv.gnu.org/cgit/coreutils.git/commit/?id=7262994 cheers, Pádraig From unknown Sat Jun 14 03:49:38 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 23 Oct 2018 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator