GNU bug report logs - #42764
csplit does not suppress the last match when not using {*}

Previous Next

Package: coreutils;

Reported by: Emanuele Giacomelli <vpooldyn-linux <at> yahoo.it>

Date: Sat, 8 Aug 2020 14:52:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #10 received at 42764-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Emanuele Giacomelli <vpooldyn-linux <at> yahoo.it>, 42764-done <at> debbugs.gnu.org
Subject: Re: bug#42764: csplit does not suppress the last match when not using
 {*}
Date: Sat, 8 Aug 2020 21:56:48 +0100
[Message part 1 (text/plain, inline)]
On 08/08/2020 10:12, Emanuele Giacomelli via GNU coreutils Bug Reports wrote:
> Good day,
> 
> I am experiencing an odd behaviour in csplit which may actually be a
> bug.
> 
> I am testing this against the code cloned from
> https://github.com/coreutils/coreutils.git, on the commit described by
> git as v8.32-52-gc0e5f8c59.
> 
> Suppose I have the following YAML file:
> 
> ==> test.yaml <==
> value1: 123
> ---
> value2: 456
> ---
> value3: 789
> 
> and I want to split it at '---' lines. First I would try the following:
> 
>      csplit -z --suppress-matched test.yaml '/^---$/' '{1}'
> 
> which outputs:
> 
>      12
>      12
>      16
> 
> and creates the following files:
> 
>      ==> xx00 <==
>      value1: 123
> 
>      ==> xx01 <==
>      value2: 456
> 
>      ==> xx02 <==
>      ---
>      value3: 789
> 
> The last portion still contains the '---', despite it being suppressed
> from the second part.
> 
> Now, if I try again with:
> 
>      csplit -z --suppress-matched test.yaml '/^---$/' '{*}'
> 
> I get:
> 
>      12
>      12
>      12
> 
> and:
> 
>      ==> xx00 <==
>      value1: 123
> 
>      ==> xx01 <==
>      value2: 456
> 
>      ==> xx02 <==
>      value3: 789
> 
> where the last part does not contain the matched line, as expected.
> 
> While trying to figure out the problem, I noticed that match suppression
> is done at the beginning of process_regexp. For a match-twice scenario
> like the first one, the function is called twice, then the rest of the
> file is simply dumped by split_file.
> 
> This means that the two calls to process_regexp will:
> 
> * suppress nothing for call #1 because nothing has been matched yet;
> * suppress the first match in call #2.
> 
> Then, the rest of the file is dumped but no one actually suppressed the
> second match, which appears in the last segment. When using asterisk
> repetition, the file is instead dumped by process_regexp, which gets its
> chance to suppress the matched line.
> 
> I came up with the attached patch, which simply moves match suppression
> at the end of process_regexp. With this modification, the invocation:
> 
>      csplit -z --suppress-matched test.yaml '/^---$/' '{1}'
> 
> now produces:
> 
>      12
>      12
>      12
> 
> and:
> 
> ==> xx00 <==
> value1: 123
> 
> ==> xx01 <==
> value2: 456
> 
> ==> xx02 <==
> value3: 789
> 
> which is what I would expect.
> 


I agree with this analysis.
The usual manifestation would probably be
when there was only a single match.
I.E. when not specifying a repetition count,
we were not suppressing the single match.

I'll apply the attached in your name later today
(which also adds a test).

Marking this as done.

thanks!
Pádraig
[csplit--suppress-last.patch (text/x-patch, attachment)]

This bug report was last modified 4 years and 315 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.