GNU bug report logs -
#42764
csplit does not suppress the last match when not using {*}
Previous Next
Full log
Message #10 received at 42764-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 08/08/2020 10:12, Emanuele Giacomelli via GNU coreutils Bug Reports wrote:
> Good day,
>
> I am experiencing an odd behaviour in csplit which may actually be a
> bug.
>
> I am testing this against the code cloned from
> https://github.com/coreutils/coreutils.git, on the commit described by
> git as v8.32-52-gc0e5f8c59.
>
> Suppose I have the following YAML file:
>
> ==> test.yaml <==
> value1: 123
> ---
> value2: 456
> ---
> value3: 789
>
> and I want to split it at '---' lines. First I would try the following:
>
> csplit -z --suppress-matched test.yaml '/^---$/' '{1}'
>
> which outputs:
>
> 12
> 12
> 16
>
> and creates the following files:
>
> ==> xx00 <==
> value1: 123
>
> ==> xx01 <==
> value2: 456
>
> ==> xx02 <==
> ---
> value3: 789
>
> The last portion still contains the '---', despite it being suppressed
> from the second part.
>
> Now, if I try again with:
>
> csplit -z --suppress-matched test.yaml '/^---$/' '{*}'
>
> I get:
>
> 12
> 12
> 12
>
> and:
>
> ==> xx00 <==
> value1: 123
>
> ==> xx01 <==
> value2: 456
>
> ==> xx02 <==
> value3: 789
>
> where the last part does not contain the matched line, as expected.
>
> While trying to figure out the problem, I noticed that match suppression
> is done at the beginning of process_regexp. For a match-twice scenario
> like the first one, the function is called twice, then the rest of the
> file is simply dumped by split_file.
>
> This means that the two calls to process_regexp will:
>
> * suppress nothing for call #1 because nothing has been matched yet;
> * suppress the first match in call #2.
>
> Then, the rest of the file is dumped but no one actually suppressed the
> second match, which appears in the last segment. When using asterisk
> repetition, the file is instead dumped by process_regexp, which gets its
> chance to suppress the matched line.
>
> I came up with the attached patch, which simply moves match suppression
> at the end of process_regexp. With this modification, the invocation:
>
> csplit -z --suppress-matched test.yaml '/^---$/' '{1}'
>
> now produces:
>
> 12
> 12
> 12
>
> and:
>
> ==> xx00 <==
> value1: 123
>
> ==> xx01 <==
> value2: 456
>
> ==> xx02 <==
> value3: 789
>
> which is what I would expect.
>
I agree with this analysis.
The usual manifestation would probably be
when there was only a single match.
I.E. when not specifying a repetition count,
we were not suppressing the single match.
I'll apply the attached in your name later today
(which also adds a test).
Marking this as done.
thanks!
Pádraig
[csplit--suppress-last.patch (text/x-patch, attachment)]
This bug report was last modified 4 years and 315 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.