On 08/08/2020 10:12, Emanuele Giacomelli via GNU coreutils Bug Reports wrote: > Good day, > > I am experiencing an odd behaviour in csplit which may actually be a > bug. > > I am testing this against the code cloned from > https://github.com/coreutils/coreutils.git, on the commit described by > git as v8.32-52-gc0e5f8c59. > > Suppose I have the following YAML file: > > ==> test.yaml <== > value1: 123 > --- > value2: 456 > --- > value3: 789 > > and I want to split it at '---' lines. First I would try the following: > >     csplit -z --suppress-matched test.yaml '/^---$/' '{1}' > > which outputs: > >     12 >     12 >     16 > > and creates the following files: > >     ==> xx00 <== >     value1: 123 > >     ==> xx01 <== >     value2: 456 > >     ==> xx02 <== >     --- >     value3: 789 > > The last portion still contains the '---', despite it being suppressed > from the second part. > > Now, if I try again with: > >     csplit -z --suppress-matched test.yaml '/^---$/' '{*}' > > I get: > >     12 >     12 >     12 > > and: > >     ==> xx00 <== >     value1: 123 > >     ==> xx01 <== >     value2: 456 > >     ==> xx02 <== >     value3: 789 > > where the last part does not contain the matched line, as expected. > > While trying to figure out the problem, I noticed that match suppression > is done at the beginning of process_regexp. For a match-twice scenario > like the first one, the function is called twice, then the rest of the > file is simply dumped by split_file. > > This means that the two calls to process_regexp will: > > * suppress nothing for call #1 because nothing has been matched yet; > * suppress the first match in call #2. > > Then, the rest of the file is dumped but no one actually suppressed the > second match, which appears in the last segment. When using asterisk > repetition, the file is instead dumped by process_regexp, which gets its > chance to suppress the matched line. > > I came up with the attached patch, which simply moves match suppression > at the end of process_regexp. With this modification, the invocation: > >     csplit -z --suppress-matched test.yaml '/^---$/' '{1}' > > now produces: > >     12 >     12 >     12 > > and: > > ==> xx00 <== > value1: 123 > > ==> xx01 <== > value2: 456 > > ==> xx02 <== > value3: 789 > > which is what I would expect. > I agree with this analysis. The usual manifestation would probably be when there was only a single match. I.E. when not specifying a repetition count, we were not suppressing the single match. I'll apply the attached in your name later today (which also adds a test). Marking this as done. thanks! Pádraig