So I reinstated the bit vector which was a little tricky
to do while maintaining performance, but it works very well.
So in summary with the attached 3 patch series, the CPU
usage of the common cut path is nearly halved, while the
max memory that will be allocated for the bit vector is 64KiB.

I'll apply this series in the morning.

thanks,
Pádraig.

p.s. I doubt adding a sentinel to the range pair structure
would out performance the bit vector approach, given the
significant benefit shown in the benchmark in the commit message.