GNU bug report logs - #13127
[PATCH] cut: use only one data strucutre

Previous Next

Package: coreutils;

Reported by: xojoc <at> gmx.com

Date: Sun, 9 Dec 2012 10:29:01 UTC

Severity: normal

Tags: patch

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Pádraig Brady <P <at> draigBrady.com>
To: xojoc <at> gmx.com
Cc: 13127 <at> debbugs.gnu.org
Subject: bug#13127: [PATCH] cut: use only one data strucutre
Date: Sun, 28 Apr 2013 02:51:13 +0100
On 04/26/2013 05:11 PM, Pádraig Brady wrote:
> This separate patch to simplify the print_kth() function
> by removing the comparison from it, is simple and
> has a significant perf advantage. Tests pass so I'll apply.
> 
> I'll adjust the commit log to summarise the perf change,
> but I notice the change isn't as great as yours on my sandybridge i3 system.
> Benchmark results for both the rebased memory rework and
> the simple print_kth() optimization attached.

So looking in detail, this central print_kth function is of most importance to performance.
I thought that your simplification of it might allow it to be auto inlined.
but I confirmed that gcc 4.6.0 -O2 does not do this at present by doing:

  objdump -d src/cut.o | grep -q '<print_kth>:' && echo called || echo inlined

Marking it as inlined gives another gain as shown below.

Testing these combinations, we have:
orig = bit array implementation
split = ditto + simplified print_kth
split-inline = ditto + inlined print_kth
mem = no bit array
mem-split = ditto + simplified print_kth
mem-inline = ditto + inlined print_kth

$ yes abcdfeg | head -n1MB > big-file
$ for c in orig split split-inline mem mem-split mem-split-inline; do
    src/cut-$c 2>/dev/null
    echo -ne "\n== $c =="
    time src/cut-$c -b1,3 big-file > /dev/null
  done

== orig ==
real	0m0.084s
user	0m0.078s
sys	0m0.006s

== split ==
real	0m0.077s
user	0m0.070s
sys	0m0.006s

== split-inline ==
real	0m0.055s
user	0m0.049s
sys	0m0.006s

== mem ==
real	0m0.111s
user	0m0.108s
sys	0m0.002s

== mem-split ==
real	0m0.088s
user	0m0.081s
sys	0m0.007s

== mem-split-inline ==
real	0m0.070s
user	0m0.060s
sys	0m0.009s

So in summary, removing the bit array does slow things down,
but with the advantage of disassociating mem usage from range width.
I'll split the patch into two for the mem change and the cpu change,
and might follow up with a subsequent patch to reinstate the bit array
for the common case of small -[bcf] and no --output-delim.
That's a common trend in these mem adjustment patches.
I.E. Find a point to switch from the more CPU efficient method,
to one which is more memory efficient.

thanks,
Pádraig.




This bug report was last modified 12 years and 76 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.