GNU bug report logs - #13127
[PATCH] cut: use only one data strucutre

Previous Next

Package: coreutils;

Reported by: xojoc <at> gmx.com

Date: Sun, 9 Dec 2012 10:29:01 UTC

Severity: normal

Tags: patch

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #40 received at 13127 <at> debbugs.gnu.org (full text, mbox):

From: Cojocaru Alexandru <xojoc <at> gmx.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 13127 <at> debbugs.gnu.org
Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre
Date: Sun, 28 Apr 2013 13:44:09 +0200
On Sun, 28 Apr 2013 02:51:13 +0100
Pádraig Brady <P <at> draigBrady.com> wrote:
> So looking in detail, this central print_kth function is of most importance to performance.
I made the same conclusion as yours, see:
	http://lists.gnu.org/archive/html/bug-coreutils/2012-12/msg00045.html

> I thought that your simplification of it might allow it to be auto inlined.
> but I confirmed that gcc 4.6.0 -O2 does not do this at present by doing:
> 
>   objdump -d src/cut.o | grep -q '<print_kth>:' && echo called || echo inlined
With gcc 4.8.0 -O2 both `print_kth' and `is_range_start' are inlined
even without the `inline' keyword:
    nm src/cut | grep print_kth
    nm src/cut | grep is_range_start
both the above comands give me no output.

> Marking it as inlined gives another gain as shown below.
> 
> Testing these combinations, we have:
> orig = bit array implementation
> split = ditto + simplified print_kth
> split-inline = ditto + inlined print_kth
> mem = no bit array
> mem-split = ditto + simplified print_kth
> mem-inline = ditto + inlined print_kth
> 
> $ yes abcdfeg | head -n1MB > big-file
> $ for c in orig split split-inline mem mem-split mem-split-inline; do
>     src/cut-$c 2>/dev/null
>     echo -ne "\n== $c =="
>     time src/cut-$c -b1,3 big-file > /dev/null
>   done
> 
> == orig ==
> real	0m0.084s
> user	0m0.078s
> sys	0m0.006s
> 
> == split ==
> real	0m0.077s
> user	0m0.070s
> sys	0m0.006s
> 
> == split-inline ==
> real	0m0.055s
> user	0m0.049s
> sys	0m0.006s
> 
> == mem ==
> real	0m0.111s
> user	0m0.108s
> sys	0m0.002s
> 
> == mem-split ==
> real	0m0.088s
> user	0m0.081s
> sys	0m0.007s
> 
> == mem-split-inline ==
> real	0m0.070s
> user	0m0.060s
> sys	0m0.009s
> 
> So in summary, removing the bit array does slow things down,
I think that the problem lies in `print_kth' again. I've wrongly put
an useless branch in it. See the attachment for a patch.
Another problem may be the merging and the call to `xrealloc' that
we do at the end of `set_fields'.

> but with the advantage of disassociating mem usage from range width.
> I'll split the patch into two for the mem change and the cpu change,
> and might follow up with a subsequent patch to reinstate the bit array
> for the common case of small -[bcf] and no --output-delim.
My primary goal was to simplify the code. Even if the attached patch
dosen't work, I think that detecting small -[bcf] ranges would just make
the code more cumbersome.

> That's a common trend in these mem adjustment patches.
> I.E. Find a point to switch from the more CPU efficient method,
> to one which is more memory efficient.
> 
> thanks,
> Pádraig.

Please could you re-run the benchmarks after applying the patch?
Could you also try with a bigger file (for example 100MB)? So as
to make the difference among the various patches more clear.
Unfortunately I'm under an emulator and the benchmarks aren't very
faithful.

Best regards,
Cojocaru Alexandru




This bug report was last modified 12 years and 19 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.