This separate patch to simplify the print_kth() function by removing the comparison from it, is simple and has a significant perf advantage. Tests pass so I'll apply. I'll adjust the commit log to summarise the perf change, but I notice the change isn't as great as yours on my sandybridge i3 system. Benchmark results for both the rebased memory rework and the simple print_kth() optimization attached. thanks! Pádraig.