GNU bug report logs - #46169
Parallelize merge sort

Previous Next

Package: coreutils;

Reported by: Ole Tange <tange <at> gnu.org>

Date: Fri, 29 Jan 2021 09:08:02 UTC

Severity: wishlist

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ole Tange <tange <at> gnu.org>
To: bug-coreutils <at> gnu.org
Subject: Parallelize merge sort
Date: Fri, 29 Jan 2021 10:07:12 +0100
During the past 6 months I have been using GNU sort on a lot of files in
the GByte range on my 64 core machine. I have discovered, that GNU sort
is bad at using all the cores.

I have built parsort (now distributed as part of GNU parallel) as a
small wrapper around GNU sort and it performs 2-3x better than GNU sort
for files of this size on this machine.

The first step works great in GNU sort: Here you can really use all
your cores. But after the first step, it seems GNU sort does a merge
sort of all the results. And this pegs a single CPU at 100% while the
rest of the cores are pretty lazy.

It seems the merging is not parallelized.

parsort does that a little better than GNU sort. Instead of doing a
k-way merge at the final step, it does a 2-way merge on each core. It
still has to a 2-way merge of all data on a single core in the end, so
it is not even close to optimal.

Wikipedia describes a way of doing parallel merge:

https://en.wikipedia.org/wiki/Merge_sort#Parallel_multiway_merge_sort

I am no C-coder, and I respect that we want coreutils to be written in
C. Unfortunately that means I cannot really help doing the actual
coding (at least not in a way that would not introduce a plethora of
bugs).

But looking at the pseudocode
https://en.wikipedia.org/wiki/Merge_sort#Pseudocode it does not look
that hard to implement.

This method requires all data to fit in memory, but there are even
more advanced methods (such as
https://www.sciencedirect.com/science/article/pii/S0304397500002875)
that would not require this.

Could you consider implementing a parallel merge, so I can retire
parsort?

(Also: Thanks for maintaining a vital piece of software).

/Ole




This bug report was last modified 3 years and 120 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.