GNU bug report logs -
#46169
Parallelize merge sort
Previous Next
To reply to this bug, email your comments to 46169 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#46169
; Package
coreutils
.
(Fri, 29 Jan 2021 09:08:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ole Tange <tange <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 29 Jan 2021 09:08:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
During the past 6 months I have been using GNU sort on a lot of files in
the GByte range on my 64 core machine. I have discovered, that GNU sort
is bad at using all the cores.
I have built parsort (now distributed as part of GNU parallel) as a
small wrapper around GNU sort and it performs 2-3x better than GNU sort
for files of this size on this machine.
The first step works great in GNU sort: Here you can really use all
your cores. But after the first step, it seems GNU sort does a merge
sort of all the results. And this pegs a single CPU at 100% while the
rest of the cores are pretty lazy.
It seems the merging is not parallelized.
parsort does that a little better than GNU sort. Instead of doing a
k-way merge at the final step, it does a 2-way merge on each core. It
still has to a 2-way merge of all data on a single core in the end, so
it is not even close to optimal.
Wikipedia describes a way of doing parallel merge:
https://en.wikipedia.org/wiki/Merge_sort#Parallel_multiway_merge_sort
I am no C-coder, and I respect that we want coreutils to be written in
C. Unfortunately that means I cannot really help doing the actual
coding (at least not in a way that would not introduce a plethora of
bugs).
But looking at the pseudocode
https://en.wikipedia.org/wiki/Merge_sort#Pseudocode it does not look
that hard to implement.
This method requires all data to fit in memory, but there are even
more advanced methods (such as
https://www.sciencedirect.com/science/article/pii/S0304397500002875)
that would not require this.
Could you consider implementing a parallel merge, so I can retire
parsort?
(Also: Thanks for maintaining a vital piece of software).
/Ole
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#46169
; Package
coreutils
.
(Fri, 29 Jan 2021 22:20:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 46169 <at> debbugs.gnu.org (full text, mbox):
On 1/29/21 1:07 AM, Ole Tange wrote:
> Could you consider implementing a parallel merge, so I can retire
> parsort?
Yes, improving that part of 'sort' performance has been on my long list
of things to do for quite some time. If someone else could take up the
task it'd be done quicker.... Anyway, thanks for reporting it as a bug,
so that we can track it in our bug database.
Severity set to 'wishlist' from 'normal'
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Mon, 21 Feb 2022 10:08:01 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 120 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.