GNU bug report logs - #6600
[PATCH] sort: add --threads option to parallelize internal sort.

Previous Next

Package: coreutils;

Reported by: Pádraig Brady <P <at> draigBrady.com>

Date: Sat, 10 Jul 2010 01:09:02 UTC

Severity: normal

Tags: patch

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6600 in the body.
You can then email your comments to 6600 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Sat, 10 Jul 2010 01:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pádraig Brady <P <at> draigBrady.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 10 Jul 2010 01:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Chen Guo <chenguo4 <at> yahoo.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Bug Coreutils <bug-coreutils <at> gnu.org>,
	Glen Lenker <glen.lenker <at> gmail.com>, Mike Nichols <mykphyre <at> gmail.com>,
	Gene Auyeung <quaker4lyf <at> gmail.com>, Chris Dickens <cdickens <at> ucla.edu>
Subject: Re: [PATCH] sort: add --threads option to parallelize internal sort.
Date: Sat, 10 Jul 2010 02:07:37 +0100
On 08/03/10 10:39, Chen Guo wrote:
> Hi Padraig,
> 
>> You previously mentioned a thread bug with memcoll. Is that worked around?
>
> That happened when more than one instance of memcoll is called on the same
> line at once, since memcoll replaces the eolchar with '\0'. Under our approach,
> the same line shouldn't ever be compared at the same time, so we're fine.
> On top of that, Professor Eggert suggested NUL delimiting all lines as they're
> read in, so memcoll doesn't have to; hence the patch to gnulib, which introduces
> xmemcoll_nul and memcoll_nul, for when input is known to be NUL delimited, thus
> no replacement of the eolchar is needed, making memcoll threadsafe.

Note the current xmemcoll0() in gnulib requires the length
_including_ the terminating NUL to be passed, whereas one
usually does not include the terminating char in the length
passed to xmemcoll(). I accordingly updated the lengths
passed to xmemcoll0() by your latest patch.

However there are still writes done to the source text
in the keycompare() function. So I'm thinking of dropping
the whole xmemcoll0() thing altogether assuming your
statement above is correct, that a particular line will
not be used at the same time by multiple threads.
I did try to copy the text to the stack before comparing,
but that introduced a significant overhead noted below.

Your patch is still performing well on a single core machine:

----------- before ---------------------
$ time ./src/sort < nums.list >/dev/null
real    0m8.644s
user    0m8.307s
sys     0m0.292s

$ time ./src/sort -g < nums.list >/dev/null
real    0m11.046s
user    0m10.652s
sys     0m0.295s

$ time ./src/sort -n < nums.list >/dev/null
real    0m4.909s
user    0m4.567s
sys     0m0.298s

$ time LANG=C ./src/sort < nums.list >/dev/null
real    0m1.959s
user    0m1.657s
sys     0m0.285s


------------ after ---------------------
$ time ./src/sort < nums.list >/dev/null
real    0m8.686s
user    0m8.300s
sys     0m0.232s

$ time ./src/sort -g < nums.list >/dev/null
real    0m10.196s
user    0m9.850s
sys     0m0.221s

$ time ./src/sort -n < nums.list >/dev/null
real    0m2.958s
user    0m2.664s
sys     0m0.221s

$ time LANG=C ./src/sort < nums.list >/dev/null
real    0m1.985s
user    0m1.750s
sys     0m0.217s

After copying the text to the stack as mentioned above
there is a significant performance drop:

$ time ./src/sort -n < nums.list >/dev/null

real    0m4.086s
user    0m3.848s
sys     0m0.218s

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Sat, 10 Jul 2010 02:55:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Bug Coreutils <bug-coreutils <at> gnu.org>, Chen Guo <chenguo4 <at> yahoo.com>,
	Glen Lenker <glen.lenker <at> gmail.com>, Mike Nichols <mykphyre <at> gmail.com>,
	Gene Auyeung <quaker4lyf <at> gmail.com>, Chris Dickens <cdickens <at> ucla.edu>
Subject: Re: [PATCH] sort: add --threads option to parallelize internal sort.
Date: Fri, 09 Jul 2010 19:54:26 -0700
On 07/09/10 18:07, Pádraig Brady wrote:
> Chen Guo wrote:
>> That happened when more than one instance of memcoll is called on the same
>> line at once, since memcoll replaces the eolchar with '\0'. Under our approach,
>> the same line shouldn't ever be compared at the same time, so we're fine.

Ah, sorry, I wasn't aware of that.

> I'm thinking of dropping
> the whole xmemcoll0() thing altogether assuming your
> statement above is correct, that a particular line will
> not be used at the same time by multiple threads.

Yes, that makes sense.  We can revert that change from gnulib, since it
makes gnulib bigger unnecessarily.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Sat, 10 Jul 2010 09:31:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Chen Guo <chen.guo.0625 <at> gmail.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Bug Coreutils <bug-coreutils <at> gnu.org>,
	Glen Lenker <glen.lenker <at> gmail.com>, Mike Nichols <mykphyre <at> gmail.com>,
	Gene Auyeung <quaker4lyf <at> gmail.com>, Chris Dickens <cdickens <at> ucla.edu>
Subject: Re: [PATCH] sort: add --threads option to parallelize internal sort.
Date: Sat, 10 Jul 2010 10:29:03 +0100
On 10/07/10 05:23, Chen Guo wrote:
> 2010/7/9 Paul Eggert <eggert <at> cs.ucla.edu>:
>> On 07/09/10 18:07, Pádraig Brady wrote:
>>> Chen Guo wrote:
>>>> That happened when more than one instance of memcoll is called on the same
>>>> line at once, since memcoll replaces the eolchar with '\0'. Under our approach,
>>>> the same line shouldn't ever be compared at the same time, so we're fine.
>>
>> Ah, sorry, I wasn't aware of that.
>>
>>> I'm thinking of dropping
>>> the whole xmemcoll0() thing altogether assuming your
>>> statement above is correct, that a particular line will
>>> not be used at the same time by multiple threads.
>>
>> Yes, that makes sense.  We can revert that change from gnulib, since it
>> makes gnulib bigger unnecessarily.
>>
> 
> Actually, the '\0' saves about 5% off runtime last I checked. This is because
> EACH TIME sort compares two lines memcoll would replace the last byte. If we
> set them all to NUL anyway at the start, memcoll_nul wouldn't need to do that
> replacement for each compare. When we output, we'd simply put the \n back.
> 
> I could be wrong though, this is going off memory from 4-5 months ago. But 5%
> is about what I remember, when sorting 1M lines on 8 cores.

Well for the whole line comparison where it works it's 2.9% faster
(or 2.3% adding in the NUL checks to xmemcoll0).
Also xmemcoll0() is probably generally useful since it's a readonly function.
So I'll leave it and fix up the keycompare() calls to it
(while documenting that keycompare() needs to work on a particular
line in line one thread.

thanks,
Pádraig.





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Sat, 10 Jul 2010 16:30:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Chen Guo <chen.guo.0625 <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Bug Coreutils <bug-coreutils <at> gnu.org>, Glen Lenker <glen.lenker <at> gmail.com>,
	Mike Nichols <mykphyre <at> gmail.com>, Gene Auyeung <quaker4lyf <at> gmail.com>,
	Chris Dickens <cdickens <at> ucla.edu>,
	Pádraig Brady <P <at> draigbrady.com>
Subject: Re: [PATCH] sort: add --threads option to parallelize internal sort.
Date: Fri, 9 Jul 2010 21:23:46 -0700
2010/7/9 Paul Eggert <eggert <at> cs.ucla.edu>:
> On 07/09/10 18:07, Pádraig Brady wrote:
>> Chen Guo wrote:
>>> That happened when more than one instance of memcoll is called on the same
>>> line at once, since memcoll replaces the eolchar with '\0'. Under our approach,
>>> the same line shouldn't ever be compared at the same time, so we're fine.
>
> Ah, sorry, I wasn't aware of that.
>
>> I'm thinking of dropping
>> the whole xmemcoll0() thing altogether assuming your
>> statement above is correct, that a particular line will
>> not be used at the same time by multiple threads.
>
> Yes, that makes sense.  We can revert that change from gnulib, since it
> makes gnulib bigger unnecessarily.
>

Actually, the '\0' saves about 5% off runtime last I checked. This is because
EACH TIME sort compares two lines memcoll would replace the last byte. If we
set them all to NUL anyway at the start, memcoll_nul wouldn't need to do that
replacement for each compare. When we output, we'd simply put the \n back.

I could be wrong though, this is going off memory from 4-5 months ago. But 5%
is about what I remember, when sorting 1M lines on 8 cores.




Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Tue, 13 Jul 2010 01:01:01 GMT) Full text and rfc822 format available.

Notification sent to Pádraig Brady <P <at> draigBrady.com>:
bug acknowledged by developer. (Tue, 13 Jul 2010 01:01:01 GMT) Full text and rfc822 format available.

Message #19 received at 6600-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 6600-done <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Chen Guo <chen.guo.0625 <at> gmail.com>,
	Glen Lenker <glen.lenker <at> gmail.com>, Mike Nichols <mykphyre <at> gmail.com>,
	Gene Auyeung <quaker4lyf <at> gmail.com>, Chris Dickens <cdickens <at> ucla.edu>
Subject: Re: bug#6600: [PATCH] sort: add --threads option to
	parallelize	internal sort.
Date: Tue, 13 Jul 2010 01:59:12 +0100
I've finally applied the patch.
http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=9face836

I made a few comment tweaks and added
some dependencies for the heap module.

I also removed the xmemcoll0() calls
which are separate to this concurrent functionality.
I will add those back in Chen's name after updating to
the latest gnulib.

Thanks everyone for their work on this!

Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Tue, 13 Jul 2010 09:15:02 GMT) Full text and rfc822 format available.

Message #22 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 6600 <at> debbugs.gnu.org
Cc: Gene Auyeung <quaker4lyf <at> gmail.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
	Chen Guo <chenguo4 <at> yahoo.com>
Subject: Re: bug#6600: [PATCH] sort: add --threads option
	to	parallelize	internal sort.
Date: Tue, 13 Jul 2010 10:13:40 +0100
Here's a small cleanup I missed.
Alternatively one could make heap() return NULL rather than aborting,
but since it already used xmalloc, I'm tending to this..

commit cec33eb226df63f406f7eb70cd46d960ee02a060
Author: Pádraig Brady <P <at> draigBrady.com>
Date:   Tue Jul 13 08:23:52 2010 +0100

    maint: heap.c: simplify heap_alloc

    * gl/lib/heap.c (heap_alloc): Use the fact that the xalloc
    routines will not return NULL.  Also remove the redundant
    temporary variables.

diff --git a/gl/lib/heap.c b/gl/lib/heap.c
index a37224f..f148434 100644
--- a/gl/lib/heap.c
+++ b/gl/lib/heap.c
@@ -36,22 +36,12 @@ static void heapify_up (void **, size_t,
 struct heap *
 heap_alloc (int (*compare)(const void *, const void *), size_t n_reserve)
 {
-  struct heap *heap;
-  void *xmalloc_ret = xmalloc (sizeof *heap);
-  heap = (struct heap *) xmalloc_ret;
-  if (!heap)
-    return NULL;
+  struct heap *heap = xmalloc (sizeof *heap);

-  if (n_reserve <= 0)
+  if (n_reserve == 0)
     n_reserve = 1;

-  xmalloc_ret = xmalloc (n_reserve * sizeof *(heap->array));
-  heap->array = (void **) xmalloc_ret;
-  if (!heap->array)
-    {
-      free (heap);
-      return NULL;
-    }
+  heap->array = xmalloc (n_reserve * sizeof *(heap->array));

   heap->array[0] = NULL;
   heap->capacity = n_reserve;
@@ -84,8 +74,7 @@ heap_insert (struct heap *heap, void *item)
   if (heap->capacity - 1 <= heap->count)
     {
       size_t new_size = (2 + heap->count) * sizeof *(heap->array);
-      void *realloc_ret = xrealloc (heap->array, new_size);
-      heap->array = (void **) realloc_ret;
+      heap->array = xrealloc (heap->array, new_size);





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Tue, 13 Jul 2010 14:36:02 GMT) Full text and rfc822 format available.

Message #25 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Gene Auyeung <quaker4lyf <at> gmail.com>, 6600 <at> debbugs.gnu.org,
	Paul Eggert <eggert <at> CS.UCLA.EDU>, Chen Guo <chenguo4 <at> yahoo.com>
Subject: Re: bug#6600: [PATCH] sort: add --threads option
	to	parallelize	internal sort.
Date: Tue, 13 Jul 2010 16:35:24 +0200
Pádraig Brady wrote:
> Here's a small cleanup I missed.
> Alternatively one could make heap() return NULL rather than aborting,
> but since it already used xmalloc, I'm tending to this..
>
> commit cec33eb226df63f406f7eb70cd46d960ee02a060
> Author: Pádraig Brady <P <at> draigBrady.com>
> Date:   Tue Jul 13 08:23:52 2010 +0100
>
>     maint: heap.c: simplify heap_alloc
>
>     * gl/lib/heap.c (heap_alloc): Use the fact that the xalloc
>     routines will not return NULL.  Also remove the redundant
>     temporary variables.
>
> diff --git a/gl/lib/heap.c b/gl/lib/heap.c
> index a37224f..f148434 100644
> --- a/gl/lib/heap.c
> +++ b/gl/lib/heap.c
> @@ -36,22 +36,12 @@ static void heapify_up (void **, size_t,
>  struct heap *
>  heap_alloc (int (*compare)(const void *, const void *), size_t n_reserve)
>  {
> -  struct heap *heap;
> -  void *xmalloc_ret = xmalloc (sizeof *heap);
> -  heap = (struct heap *) xmalloc_ret;
> -  if (!heap)
> -    return NULL;
> +  struct heap *heap = xmalloc (sizeof *heap);
>
> -  if (n_reserve <= 0)
> +  if (n_reserve == 0)
>      n_reserve = 1;
>
> -  xmalloc_ret = xmalloc (n_reserve * sizeof *(heap->array));
> -  heap->array = (void **) xmalloc_ret;
> -  if (!heap->array)
> -    {
> -      free (heap);
> -      return NULL;
> -    }
> +  heap->array = xmalloc (n_reserve * sizeof *(heap->array));
>
>    heap->array[0] = NULL;
>    heap->capacity = n_reserve;
> @@ -84,8 +74,7 @@ heap_insert (struct heap *heap, void *item)
>    if (heap->capacity - 1 <= heap->count)
>      {
>        size_t new_size = (2 + heap->count) * sizeof *(heap->array);
> -      void *realloc_ret = xrealloc (heap->array, new_size);
> -      heap->array = (void **) realloc_ret;
> +      heap->array = xrealloc (heap->array, new_size);

Thanks.
That looks good.  Please push.

I noticed that heap_insert's reallocation was awkward and inefficient.
Using x2nrealloc rather than xrealloc makes the code
cleaner as well as more efficient in the face of a growing
heap, and also handles integer overflow.

diff --git a/gl/lib/heap.c b/gl/lib/heap.c
index a37224f..12a7767 100644
--- a/gl/lib/heap.c
+++ b/gl/lib/heap.c
@@ -82,15 +82,8 @@ int
 heap_insert (struct heap *heap, void *item)
 {
   if (heap->capacity - 1 <= heap->count)
-    {
-      size_t new_size = (2 + heap->count) * sizeof *(heap->array);
-      void *realloc_ret = xrealloc (heap->array, new_size);
-      heap->array = (void **) realloc_ret;
-      heap->capacity = (2 + heap->count);
-
-      if (!heap->array)
-        return -1;
-    }
+    heap->array = x2nrealloc (heap->array, &heap->capacity,
+                              sizeof *(heap->array));

   heap->array[++heap->count] = item;
   heapify_up (heap->array, heap->count, heap->compare);




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Tue, 13 Jul 2010 15:11:02 GMT) Full text and rfc822 format available.

Message #28 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Gene Auyeung <quaker4lyf <at> gmail.com>, 6600 <at> debbugs.gnu.org,
	Paul Eggert <eggert <at> CS.UCLA.EDU>, Chen Guo <chenguo4 <at> yahoo.com>
Subject: Re: bug#6600: [PATCH] sort: add --threads option
	to	parallelize	internal sort.
Date: Tue, 13 Jul 2010 16:09:08 +0100
On 13/07/10 15:35, Jim Meyering wrote:
> I noticed that heap_insert's reallocation was awkward and inefficient.
> Using x2nrealloc rather than xrealloc makes the code
> cleaner as well as more efficient in the face of a growing
> heap, and also handles integer overflow.
> 
> diff --git a/gl/lib/heap.c b/gl/lib/heap.c
> index a37224f..12a7767 100644
> --- a/gl/lib/heap.c
> +++ b/gl/lib/heap.c
> @@ -82,15 +82,8 @@ int
>  heap_insert (struct heap *heap, void *item)
>  {
>    if (heap->capacity - 1 <= heap->count)
> -    {
> -      size_t new_size = (2 + heap->count) * sizeof *(heap->array);
> -      void *realloc_ret = xrealloc (heap->array, new_size);
> -      heap->array = (void **) realloc_ret;
> -      heap->capacity = (2 + heap->count);
> -
> -      if (!heap->array)
> -        return -1;
> -    }
> +    heap->array = x2nrealloc (heap->array, &heap->capacity,
> +                              sizeof *(heap->array));
> 
>    heap->array[++heap->count] = item;
>    heapify_up (heap->array, heap->count, heap->compare);

Much cleaner and increases with n *= 1.5 rather than n += 2
Testing here shows no change in performance.

Do you want me to roll that into my patch?

cheers,
Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Tue, 13 Jul 2010 15:17:02 GMT) Full text and rfc822 format available.

Message #31 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Gene Auyeung <quaker4lyf <at> gmail.com>, 6600 <at> debbugs.gnu.org,
	Paul Eggert <eggert <at> CS.UCLA.EDU>, Chen Guo <chenguo4 <at> yahoo.com>
Subject: Re: bug#6600: [PATCH] sort: add --threads option
	to	parallelize	internal sort.
Date: Tue, 13 Jul 2010 17:16:20 +0200
Pádraig Brady wrote:
> On 13/07/10 15:35, Jim Meyering wrote:
>> I noticed that heap_insert's reallocation was awkward and inefficient.
>> Using x2nrealloc rather than xrealloc makes the code
>> cleaner as well as more efficient in the face of a growing
>> heap, and also handles integer overflow.
>>
>> diff --git a/gl/lib/heap.c b/gl/lib/heap.c
>> index a37224f..12a7767 100644
>> --- a/gl/lib/heap.c
>> +++ b/gl/lib/heap.c
>> @@ -82,15 +82,8 @@ int
>>  heap_insert (struct heap *heap, void *item)
>>  {
>>    if (heap->capacity - 1 <= heap->count)
>> -    {
>> -      size_t new_size = (2 + heap->count) * sizeof *(heap->array);
>> -      void *realloc_ret = xrealloc (heap->array, new_size);
>> -      heap->array = (void **) realloc_ret;
>> -      heap->capacity = (2 + heap->count);
>> -
>> -      if (!heap->array)
>> -        return -1;
>> -    }
>> +    heap->array = x2nrealloc (heap->array, &heap->capacity,
>> +                              sizeof *(heap->array));
>>
>>    heap->array[++heap->count] = item;
>>    heapify_up (heap->array, heap->count, heap->compare);
>
> Much cleaner and increases with n *= 1.5 rather than n += 2
> Testing here shows no change in performance.

Thanks for the perf. testing.

> Do you want me to roll that into my patch?

Sure, thanks.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Wed, 14 Jul 2010 03:15:02 GMT) Full text and rfc822 format available.

Message #34 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Chen Guo <chen.guo.0625 <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: Gene Auyeung <quaker4lyf <at> gmail.com>, 6600 <at> debbugs.gnu.org,
	Paul Eggert <eggert <at> cs.ucla.edu>,
	Pádraig Brady <P <at> draigbrady.com>
Subject: Re: bug#6600: [PATCH] sort: add --threads option to parallelize 
	internal sort.
Date: Tue, 13 Jul 2010 20:14:35 -0700
Thanks a lot for all the hard work reviewing and revising this, Pádraig.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Thu, 15 Jul 2010 00:09:01 GMT) Full text and rfc822 format available.

Message #37 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 6600 <at> debbugs.gnu.org
Subject: Re: bug#6600: [PATCH] sort: add --threads option
	to	parallelize	internal sort.
Date: Thu, 15 Jul 2010 01:07:16 +0100
On 13/07/10 01:59, Pádraig Brady wrote:
> I've finally applied the patch.
> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=9face836
> 
> I made a few comment tweaks and added
> some dependencies for the heap module.
> 
> I also removed the xmemcoll0() calls
> which are separate to this concurrent functionality.
> I will add those back in Chen's name after updating to
> the latest gnulib.
> 
> Thanks everyone for their work on this!
> 
> Pádraig.
> 
> 
> 
> 

Here's the xmemcoll0 follow up:

From: Chen Guo <chenguo4 <at> yahoo.com>
Date: Wed, 14 Jul 2010 07:41:05 +0100
Subject: [PATCH] sort: speed up default full line sorting

Don't write NUL after the comparison buffers on each compare,
which increases performance by about 3% for short lines
on a pentium-m with gcc-4.4.1

* src/sort.c: (fillbuf): Delimit input items with NUL.
(write_bytes): Restore the item delimiter char which was
replaced with NUL in fillbuf().
---
 src/sort.c |   18 +++++++++++++++---
 1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/sort.c b/src/sort.c
index 5ea1b34..45cb78f 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -1743,13 +1743,17 @@ fillbuf (struct buffer *buf, FILE *fp, char const *file)
                   if (buf->buf == ptrlim)
                     return false;
                   if (ptrlim[-1] != eol)
-                    *ptrlim++ = eol;
+                    *ptrlim++ = '\0';
                 }
             }

           /* Find and record each line in the just-read input.  */
           while ((p = memchr (ptr, eol, ptrlim - ptr)))
             {
+              /* Delimit the line with NUL. This eliminates the need to
+                 temporarily replace the last byte with NUL when calling
+                 xmemcoll(), which increases performance.  */
+              *p = '\0';
               ptr = p + 1;
               line--;
               line->text = line_start;
@@ -2642,7 +2646,13 @@ compare (const struct line *a, const struct line *b, bool show_debug)
   else if (blen == 0)
     diff = 1;
   else if (hard_LC_COLLATE)
-    diff = xmemcoll (a->text, alen, b->text, blen);
+    {
+      /* Note xmemcoll0 is a performance enhancement as
+         it will not unconditionally write '\0' after the
+         passed in buffers, which was seen to give around
+         a 3% increase in performance for short lines.  */
+      diff = xmemcoll0 (a->text, alen + 1, b->text, blen + 1);
+    }
   else if (! (diff = memcmp (a->text, b->text, MIN (alen, blen))))
     diff = alen < blen ? -1 : alen != blen;

@@ -2652,9 +2662,11 @@ compare (const struct line *a, const struct line *b, bool show_debug)
 static void
 write_bytes (struct line const *line, FILE *fp, char const *output_file)
 {
-  char const *buf = line->text;
+  char *buf = line->text;
   size_t n_bytes = line->length;

+  *(buf + n_bytes - 1) = eolchar;
+
   /* Convert TABs to '>' and \0 to \n when -z specified.  */
   if (debug && fp == stdout)
     {
-- 
1.6.2.5





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Thu, 15 Jul 2010 11:13:02 GMT) Full text and rfc822 format available.

Message #40 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 6600 <at> debbugs.gnu.org, Ludovic Courtès
 <ludo <at> gnu.org>
Subject: Re: bug#6600: [PATCH] sort: add --threads
	option	to	parallelize	internal sort.
Date: Thu, 15 Jul 2010 12:11:23 +0100
On 15/07/10 01:07, Pádraig Brady wrote:
> On 13/07/10 01:59, Pádraig Brady wrote:
>> I've finally applied the patch.
>> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=9face836
>>
>> I made a few comment tweaks and added
>> some dependencies for the heap module.
>>
>> I also removed the xmemcoll0() calls
>> which are separate to this concurrent functionality.
>> I will add those back in Chen's name after updating to
>> the latest gnulib.
>>
>> Thanks everyone for their work on this!
>>
>> Pádraig.
> 
> Here's the xmemcoll0 follow up:

And a follow up fix to that which
fixes 2 test failures noticed on our integration server:
http://hydra.nixos.org/build/486508

commit aadc67dfdb47f28bb8d1fa5e0fe0f52e2a8c51bf
Author: Pádraig Brady <P <at> draigBrady.com>
Date:   Thu Jul 15 12:06:04 2010 +0100

    sort: fix a bug when sorting unterminated lines

    * src/sort.c (fillbuf): The previous commit incorrectly
    terminated the buffer when the last line of input
    didn't contain a terminating character.

diff --git a/src/sort.c b/src/sort.c
index 45cb78f..7d31878 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -1743,7 +1743,7 @@ fillbuf (struct buffer *buf, FILE *fp, char const *file)
                   if (buf->buf == ptrlim)
                     return false;
                   if (ptrlim[-1] != eol)
-                    *ptrlim++ = '\0';
+                    *ptrlim++ = eol;
                 }
             }





Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Thu, 15 Jul 2010 14:09:02 GMT) Full text and rfc822 format available.

Message #43 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 6600 <at> debbugs.gnu.org, Ludovic Courtès <ludo <at> gnu.org>
Subject: Re: bug#6600: [PATCH] sort: add --threads option to parallelize
	internal sort.
Date: Thu, 15 Jul 2010 16:08:29 +0200
Pádraig Brady wrote:

> On 15/07/10 01:07, Pádraig Brady wrote:
>> On 13/07/10 01:59, Pádraig Brady wrote:
>>> I've finally applied the patch.
>>> http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commit;h=9face836
>>>
>>> I made a few comment tweaks and added
>>> some dependencies for the heap module.
>>>
>>> I also removed the xmemcoll0() calls
>>> which are separate to this concurrent functionality.
>>> I will add those back in Chen's name after updating to
>>> the latest gnulib.
>>>
>>> Thanks everyone for their work on this!
>>>
>>> Pádraig.
>>
>> Here's the xmemcoll0 follow up:
>
> And a follow up fix to that which
> fixes 2 test failures noticed on our integration server:
> http://hydra.nixos.org/build/486508
>
> commit aadc67dfdb47f28bb8d1fa5e0fe0f52e2a8c51bf
> Author: Pádraig Brady <P <at> draigBrady.com>
> Date:   Thu Jul 15 12:06:04 2010 +0100
>
>     sort: fix a bug when sorting unterminated lines
>
>     * src/sort.c (fillbuf): The previous commit incorrectly
>     terminated the buffer when the last line of input
>     didn't contain a terminating character.
>
> diff --git a/src/sort.c b/src/sort.c
> index 45cb78f..7d31878 100644
> --- a/src/sort.c
> +++ b/src/sort.c
> @@ -1743,7 +1743,7 @@ fillbuf (struct buffer *buf, FILE *fp, char const *file)
>                    if (buf->buf == ptrlim)
>                      return false;
>                    if (ptrlim[-1] != eol)
> -                    *ptrlim++ = '\0';
> +                    *ptrlim++ = eol;

Thanks.
FYI, here's a simple demo of that bug:

    $ printf a |src/sort
    src/sort: memory exhausted




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Wed, 21 Jul 2010 08:14:02 GMT) Full text and rfc822 format available.

Message #46 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Chen Guo <chen.guo.0625 <at> gmail.com>
To: 6600 <at> debbugs.gnu.org
Subject: Re: bug#6600: [PATCH] sort: add --threads option to parallelize 
	internal sort.
Date: Wed, 21 Jul 2010 01:13:50 -0700
Hi all,
    So the gcc compile farm just got a 24 core donation
from AMD (2x12 at 1.5 GHz), so I couldn't help but run
some results.
    As a way of isolating sort time versus I/O time, I installed
a timer output, such that sort prints to stderr the elapsed
time during sort()'s call to sortlines().
    There are other people using the machine, so the
timings wont be the most accurate, but I figure you guys
may still be interested. On 24 threads, the pure sort runs
in about 15% of the single threaded time, while with I/O
that figure's 17.5%.

$ for i in 1 2 4 8 12 16 24; do echo "T=$i: ";  /usr/bin/time -f "%e"
sort --parallel=$i 1M > /dev/null; echo ""; done
T=1:
Sort without I/O: 11.074463
11.38

T=2:
Sort without I/O: 5.672202
5.96

T=4:
Sort without I/O: 3.453562
3.71

T=8:
Sort without I/O: 2.351003
2.65

T=12:
Sort without I/O: 2.100730
2.39

T=16:
Sort without I/O: 1.821079
2.09

T=24:
Sort without I/O: 1.674106
1.99




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6600; Package coreutils. (Wed, 21 Jul 2010 08:59:02 GMT) Full text and rfc822 format available.

Message #49 received at 6600 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Chen Guo <chen.guo.0625 <at> gmail.com>
Cc: 6600 <at> debbugs.gnu.org
Subject: Re: bug#6600: [PATCH] sort: add --threads option to
	parallelize	internal sort.
Date: Wed, 21 Jul 2010 09:58:12 +0100
[Message part 1 (text/plain, inline)]
On 21/07/10 09:13, Chen Guo wrote:
> Hi all,
>     So the gcc compile farm just got a 24 core donation
> from AMD (2x12 at 1.5 GHz), so I couldn't help but run
> some results.
>     As a way of isolating sort time versus I/O time, I installed
> a timer output, such that sort prints to stderr the elapsed
> time during sort()'s call to sortlines().
>     There are other people using the machine, so the
> timings wont be the most accurate, but I figure you guys
> may still be interested. On 24 threads, the pure sort runs
> in about 15% of the single threaded time, while with I/O
> that figure's 17.5%.
> 
> $ for i in 1 2 4 8 12 16 24; do echo "T=$i: ";  /usr/bin/time -f "%e"
> sort --parallel=$i 1M > /dev/null; echo ""; done

Cool! I had intended to do this at the weekend on the
gcc niagra (32 processor) machine but my DSL died.
I also intended to plot the cumulative CPU used,
and also with using mutexes rather than spinlocks.

Reformatting your results as:

1 11.074463 11.38
2 5.672202 5.96
4 3.453562 3.71
8 2.351003 2.65
12 2.100730 2.39
16 1.821079 2.09
24 1.674106 1.99

And using this gnuplot script:

set term pngcairo font 'Sans,10' size 640,480
set output 'sort-amd-24way.png'
set title "Multicore sort time on 12x2 AMD"
set ylabel "time (seconds)"
set xlabel "processors"
set xtics (2,4,6,8,12,16,24,32)
set xrange [0:32]
plot "amd" using 1:2 with lines title "no I/O", \
     "amd" using 1:3 with lines title "with I/O"

Gives the attached plot.

cheers,
Pádraig.
[sort-amd-24way.png (image/png, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 18 Aug 2010 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 313 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.