From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 09 Dec 2010 12:11:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Chen Guo Cc: eggert@cs.ucla.edu, 7597@debbugs.gnu.org, dj@linuxfromscratch.org, coreutils@gnu.org X-Debbugs-Original-Cc: Paul Eggert , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.12918966307050 (code B ref -1); Thu, 09 Dec 2010 12:11:01 +0000 Received: (at submit) by debbugs.gnu.org; 9 Dec 2010 12:10:30 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQfKP-0001pe-TT for submit@debbugs.gnu.org; Thu, 09 Dec 2010 07:10:30 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQfKM-0001pO-KQ for submit@debbugs.gnu.org; Thu, 09 Dec 2010 07:10:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PQfPs-0007Ui-QL for submit@debbugs.gnu.org; Thu, 09 Dec 2010 07:16:26 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:45766) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PQfPs-0007UT-K8 for submit@debbugs.gnu.org; Thu, 09 Dec 2010 07:16:08 -0500 Received: from [140.186.70.92] (port=47014 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PQeiU-0002pv-1t for bug-coreutils@gnu.org; Thu, 09 Dec 2010 06:31:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PQeiS-0004ic-Hp for bug-coreutils@gnu.org; Thu, 09 Dec 2010 06:31:17 -0500 Received: from mx.meyering.net ([82.230.74.64]:33214) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PQeiS-0004iP-6K; Thu, 09 Dec 2010 06:31:16 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id A2AA4600C7; Thu, 9 Dec 2010 12:31:14 +0100 (CET) From: Jim Meyering References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> Date: Thu, 09 Dec 2010 12:31:14 +0100 Message-ID: <8762v39o8t.fsf_-_@meyering.net> Lines: 158 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.7 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) [I've retitled and Cc'd bug-coreutils so this gets its own bug number] Chen Guo wrote: > On Tue, Dec 7, 2010 at 3:24 AM, Jim Meyering wrote: >> Thanks. =C2=A0What does your input file look like? >> I've been unable to reproduce the failure using the output of >> seq 1000000. =C2=A0I've tried a few different -S ... options, in case >> the amount of available memory is a factor: >> >> =C2=A0seq 1000000 > in-1M >> =C2=A0for i in $(seq 1000); do \ >> =C2=A0 =C2=A0printf '%03d\r' $i; src/sort --parallel=3D2 -S 1M in-1M > /= dev/null; done > > The input file I used was generated with gensort > (http://www.ordinal.com/gensort.html) > I used the -a 1000000 to generate a 1 million line ASCII file. Running > sort 10 times on 2 or 4 threads almost always triggered at least 1 > segfault. Thank you. With that, I've solved at least part of the problem. The segfault (and other strangeness we've witnessed) arises because each "node" struct is stored on the stack, and its address ends up being used by another thread after the thread that owns the stack in question has been "joined". My solution is to use the heap instead of the stack. However, for today I'm out of time and I have not yet found a way to free these newly-malloc'd "node" buffers. To test this, I've done the following: gensort -a 10000 > gensort-10k for i in $(seq 2000); do printf '% 4d\n' $i; valgrind --quiet src/sort -S 1= 00K \ --parallel=3D2 gensort-10k > k; test $(wc -c < k) =3D 1000000 || break; d= one for i in $(seq 2000); do printf '% 4d\n' $i; src/sort -S 100K \ --parallel=3D2 gensort-10k > j; test $(wc -c < j) =3D 1000000 || break; d= one Without the patch, the first would show errors for more than 50% of the runs and the second would rarely get to i=3D100 without generating a core file. With the patch, both complete error-free (not counting leaks). I'll add tests and a NEWS item, eventually. >From e3a0cb08ebcb7ad6638d022418488c1085838fc3 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Thu, 9 Dec 2010 12:12:53 +0100 Subject: [PATCH] sort: avoid segfault when using two or more threads * src/sort.c (sortlines, sort): Store each "node" structure on the heap, not on the stack. Otherwise, a node from one thread's stack could be used in another thread after the first thread had expired (via pthread_join). --- src/sort.c | 56 ++++++++++++++++++++++++++++---------------------------- 1 files changed, 28 insertions(+), 28 deletions(-) diff --git a/src/sort.c b/src/sort.c index 742971e..9a7ecd4 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3471,28 +3471,28 @@ sortlines (struct line *restrict lines, struct line= *restrict dest, struct line *hi =3D lo - nlo; struct line **parent_end =3D lo_child ? &parent->end_lo : &parent->end_h= i; - struct merge_node node; - node.lo =3D node.end_lo =3D lo; - node.hi =3D node.end_hi =3D hi; - node.dest =3D parent_end; - node.nlo =3D nlo; - node.nhi =3D nhi; - node.parent =3D parent; - node.level =3D parent->level + 1; - node.queued =3D false; - pthread_mutex_init (&node.lock, NULL); + struct merge_node *node =3D xmalloc (sizeof *node); + node->lo =3D node->end_lo =3D lo; + node->hi =3D node->end_hi =3D hi; + node->dest =3D parent_end; + node->nlo =3D nlo; + node->nhi =3D nhi; + node->parent =3D parent; + node->level =3D parent->level + 1; + node->queued =3D false; + pthread_mutex_init (&node->lock, NULL); /* Calculate thread arguments. */ unsigned long int lo_threads =3D nthreads / 2; unsigned long int hi_threads =3D nthreads - lo_threads; pthread_t thread; - struct thread_args args =3D {lines, lo, lo_threads, total_lines, &node, + struct thread_args args =3D {lines, lo, lo_threads, total_lines, node, true, queue, tfp, temp_output}; if (nthreads > 1 && SUBTHREAD_LINES_HEURISTIC <=3D nlines && pthread_create (&thread, NULL, sortlines_thread, &args) =3D=3D 0) { - sortlines (lines - nlo, hi, hi_threads, total_lines, &node, false, + sortlines (lines - nlo, hi, hi_threads, total_lines, node, false, queue, tfp, temp_output); pthread_join (thread, NULL); } @@ -3507,16 +3507,16 @@ sortlines (struct line *restrict lines, struct line= *restrict dest, sequential_sort (lines, nlo, temp, false); /* Update merge NODE. No need to lock yet. */ - node.lo =3D lines; - node.hi =3D lines - nlo; - node.end_lo =3D lines - nlo; - node.end_hi =3D lines - nlo - nhi; + node->lo =3D lines; + node->hi =3D lines - nlo; + node->end_lo =3D lines - nlo; + node->end_hi =3D lines - nlo - nhi; - queue_insert (queue, &node); + queue_insert (queue, node); merge_loop (queue, total_lines, tfp, temp_output); } - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&node->lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3793,19 +3793,19 @@ sort (char *const *files, size_t nfiles, char const= *output_file, struct merge_node_queue queue; queue_init (&queue, 2 * nthreads); - struct merge_node node; - node.lo =3D node.hi =3D node.end_lo =3D node.end_hi =3D NULL; - node.dest =3D NULL; - node.nlo =3D node.nhi =3D buf.nlines; - node.parent =3D NULL; - node.level =3D MERGE_END; - node.queued =3D false; - pthread_mutex_init (&node.lock, NULL); + struct merge_node *node =3D xmalloc (sizeof *node); + node->lo =3D node->hi =3D node->end_lo =3D node->end_hi =3D = NULL; + node->dest =3D NULL; + node->nlo =3D node->nhi =3D buf.nlines; + node->parent =3D NULL; + node->level =3D MERGE_END; + node->queued =3D false; + pthread_mutex_init (&node->lock, NULL); - sortlines (line, line, nthreads, buf.nlines, &node, true, + sortlines (line, line, nthreads, buf.nlines, node, true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&node->lock); } else write_unique (line - 1, tfp, temp_output); -- 1.7.3.3 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: [coreutils] multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 09 Dec 2010 21:28:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Chen Guo Cc: Paul Eggert , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129193006929954 (code B ref -1); Thu, 09 Dec 2010 21:28:02 +0000 Received: (at submit) by debbugs.gnu.org; 9 Dec 2010 21:27:49 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQo1k-0007n3-MI for submit@debbugs.gnu.org; Thu, 09 Dec 2010 16:27:49 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQo1i-0007mm-Lj for submit@debbugs.gnu.org; Thu, 09 Dec 2010 16:27:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PQo7W-0004lu-SE for submit@debbugs.gnu.org; Thu, 09 Dec 2010 16:33:47 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:53483) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PQo7W-0004ll-Q5 for submit@debbugs.gnu.org; Thu, 09 Dec 2010 16:33:46 -0500 Received: from [140.186.70.92] (port=53684 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PQo7V-0006Cv-Hg for bug-coreutils@gnu.org; Thu, 09 Dec 2010 16:33:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PQo7U-0004lK-M6 for bug-coreutils@gnu.org; Thu, 09 Dec 2010 16:33:45 -0500 Received: from mx.meyering.net ([82.230.74.64]:33212) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PQo7U-0004kz-GP; Thu, 09 Dec 2010 16:33:44 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id F1E37600D8; Thu, 9 Dec 2010 22:33:42 +0100 (CET) From: Jim Meyering In-Reply-To: <8762v39o8t.fsf_-_@meyering.net> (Jim Meyering's message of "Thu, 09 Dec 2010 12:31:14 +0100") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> Date: Thu, 09 Dec 2010 22:33:42 +0100 Message-ID: <87r5dq7hs9.fsf@meyering.net> Lines: 28 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.7 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) Jim Meyering wrote: ... > With that, I've solved at least part of the problem. > The segfault (and other strangeness we've witnessed) > arises because each "node" struct is stored on the stack, > and its address ends up being used by another thread after > the thread that owns the stack in question has been "joined". > > My solution is to use the heap instead of the stack. > However, for today I'm out of time and I have not yet found a > way to free these newly-malloc'd "node" buffers. > > To test this, I've done the following: > > gensort -a 10000 > gensort-10k > for i in $(seq 2000); do printf '% 4d\n' $i; valgrind --quiet src/sort -S 100K \ > --parallel=2 gensort-10k > k; test $(wc -c < k) = 1000000 || break; done > for i in $(seq 2000); do printf '% 4d\n' $i; src/sort -S 100K \ > --parallel=2 gensort-10k > j; test $(wc -c < j) = 1000000 || break; done > > Without the patch, the first would show errors for more than 50% of > the runs and the second would rarely get to i=100 without generating > a core file. With the patch, both complete error-free (not counting > leaks). FYI, while preparing a test, I've found that the latter test (without valgrind) passes 2000/2000 tests when compiled with -g -O2, yet fails in at least 10 of the 2000 when compiled with -ggdb3. From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 10 Dec 2010 04:01:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jim Meyering Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.12919536461437 (code B ref -1); Fri, 10 Dec 2010 04:01:02 +0000 Received: (at submit) by debbugs.gnu.org; 10 Dec 2010 04:00:46 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQuA2-0000N8-6C for submit@debbugs.gnu.org; Thu, 09 Dec 2010 23:00:46 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQuA0-0000Mw-5v for submit@debbugs.gnu.org; Thu, 09 Dec 2010 23:00:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PQuFn-0002Ce-UT for submit@debbugs.gnu.org; Thu, 09 Dec 2010 23:06:46 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:35595) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PQuFn-0002CV-Rn for submit@debbugs.gnu.org; Thu, 09 Dec 2010 23:06:43 -0500 Received: from [140.186.70.92] (port=44486 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PQnJO-0002To-Nk for bug-coreutils@gnu.org; Thu, 09 Dec 2010 15:42:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PQlCJ-0005xE-D1 for bug-coreutils@gnu.org; Thu, 09 Dec 2010 13:26:32 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:54688) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PQlCJ-0005x7-4C; Thu, 09 Dec 2010 13:26:31 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id BCA3339E80F5; Thu, 9 Dec 2010 10:26:29 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ic1JcBkdULRq; Thu, 9 Dec 2010 10:26:29 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 2D8E239E80F2; Thu, 9 Dec 2010 10:26:29 -0800 (PST) Message-ID: <4D011F54.3050403@cs.ucla.edu> Date: Thu, 09 Dec 2010 10:26:28 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> In-Reply-To: <8762v39o8t.fsf_-_@meyering.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) On 12/09/10 03:31, Jim Meyering wrote: > The segfault (and other strangeness we've witnessed) > arises because each "node" struct is stored on the stack, > and its address ends up being used by another thread after > the thread that owns the stack in question has been "joined". Ah, of *course*! > My solution is to use the heap instead of the stack. This can be simplified by allocating all the nodes at once, since the number of nodes is bounded above by the number of threads. I'll take a look at this, if nobody else gets to it first. I am also still looking at the problem with the compression/decompression hang. The code to do subprocesses is busted in more than one way: it does not always catch failures (nonzero exit status) in decompression, and it can falsely report failure even when compression and decompression are both perfectly OK. However, I still don't have a handle on the actual bug that causes the hang. From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Chen Guo Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 10 Dec 2010 21:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: coreutils@gnu.org, bug-coreutils@gnu.org, Jim Meyering , DJ Lucas Received: via spool by submit@debbugs.gnu.org id=B.12920158406684 (code B ref -1); Fri, 10 Dec 2010 21:18:02 +0000 Received: (at submit) by debbugs.gnu.org; 10 Dec 2010 21:17:20 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRAL8-0001jk-Pr for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:17:19 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRAL6-0001jY-HC for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:17:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRAQx-0001XH-A9 for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:23:20 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_DKIM_INVALID autolearn=no version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:54981) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRAQx-0001XB-1h for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:23:19 -0500 Received: from [140.186.70.92] (port=54829 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PRAQv-0005zE-8r for bug-coreutils@gnu.org; Fri, 10 Dec 2010 16:23:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRAQt-0001WJ-Rp for bug-coreutils@gnu.org; Fri, 10 Dec 2010 16:23:17 -0500 Received: from mail-wy0-f169.google.com ([74.125.82.169]:62165) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRAQq-0001Us-3T; Fri, 10 Dec 2010 16:23:12 -0500 Received: by wyj26 with SMTP id 26so4121436wyj.0 for ; Fri, 10 Dec 2010 13:23:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=732nHKLSQFe0AybcYcgzFor8aNV2HLipp39hH3pLN10=; b=YiLZKPrh02mqz55jtjiPMFhtimVewaw/7VsyrnahfzbqLH931aD1iWQmIkjpvWo7hd ktlrnpjSwFnmmVrjOC0/sW19QaDoxh+X2Cmr34RE8nibQVkwd2C9lXvSlchD4IydMHiT to05C0uOVYfU3gqh5p5GItmEYv2kKF8TeWWBc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=lX7Od9kQBP9JE/M0no5Tey3sBKyCcYYrhIC9CnwJG8OL6QqVYMv9xFrqlRx+U1olV6 IN9GvKwJlom3DrjK3G1GcFm4fFt92nlSJq9QSJBEun17zzF2G850x+vz0701Qgqwd1Zt QbYQhYosYB9SpH3QX8axG0MlbtP4CkAuwrmas= MIME-Version: 1.0 Received: by 10.216.168.78 with SMTP id j56mr1706910wel.45.1292016190479; Fri, 10 Dec 2010 13:23:10 -0800 (PST) Received: by 10.216.240.196 with HTTP; Fri, 10 Dec 2010 13:23:10 -0800 (PST) In-Reply-To: <4D011F54.3050403@cs.ucla.edu> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> Date: Fri, 10 Dec 2010 13:23:10 -0800 Message-ID: From: Chen Guo Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -4.7 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.8 (----) Hi Professor Eggert, On Thu, Dec 9, 2010 at 10:26 AM, Paul Eggert wrote: > This can be simplified by allocating all the nodes at once, > since the number of nodes is bounded above by the number > of threads. =A0I'll take a look at this, if nobody else gets > to it first. Got to it first :-) This approach has the side benefit of letting me refactor that whole node creation block to its own function, and thus cleaning up sortlines quite a bit. Over 60 runs each on 2 and 4 threads, I am no longer seeing any segfaults. >From 837b4ea56ba32220c5c09f21a8ab59e7c71824a9 Mon Sep 17 00:00:00 2001 From: Chen Guo Date: Fri, 10 Dec 2010 13:13:36 -0800 Subject: [PATCH] sort: preallocate merge tree nodes to heap. * src/sort.c: (merge_tree_init) New function. Allocates memory for merge tree nodes. (merge_tree_destory) New function. (init_node) New function. (sortlines) Refactor node creation code to init_node. Remove now superfluous arguments. All callers changed. (sort) Initialize/destory merge tree. Refactor root node creation to merge_tree_init. --- src/sort.c | 170 +++++++++++++++++++++++++++++++++++++++-----------------= ---- 1 files changed, 111 insertions(+), 59 deletions(-) diff --git a/src/sort.c b/src/sort.c index 9cfc814..165d8e8 100644 --- a/src/sort.c +++ b/src/sort.c @@ -239,6 +239,8 @@ struct merge_node size_t nlo; /* Total Lines remaining from LO. */ size_t nhi; /* Total lines remaining from HI. */ struct merge_node *parent; /* Parent node. */ + struct merge_node *lo_child; /* LO child node. */ + struct merge_node *hi_child; /* HI child node. */ unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ pthread_mutex_t lock; /* Lock for node operations. */ @@ -3137,6 +3139,83 @@ sequential_sort (struct line *restrict lines, size_t nlines, } } +struct merge_node * init_node (struct merge_node *, struct merge_node *, + struct line *restrict, unsigned long int, + size_t, bool); + + +/* Initialize the merge tree. */ +static struct merge_node * +merge_tree_init (unsigned long int nthreads, size_t nlines, + struct line *restrict dest) +{ + struct merge_node *merge_tree =3D xmalloc (2 * nthreads * sizeof *merge_= tree); + + struct merge_node *root_node =3D merge_tree; + root_node->lo =3D root_node->hi =3D root_node->end_lo =3D root_node->end= _hi =3D NULL; + root_node->dest =3D NULL; + root_node->nlo =3D root_node->nhi =3D nlines; + root_node->parent =3D NULL; + root_node->level =3D MERGE_END; + root_node->queued =3D false; + pthread_mutex_init (&root_node->lock, NULL); + + init_node (root_node, root_node + 1, dest, nthreads, nlines, false); + return merge_tree; +} + +/* Destroy the merge tree. */ +static void +merge_tree_destroy (struct merge_node *merge_tree) +{ + free (merge_tree); +} + +/* Initialize a merge tree node. */ + +struct merge_node * +init_node (struct merge_node *parent, struct merge_node *node_pool, + struct line *restrict dest, unsigned long int nthreads, + size_t total_lines, bool lo_child) +{ + size_t nlines =3D (lo_child)? parent->nlo : parent->nhi; + size_t nlo =3D nlines / 2; + size_t nhi =3D nlines - nlo; + struct line *lo =3D dest - total_lines; + struct line *hi =3D lo - nlo; + struct line **parent_end =3D (lo_child)? &parent->end_lo : &parent->end_= hi; + + struct merge_node *node =3D node_pool++; + node->lo =3D node->end_lo =3D lo; + node->hi =3D node->end_hi =3D hi; + node->dest =3D parent_end; + node->nlo =3D nlo; + node->nhi =3D nhi; + node->parent =3D parent; + node->level =3D parent->level + 1; + node->queued =3D false; + pthread_mutex_init (&node->lock, NULL); + + if (nthreads > 1) + { + unsigned long int lo_threads =3D nthreads / 2; + unsigned long int hi_threads =3D nthreads - lo_threads; + node->lo_child =3D node_pool; + node_pool =3D init_node (node, node_pool, lo, lo_threads, + total_lines, true); + node->hi_child =3D node_pool; + node_pool =3D init_node (node, node_pool, hi, hi_threads, + total_lines, false); + } + else + { + node->lo_child =3D NULL; + node->hi_child =3D NULL; + } + return node_pool; +} + + /* Compare two merge nodes A and B for priority. */ static int @@ -3378,10 +3457,8 @@ merge_loop (struct merge_node_queue *queue, } -static void sortlines (struct line *restrict, struct line *restrict, - unsigned long int, size_t, - struct merge_node *, bool, - struct merge_node_queue *, +static void sortlines (struct line *restrict, unsigned long int, size_t, + struct merge_node *, bool, struct merge_node_queue = *, FILE *, char const *); /* Thread arguments for sortlines_thread. */ @@ -3392,19 +3469,15 @@ struct thread_args the end of the array. */ struct line *lines; - /* Destination, i.e., the array of lines to store result into. This - also points just past the end of the array. */ - struct line *dest; - /* Number of threads to use. If 0 or 1, sort single-threaded. */ unsigned long int nthreads; /* Number of lines in LINES and DEST. */ size_t const total_lines; - /* Parent node; it will merge this node's output to the output - of this node's sibling. */ - struct merge_node *const parent; + /* Merge node. Lines from this node and this node's sibling will merged + to this node's parent. */ + struct merge_node *const node; /* True if this node is sorting the lower half of the parent's work. */ bool lo_child; @@ -3425,9 +3498,9 @@ static void * sortlines_thread (void *data) { struct thread_args const *args =3D data; - sortlines (args->lines, args->dest, args->nthreads, args->total_lines, - args->parent, args->lo_child, args->queue, - args->tfp, args->output_temp); + sortlines (args->lines, args->nthreads, args->total_lines, + args->node, args->lo_child, args->queue, args->tfp, + args->output_temp); return NULL; } @@ -3456,49 +3529,32 @@ sortlines_thread (void *data) have been merged. */ static void -sortlines (struct line *restrict lines, struct line *restrict dest, - unsigned long int nthreads, size_t total_lines, - struct merge_node *parent, bool lo_child, - struct merge_node_queue *queue, - FILE *tfp, char const *temp_output) +sortlines (struct line *restrict lines, unsigned long int nthreads, + size_t total_lines, struct merge_node *node, bool lo_child, + struct merge_node_queue *queue, FILE *tfp, char const *temp_out= put) { - /* Create merge tree NODE. */ - size_t nlines =3D (lo_child)? parent->nlo : parent->nhi; - size_t nlo =3D nlines / 2; - size_t nhi =3D nlines - nlo; - struct line *lo =3D dest - total_lines; - struct line *hi =3D lo - nlo; - struct line **parent_end =3D (lo_child)? &parent->end_lo : &parent->end_= hi; - - struct merge_node node; - node.lo =3D node.end_lo =3D lo; - node.hi =3D node.end_hi =3D hi; - node.dest =3D parent_end; - node.nlo =3D nlo; - node.nhi =3D nhi; - node.parent =3D parent; - node.level =3D parent->level + 1; - node.queued =3D false; - pthread_mutex_init (&node.lock, NULL); + size_t nlines =3D (lo_child)? node->parent->nlo : node->parent->nhi; /* Calculate thread arguments. */ unsigned long int lo_threads =3D nthreads / 2; unsigned long int hi_threads =3D nthreads - lo_threads; pthread_t thread; - struct thread_args args =3D {lines, lo, lo_threads, total_lines, &node, - true, queue, tfp, temp_output}; + struct thread_args args =3D {lines, lo_threads, total_lines, + node->lo_child, true, queue, tfp, temp_output= }; if (nthreads > 1 && SUBTHREAD_LINES_HEURISTIC <=3D nlines && pthread_create (&thread, NULL, sortlines_thread, &args) =3D=3D 0) { - sortlines (lines - nlo, hi, hi_threads, total_lines, &node, false, - queue, tfp, temp_output); + sortlines (lines - node->nlo, hi_threads, total_lines, + node->hi_child, false, queue, tfp, temp_output); pthread_join (thread, NULL); } else { /* Nthreads =3D 1, this is a leaf NODE, or pthread_create failed. Sort with 1 thread. */ + size_t nlo =3D node->nlo; + size_t nhi =3D node->nhi; struct line *temp =3D lines - total_lines; if (1 < nhi) sequential_sort (lines - nlo, nhi, temp - nlo / 2, false); @@ -3506,16 +3562,16 @@ sortlines (struct line *restrict lines, struct line *restrict dest, sequential_sort (lines, nlo, temp, false); /* Update merge NODE. No need to lock yet. */ - node.lo =3D lines; - node.hi =3D lines - nlo; - node.end_lo =3D lines - nlo; - node.end_hi =3D lines - nlo - nhi; + node->lo =3D lines; + node->hi =3D lines - nlo; + node->end_lo =3D lines - nlo; + node->end_hi =3D lines - nlo - nhi; - queue_insert (queue, &node); + queue_insert (queue, node); merge_loop (queue, total_lines, tfp, temp_output); } - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&node->lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3791,20 +3847,16 @@ sort (char * const *files, size_t nfiles, char const *output_file, { struct merge_node_queue queue; queue_init (&queue, 2 * nthreads); + struct merge_node *merge_tree =3D + merge_tree_init (nthreads, buf.nlines, line); + struct merge_node *end_node =3D merge_tree; + struct merge_node *root_node =3D merge_tree + 1; - struct merge_node node; - node.lo =3D node.hi =3D node.end_lo =3D node.end_hi =3D NULL= ; - node.dest =3D NULL; - node.nlo =3D node.nhi =3D buf.nlines; - node.parent =3D NULL; - node.level =3D MERGE_END; - node.queued =3D false; - pthread_mutex_init (&node.lock, NULL); - - sortlines (line, line, nthreads, buf.nlines, &node, true, - &queue, tfp, temp_output); + sortlines (line, nthreads, buf.nlines, root_node, + true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&root_node->lock); + merge_tree_destroy (merge_tree); } else write_unique (line - 1, tfp, temp_output); --=20 1.7.1 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Chen Guo Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Fri, 10 Dec 2010 21:27:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: coreutils@gnu.org, bug-coreutils@gnu.org, Jim Meyering , DJ Lucas Received: via spool by submit@debbugs.gnu.org id=B.12920163977535 (code B ref -1); Fri, 10 Dec 2010 21:27:01 +0000 Received: (at submit) by debbugs.gnu.org; 10 Dec 2010 21:26:37 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRAU8-0001xU-Ou for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:26:37 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRAU5-0001xH-HW for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:26:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRAZv-0004bS-Js for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:32:37 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=BAYES_00, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, T_DKIM_INVALID autolearn=no version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:58870) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRAZv-0004b3-1b for submit@debbugs.gnu.org; Fri, 10 Dec 2010 16:32:35 -0500 Received: from [140.186.70.92] (port=53792 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PRAZt-0000vC-7I for bug-coreutils@gnu.org; Fri, 10 Dec 2010 16:32:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRAZr-0004a2-M9 for bug-coreutils@gnu.org; Fri, 10 Dec 2010 16:32:33 -0500 Received: from mail-wy0-f169.google.com ([74.125.82.169]:34862) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRAZn-0004ZC-NN; Fri, 10 Dec 2010 16:32:27 -0500 Received: by wyj26 with SMTP id 26so4130452wyj.0 for ; Fri, 10 Dec 2010 13:32:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=LZDYGblaaqBQRVBqDrr8h+M914eM3G4ZSjy5eVDeofs=; b=btDtDPEUcHbDhKIZJgaq14n3fNWzmX4GTaw6Z6PIFmFBvl3BsE4lq71867aQ1MzHT3 FlTh2a5fwqVOWVKwEqV9BJHdCSHRddq3FeUKVMYkk3FAj+uVlvcJYnZ6kBiLWNTcKSBT zyfsvdxBmozPYYyDoASEie7x7MUTYz5WUDBqg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=IzVWOfguEgBg9fbCQ7FfQ2wdU7dUV3SsT+PIqJJ87BRwChFebxfcw5ztr3Y/3/Kqkk f5TNciYuA91jcPTEk0ICl6t0n2COqhBMDzBTUs3+ZLBNgmUWW1vXMjU6JahOcFmNg0W3 jLuRYZrbJhyKT8EwD8nD7Pwi5KjinWGctU+HQ= MIME-Version: 1.0 Received: by 10.216.7.8 with SMTP id 8mr170578weo.30.1292016746807; Fri, 10 Dec 2010 13:32:26 -0800 (PST) Received: by 10.216.240.196 with HTTP; Fri, 10 Dec 2010 13:32:26 -0800 (PST) In-Reply-To: References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> Date: Fri, 10 Dec 2010 13:32:26 -0800 Message-ID: From: Chen Guo Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -4.8 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.9 (----) > + =A0size_t nlines =3D (lo_child)? node->parent->nlo : node->parent->nhi; Small change... Remove unnecessary branch and redirection in sortlines: size_t nlines =3D node->nlo + node->nhi; >From c691813ecbfce60c207960fd8bd327cc7d99fe39 Mon Sep 17 00:00:00 2001 From: Chen Guo Date: Fri, 10 Dec 2010 13:13:36 -0800 Subject: [PATCH] sort: preallocate merge tree nodes to heap. * src/sort.c: (merge_tree_init) New function. Allocates memory for merge tree nodes. (merge_tree_destory) New function. (init_node) New function. (sortlines) Refactor node creation code to init_node. Remove now superfluous arguments. All callers changed. (sort) Initialize/destory merge tree. Refactor root node creation to merge_tree_init. --- src/sort.c | 170 +++++++++++++++++++++++++++++++++++++++-----------------= ---- 1 files changed, 111 insertions(+), 59 deletions(-) diff --git a/src/sort.c b/src/sort.c index 9cfc814..019b065 100644 --- a/src/sort.c +++ b/src/sort.c @@ -239,6 +239,8 @@ struct merge_node size_t nlo; /* Total Lines remaining from LO. */ size_t nhi; /* Total lines remaining from HI. */ struct merge_node *parent; /* Parent node. */ + struct merge_node *lo_child; /* LO child node. */ + struct merge_node *hi_child; /* HI child node. */ unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ pthread_mutex_t lock; /* Lock for node operations. */ @@ -3137,6 +3139,83 @@ sequential_sort (struct line *restrict lines, size_t nlines, } } +struct merge_node * init_node (struct merge_node *, struct merge_node *, + struct line *restrict, unsigned long int, + size_t, bool); + + +/* Initialize the merge tree. */ +static struct merge_node * +merge_tree_init (unsigned long int nthreads, size_t nlines, + struct line *restrict dest) +{ + struct merge_node *merge_tree =3D xmalloc (2 * nthreads * sizeof *merge_= tree); + + struct merge_node *root_node =3D merge_tree; + root_node->lo =3D root_node->hi =3D root_node->end_lo =3D root_node->end= _hi =3D NULL; + root_node->dest =3D NULL; + root_node->nlo =3D root_node->nhi =3D nlines; + root_node->parent =3D NULL; + root_node->level =3D MERGE_END; + root_node->queued =3D false; + pthread_mutex_init (&root_node->lock, NULL); + + init_node (root_node, root_node + 1, dest, nthreads, nlines, false); + return merge_tree; +} + +/* Destroy the merge tree. */ +static void +merge_tree_destroy (struct merge_node *merge_tree) +{ + free (merge_tree); +} + +/* Initialize a merge tree node. */ + +struct merge_node * +init_node (struct merge_node *parent, struct merge_node *node_pool, + struct line *restrict dest, unsigned long int nthreads, + size_t total_lines, bool lo_child) +{ + size_t nlines =3D (lo_child)? parent->nlo : parent->nhi; + size_t nlo =3D nlines / 2; + size_t nhi =3D nlines - nlo; + struct line *lo =3D dest - total_lines; + struct line *hi =3D lo - nlo; + struct line **parent_end =3D (lo_child)? &parent->end_lo : &parent->end_= hi; + + struct merge_node *node =3D node_pool++; + node->lo =3D node->end_lo =3D lo; + node->hi =3D node->end_hi =3D hi; + node->dest =3D parent_end; + node->nlo =3D nlo; + node->nhi =3D nhi; + node->parent =3D parent; + node->level =3D parent->level + 1; + node->queued =3D false; + pthread_mutex_init (&node->lock, NULL); + + if (nthreads > 1) + { + unsigned long int lo_threads =3D nthreads / 2; + unsigned long int hi_threads =3D nthreads - lo_threads; + node->lo_child =3D node_pool; + node_pool =3D init_node (node, node_pool, lo, lo_threads, + total_lines, true); + node->hi_child =3D node_pool; + node_pool =3D init_node (node, node_pool, hi, hi_threads, + total_lines, false); + } + else + { + node->lo_child =3D NULL; + node->hi_child =3D NULL; + } + return node_pool; +} + + /* Compare two merge nodes A and B for priority. */ static int @@ -3378,10 +3457,8 @@ merge_loop (struct merge_node_queue *queue, } -static void sortlines (struct line *restrict, struct line *restrict, - unsigned long int, size_t, - struct merge_node *, bool, - struct merge_node_queue *, +static void sortlines (struct line *restrict, unsigned long int, size_t, + struct merge_node *, bool, struct merge_node_queue = *, FILE *, char const *); /* Thread arguments for sortlines_thread. */ @@ -3392,19 +3469,15 @@ struct thread_args the end of the array. */ struct line *lines; - /* Destination, i.e., the array of lines to store result into. This - also points just past the end of the array. */ - struct line *dest; - /* Number of threads to use. If 0 or 1, sort single-threaded. */ unsigned long int nthreads; /* Number of lines in LINES and DEST. */ size_t const total_lines; - /* Parent node; it will merge this node's output to the output - of this node's sibling. */ - struct merge_node *const parent; + /* Merge node. Lines from this node and this node's sibling will merged + to this node's parent. */ + struct merge_node *const node; /* True if this node is sorting the lower half of the parent's work. */ bool lo_child; @@ -3425,9 +3498,9 @@ static void * sortlines_thread (void *data) { struct thread_args const *args =3D data; - sortlines (args->lines, args->dest, args->nthreads, args->total_lines, - args->parent, args->lo_child, args->queue, - args->tfp, args->output_temp); + sortlines (args->lines, args->nthreads, args->total_lines, + args->node, args->lo_child, args->queue, args->tfp, + args->output_temp); return NULL; } @@ -3456,49 +3529,32 @@ sortlines_thread (void *data) have been merged. */ static void -sortlines (struct line *restrict lines, struct line *restrict dest, - unsigned long int nthreads, size_t total_lines, - struct merge_node *parent, bool lo_child, - struct merge_node_queue *queue, - FILE *tfp, char const *temp_output) +sortlines (struct line *restrict lines, unsigned long int nthreads, + size_t total_lines, struct merge_node *node, bool lo_child, + struct merge_node_queue *queue, FILE *tfp, char const *temp_out= put) { - /* Create merge tree NODE. */ - size_t nlines =3D (lo_child)? parent->nlo : parent->nhi; - size_t nlo =3D nlines / 2; - size_t nhi =3D nlines - nlo; - struct line *lo =3D dest - total_lines; - struct line *hi =3D lo - nlo; - struct line **parent_end =3D (lo_child)? &parent->end_lo : &parent->end_= hi; - - struct merge_node node; - node.lo =3D node.end_lo =3D lo; - node.hi =3D node.end_hi =3D hi; - node.dest =3D parent_end; - node.nlo =3D nlo; - node.nhi =3D nhi; - node.parent =3D parent; - node.level =3D parent->level + 1; - node.queued =3D false; - pthread_mutex_init (&node.lock, NULL); + size_t nlines =3D node->nlo + node->nhi; /* Calculate thread arguments. */ unsigned long int lo_threads =3D nthreads / 2; unsigned long int hi_threads =3D nthreads - lo_threads; pthread_t thread; - struct thread_args args =3D {lines, lo, lo_threads, total_lines, &node, - true, queue, tfp, temp_output}; + struct thread_args args =3D {lines, lo_threads, total_lines, + node->lo_child, true, queue, tfp, temp_output= }; if (nthreads > 1 && SUBTHREAD_LINES_HEURISTIC <=3D nlines && pthread_create (&thread, NULL, sortlines_thread, &args) =3D=3D 0) { - sortlines (lines - nlo, hi, hi_threads, total_lines, &node, false, - queue, tfp, temp_output); + sortlines (lines - node->nlo, hi_threads, total_lines, + node->hi_child, false, queue, tfp, temp_output); pthread_join (thread, NULL); } else { /* Nthreads =3D 1, this is a leaf NODE, or pthread_create failed. Sort with 1 thread. */ + size_t nlo =3D node->nlo; + size_t nhi =3D node->nhi; struct line *temp =3D lines - total_lines; if (1 < nhi) sequential_sort (lines - nlo, nhi, temp - nlo / 2, false); @@ -3506,16 +3562,16 @@ sortlines (struct line *restrict lines, struct line *restrict dest, sequential_sort (lines, nlo, temp, false); /* Update merge NODE. No need to lock yet. */ - node.lo =3D lines; - node.hi =3D lines - nlo; - node.end_lo =3D lines - nlo; - node.end_hi =3D lines - nlo - nhi; + node->lo =3D lines; + node->hi =3D lines - nlo; + node->end_lo =3D lines - nlo; + node->end_hi =3D lines - nlo - nhi; - queue_insert (queue, &node); + queue_insert (queue, node); merge_loop (queue, total_lines, tfp, temp_output); } - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&node->lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3791,20 +3847,16 @@ sort (char * const *files, size_t nfiles, char const *output_file, { struct merge_node_queue queue; queue_init (&queue, 2 * nthreads); + struct merge_node *merge_tree =3D + merge_tree_init (nthreads, buf.nlines, line); + struct merge_node *end_node =3D merge_tree; + struct merge_node *root_node =3D merge_tree + 1; - struct merge_node node; - node.lo =3D node.hi =3D node.end_lo =3D node.end_hi =3D NULL= ; - node.dest =3D NULL; - node.nlo =3D node.nhi =3D buf.nlines; - node.parent =3D NULL; - node.level =3D MERGE_END; - node.queued =3D false; - pthread_mutex_init (&node.lock, NULL); - - sortlines (line, line, nthreads, buf.nlines, &node, true, - &queue, tfp, temp_output); + sortlines (line, nthreads, buf.nlines, root_node, + true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&root_node->lock); + merge_tree_destroy (merge_tree); } else write_unique (line - 1, tfp, temp_output); --=20 1.7.1 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 11 Dec 2010 08:30:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Chen Guo Cc: coreutils@gnu.org, bug-coreutils@gnu.org, Jim Meyering , DJ Lucas Received: via spool by submit@debbugs.gnu.org id=B.1292056173910 (code B ref -1); Sat, 11 Dec 2010 08:30:03 +0000 Received: (at submit) by debbugs.gnu.org; 11 Dec 2010 08:29:33 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRKpf-0000Ec-W1 for submit@debbugs.gnu.org; Sat, 11 Dec 2010 03:29:33 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRKpc-0000EP-JS for submit@debbugs.gnu.org; Sat, 11 Dec 2010 03:29:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRKvT-00081k-2s for submit@debbugs.gnu.org; Sat, 11 Dec 2010 03:35:33 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:41537) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRKvS-00081d-Rs for submit@debbugs.gnu.org; Sat, 11 Dec 2010 03:35:31 -0500 Received: from [140.186.70.92] (port=50085 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PRKvQ-00025D-8m for bug-coreutils@gnu.org; Sat, 11 Dec 2010 03:35:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRKvN-000802-L9 for bug-coreutils@gnu.org; Sat, 11 Dec 2010 03:35:28 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:39441) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRKvN-0007zL-08; Sat, 11 Dec 2010 03:35:25 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id AA6AD39E8109; Sat, 11 Dec 2010 00:35:22 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cEuYX+N3cHWt; Sat, 11 Dec 2010 00:35:20 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 8734A39E80F7; Sat, 11 Dec 2010 00:35:20 -0800 (PST) Message-ID: <4D0337C8.1050005@cs.ucla.edu> Date: Sat, 11 Dec 2010 00:35:20 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -4.6 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.6 (----) Thanks, Chen, that was much nicer than what I was writing. I did some minor cleanups, mostly to do with catching an unlikely integer overflow that would have made 'sort' crash badly, and pushed the following set of patches. >From 780831a8602d9cdc22e7dcf44804e9c7183dd944 Mon Sep 17 00:00:00 2001 From: Chen Guo Date: Mon, 6 Dec 2010 00:15:42 -0800 Subject: [PATCH 1/4] sort: use mutexes, not spinlocks (avoid busy loop on blocked output) Running a command like this on a multi-core system sort < big-file | less would peg all processors at near 100% utilization. * src/sort.c: (struct merge_node) Change member lock to mutex. All uses changed. * tests/Makefile.am (XFAIL_TESTS): Remove definition, now that this test passes once again. I.e., the sort-spinlock-abuse test no longer fails. * NEWS (Bug reports): Mention this. Reported by DJ Lucas in http://debbugs.gnu.org/7489. --- NEWS | 5 +++++ src/sort.c | 14 +++++++------- tests/Makefile.am | 3 --- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/NEWS b/NEWS index c3110a3..9f55cbb 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,11 @@ GNU coreutils NEWS -*- outline -*- sort -u with at least two threads could attempt to read through a corrupted pointer. [bug introduced in coreutils-8.6] + sort with at least two threads and with blocked output would busy-loop + (spinlock) all threads, often using 100% of available CPU cycles to + do no work. I.e., "sort < big-file | less" could waste a lot of power. + [bug introduced in coreutils-8.6] + ** New features split accepts the --number option to generate a specific number of files. diff --git a/src/sort.c b/src/sort.c index a4c2cbf..9cfc814 100644 --- a/src/sort.c +++ b/src/sort.c @@ -241,7 +241,7 @@ struct merge_node struct merge_node *parent; /* Parent node. */ unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ - pthread_spinlock_t lock; /* Lock for node operations. */ + pthread_mutex_t lock; /* Lock for node operations. */ }; /* Priority queue of merge nodes. */ @@ -3157,7 +3157,7 @@ compare_nodes (void const *a, void const *b) static inline void lock_node (struct merge_node *node) { - pthread_spin_lock (&node->lock); + pthread_mutex_lock (&node->lock); } /* Unlock a merge tree NODE. */ @@ -3165,7 +3165,7 @@ lock_node (struct merge_node *node) static inline void unlock_node (struct merge_node *node) { - pthread_spin_unlock (&node->lock); + pthread_mutex_unlock (&node->lock); } /* Destroy merge QUEUE. */ @@ -3479,7 +3479,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, node.parent = parent; node.level = parent->level + 1; node.queued = false; - pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); + pthread_mutex_init (&node.lock, NULL); /* Calculate thread arguments. */ unsigned long int lo_threads = nthreads / 2; @@ -3515,7 +3515,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, merge_loop (queue, total_lines, tfp, temp_output); } - pthread_spin_destroy (&node.lock); + pthread_mutex_destroy (&node.lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3799,12 +3799,12 @@ sort (char * const *files, size_t nfiles, char const *output_file, node.parent = NULL; node.level = MERGE_END; node.queued = false; - pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); + pthread_mutex_init (&node.lock, NULL); sortlines (line, line, nthreads, buf.nlines, &node, true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_spin_destroy (&node.lock); + pthread_mutex_destroy (&node.lock); } else write_unique (line - 1, tfp, temp_output); diff --git a/tests/Makefile.am b/tests/Makefile.am index d52f677..b573061 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -656,7 +656,4 @@ pr_data = \ pr/ttb3-FF \ pr/w72l24f-ll -XFAIL_TESTS = \ - misc/sort-spinlock-abuse - include $(srcdir)/check.mk -- 1.7.2 >From bcf9043b1f3983d099672047b36ad0a371af169d Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 10 Dec 2010 20:52:04 -0800 Subject: [PATCH 2/4] sort: comment fix * src/sort.c: Comment fix re spin locks. --- src/sort.c | 7 +------ 1 files changed, 1 insertions(+), 6 deletions(-) diff --git a/src/sort.c b/src/sort.c index 9cfc814..36e3b19 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3149,10 +3149,7 @@ compare_nodes (void const *a, void const *b) return nodea->level < nodeb->level; } -/* Lock a merge tree NODE. - Spin locks were seen to perform better than mutexes when the number - of threads is limited to the number of processors, assuming 'sort' - never waits when writing to stdout. */ +/* Lock a merge tree NODE. */ static inline void lock_node (struct merge_node *node) @@ -4567,8 +4564,6 @@ main (int argc, char **argv) } else { - /* If NTHREADS > number of cores on the machine, spinlocking - could be wasteful. */ unsigned long int np2 = num_processors (NPROC_CURRENT_OVERRIDABLE); if (!nthreads || nthreads > np2) nthreads = np2; -- 1.7.2 >From a1f8177972fb3f864847dc45237ad7d4d6a7f395 Mon Sep 17 00:00:00 2001 From: Chen Guo Date: Fri, 10 Dec 2010 13:13:36 -0800 Subject: [PATCH 3/4] sort: preallocate merge tree nodes to heap. * src/sort.c: (merge_tree_init) New function. Allocates memory for merge tree nodes. (merge_tree_destory) New function. (init_node) New function. (sortlines) Refactor node creation code to init_node. Remove now superfluous arguments. All callers changed. (sort) Initialize/destory merge tree. Refactor root node creation to merge_tree_init. --- src/sort.c | 170 +++++++++++++++++++++++++++++++++++++++--------------------- 1 files changed, 111 insertions(+), 59 deletions(-) diff --git a/src/sort.c b/src/sort.c index 36e3b19..b724f3d 100644 --- a/src/sort.c +++ b/src/sort.c @@ -239,6 +239,8 @@ struct merge_node size_t nlo; /* Total Lines remaining from LO. */ size_t nhi; /* Total lines remaining from HI. */ struct merge_node *parent; /* Parent node. */ + struct merge_node *lo_child; /* LO child node. */ + struct merge_node *hi_child; /* HI child node. */ unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ pthread_mutex_t lock; /* Lock for node operations. */ @@ -3137,6 +3139,83 @@ sequential_sort (struct line *restrict lines, size_t nlines, } } +struct merge_node * init_node (struct merge_node *, struct merge_node *, + struct line *restrict, unsigned long int, + size_t, bool); + + +/* Initialize the merge tree. */ +static struct merge_node * +merge_tree_init (unsigned long int nthreads, size_t nlines, + struct line *restrict dest) +{ + struct merge_node *merge_tree = xmalloc (2 * nthreads * sizeof *merge_tree); + + struct merge_node *root_node = merge_tree; + root_node->lo = root_node->hi = root_node->end_lo = root_node->end_hi = NULL; + root_node->dest = NULL; + root_node->nlo = root_node->nhi = nlines; + root_node->parent = NULL; + root_node->level = MERGE_END; + root_node->queued = false; + pthread_mutex_init (&root_node->lock, NULL); + + init_node (root_node, root_node + 1, dest, nthreads, nlines, false); + return merge_tree; +} + +/* Destroy the merge tree. */ +static void +merge_tree_destroy (struct merge_node *merge_tree) +{ + free (merge_tree); +} + +/* Initialize a merge tree node. */ + +struct merge_node * +init_node (struct merge_node *parent, struct merge_node *node_pool, + struct line *restrict dest, unsigned long int nthreads, + size_t total_lines, bool lo_child) +{ + size_t nlines = (lo_child)? parent->nlo : parent->nhi; + size_t nlo = nlines / 2; + size_t nhi = nlines - nlo; + struct line *lo = dest - total_lines; + struct line *hi = lo - nlo; + struct line **parent_end = (lo_child)? &parent->end_lo : &parent->end_hi; + + struct merge_node *node = node_pool++; + node->lo = node->end_lo = lo; + node->hi = node->end_hi = hi; + node->dest = parent_end; + node->nlo = nlo; + node->nhi = nhi; + node->parent = parent; + node->level = parent->level + 1; + node->queued = false; + pthread_mutex_init (&node->lock, NULL); + + if (nthreads > 1) + { + unsigned long int lo_threads = nthreads / 2; + unsigned long int hi_threads = nthreads - lo_threads; + node->lo_child = node_pool; + node_pool = init_node (node, node_pool, lo, lo_threads, + total_lines, true); + node->hi_child = node_pool; + node_pool = init_node (node, node_pool, hi, hi_threads, + total_lines, false); + } + else + { + node->lo_child = NULL; + node->hi_child = NULL; + } + return node_pool; +} + + /* Compare two merge nodes A and B for priority. */ static int @@ -3375,10 +3454,8 @@ merge_loop (struct merge_node_queue *queue, } -static void sortlines (struct line *restrict, struct line *restrict, - unsigned long int, size_t, - struct merge_node *, bool, - struct merge_node_queue *, +static void sortlines (struct line *restrict, unsigned long int, size_t, + struct merge_node *, bool, struct merge_node_queue *, FILE *, char const *); /* Thread arguments for sortlines_thread. */ @@ -3389,19 +3466,15 @@ struct thread_args the end of the array. */ struct line *lines; - /* Destination, i.e., the array of lines to store result into. This - also points just past the end of the array. */ - struct line *dest; - /* Number of threads to use. If 0 or 1, sort single-threaded. */ unsigned long int nthreads; /* Number of lines in LINES and DEST. */ size_t const total_lines; - /* Parent node; it will merge this node's output to the output - of this node's sibling. */ - struct merge_node *const parent; + /* Merge node. Lines from this node and this node's sibling will merged + to this node's parent. */ + struct merge_node *const node; /* True if this node is sorting the lower half of the parent's work. */ bool lo_child; @@ -3422,9 +3495,9 @@ static void * sortlines_thread (void *data) { struct thread_args const *args = data; - sortlines (args->lines, args->dest, args->nthreads, args->total_lines, - args->parent, args->lo_child, args->queue, - args->tfp, args->output_temp); + sortlines (args->lines, args->nthreads, args->total_lines, + args->node, args->lo_child, args->queue, args->tfp, + args->output_temp); return NULL; } @@ -3453,49 +3526,32 @@ sortlines_thread (void *data) have been merged. */ static void -sortlines (struct line *restrict lines, struct line *restrict dest, - unsigned long int nthreads, size_t total_lines, - struct merge_node *parent, bool lo_child, - struct merge_node_queue *queue, - FILE *tfp, char const *temp_output) +sortlines (struct line *restrict lines, unsigned long int nthreads, + size_t total_lines, struct merge_node *node, bool lo_child, + struct merge_node_queue *queue, FILE *tfp, char const *temp_output) { - /* Create merge tree NODE. */ - size_t nlines = (lo_child)? parent->nlo : parent->nhi; - size_t nlo = nlines / 2; - size_t nhi = nlines - nlo; - struct line *lo = dest - total_lines; - struct line *hi = lo - nlo; - struct line **parent_end = (lo_child)? &parent->end_lo : &parent->end_hi; - - struct merge_node node; - node.lo = node.end_lo = lo; - node.hi = node.end_hi = hi; - node.dest = parent_end; - node.nlo = nlo; - node.nhi = nhi; - node.parent = parent; - node.level = parent->level + 1; - node.queued = false; - pthread_mutex_init (&node.lock, NULL); + size_t nlines = node->nlo + node->nhi; /* Calculate thread arguments. */ unsigned long int lo_threads = nthreads / 2; unsigned long int hi_threads = nthreads - lo_threads; pthread_t thread; - struct thread_args args = {lines, lo, lo_threads, total_lines, &node, - true, queue, tfp, temp_output}; + struct thread_args args = {lines, lo_threads, total_lines, + node->lo_child, true, queue, tfp, temp_output}; if (nthreads > 1 && SUBTHREAD_LINES_HEURISTIC <= nlines && pthread_create (&thread, NULL, sortlines_thread, &args) == 0) { - sortlines (lines - nlo, hi, hi_threads, total_lines, &node, false, - queue, tfp, temp_output); + sortlines (lines - node->nlo, hi_threads, total_lines, + node->hi_child, false, queue, tfp, temp_output); pthread_join (thread, NULL); } else { /* Nthreads = 1, this is a leaf NODE, or pthread_create failed. Sort with 1 thread. */ + size_t nlo = node->nlo; + size_t nhi = node->nhi; struct line *temp = lines - total_lines; if (1 < nhi) sequential_sort (lines - nlo, nhi, temp - nlo / 2, false); @@ -3503,16 +3559,16 @@ sortlines (struct line *restrict lines, struct line *restrict dest, sequential_sort (lines, nlo, temp, false); /* Update merge NODE. No need to lock yet. */ - node.lo = lines; - node.hi = lines - nlo; - node.end_lo = lines - nlo; - node.end_hi = lines - nlo - nhi; + node->lo = lines; + node->hi = lines - nlo; + node->end_lo = lines - nlo; + node->end_hi = lines - nlo - nhi; - queue_insert (queue, &node); + queue_insert (queue, node); merge_loop (queue, total_lines, tfp, temp_output); } - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&node->lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3788,20 +3844,16 @@ sort (char * const *files, size_t nfiles, char const *output_file, { struct merge_node_queue queue; queue_init (&queue, 2 * nthreads); + struct merge_node *merge_tree = + merge_tree_init (nthreads, buf.nlines, line); + struct merge_node *end_node = merge_tree; + struct merge_node *root_node = merge_tree + 1; - struct merge_node node; - node.lo = node.hi = node.end_lo = node.end_hi = NULL; - node.dest = NULL; - node.nlo = node.nhi = buf.nlines; - node.parent = NULL; - node.level = MERGE_END; - node.queued = false; - pthread_mutex_init (&node.lock, NULL); - - sortlines (line, line, nthreads, buf.nlines, &node, true, - &queue, tfp, temp_output); + sortlines (line, nthreads, buf.nlines, root_node, + true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_mutex_destroy (&node.lock); + pthread_mutex_destroy (&root_node->lock); + merge_tree_destroy (merge_tree); } else write_unique (line - 1, tfp, temp_output); -- 1.7.2 >From 8413953717970bc1cf33c52a5882489717304624 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 11 Dec 2010 00:27:05 -0800 Subject: [PATCH 4/4] sort: integer overflow checks in thread counts, etc. * src/sort.c (specify_nthreads, merge_tree_init, init_node): (queue_init, sortlines, struct thread_args, sort, main): Use size_t, not unsigned long int, for thread counts, since thread counts are now used to compute sizes. (specify_nthreads): Check for size_t overflow. (merge_tree_init, sort): Shorten name of local variable, for readability. (merge_tree_init): Move constants next to each other in product, so that the constant folding is easier to see. (init_node): Now static. Add 'restrict' only where it might be helpful for compiler optimization. (queue_init): 2nd arg is now nthreads, not "reserve", which is a bit harder to follow. All uses changed. (struct thread_args): Rename lo_child to is_lo_child, so that it's obvious to the reader when we're talking about this boolean as opposed to the new lo_child member of the other structure. All uses changed. (sort): Remove unused local variable end_node. (main): Don't allow large thread counts to cause undefined behavior later, due to integer overflow. --- src/sort.c | 115 +++++++++++++++++++++++++++++++++-------------------------- 1 files changed, 64 insertions(+), 51 deletions(-) diff --git a/src/sort.c b/src/sort.c index b724f3d..2c0f852 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1379,15 +1379,17 @@ specify_sort_size (int oi, char c, char const *s) } /* Specify the number of threads to spawn during internal sort. */ -static unsigned long int +static size_t specify_nthreads (int oi, char c, char const *s) { unsigned long int nthreads; enum strtol_error e = xstrtoul (s, NULL, 10, &nthreads, ""); if (e == LONGINT_OVERFLOW) - return ULONG_MAX; + return SIZE_MAX; if (e != LONGINT_OK) xstrtol_fatal (e, oi, c, long_options, s); + if (SIZE_MAX < nthreads) + nthreads = SIZE_MAX; if (nthreads == 0) error (SORT_FAILURE, 0, _("number in parallel must be nonzero")); return nthreads; @@ -3139,28 +3141,28 @@ sequential_sort (struct line *restrict lines, size_t nlines, } } -struct merge_node * init_node (struct merge_node *, struct merge_node *, - struct line *restrict, unsigned long int, - size_t, bool); +static struct merge_node *init_node (struct merge_node *restrict, + struct merge_node *restrict, + struct line *, size_t, size_t, bool); -/* Initialize the merge tree. */ +/* Create and return a merge tree for NTHREADS threads, sorting NLINES + lines, with destination DEST. */ static struct merge_node * -merge_tree_init (unsigned long int nthreads, size_t nlines, - struct line *restrict dest) +merge_tree_init (size_t nthreads, size_t nlines, struct line *dest) { - struct merge_node *merge_tree = xmalloc (2 * nthreads * sizeof *merge_tree); - - struct merge_node *root_node = merge_tree; - root_node->lo = root_node->hi = root_node->end_lo = root_node->end_hi = NULL; - root_node->dest = NULL; - root_node->nlo = root_node->nhi = nlines; - root_node->parent = NULL; - root_node->level = MERGE_END; - root_node->queued = false; - pthread_mutex_init (&root_node->lock, NULL); - - init_node (root_node, root_node + 1, dest, nthreads, nlines, false); + struct merge_node *merge_tree = xmalloc (2 * sizeof *merge_tree * nthreads); + + struct merge_node *root = merge_tree; + root->lo = root->hi = root->end_lo = root->end_hi = NULL; + root->dest = NULL; + root->nlo = root->nhi = nlines; + root->parent = NULL; + root->level = MERGE_END; + root->queued = false; + pthread_mutex_init (&root->lock, NULL); + + init_node (root, root + 1, dest, nthreads, nlines, false); return merge_tree; } @@ -3171,19 +3173,25 @@ merge_tree_destroy (struct merge_node *merge_tree) free (merge_tree); } -/* Initialize a merge tree node. */ +/* Initialize a merge tree node and its descendants. The node's + parent is PARENT. The node and its descendants are taken from the + array of nodes NODE_POOL. Their destination starts at DEST; they + will consume NTHREADS threads. The total number of sort lines is + TOTAL_LINES. IS_LO_CHILD is true if the node is the low child of + its parent. */ -struct merge_node * -init_node (struct merge_node *parent, struct merge_node *node_pool, - struct line *restrict dest, unsigned long int nthreads, - size_t total_lines, bool lo_child) +static struct merge_node * +init_node (struct merge_node *restrict parent, + struct merge_node *restrict node_pool, + struct line *dest, size_t nthreads, + size_t total_lines, bool is_lo_child) { - size_t nlines = (lo_child)? parent->nlo : parent->nhi; + size_t nlines = (is_lo_child ? parent->nlo : parent->nhi); size_t nlo = nlines / 2; size_t nhi = nlines - nlo; struct line *lo = dest - total_lines; struct line *hi = lo - nlo; - struct line **parent_end = (lo_child)? &parent->end_lo : &parent->end_hi; + struct line **parent_end = (is_lo_child ? &parent->end_lo : &parent->end_hi); struct merge_node *node = node_pool++; node->lo = node->end_lo = lo; @@ -3198,8 +3206,8 @@ init_node (struct merge_node *parent, struct merge_node *node_pool, if (nthreads > 1) { - unsigned long int lo_threads = nthreads / 2; - unsigned long int hi_threads = nthreads - lo_threads; + size_t lo_threads = nthreads / 2; + size_t hi_threads = nthreads - lo_threads; node->lo_child = node_pool; node_pool = init_node (node, node_pool, lo, lo_threads, total_lines, true); @@ -3254,15 +3262,16 @@ queue_destroy (struct merge_node_queue *queue) pthread_mutex_destroy (&queue->mutex); } -/* Initialize merge QUEUE, allocating space for a maximum of RESERVE nodes. - Though it's highly unlikely all nodes are in the heap at the same time, - RESERVE should accommodate all of them. Counting a NULL dummy head for the - heap, RESERVE should be 2 * NTHREADS. */ +/* Initialize merge QUEUE, allocating space suitable for a maximum of + NTHREADS threads. */ static void -queue_init (struct merge_node_queue *queue, size_t reserve) +queue_init (struct merge_node_queue *queue, size_t nthreads) { - queue->priority_queue = heap_alloc (compare_nodes, reserve); + /* Though it's highly unlikely all nodes are in the heap at the same + time, the heap should accommodate all of them. Counting a NULL + dummy head for the heap, reserve 2 * NTHREADS nodes. */ + queue->priority_queue = heap_alloc (compare_nodes, 2 * nthreads); pthread_mutex_init (&queue->mutex, NULL); pthread_cond_init (&queue->cond, NULL); } @@ -3454,7 +3463,7 @@ merge_loop (struct merge_node_queue *queue, } -static void sortlines (struct line *restrict, unsigned long int, size_t, +static void sortlines (struct line *restrict, size_t, size_t, struct merge_node *, bool, struct merge_node_queue *, FILE *, char const *); @@ -3467,7 +3476,7 @@ struct thread_args struct line *lines; /* Number of threads to use. If 0 or 1, sort single-threaded. */ - unsigned long int nthreads; + size_t nthreads; /* Number of lines in LINES and DEST. */ size_t const total_lines; @@ -3477,7 +3486,7 @@ struct thread_args struct merge_node *const node; /* True if this node is sorting the lower half of the parent's work. */ - bool lo_child; + bool is_lo_child; /* The priority queue controlling available work for the entire internal sort. */ @@ -3496,7 +3505,7 @@ sortlines_thread (void *data) { struct thread_args const *args = data; sortlines (args->lines, args->nthreads, args->total_lines, - args->node, args->lo_child, args->queue, args->tfp, + args->node, args->is_lo_child, args->queue, args->tfp, args->output_temp); return NULL; } @@ -3526,15 +3535,15 @@ sortlines_thread (void *data) have been merged. */ static void -sortlines (struct line *restrict lines, unsigned long int nthreads, - size_t total_lines, struct merge_node *node, bool lo_child, +sortlines (struct line *restrict lines, size_t nthreads, + size_t total_lines, struct merge_node *node, bool is_lo_child, struct merge_node_queue *queue, FILE *tfp, char const *temp_output) { size_t nlines = node->nlo + node->nhi; /* Calculate thread arguments. */ - unsigned long int lo_threads = nthreads / 2; - unsigned long int hi_threads = nthreads - lo_threads; + size_t lo_threads = nthreads / 2; + size_t hi_threads = nthreads - lo_threads; pthread_t thread; struct thread_args args = {lines, lo_threads, total_lines, node->lo_child, true, queue, tfp, temp_output}; @@ -3774,7 +3783,7 @@ merge (struct sortfile *files, size_t ntemps, size_t nfiles, static void sort (char * const *files, size_t nfiles, char const *output_file, - unsigned long int nthreads) + size_t nthreads) { struct buffer buf; size_t ntemps = 0; @@ -3793,7 +3802,7 @@ sort (char * const *files, size_t nfiles, char const *output_file, if (nthreads > 1) { /* Get log P. */ - unsigned long int tmp = 1; + size_t tmp = 1; size_t mult = 1; while (tmp < nthreads) { @@ -3843,16 +3852,15 @@ sort (char * const *files, size_t nfiles, char const *output_file, if (1 < buf.nlines) { struct merge_node_queue queue; - queue_init (&queue, 2 * nthreads); + queue_init (&queue, nthreads); struct merge_node *merge_tree = merge_tree_init (nthreads, buf.nlines, line); - struct merge_node *end_node = merge_tree; - struct merge_node *root_node = merge_tree + 1; + struct merge_node *root = merge_tree + 1; - sortlines (line, nthreads, buf.nlines, root_node, + sortlines (line, nthreads, buf.nlines, root, true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_mutex_destroy (&root_node->lock); + pthread_mutex_destroy (&root->lock); merge_tree_destroy (merge_tree); } else @@ -4076,7 +4084,7 @@ main (int argc, char **argv) bool mergeonly = false; char *random_source = NULL; bool need_random = false; - unsigned long int nthreads = 0; + size_t nthreads = 0; size_t nfiles = 0; bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL); bool obsolete_usage = (posix2_version () < 200112); @@ -4620,6 +4628,11 @@ main (int argc, char **argv) if (!nthreads || nthreads > np2) nthreads = np2; + /* Avoid integer overflow later. */ + size_t nthreads_max = SIZE_MAX / (2 * sizeof (struct merge_node)); + if (nthreads_max < nthreads) + nthreads = nthreads_max; + sort (files, nfiles, outfile, nthreads); } -- 1.7.2 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 11 Dec 2010 11:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129206540314198 (code B ref -1); Sat, 11 Dec 2010 11:04:01 +0000 Received: (at submit) by debbugs.gnu.org; 11 Dec 2010 11:03:23 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRNEY-0003gx-Ie for submit@debbugs.gnu.org; Sat, 11 Dec 2010 06:03:22 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRNEW-0003gk-0U for submit@debbugs.gnu.org; Sat, 11 Dec 2010 06:03:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRNKN-0007N2-Uo for submit@debbugs.gnu.org; Sat, 11 Dec 2010 06:09:25 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:33497) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRNKN-0007Mp-A1 for submit@debbugs.gnu.org; Sat, 11 Dec 2010 06:09:23 -0500 Received: from [140.186.70.92] (port=53984 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PRNKM-0000kM-7g for bug-coreutils@gnu.org; Sat, 11 Dec 2010 06:09:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRNKK-0007M8-Dw for bug-coreutils@gnu.org; Sat, 11 Dec 2010 06:09:22 -0500 Received: from mx.meyering.net ([82.230.74.64]:40671) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRNKJ-0007LX-Gu; Sat, 11 Dec 2010 06:09:20 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 7EAE66018B; Sat, 11 Dec 2010 12:09:17 +0100 (CET) From: Jim Meyering In-Reply-To: <4D0337C8.1050005@cs.ucla.edu> (Paul Eggert's message of "Sat, 11 Dec 2010 00:35:20 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> Date: Sat, 11 Dec 2010 12:09:17 +0100 Message-ID: <8762v04lcy.fsf@meyering.net> Lines: 123 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.7 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) Paul Eggert wrote: > Thanks, Chen, that was much nicer than what I was writing. > I did some minor cleanups, mostly to do with catching an > unlikely integer overflow that would have made 'sort' crash badly, > and pushed the following set of patches. Thanks for helping, but... Chen's log message would have been appropriate if that patch had been rebased to apply after my stack->heap bug fix. However, as a replacement for my fix, the description is lacking, since it says nothing about fixing the hard-to-reproduce (and harder-to-diagnose) segfault-inducing bug. Plus, the change set includes no NEWS or test suite addition. With each bug fix it is best to describe or at least mention the bug in the commit log. Also, there should be a NEWS entry and, preferably, a test or two to exercise the bug. Here at least is the NEWS addition and log from what I posted Thursday. I've also fixed a few syntax nits separately: >From 9a9d69e9e463ebfa67e90ecc061305532a91543e Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Sat, 11 Dec 2010 11:29:38 +0100 Subject: [PATCH 1/2] sort: syntax cleanup * src/sort.c (xfopen, debug_key, sortlines, sort, main): Adjust formatting: fix misplaced braces, use consistent spacing, split a 2-stmt line. --- src/sort.c | 11 ++++++----- 1 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/sort.c b/src/sort.c index 2c0f852..1de8115 100644 --- a/src/sort.c +++ b/src/sort.c @@ -939,7 +939,7 @@ stream_open (char const *file, char const *how) static FILE * xfopen (char const *file, char const *how) - { +{ FILE *fp = stream_open (file, how); if (!fp) die (_("open failed"), file); @@ -2207,7 +2207,8 @@ debug_key (struct line const *line, struct keyfield const *key) if (key->skipsblanks || key->month || key_numeric (key)) { - char saved = *lim; *lim = '\0'; + char saved = *lim; + *lim = '\0'; while (blanks[to_uchar (*beg)]) beg++; @@ -3782,7 +3783,7 @@ merge (struct sortfile *files, size_t ntemps, size_t nfiles, /* Sort NFILES FILES onto OUTPUT_FILE. Use at most NTHREADS threads. */ static void -sort (char * const *files, size_t nfiles, char const *output_file, +sort (char *const *files, size_t nfiles, char const *output_file, size_t nthreads) { struct buffer buf; @@ -4498,7 +4499,7 @@ main (int argc, char **argv) files = tok.tok; nfiles = tok.n_tok; for (i = 0; i < nfiles; i++) - { + { if (STREQ (files[i], "-")) error (SORT_FAILURE, 0, _("when reading file names from stdin, " "no file name of %s allowed"), @@ -4513,7 +4514,7 @@ main (int argc, char **argv) _("%s:%lu: invalid zero-length file name"), quotearg_colon (files_from), file_number); } - } + } } else error (SORT_FAILURE, 0, _("no input from %s"), -- 1.7.3.3 >From ad61335bf832937dd95739c70405db9b61ffda37 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Sat, 11 Dec 2010 11:38:21 +0100 Subject: [PATCH 2/2] sort: avoid segfault when using two or more threads This change does not fix the actual bug. That was done by commit c9db0ac6, "sort: preallocate merge tree nodes to heap". The fix was to store each "node" structure on the heap, not on the stack. Otherwise, a node from one thread's stack could be used in another thread after the first thread had expired (via pthread_join). This bug was very hard to trigger when using spinlocks, but easier once we began using mutexes. * NEWS (Bug fixes): Mention it. For details, see http://debbugs.gnu.org/7597. --- NEWS | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/NEWS b/NEWS index 9f55cbb..b0a11b1 100644 --- a/NEWS +++ b/NEWS @@ -18,6 +18,9 @@ GNU coreutils NEWS -*- outline -*- do no work. I.e., "sort < big-file | less" could waste a lot of power. [bug introduced in coreutils-8.6] + sort with at least two threads no longer segfaults due to use of pointers + into the stack of an expired thread. [bug introduced in coreutils-8.6] + ** New features split accepts the --number option to generate a specific number of files. -- 1.7.3.3 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 11 Dec 2010 20:07:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jim Meyering Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.1292097992981 (code B ref -1); Sat, 11 Dec 2010 20:07:01 +0000 Received: (at submit) by debbugs.gnu.org; 11 Dec 2010 20:06:32 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRViB-0000Fm-QK for submit@debbugs.gnu.org; Sat, 11 Dec 2010 15:06:32 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRVi9-0000FT-5l for submit@debbugs.gnu.org; Sat, 11 Dec 2010 15:06:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRVnw-0001CU-GO for submit@debbugs.gnu.org; Sat, 11 Dec 2010 15:12:35 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:55374) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRVnw-00015l-ET for submit@debbugs.gnu.org; Sat, 11 Dec 2010 15:12:28 -0500 Received: from [140.186.70.92] (port=36081 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PRVcp-0007O4-9V for bug-coreutils@gnu.org; Sat, 11 Dec 2010 15:01:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRVco-0006k0-5L for bug-coreutils@gnu.org; Sat, 11 Dec 2010 15:00:59 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:42910) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRVcn-0006jQ-QI; Sat, 11 Dec 2010 15:00:58 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id BEA0739E80FF; Sat, 11 Dec 2010 12:00:55 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jF9xGlc8ytQm; Sat, 11 Dec 2010 12:00:54 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id CAC5939E80B1; Sat, 11 Dec 2010 12:00:54 -0800 (PST) Message-ID: <4D03D870.9050807@cs.ucla.edu> Date: Sat, 11 Dec 2010 12:00:48 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 References: <4CEFF613.8060200@linuxfromscratch.org> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> <8762v04lcy.fsf@meyering.net> In-Reply-To: <8762v04lcy.fsf@meyering.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -4.6 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.6 (----) Sorry for botching the NEWS and the change log. To help make amends, how about if I add a test case for that? I'm thinking of the 2nd test case in , namely this one: gensort -a 10000 > gensort-10k for i in $(seq 2000); do printf '% 4d\n' $i; src/sort -S 100K \ --parallel=2 gensort-10k > j; test $(wc -c < j) = 1000000 || break; done Or if you have a better one in mind, please let me know. There are also some test cases I need to add for the (unrelated) sort-compression bug, which is next on my list of coreutils bugs to look at. From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sun, 12 Dec 2010 15:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129216811313880 (code B ref -1); Sun, 12 Dec 2010 15:36:02 +0000 Received: (at submit) by debbugs.gnu.org; 12 Dec 2010 15:35:13 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRnxA-0003bo-F5 for submit@debbugs.gnu.org; Sun, 12 Dec 2010 10:35:12 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRnx6-0003bb-So for submit@debbugs.gnu.org; Sun, 12 Dec 2010 10:35:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRo31-0003DJ-Uw for submit@debbugs.gnu.org; Sun, 12 Dec 2010 10:41:17 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:51689) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRo31-0003D6-RJ for submit@debbugs.gnu.org; Sun, 12 Dec 2010 10:41:15 -0500 Received: from [140.186.70.92] (port=48012 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PRo2z-0005pG-Au for bug-coreutils@gnu.org; Sun, 12 Dec 2010 10:41:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRo2v-0003Ao-Lm for bug-coreutils@gnu.org; Sun, 12 Dec 2010 10:41:13 -0500 Received: from mx.meyering.net ([82.230.74.64]:60436) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRo2v-0003AA-Cy; Sun, 12 Dec 2010 10:41:09 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id C7CC460113; Sun, 12 Dec 2010 16:41:07 +0100 (CET) From: Jim Meyering In-Reply-To: <4D03D870.9050807@cs.ucla.edu> (Paul Eggert's message of "Sat, 11 Dec 2010 12:00:48 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> <8762v04lcy.fsf@meyering.net> <4D03D870.9050807@cs.ucla.edu> Date: Sun, 12 Dec 2010 16:41:07 +0100 Message-ID: <874oajyp64.fsf@meyering.net> Lines: 66 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.7 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) Paul Eggert wrote: > Sorry for botching the NEWS and the change log. To help > make amends, how about if I add a test case for that? That would be welcome. Thanks. > I'm thinking of the 2nd test case in > , > namely this one: > > gensort -a 10000 > gensort-10k > for i in $(seq 2000); do printf '% 4d\n' $i; src/sort -S 100K \ > --parallel=2 gensort-10k > j; test $(wc -c < j) = 1000000 || break; done That sounds good, assuming it triggers the bug reliably for you. I was hoping to find a way to reproduce it without relying on gensort, but won't object if you want to do that. I ran the above a few times on a 6-core i7 970 and it would usually fail only 2 or 3 times out of 2000. For one trial, it failed only once, and that was on the 1932nd iteration. Interestingly, when I run enough of those 2-process jobs in parallel to keep all nominal 12 processors busy, I get 80-100 failures in 10,000 trials. cat <<\EOF > sort-test #!/bin/bash exp_output=$1 t=$(mktemp) || exit 2 src/sort --parallel=2 -S 100k in > $t || exit 3 cmp $t $exp_output || exit 4 rm -f $t exit 0 EOF chmod a+x sort-test i.e., by running that little script via GNU parallel: export TMPDIR=/dev/shm/z; mkdir $TMPDIR gensort -a 1000 in; sort in > exp; seq 10000 \ | parallel --eta -j $(($(nproc)/2)) ./sort-test exp This shows how many failed of the 10,000: ls -1 $TMPDIR/tmp.*|wc -l : I'm not suggesting to use GNU parallel in the test, but if you have a few spare cores or systems, it does make it easier to run large parallel tests in an attempt to find something more reliable. I noticed that decreasing the input size to 1000 lines had little effect on the number of failures, but obviously makes the test less expensive. > Or if you have a better one in mind, please let me know. > > There are also some test cases I need to add for the > (unrelated) sort-compression bug, which is next on my > list of coreutils bugs to look at. It would be great to fix that for 8.8, too, but don't worry if you don't get to it soon. I can always make an 8.9 not long afterward. Perhaps very few people use --compress-program=PROG, or this is due to a latent problem that is showing up only now. From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Sun, 12 Dec 2010 21:37:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jim Meyering Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129218980120584 (code B ref -1); Sun, 12 Dec 2010 21:37:02 +0000 Received: (at submit) by debbugs.gnu.org; 12 Dec 2010 21:36:41 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRtaz-0005Lu-2J for submit@debbugs.gnu.org; Sun, 12 Dec 2010 16:36:41 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PRtav-0005LR-UQ for submit@debbugs.gnu.org; Sun, 12 Dec 2010 16:36:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRtgr-0006mg-RU for submit@debbugs.gnu.org; Sun, 12 Dec 2010 16:42:47 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:39666) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRtgr-0006mQ-CL for submit@debbugs.gnu.org; Sun, 12 Dec 2010 16:42:45 -0500 Received: from [140.186.70.92] (port=43098 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PRtgp-0005Zd-3d for bug-coreutils@gnu.org; Sun, 12 Dec 2010 16:42:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PRtgn-0006lM-9Y for bug-coreutils@gnu.org; Sun, 12 Dec 2010 16:42:42 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:36331) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PRtgm-0006l5-Te; Sun, 12 Dec 2010 16:42:41 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id B0E1739E80FF; Sun, 12 Dec 2010 13:42:37 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Meke9gSKkpHL; Sun, 12 Dec 2010 13:42:36 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 32F5F39E80F7; Sun, 12 Dec 2010 13:42:36 -0800 (PST) Message-ID: <4D0541C7.3050508@cs.ucla.edu> Date: Sun, 12 Dec 2010 13:42:31 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 References: <4CEFF613.8060200@linuxfromscratch.org> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> <8762v04lcy.fsf@meyering.net> <4D03D870.9050807@cs.ucla.edu> <874oajyp64.fsf@meyering.net> In-Reply-To: <874oajyp64.fsf@meyering.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -4.6 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.6 (----) On 12/12/2010 07:41 AM, Jim Meyering wrote: > That sounds good, assuming it triggers the bug reliably for you. > I was hoping to find a way to reproduce it without relying on gensort, > but won't object if you want to do that. In my attempts to reproduce the problem, it's pretty flaky. I think it depends on how busy the operating system is. Sometimes I'd get failures all the time; sometimes, almost never. (This was with valgrind; I had much less luck without valgrind.) Anyway, I pushed this, which seemed to work well enough on my host. It prefers gensort if available, but falls back on seq+shuf if not. >From 63d1b425976ccc0b89159d743e33eb5da634de3c Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 12 Dec 2010 13:38:19 -0800 Subject: [PATCH] tests: test for access to stale thread memory * tests/misc/sort-stale-thread-mem: New tests. * tests/Makefile.am (TESTS): Add it. --- tests/Makefile.am | 1 + tests/misc/sort-stale-thread-mem | 44 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 0 deletions(-) create mode 100755 tests/misc/sort-stale-thread-mem diff --git a/tests/Makefile.am b/tests/Makefile.am index b573061..f7a8af8 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -238,6 +238,7 @@ TESTS = \ misc/sort-month \ misc/sort-rand \ misc/sort-spinlock-abuse \ + misc/sort-stale-thread-mem \ misc/sort-unique \ misc/sort-unique-segv \ misc/sort-version \ diff --git a/tests/misc/sort-stale-thread-mem b/tests/misc/sort-stale-thread-mem new file mode 100755 index 0000000..c4f4fcb --- /dev/null +++ b/tests/misc/sort-stale-thread-mem @@ -0,0 +1,44 @@ +#!/bin/sh +# Trigger a bug that would cause 'sort' to reference stale thread stack memory. + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# written by Jim Meyering and Paul Eggert + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +print_ver_ sort + +expensive_ + +valgrind --help >/dev/null || skip_ "requires valgrind" +test "$(nproc)" = 1 && skip_ "requires a multi-core system" + +# gensort output seems to trigger the failure more often, +# so prefer gensort if it is available. +(gensort -a 10000 in) 2>/dev/null || + seq -f %-98f 10000 | shuf > in || + framework_failure_ + +# With the bug, 'sort' would fail under valgrind about half the time, +# on some circa-2010 multicore Linux platforms. Run the test 10 times +# so that the probability of missing the bug should be about 1 in +# 2**100 on these hosts. +fail=0 +for i in $(seq 100); do + valgrind --quiet --error-exitcode=3 \ + sort -S 100K --parallel=2 in > /dev/null || + { fail=$?; echo iteration $i failed; Exit $fail; } +done -- 1.7.2 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Mon, 13 Dec 2010 07:25:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129222508515351 (code B ref -1); Mon, 13 Dec 2010 07:25:02 +0000 Received: (at submit) by debbugs.gnu.org; 13 Dec 2010 07:24:45 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PS2m4-0003zW-IV for submit@debbugs.gnu.org; Mon, 13 Dec 2010 02:24:45 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PS2m1-0003zH-4I for submit@debbugs.gnu.org; Mon, 13 Dec 2010 02:24:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PS2ry-0003hK-9u for submit@debbugs.gnu.org; Mon, 13 Dec 2010 02:30:51 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:50735) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PS2ry-0003hG-6x for submit@debbugs.gnu.org; Mon, 13 Dec 2010 02:30:50 -0500 Received: from [140.186.70.92] (port=43551 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PS2rw-0005Oz-Sh for bug-coreutils@gnu.org; Mon, 13 Dec 2010 02:30:49 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PS2rv-0003gl-1k for bug-coreutils@gnu.org; Mon, 13 Dec 2010 02:30:48 -0500 Received: from mx.meyering.net ([82.230.74.64]:52998) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PS2ru-0003gO-Fd; Mon, 13 Dec 2010 02:30:47 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 9958A60102; Mon, 13 Dec 2010 08:30:44 +0100 (CET) From: Jim Meyering In-Reply-To: <4D0541C7.3050508@cs.ucla.edu> (Paul Eggert's message of "Sun, 12 Dec 2010 13:42:31 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> <8762v04lcy.fsf@meyering.net> <4D03D870.9050807@cs.ucla.edu> <874oajyp64.fsf@meyering.net> <4D0541C7.3050508@cs.ucla.edu> Date: Mon, 13 Dec 2010 08:30:44 +0100 Message-ID: <87r5dmdt97.fsf@meyering.net> Lines: 119 MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.7 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) Paul Eggert wrote: > On 12/12/2010 07:41 AM, Jim Meyering wrote: >> That sounds good, assuming it triggers the bug reliably for you. >> I was hoping to find a way to reproduce it without relying on gensort, >> but won't object if you want to do that. > > In my attempts to reproduce the problem, it's pretty flaky. > I think it depends on how busy the operating system is. > Sometimes I'd get failures all the time; sometimes, almost > never. (This was with valgrind; I had much less luck without > valgrind.) > > Anyway, I pushed this, which seemed to work well enough > on my host. It prefers gensort if available, but falls > back on seq+shuf if not. > > Subject: [PATCH] tests: test for access to stale thread memory > > * tests/misc/sort-stale-thread-mem: New tests. > * tests/Makefile.am (TESTS): Add it. Thank you! I did this, too: >From 8351407f874ab3d6fc0830e78a6c234bf1583e3f Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Mon, 13 Dec 2010 08:07:25 +0100 Subject: [PATCH 1/2] tests: mark new test as very expensive * tests/misc/sort-stale-thread-mem: Don't initialize fail=0 here; that is done in init.sh. This avoids a syntax-check failure. Invoke "Exit $fail" at end, too. Mark as a very expensive test. --- tests/misc/sort-stale-thread-mem | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/tests/misc/sort-stale-thread-mem b/tests/misc/sort-stale-thread-mem index c4f4fcb..8ad60ed 100755 --- a/tests/misc/sort-stale-thread-mem +++ b/tests/misc/sort-stale-thread-mem @@ -21,7 +21,7 @@ . "${srcdir=.}/init.sh"; path_prepend_ ../src print_ver_ sort -expensive_ +very_expensive_ valgrind --help >/dev/null || skip_ "requires valgrind" test "$(nproc)" = 1 && skip_ "requires a multi-core system" @@ -36,9 +36,10 @@ test "$(nproc)" = 1 && skip_ "requires a multi-core system" # on some circa-2010 multicore Linux platforms. Run the test 10 times # so that the probability of missing the bug should be about 1 in # 2**100 on these hosts. -fail=0 for i in $(seq 100); do valgrind --quiet --error-exitcode=3 \ sort -S 100K --parallel=2 in > /dev/null || { fail=$?; echo iteration $i failed; Exit $fail; } done + +Exit $fail -- 1.7.3.3.38.gc6d05 >From 0c70708db7ed32d2b379116dc6bf64f07539aaf1 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Mon, 13 Dec 2010 08:19:12 +0100 Subject: [PATCH 2/2] tests: tweak basic-1 to use warn_ rather than literal "exit 77" * tests/install/basic-1 (just_built_dd): Use warn_, rather than cat and exit 77. --- tests/install/basic-1 | 24 ++++++------------------ 1 files changed, 6 insertions(+), 18 deletions(-) diff --git a/tests/install/basic-1 b/tests/install/basic-1 index 5e07bab..3c45c2a 100755 --- a/tests/install/basic-1 +++ b/tests/install/basic-1 @@ -39,28 +39,16 @@ dd2=dd2$EXEEXT just_built_dd=$abs_top_builddir/src/$dd -test -r "$just_built_dd" || \ - { - cat 1>&2 <&2 < Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Mon, 13 Dec 2010 17:59:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jim Meyering Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129226310511507 (code B ref -1); Mon, 13 Dec 2010 17:59:01 +0000 Received: (at submit) by debbugs.gnu.org; 13 Dec 2010 17:58:25 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PSCfI-0002zX-7T for submit@debbugs.gnu.org; Mon, 13 Dec 2010 12:58:24 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PSCfF-0002zK-J9 for submit@debbugs.gnu.org; Mon, 13 Dec 2010 12:58:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PSClE-0001kH-0H for submit@debbugs.gnu.org; Mon, 13 Dec 2010 13:04:32 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:40877) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PSClD-0001kB-SR for submit@debbugs.gnu.org; Mon, 13 Dec 2010 13:04:31 -0500 Received: from [140.186.70.92] (port=35179 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PSClC-0001Ry-O7 for bug-coreutils@gnu.org; Mon, 13 Dec 2010 13:04:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PSClA-0001jc-Rj for bug-coreutils@gnu.org; Mon, 13 Dec 2010 13:04:30 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:41895) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PSClA-0001jJ-Im; Mon, 13 Dec 2010 13:04:28 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 403D039E80F0; Mon, 13 Dec 2010 10:04:27 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h0Q3n9BVaR1m; Mon, 13 Dec 2010 10:04:26 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 3CE7139E80DF; Mon, 13 Dec 2010 10:04:26 -0800 (PST) Message-ID: <4D066029.6070706@cs.ucla.edu> Date: Mon, 13 Dec 2010 10:04:25 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 References: <4CEFF613.8060200@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> <8762v04lcy.fsf@meyering.net> <4D03D870.9050807@cs.ucla.edu> <874oajyp64.fsf@meyering.net> <4D0541C7.3050508@cs.ucla.edu> <87r5dmdt97.fsf@meyering.net> In-Reply-To: <87r5dmdt97.fsf@meyering.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) My recent patch had a typo in a comment, which I fixed as follows: >From 7e9599422e85be01dfceecf1f38ff2c2952a3f61 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Mon, 13 Dec 2010 10:02:06 -0800 Subject: [PATCH] tests: typo fix * tests/misc/sort-stale-thread-mem: Fix typo in comment. --- tests/misc/sort-stale-thread-mem | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tests/misc/sort-stale-thread-mem b/tests/misc/sort-stale-thread-mem index 8ad60ed..2955e22 100755 --- a/tests/misc/sort-stale-thread-mem +++ b/tests/misc/sort-stale-thread-mem @@ -33,7 +33,7 @@ test "$(nproc)" = 1 && skip_ "requires a multi-core system" framework_failure_ # With the bug, 'sort' would fail under valgrind about half the time, -# on some circa-2010 multicore Linux platforms. Run the test 10 times +# on some circa-2010 multicore Linux platforms. Run the test 100 times # so that the probability of missing the bug should be about 1 in # 2**100 on these hosts. for i in $(seq 100); do -- 1.7.2 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: Paul Eggert Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 16 Dec 2010 22:01:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Jim Meyering Cc: Chen Guo , bug-coreutils@gnu.org, DJ Lucas , coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.129253680620343 (code B ref -1); Thu, 16 Dec 2010 22:01:02 +0000 Received: (at submit) by debbugs.gnu.org; 16 Dec 2010 22:00:06 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PTLrm-0005I4-J6 for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:00:03 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PTLrj-0005Ha-IT for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:00:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PTLxp-0002rL-4s for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:06:19 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:43835) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PTLxo-0002rG-SA for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:06:17 -0500 Received: from [140.186.70.92] (port=57880 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PTLxk-0002Nx-Hs for bug-coreutils@gnu.org; Thu, 16 Dec 2010 17:06:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PTLxh-0002qd-Nl for bug-coreutils@gnu.org; Thu, 16 Dec 2010 17:06:12 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:35248) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PTLxh-0002qQ-2w; Thu, 16 Dec 2010 17:06:09 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id C88DF39E80DB; Thu, 16 Dec 2010 14:06:05 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ey6yvMUxWrst; Thu, 16 Dec 2010 14:06:02 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 7D3E739E80E1; Thu, 16 Dec 2010 14:06:02 -0800 (PST) Message-ID: <4D0A8D49.6070503@cs.ucla.edu> Date: Thu, 16 Dec 2010 14:06:01 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101208 Thunderbird/3.1.7 MIME-Version: 1.0 References: <4CEFF613.8060200@linuxfromscratch.org> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> <8762v04lcy.fsf@meyering.net> <4D03D870.9050807@cs.ucla.edu> <874oajyp64.fsf@meyering.net> In-Reply-To: <874oajyp64.fsf@meyering.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) On 12/12/10 07:41, Jim Meyering wrote: > Paul Eggert wrote: >> There are also some test cases I need to add for the >> (unrelated) sort-compression bug, which is next on my >> list of coreutils bugs to look at. > > It would be great to fix that for 8.8, too, > but don't worry if you don't get to it soon. I've fixed that now, with the patch enclosed at the end of this message. I was trying just to fix bug #2 mentioned in . But I discovered that the patch also fixes bug #3 listed there, as that bug was also present in the old reference-count implementation of waiting for subprocesses. Bug #1 was fixed a day or so ago, so, as far as I know, all the 'sort' bugs we've discussed in the last month or so are fixed now. The new test takes up to 28 minutes when it fails, so I marked it as very expensive (I don't know what the boundary between "expensive" and "very expensive" is, but I'm kind of an impatient guy....). >From 10b004660cec5470aa685c4ab327994d14eafeab Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 16 Dec 2010 13:55:13 -0800 Subject: [PATCH] sort: fix hang with sort --compress * NEWS: Document this. * src/sort.c (UNCOMPRESSED, UNREAPED, REAPED): New constants. (struct tempnode): New member 'state', to hold these constants. The pid member is now undefined if state == UNCOMPRESSED. (struct sortfile): Replace member 'pid' with member 'temp'. (uintptr): Remove. (proctab_hasher, proctab_comparator, register_proc, delete_proc): Proctab entries are now struct tempnode *, not pid_t, to handle the case where multiple tempnode objects correspond to the same pid. This avoids a race condition that can cause a hang. (register_proc): Arg is now struct tempnode *, not pid_t. All callers changed. (delete_proc): Set tempnode state to REAPED. (create_temp_file): No need to set pid member here; it's now done when the pid is known. (maybe_create_temp, create_temp): Remove PPID arg. Return struct tempnode *, not char *. All callers changed. (maybe_create_temp): Set node state to UNCOMPRESSED or UNREAPED. No need to set node->pid to 0. (open_temp): Replace NAME and PID args with a single TEMP arg. All callers changed. Wait only for unreaped children. (zaptemp): Wait for decompressor to finish before removing its temporary-file input. This avoids .nfsXXXX hassles with NFS and fixes a race (leading to a hang) regardless of NFS. (open_input_files): Adjust to new way of dealing with temp files and their subprocesses. * tests/Makefile.am (TESTS): Add misc/sort-compress-hang. * tests/misc/sort-compress-hang: New file. --- NEWS | 3 +- src/sort.c | 159 +++++++++++++++++++++-------------------- tests/Makefile.am | 1 + tests/misc/sort-compress-hang | 53 ++++++++++++++ 4 files changed, 136 insertions(+), 80 deletions(-) create mode 100755 tests/misc/sort-compress-hang diff --git a/NEWS b/NEWS index 429a1b7..a69ef54 100644 --- a/NEWS +++ b/NEWS @@ -21,7 +21,8 @@ GNU coreutils NEWS -*- outline -*- sort with at least two threads no longer segfaults due to use of pointers into the stack of an expired thread. [bug introduced in coreutils-8.6] - sort --compress no longer mishandles subprocesses' exit statuses. + sort --compress no longer mishandles subprocesses' exit statuses, + and no longer hangs indefinitely due to a bug in waiting for subprocesses. sort -m -o f f ... f no longer dumps core when file descriptors are limited. diff --git a/src/sort.c b/src/sort.c index f53e64d..6bce49b 100644 --- a/src/sort.c +++ b/src/sort.c @@ -610,32 +610,33 @@ cs_leave (struct cs_status status) } } +/* Possible states for a temp file. If compressed, the file's status + is unreaped or reaped, depending on whether 'sort' has waited for + the subprocess to finish. */ +enum { UNCOMPRESSED, UNREAPED, REAPED }; + /* The list of temporary files. */ struct tempnode { struct tempnode *volatile next; - pid_t pid; /* If compressed, the pid of compressor, else zero */ + pid_t pid; /* The subprocess PID; undefined if state == UNCOMPRESSED. */ + char state; char name[1]; /* Actual size is 1 + file name length. */ }; static struct tempnode *volatile temphead; static struct tempnode *volatile *temptail = &temphead; +/* A file to be sorted. */ struct sortfile { + /* The file's name. */ char const *name; - pid_t pid; /* If compressed, the pid of compressor, else zero */ -}; -/* An integer that is the same size as a pointer. To avoid GCC warnings, - cast from void * to this type rather than directly to pid_t. */ -#ifdef UINTPTR_MAX -typedef uintptr_t uintptr; -#else -typedef size_t uintptr; -#endif -verify (sizeof (pid_t) <= sizeof (uintptr)); + /* Nonnull if this is a temporary file, in which case NAME == TEMP->name. */ + struct tempnode *temp; +}; -/* IDs of unreaped compression and decompression subprocesses. */ +/* Map PIDs of unreaped subprocesses to their struct tempnode objects. */ static Hash_table *proctab; enum { INIT_PROCTAB_SIZE = 47 }; @@ -643,16 +644,16 @@ enum { INIT_PROCTAB_SIZE = 47 }; static size_t proctab_hasher (void const *entry, size_t tabsize) { - pid_t pid = (uintptr) entry; - return pid % tabsize; + struct tempnode const *node = entry; + return node->pid % tabsize; } static bool proctab_comparator (void const *e1, void const *e2) { - pid_t p1 = (uintptr) e1; - pid_t p2 = (uintptr) e2; - return p1 == p2; + struct tempnode const *n1 = e1; + struct tempnode const *n2 = e2; + return n1->pid == n2->pid; } /* The number of unreaped child processes. */ @@ -690,11 +691,11 @@ reap (pid_t pid) return cpid; } -/* Add PID to the process table. Create the process table the first - time it's called. */ +/* TEMP represents a new process; add it to the process table. Create + the process table the first time it's called. */ static void -register_proc (pid_t pid) +register_proc (struct tempnode *temp) { if (! proctab) { @@ -706,8 +707,9 @@ register_proc (pid_t pid) xalloc_die (); } - uintptr p = pid; - if (! hash_insert (proctab, (void *) p)) + temp->state = UNREAPED; + + if (! hash_insert (proctab, temp)) xalloc_die (); } @@ -717,8 +719,14 @@ register_proc (pid_t pid) static bool delete_proc (pid_t pid) { - uintptr p = pid; - return !! hash_delete (proctab, (void *) p); + struct tempnode test; + + test.pid = pid; + struct tempnode *node = hash_delete (proctab, &test); + if (! node) + return false; + node->state = REAPED; + return true; } /* Remove PID from the process table, and wait for it to exit if it @@ -802,7 +810,6 @@ create_temp_file (int *pfd, bool survive_fd_exhaustion) memcpy (file, temp_dir, len); memcpy (file + len, slashbase, sizeof slashbase); node->next = NULL; - node->pid = 0; if (++temp_dir_index == temp_dir_count) temp_dir_index = 0; @@ -1010,23 +1017,21 @@ pipe_fork (int pipefds[2], size_t tries) #endif } -/* Create a temporary file and start a compression program to filter output - to that file. Set *PFP to the file handle and if PPID is non-NULL, - set *PPID to the PID of the newly-created process. If the creation +/* Create a temporary file and, if asked for, start a compressor + to that file. Set *PFP to the file handle and return + the address of the new temp node. If the creation fails, return NULL if the failure is due to file descriptor exhaustion and SURVIVE_FD_EXHAUSTION; otherwise, die. */ -static char * -maybe_create_temp (FILE **pfp, pid_t *ppid, bool survive_fd_exhaustion) +static struct tempnode * +maybe_create_temp (FILE **pfp, bool survive_fd_exhaustion) { int tempfd; struct tempnode *node = create_temp_file (&tempfd, survive_fd_exhaustion); - char *name; - if (! node) return NULL; - name = node->name; + node->state = UNCOMPRESSED; if (compress_program) { @@ -1039,7 +1044,7 @@ maybe_create_temp (FILE **pfp, pid_t *ppid, bool survive_fd_exhaustion) close (pipefds[0]); tempfd = pipefds[1]; - register_proc (node->pid); + register_proc (node); } else if (node->pid == 0) { @@ -1053,45 +1058,40 @@ maybe_create_temp (FILE **pfp, pid_t *ppid, bool survive_fd_exhaustion) error (SORT_FAILURE, errno, _("couldn't execute %s"), compress_program); } - else - node->pid = 0; } *pfp = fdopen (tempfd, "w"); if (! *pfp) - die (_("couldn't create temporary file"), name); - - if (ppid) - *ppid = node->pid; + die (_("couldn't create temporary file"), node->name); - return name; + return node; } -/* Create a temporary file and start a compression program to filter output - to that file. Set *PFP to the file handle and if *PPID is non-NULL, - set it to the PID of the newly-created process. Die on failure. */ +/* Create a temporary file and, if asked for, start a compressor + to that file. Set *PFP to the file handle and return the address + of the new temp node. Die on failure. */ -static char * -create_temp (FILE **pfp, pid_t *ppid) +static struct tempnode * +create_temp (FILE **pfp) { - return maybe_create_temp (pfp, ppid, false); + return maybe_create_temp (pfp, false); } /* Open a compressed temp file and start a decompression process through - which to filter the input. PID must be the valid processes ID of the - process used to compress the file. Return NULL (setting errno to + which to filter the input. Return NULL (setting errno to EMFILE) if we ran out of file descriptors, and die on any other kind of failure. */ static FILE * -open_temp (char const *name, pid_t pid) +open_temp (struct tempnode *temp) { int tempfd, pipefds[2]; FILE *fp = NULL; - wait_proc (pid); + if (temp->state == UNREAPED) + wait_proc (temp->pid); - tempfd = open (name, O_RDONLY); + tempfd = open (temp->name, O_RDONLY); if (tempfd < 0) return NULL; @@ -1119,7 +1119,8 @@ open_temp (char const *name, pid_t pid) compress_program); default: - register_proc (child); + temp->pid = child; + register_proc (temp); close (tempfd); close (pipefds[1]); @@ -1161,6 +1162,9 @@ zaptemp (char const *name) for (pnode = &temphead; (node = *pnode)->name != name; pnode = &node->next) continue; + if (node->state == UNREAPED) + wait_proc (node->pid); + /* Unlink the temporary file in a critical section to avoid races. */ next = node->next; cs = cs_enter (); @@ -2773,8 +2777,8 @@ open_input_files (struct sortfile *files, size_t nfiles, FILE ***pfps) /* Open as many input files as we can. */ for (i = 0; i < nfiles; i++) { - fps[i] = (files[i].pid - ? open_temp (files[i].name, files[i].pid) + fps[i] = (files[i].temp && files[i].temp->state != UNCOMPRESSED + ? open_temp (files[i].temp) : stream_open (files[i].name, "r")); if (!fps[i]) break; @@ -3573,8 +3577,7 @@ avoid_trashing_input (struct sortfile *files, size_t ntemps, size_t i; bool got_outstat = false; struct stat outstat; - char const *tempcopy = NULL; - pid_t pid IF_LINT (= 0); + struct tempnode *tempcopy = NULL; for (i = ntemps; i < nfiles; i++) { @@ -3608,12 +3611,12 @@ avoid_trashing_input (struct sortfile *files, size_t ntemps, if (! tempcopy) { FILE *tftp; - tempcopy = create_temp (&tftp, &pid); - mergefiles (&files[i], 0, 1, tftp, tempcopy); + tempcopy = create_temp (&tftp); + mergefiles (&files[i], 0, 1, tftp, tempcopy->name); } - files[i].name = tempcopy; - files[i].pid = pid; + files[i].name = tempcopy->name; + files[i].temp = tempcopy; } } } @@ -3648,13 +3651,12 @@ merge (struct sortfile *files, size_t ntemps, size_t nfiles, for (out = in = 0; nmerge <= nfiles - in; out++) { FILE *tfp; - pid_t pid; - char *temp = create_temp (&tfp, &pid); + struct tempnode *temp = create_temp (&tfp); size_t num_merged = mergefiles (&files[in], MIN (ntemps, nmerge), - nmerge, tfp, temp); + nmerge, tfp, temp->name); ntemps -= MIN (ntemps, num_merged); - files[out].name = temp; - files[out].pid = pid; + files[out].name = temp->name; + files[out].temp = temp; in += num_merged; } @@ -3668,13 +3670,12 @@ merge (struct sortfile *files, size_t ntemps, size_t nfiles, files as possible, to avoid needless I/O. */ size_t nshortmerge = remainder - cheap_slots + 1; FILE *tfp; - pid_t pid; - char *temp = create_temp (&tfp, &pid); + struct tempnode *temp = create_temp (&tfp); size_t num_merged = mergefiles (&files[in], MIN (ntemps, nshortmerge), - nshortmerge, tfp, temp); + nshortmerge, tfp, temp->name); ntemps -= MIN (ntemps, num_merged); - files[out].name = temp; - files[out++].pid = pid; + files[out].name = temp->name; + files[out++].temp = temp; in += num_merged; } @@ -3717,21 +3718,21 @@ merge (struct sortfile *files, size_t ntemps, size_t nfiles, (e.g., some other process could open a file between the time we closed and tried to create). */ FILE *tfp; - pid_t pid; - char *temp; + struct tempnode *temp; do { nopened--; xfclose (fps[nopened], files[nopened].name); - temp = maybe_create_temp (&tfp, &pid, ! (nopened <= 2)); + temp = maybe_create_temp (&tfp, ! (nopened <= 2)); } while (!temp); /* Merge into the newly allocated temporary. */ - mergefps (&files[0], MIN (ntemps, nopened), nopened, tfp, temp, fps); + mergefps (&files[0], MIN (ntemps, nopened), nopened, tfp, temp->name, + fps); ntemps -= MIN (ntemps, nopened); - files[0].name = temp; - files[0].pid = pid; + files[0].name = temp->name; + files[0].temp = temp; memmove (&files[1], &files[nopened], (nfiles - nopened) * sizeof *files); ntemps++; @@ -3807,7 +3808,7 @@ sort (char *const *files, size_t nfiles, char const *output_file, else { ++ntemps; - temp_output = create_temp (&tfp, NULL); + temp_output = create_temp (&tfp)->name; } if (1 < buf.nlines) { @@ -3849,7 +3850,7 @@ sort (char *const *files, size_t nfiles, char const *output_file, for (i = 0; node; i++) { tempfiles[i].name = node->name; - tempfiles[i].pid = node->pid; + tempfiles[i].temp = node; node = node->next; } merge (tempfiles, ntemps, ntemps, output_file); diff --git a/tests/Makefile.am b/tests/Makefile.am index de06704..06d81f0 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -228,6 +228,7 @@ TESTS = \ misc/sort \ misc/sort-benchmark-random \ misc/sort-compress \ + misc/sort-compress-hang \ misc/sort-compress-proc \ misc/sort-continue \ misc/sort-debug-keys \ diff --git a/tests/misc/sort-compress-hang b/tests/misc/sort-compress-hang new file mode 100755 index 0000000..a536b1f --- /dev/null +++ b/tests/misc/sort-compress-hang @@ -0,0 +1,53 @@ +#!/bin/sh +# Test for sort --compress hang + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +print_ver_ sort +very_expensive_ + +cat <<\EOF >compress || framework_failure +#!/bin/sh +tr 41 14 || exit +touch ok +EOF + +chmod +x compress + +seq -w 200000 > exp || fail=1 +tac exp > in || fail=1 + +# When the bug occurs, 'sort' hangs forever. When it doesn't occur, +# 'sort' could be running slowly on an overburdened machine. +# On a circa-2010 Linux server using NFS, a successful test completes +# in about 170 seconds, so specify 1700 seconds as a safety margin. +timeout 1700 sort --compress-program=./compress -S 1k in > out || fail=1 + +compare exp out || fail=1 +test -f ok || fail=1 +rm -f compress ok + +# If $TMPDIR is relative, give subprocesses time to react when 'sort' exits. +# Otherwise, under NFS, when 'sort' unlinks the temp files and they +# are renamed to .nfsXXXX instead of being removed, the parent cleanup +# of this directory will fail because the files are still open. +case $TMPDIR in +/*) ;; +*) sleep 1;; +esac + +Exit $fail -- 1.7.2 From unknown Tue Aug 19 22:00:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7597: multi-threaded sort can segfault (unrelated to the sort -u segfault) Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 16 Dec 2010 22:32:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7597 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: Chen Guo , coreutils@gnu.org, bug-coreutils@gnu.org, Jim Meyering , DJ Lucas Received: via spool by submit@debbugs.gnu.org id=B.129253871522988 (code B ref -1); Thu, 16 Dec 2010 22:32:02 +0000 Received: (at submit) by debbugs.gnu.org; 16 Dec 2010 22:31:55 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PTMMc-0005yj-6O for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:31:54 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PTMMa-0005yX-C2 for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:31:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PTMSh-0007wv-7y for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:38:12 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:60581) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PTMSh-0007wl-5l for submit@debbugs.gnu.org; Thu, 16 Dec 2010 17:38:11 -0500 Received: from [140.186.70.92] (port=51459 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PTMSg-0004ex-3N for bug-coreutils@gnu.org; Thu, 16 Dec 2010 17:38:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PTMSe-0007wK-L3 for bug-coreutils@gnu.org; Thu, 16 Dec 2010 17:38:09 -0500 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]:18937) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1PTMSe-0007w9-Au for bug-coreutils@gnu.org; Thu, 16 Dec 2010 17:38:08 -0500 Received: (qmail 92396 invoked from network); 16 Dec 2010 22:38:05 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 16 Dec 2010 22:38:05 -0000 Message-ID: <4D0A9417.8050008@draigBrady.com> Date: Thu, 16 Dec 2010 22:35:03 +0000 From: =?UTF-8?Q?P=C3=A1draig?= Brady User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 References: <4CEFF613.8060200@linuxfromscratch.org> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> <8762v39o8t.fsf_-_@meyering.net> <4D011F54.3050403@cs.ucla.edu> <4D0337C8.1050005@cs.ucla.edu> <8762v04lcy.fsf@meyering.net> <4D03D870.9050807@cs.ucla.edu> <874oajyp64.fsf@meyering.net> <4D0A8D49.6070503@cs.ucla.edu> In-Reply-To: <4D0A8D49.6070503@cs.ucla.edu> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: FreeBSD 4.6-4.9 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -4.7 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.7 (----) On 16/12/10 22:06, Paul Eggert wrote: > On 12/12/10 07:41, Jim Meyering wrote: >> Paul Eggert wrote: >>> There are also some test cases I need to add for the >>> (unrelated) sort-compression bug, which is next on my >>> list of coreutils bugs to look at. >> >> It would be great to fix that for 8.8, too, >> but don't worry if you don't get to it soon. > > I've fixed that now, with the patch enclosed >From a quick inspection, it looks good! All sort tests pass on 32 bit. $ time (cd tests && make check TESTS=misc/sort-compress-hang RUN_VERY_EXPENSIVE_TESTS=yes VERBOSE=yes) PASS: misc/sort-compress-hang real 1m53.543s user 0m18.475s sys 1m5.758s cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Thu Oct 11 18:41:23 2018 Received: (at control) by debbugs.gnu.org; 11 Oct 2018 22:41:23 +0000 Received: from localhost ([127.0.0.1]:45718 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gAjdz-0005AC-3c for submit@debbugs.gnu.org; Thu, 11 Oct 2018 18:41:23 -0400 Received: from mail-pl1-f174.google.com ([209.85.214.174]:37739) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gAjdx-00059z-2T for control@debbugs.gnu.org; Thu, 11 Oct 2018 18:41:21 -0400 Received: by mail-pl1-f174.google.com with SMTP id u6-v6so2169559plz.4 for ; Thu, 11 Oct 2018 15:41:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=uJKUdpAUHaUoHWmSrSdjOngBf5819XF15F7yXY2d6yk=; b=Ted2UJ0CXKcyLJ9bRmBhAY4LP/nopOPhaN8F3uo0Mc0fpqWCTMn3Mi6Qddn4kHDuv0 0EuMijAhVzTFETDOPm4uSRrh/wa3wvrQDCo79ZC6JtQ6AJfOHwxh3U8MnDfpjHplDrlz odoviTjdXT4MwzbkJ1hjTb3cQv/i7ENEUA/8Dut2AR/FQit/c5tZd3iyTRoZ8fwJGEB9 juw271JIkKxhmxCh0Nihp2zUoRJlEQg7UdTH1zsJ6yGa0om1OBD5sClO2deWvL78DiFI mQ2Ja0Zn++QMkLP5p2AmgulerKIDMpYexray475QFTbzHNe1Lu0D+dWyGDASymOtg57g uy/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=uJKUdpAUHaUoHWmSrSdjOngBf5819XF15F7yXY2d6yk=; b=Cu/4J3Ih6JT9tYmOYE1wbmFrHTJdy/lW/hvP0+tiPVs/9vj8pp2o5Ft0bocu2CsIAp v3VvSGCnUdvxX3+dhToQk0+LrfaPZcyb0KeZTjzOkkhu7vL12BIS85OthjEKcXsC5Xdo 2FGdazuROmUaLHgdbIG0VzJbyzEz5sZgTt/UHI0v7VlPfhvpgnI/+yXgGR0jft5S27Ss HUopiQOgp3IzyuDnNdHFSt2DMrWvb+h4DEUOyBflUJfDRwmsJqsFlbD/RpGh5jQMNUq7 /ShX8CKPnoR90Shj0GNn1EwIaRepIxe63p8MnU29GrEQeQMxOW9Rv6kGfHE0bze38/kM 74JQ== X-Gm-Message-State: ABuFfojfKoYWdznwia8ldeKWKGu6vqC/fSZKp3xvrQHzX48JoeDl3/v2 bs+3Zsp2wJAj7nS8nm+0K3fxRhar X-Google-Smtp-Source: ACcGV615390budogDdJ4QyFlFPD6w3s+H4at71Ss2fiEnDUzrvW9sskA0n/LxN0WITy1mM9Ig48daw== X-Received: by 2002:a17:902:6e17:: with SMTP id u23-v6mr3441305plk.28.1539297674702; Thu, 11 Oct 2018 15:41:14 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id x23-v6sm16505975pfh.56.2018.10.11.15.41.12 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Oct 2018 15:41:13 -0700 (PDT) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: <82dcd5cf-e31a-1a83-8aa9-fedba61edde7@gmail.com> Date: Thu, 11 Oct 2018 16:41:11 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: close 10436 tags 10376 notabug close 10376 close 10430 [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.214.174 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.174 listed in wl.mailspike.net] 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) close 10436 tags 10376 notabug close 10376 close 10430 tags 7597 fixed close 7597