From unknown Mon Jun 23 18:27:31 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#7489 <7489@debbugs.gnu.org> To: bug#7489 <7489@debbugs.gnu.org> Subject: Status: [coreutils] over aggressive threads in sort Reply-To: bug#7489 <7489@debbugs.gnu.org> Date: Tue, 24 Jun 2025 01:27:31 +0000 retitle 7489 [coreutils] over aggressive threads in sort reassign 7489 coreutils submitter 7489 DJ Lucas severity 7489 normal tag 7489 fixed thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 26 14:40:00 2010 Received: (at submit) by debbugs.gnu.org; 26 Nov 2010 19:40:00 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PM49H-0005mJ-Ku for submit@debbugs.gnu.org; Fri, 26 Nov 2010 14:40:00 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PM2XI-0003Yg-0Q for submit@debbugs.gnu.org; Fri, 26 Nov 2010 12:56:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PM2cX-0000bS-AI for submit@debbugs.gnu.org; Fri, 26 Nov 2010 13:02:06 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,T_DKIM_INVALID,T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:35966) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PM2cX-0000bK-7i for submit@debbugs.gnu.org; Fri, 26 Nov 2010 13:02:05 -0500 Received: from [140.186.70.92] (port=46935 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PM2cW-0004EU-Ga for bug-coreutils@gnu.org; Fri, 26 Nov 2010 13:02:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PM2cU-0000au-Q2 for bug-coreutils@gnu.org; Fri, 26 Nov 2010 13:02:03 -0500 Received: from mail-iw0-f169.google.com ([209.85.214.169]:34565) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PM2cU-0000am-HR; Fri, 26 Nov 2010 13:02:02 -0500 Received: by iwn38 with SMTP id 38so1431761iwn.0 for ; Fri, 26 Nov 2010 10:02:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:received:message-id :date:from:user-agent:mime-version:to:cc:subject:x-enigmail-version :content-type:content-transfer-encoding :x-lucasit-mailscanner-watermark:x-lucasit-mailscanner-information :x-mailscanner-id:x-lucasit-mailscanner :x-lucasit-mailscanner-spamcheck:x-lucasit-mailscanner-from :x-spam-status; bh=t1piAAJqZPMzNRQyABp3kUTnMOGvZDGFxOkY3DaSmuI=; b=Nrz0mUlH8lOkMWNX+2eHfFqB8g+zjogEGOz6w9HctrpY80W4Sn+ad0U1TeKIAYt6sI ZDCbIwhpGBtK6PqLVwASsP709cHUkhKvcTOmi3MsqAkE/Wlq6i28MN2bZMeyBEXTla6k NKYrFiIz9/iEAwKYFRB3nM5ztpgOMCMZDSSyo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :x-enigmail-version:content-type:content-transfer-encoding :x-lucasit-mailscanner-watermark:x-lucasit-mailscanner-information :x-mailscanner-id:x-lucasit-mailscanner :x-lucasit-mailscanner-spamcheck:x-lucasit-mailscanner-from :x-spam-status; b=raH07uC3IIAqTc3VZx6uYyimCDkVhlZdtr34QMN4ndBkdoR1iWqePjQhUIj3UKHqYo By4yVwwCf8TJYxYbGcxvLpT7ahTLyiPOeK8MpSKwe8RJSBJjeS8HdZLh5OySjdFSdnte /ITY2K3A9leLRHtrslu0osPxnSqT78JjLbz7Y= Received: by 10.231.169.74 with SMTP id x10mr1864569iby.65.1290794520705; Fri, 26 Nov 2010 10:02:00 -0800 (PST) Received: from postal.lucasit.com (99-62-160-221.lightspeed.stlsmo.sbcglobal.net [99.62.160.221]) by mx.google.com with ESMTPS id gy41sm2300127ibb.5.2010.11.26.10.01.59 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 26 Nov 2010 10:01:59 -0800 (PST) Received: from [192.168.143.240] (unknown [192.168.143.240]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by postal.lucasit.com (Postfix) with ESMTPSA id 09DA38158BA; Fri, 26 Nov 2010 18:01:55 +0000 (GMT) Message-ID: <4CEFF613.8060200@linuxfromscratch.org> Date: Fri, 26 Nov 2010 12:01:55 -0600 From: DJ Lucas User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.4) Gecko/20100710 Thunderbird/3.1 MIME-Version: 1.0 To: coreutils@gnu.org Subject: Re: [coreutils] over aggressive threads in sort X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-LucasIT-MailScanner-Watermark: 1291399316.12005@ZdAqLVNdwGpo/U4yAqsZNw X-LucasIT-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: 09DA38158BA.7D1D6 X-LucasIT-MailScanner: Clean X-LucasIT-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-16.8, required 2.7, autolearn=not spam, ALL_TRUSTED -1.80, BAYES_00 -15.00) X-LucasIT-MailScanner-From: dj@linuxfromscratch.org X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 26 Nov 2010 14:39:58 -0500 Cc: bug-coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) Sent too bug-coreutils too (no bug id currently AFAICT). Bug only affects multi-byte locales. Take the following samples: bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug > file && echo $? sort: using `en_US.UTF-8' sorting rules Segmentation fault bash-4.1# echo $? 139 bash-4.1# bash-4.1# zcat cracklib-words-20080507.gz | sort -u --parallel=1 --debug > file && echo $? sort: using `en_US.UTF-8' sorting rules 0 bash-4.1# bash-4.1# zcat cracklib-words-20080507.gz | LANG=C sort -u --debug > file && echo $? sort: using simple byte comparison 0 bash-4.1# bash-4.1# gzip -d cracklib-words-20080507.gz bash-4.1# sort -u --debug cracklib-words-20080507 > file && echo $? sort: using `en_US.UTF-8' sorting rules 0 bash-4.1# In the interim, for a quick and dirty hack, I've added an LC_COLLATE comparison and set nthreads to 1 in multibyte locales. Probably well known, but the test file that I used is available from: http://downloads.sourceforge.net/cracklib/cracklib-words-20080507.gz -- DJ Lucas -- This message has been scanned for viruses and dangerous content, and is believed to be clean. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 26 18:18:48 2010 Received: (at 7489) by debbugs.gnu.org; 26 Nov 2010 23:18:48 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PM7Z1-0002pa-4p for submit@debbugs.gnu.org; Fri, 26 Nov 2010 18:18:48 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PM7Yy-0002pM-J1 for 7489@debbugs.gnu.org; Fri, 26 Nov 2010 18:18:45 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 45A3D39E80FF; Fri, 26 Nov 2010 15:24:11 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xKru3lWrBjk3; Fri, 26 Nov 2010 15:24:11 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id E557439E80DB; Fri, 26 Nov 2010 15:24:10 -0800 (PST) Message-ID: <4CF04195.50202@cs.ucla.edu> Date: Fri, 26 Nov 2010 15:24:05 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: DJ Lucas Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> In-Reply-To: <4CEFF613.8060200@linuxfromscratch.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) Thanks for the bug report. Unfortunately, I cannot reproduce the problem with coreutils 8.7, either on RHEL 5.5 x86-64 or on Ubuntu 10.10 x86. Which version of coreutils are you running? And on what platform? How did you build it? Can you reproduce it with --parallel=2? If not, which value of --parallel is needed? If this is coreutils 8.7, can you get a core dump and a GDB backtrace? From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 26 21:48:58 2010 Received: (at 7489) by debbugs.gnu.org; 27 Nov 2010 02:48:58 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMAqQ-00081z-1v for submit@debbugs.gnu.org; Fri, 26 Nov 2010 21:48:58 -0500 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1PMAqN-00081n-RB for 7489@debbugs.gnu.org; Fri, 26 Nov 2010 21:48:56 -0500 Received: (qmail 2418 invoked from network); 27 Nov 2010 02:54:22 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 27 Nov 2010 02:54:22 -0000 Message-ID: <4CF07288.60902@draigBrady.com> Date: Sat, 27 Nov 2010 02:52:56 +0000 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: DJ Lucas Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> In-Reply-To: <4CEFF613.8060200@linuxfromscratch.org> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.7 (--) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.7 (--) On 26/11/10 18:01, DJ Lucas wrote: > Sent too bug-coreutils too (no bug id currently AFAICT). > > Bug only affects multi-byte locales. Take the following samples: > > > > bash-4.1# zcat cracklib-words-20080507.gz | sort -u --debug > file && > echo $? > sort: using `en_US.UTF-8' sorting rules > Segmentation fault > bash-4.1# echo $? > 139 > bash-4.1# > > > bash-4.1# zcat cracklib-words-20080507.gz | sort -u --parallel=1 > --debug > file && echo $? > sort: using `en_US.UTF-8' sorting rules > 0 > bash-4.1# > > > bash-4.1# zcat cracklib-words-20080507.gz | LANG=C sort -u --debug > > file && echo $? > sort: using simple byte comparison > 0 > bash-4.1# > > > bash-4.1# gzip -d cracklib-words-20080507.gz > bash-4.1# sort -u --debug cracklib-words-20080507 > file && echo $? > sort: using `en_US.UTF-8' sorting rules > 0 > bash-4.1# > > > In the interim, for a quick and dirty hack, I've added an LC_COLLATE > comparison and set nthreads to 1 in multibyte locales. > > Probably well known, but the test file that I used is available from: > http://downloads.sourceforge.net/cracklib/cracklib-words-20080507.gz > > -- DJ Lucas > 100% reproducible on a 32 bit F12 box (first I tried) # zcat cracklib-words-20080507.gz | ./coreutils-8.7/src/sort -u --parallel=1 > /dev/null # zcat cracklib-words-20080507.gz | ./coreutils-8.7/src/sort -u --parallel=2 > /dev/null Segmentation fault (core dumped) #0 0x00c4aecf in __strlen_ia32 () from /lib/libc.so.6 #1 0x00c4ecf1 in strcoll_l () from /lib/libc.so.6 #2 0x00c4a8b1 in strcoll () from /lib/libc.so.6 #3 0x0805682d in strcoll_loop (s1=, s1size=, s2=, s2size=1852142453) at memcoll.c:39 #4 memcoll0 (s1=, s1size=, s2=, s2size=1852142453) at memcoll.c:110 #5 0x08054263 in xmemcoll0 (s1=0xb5457008 "lauters", s1size=8, s2=0x6f626f79
, s2size=1852142453) at xmemcoll.c:71 #6 0x0804e091 in compare (a=0xb68cb558, b=0xb5c3e9b8) at sort.c:2653 #7 0x0804f320 in write_unique (line=0xb68cb558, tfp=0x9f2ebe0, temp_output=0x9f2e9c0 "/tmp/sortqd5BvL") at sort.c:3233 #8 0x0804f5b0 in mergelines_node (node=0xbfd06c0c, total_lines=822189, tfp=0x9f2ebe0, temp_output=0x9f2e9c0 "/tmp/sortqd5BvL") at sort.c:3279 #9 0x0804f8e4 in merge_loop (queue=0xbfd06ca4, total_lines=822189, tfp=0x9f2ebe0, temp_output=0x9f2e9c0 "/tmp/sortqd5BvL") at sort.c:3367 #10 0x0804fc52 in sortlines (lines=0xb7557038, dest=0xb68cb568, nthreads=1, total_lines=822189, parent=0xbfd06c0c, lo_child=true, merge_queue=0xbfd06ca4, tfp=0x9f2ebe0, temp_output=0x9f2e9c0 "/tmp/sortqd5BvL") at sort.c:3481 #11 0x0804f9a4 in sortlines_thread (data=0xbfd06be4) at sort.c:3404 #12 0x00d58ab5 in start_thread () from /lib/libpthread.so.0 #13 0x00caef1e in clone () from /lib/libc.so.6 (gdb) quit Looks like ASCII data passed on the stack for pointer and size (oboy, nesu) Hmm, seems like multiple threads are racing to update the static "saved" variable in write_unique() ? From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 26 21:54:45 2010 Received: (at 7489) by debbugs.gnu.org; 27 Nov 2010 02:54:45 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMAw1-00089o-A2 for submit@debbugs.gnu.org; Fri, 26 Nov 2010 21:54:45 -0500 Received: from mail-iw0-f172.google.com ([209.85.214.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMAw0-00089d-8Y for 7489@debbugs.gnu.org; Fri, 26 Nov 2010 21:54:44 -0500 Received: by iwn40 with SMTP id 40so3118107iwn.3 for <7489@debbugs.gnu.org>; Fri, 26 Nov 2010 19:00:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:received:message-id :date:from:user-agent:mime-version:to:cc:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-lucasit-mailscanner-watermark :x-lucasit-mailscanner-information:x-mailscanner-id :x-lucasit-mailscanner:x-lucasit-mailscanner-spamcheck :x-lucasit-mailscanner-from:x-spam-status; bh=SFvDWRtyx9BeRKSoLjvwSInOLFyBPanHt7yWOJPj5M0=; b=lPIl2T6lCdKLjN7pQkXIqhQGlEApFJwj/t8dSuh7vjgZ/I6j6Kf+tSddH+Yd3lPVxp HtoanabC2G6TrxMGKw43ctyAz5CIJRDCRjZbrPJ/HZ43mY6ipJqScl5IpYGjiTK/GvwI Ih87mOJdR+ULjCEgKvE2Jwu1eedN6di6qyFbY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-lucasit-mailscanner-watermark :x-lucasit-mailscanner-information:x-mailscanner-id :x-lucasit-mailscanner:x-lucasit-mailscanner-spamcheck :x-lucasit-mailscanner-from:x-spam-status; b=K/T6h/MOOSGKg4ogWQhhMMtRQc0lINsQ2ONiNwUbLfZ5jrSpi/MueheB3ohMlRnTLF vRFFVYa3TqlmHidOyC4wB0SLIN6fbMf4jM5V+tVi8LRNuKRX5KCm9vFI4bPZvv1rkfEN 98VjVqqG5CbKMSbR8a2SdRE3WhpznoSwusR2c= Received: by 10.42.218.134 with SMTP id hq6mr853172icb.492.1290826810994; Fri, 26 Nov 2010 19:00:10 -0800 (PST) Received: from postal.lucasit.com (99-62-160-221.lightspeed.stlsmo.sbcglobal.net [99.62.160.221]) by mx.google.com with ESMTPS id 8sm2734227iba.4.2010.11.26.19.00.09 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 26 Nov 2010 19:00:10 -0800 (PST) Received: from [192.168.143.240] (unknown [192.168.143.240]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by postal.lucasit.com (Postfix) with ESMTPSA id 2060F8158B2; Sat, 27 Nov 2010 03:00:01 +0000 (GMT) Message-ID: <4CF07431.2050509@linuxfromscratch.org> Date: Fri, 26 Nov 2010 21:00:01 -0600 From: DJ Lucas User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.4) Gecko/20100710 Thunderbird/3.1 MIME-Version: 1.0 To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF04195.50202@cs.ucla.edu> In-Reply-To: <4CF04195.50202@cs.ucla.edu> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-LucasIT-MailScanner-Watermark: 1291431605.50233@3cFf5PyJ0RFQqHV35mu6wA X-LucasIT-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: 2060F8158B2.412AE X-LucasIT-MailScanner: Clean X-LucasIT-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-16.8, required 2.7, autolearn=not spam, ALL_TRUSTED -1.80, BAYES_00 -15.00) X-LucasIT-MailScanner-From: dj@linuxfromscratch.org X-Spam-Status: No X-Spam-Score: -4.3 (----) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.7 (---) On 11/26/2010 05:24 PM, Paul Eggert wrote: > Thanks for the bug report. Unfortunately, > I cannot reproduce the problem with coreutils 8.7, either on > RHEL 5.5 x86-64 or on Ubuntu 10.10 x86. > > Which version of coreutils are you running? 8.7. Haven't tested on 8.6 or 8.5. 8.4 worked correctly, but that was expected since I can get around it if not running in multiple threads. > And on what > platform? LFS 32bit on Athlon II 250 (64bit dual core CPU/hardware). gcc-4.5.1, glibc-2.12.1, binutils-2.20.1. > How did you build it? Basic CMMI --prefix=/usr. Built both with and without i18n RH patch and Linux only uname patches. > > Can you reproduce it with --parallel=2? If not, which value > of --parallel is needed? Yes, it is reproducible with --parallel=2. Seems this is a difficult one to track down. So far we've only confirmed that it happens in multibyte locales, with 32bit build on 64bit multi-core hardware, and only from stdin. Works fine if the file is passed as an argument to sort. Haven't confirmed 64bit build had multibyte yet, but the error did not occur on the report from that user. Setting LANG/LC_ALL to C or POSIX works around it, as well as --parallel=1. I've added a quick and really ugly workaround in the interim so that I can continue and get some work done (using getenv("LC_COLLATE"), and testing only against != POSIX and != C, setting nthreads=1). Confirmed that the correct encoding is displayed with --debug with the workaround in place. If you have a similar setup laying around (32bit installation on 64bit multi-core hardware), try cat bigfile.txt | LANG=en_US.UTF-8 sort -u (or similar) > > If this is coreutils 8.7, can you get a core dump and a GDB > backtrace? > Sure, give me a bit, but looks like Pádraig Brady beat me to it. -- DJ Lucas -- This message has been scanned for viruses and dangerous content, and is believed to be clean. From debbugs-submit-bounces@debbugs.gnu.org Fri Nov 26 22:28:38 2010 Received: (at 7489) by debbugs.gnu.org; 27 Nov 2010 03:28:39 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMBSo-0000S4-2F for submit@debbugs.gnu.org; Fri, 26 Nov 2010 22:28:38 -0500 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1PMBSm-0000Rt-AA for 7489@debbugs.gnu.org; Fri, 26 Nov 2010 22:28:37 -0500 Received: (qmail 6875 invoked from network); 27 Nov 2010 03:34:03 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 27 Nov 2010 03:34:03 -0000 Message-ID: <4CF07BD5.3090203@draigBrady.com> Date: Sat, 27 Nov 2010 03:32:37 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: DJ Lucas Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF04195.50202@cs.ucla.edu> <4CF07431.2050509@linuxfromscratch.org> In-Reply-To: <4CF07431.2050509@linuxfromscratch.org> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.7 (--) X-Debbugs-Envelope-To: 7489 Cc: Paul Eggert , coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.7 (--) On 27/11/10 03:00, DJ Lucas wrote: > > Setting LANG/LC_ALL to C or > POSIX works around it, as well as --parallel=1. I've added a quick and > really ugly workaround in the interim so that I can continue and get > some work done (using getenv("LC_COLLATE"), and testing only against != > POSIX and != C, setting nthreads=1). Confirmed that the correct encoding > is displayed with --debug with the workaround in place. Best workaround I think is to leave the code alone, and set OMP_NUM_THREADS=1 in your env as otherwise you may get invalid results in any locale. cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 27 03:23:57 2010 Received: (at 7489) by debbugs.gnu.org; 27 Nov 2010 08:23:57 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMG4a-00078X-T3 for submit@debbugs.gnu.org; Sat, 27 Nov 2010 03:23:57 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMG4Z-00078M-91 for 7489@debbugs.gnu.org; Sat, 27 Nov 2010 03:23:56 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id EB7DB39E8100; Sat, 27 Nov 2010 00:29:22 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9ZrG+ftNmo6v; Sat, 27 Nov 2010 00:29:22 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 033D839E80DC; Sat, 27 Nov 2010 00:29:21 -0800 (PST) Message-ID: <4CF0C15C.9080201@cs.ucla.edu> Date: Sat, 27 Nov 2010 00:29:16 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> In-Reply-To: <4CF07288.60902@draigBrady.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 11/26/2010 06:52 PM, P=C3=A1draig Brady wrote: > Hmm, seems like multiple threads are racing to update the > static "saved" variable in write_unique() ? I don't think it's as simple as that. write_unique is generating output, and when it is run it is supposed to have exclusive access to the output file, which means it also has exclusive access to the "saved" variable. I suspect the problem is that the line structure is messed up when you have multiple threads and sufficiently large input, in that when one merge node is done and the next one is run, the "saved" variable (which points to storage managed by the earlier node) is pointing to storage that is no longer valid. I haven't had time to look into the details, though. Chen, do you have some insight into this? Here's the thread: http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00209.html I have now reproduced the bug on RHEL 5.5 x86-64, using coreutils 8.7 built with "configure CC=3D'cc -m32'". The bug is also present in the savannah git master. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 27 04:33:46 2010 Received: (at 7489) by debbugs.gnu.org; 27 Nov 2010 09:33:46 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMHA9-0000xo-QI for submit@debbugs.gnu.org; Sat, 27 Nov 2010 04:33:46 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMHA7-0000xa-Op for 7489@debbugs.gnu.org; Sat, 27 Nov 2010 04:33:44 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 509E539E8105; Sat, 27 Nov 2010 01:39:11 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id A3n-bzJJq0ZI; Sat, 27 Nov 2010 01:39:10 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id B9E2839E80DC; Sat, 27 Nov 2010 01:39:10 -0800 (PST) Message-ID: <4CF0D1BE.6070908@cs.ucla.edu> Date: Sat, 27 Nov 2010 01:39:10 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> In-Reply-To: <4CF0C15C.9080201@cs.ucla.edu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) Following up on my previous email, it appears to me that the following line in mergelines_node is weird: node->dest -= lo_orig - node->lo + hi_orig - node->hi; Surely there should be a "*" in front of that line? (This does not fix the bug; perhaps it is a different bug?) From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 27 19:52:27 2010 Received: (at 7489) by debbugs.gnu.org; 28 Nov 2010 00:52:28 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMVVD-0005XK-6x for submit@debbugs.gnu.org; Sat, 27 Nov 2010 19:52:27 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMVVA-0005X8-Ud for 7489@debbugs.gnu.org; Sat, 27 Nov 2010 19:52:26 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 6B83E39E8100; Sat, 27 Nov 2010 16:57:54 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SHTCgpdCKnJ6; Sat, 27 Nov 2010 16:57:53 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id A299239E80DB; Sat, 27 Nov 2010 16:57:53 -0800 (PST) Message-ID: <4CF1A911.1000000@cs.ucla.edu> Date: Sat, 27 Nov 2010 16:57:53 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: DJ Lucas Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> In-Reply-To: <4CF0D1BE.6070908@cs.ucla.edu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= , coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) Could you please try this little patch? It should fix your problem. I came up with this fix in my sleep (literally! I woke up this morning and the patch was in my head), but haven't had time to look at the code in this area to see if it's the best fix. Clearly there's at least one more bug as noted in my previous email but I expect it's less likely to fire. diff --git a/src/sort.c b/src/sort.c index 7e25f6a..1aa1eb4 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3226,13 +3226,13 @@ queue_pop (struct merge_node_queue *queue) static void write_unique (struct line const *line, FILE *tfp, char const *temp_output) { - static struct line const *saved = NULL; + static struct line saved; if (!unique) write_line (line, tfp, temp_output); - else if (!saved || compare (line, saved)) + else if (!saved.text || compare (line, &saved)) { - saved = line; + saved = *line; write_line (line, tfp, temp_output); } } From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 27 20:30:40 2010 Received: (at 7489) by debbugs.gnu.org; 28 Nov 2010 01:30:41 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMW6C-0006LT-FE for submit@debbugs.gnu.org; Sat, 27 Nov 2010 20:30:40 -0500 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1PMW69-0006LG-OV for 7489@debbugs.gnu.org; Sat, 27 Nov 2010 20:30:38 -0500 Received: (qmail 79783 invoked from network); 28 Nov 2010 01:36:07 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 28 Nov 2010 01:36:07 -0000 Message-ID: <4CF1B1AC.7070905@draigBrady.com> Date: Sun, 28 Nov 2010 01:34:36 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> In-Reply-To: <4CF1A911.1000000@cs.ucla.edu> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.7 (--) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.7 (--) On 28/11/10 00:57, Paul Eggert wrote: > Could you please try this little patch? It should fix your > problem. I came up with this fix in my sleep (literally! > I woke up this morning and the patch was in my head), but > haven't had time to look at the code in this area to see > if it's the best fix. > > Clearly there's at least one more bug as noted in my previous email > > but I expect it's less likely to fire. > > diff --git a/src/sort.c b/src/sort.c > index 7e25f6a..1aa1eb4 100644 > --- a/src/sort.c > +++ b/src/sort.c > @@ -3226,13 +3226,13 @@ queue_pop (struct merge_node_queue *queue) > static void > write_unique (struct line const *line, FILE *tfp, char const *temp_output) > { > - static struct line const *saved = NULL; > + static struct line saved; > > if (!unique) > write_line (line, tfp, temp_output); > - else if (!saved || compare (line, saved)) > + else if (!saved.text || compare (line, &saved)) > { > - saved = line; > + saved = *line; > write_line (line, tfp, temp_output); > } > } > > It passes a fleeting test between kids and sleep... $ zcat cracklib-words-20080507.gz | ./coreutils-8.7/src/sort -u > /dev/null Segmentation fault (core dumped) $ zcat cracklib-words-20080507.gz | ./coreutils-8.7/src/sort -u | wc -l 0 $ zcat cracklib-words-20080507.gz | ./coreutils-8.7/src/sort-paul -u | wc -l 1671704 cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 27 21:13:11 2010 Received: (at 7489) by debbugs.gnu.org; 28 Nov 2010 02:13:12 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMWlK-0007GW-R9 for submit@debbugs.gnu.org; Sat, 27 Nov 2010 21:13:11 -0500 Received: from mail-iw0-f172.google.com ([209.85.214.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMWlJ-0007GK-Hs for 7489@debbugs.gnu.org; Sat, 27 Nov 2010 21:13:10 -0500 Received: by iwn40 with SMTP id 40so4086719iwn.3 for <7489@debbugs.gnu.org>; Sat, 27 Nov 2010 18:18:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:received:message-id :date:from:user-agent:mime-version:to:cc:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-lucasit-mailscanner-watermark :x-lucasit-mailscanner-information:x-mailscanner-id :x-lucasit-mailscanner:x-lucasit-mailscanner-spamcheck :x-lucasit-mailscanner-from:x-spam-status; bh=AHGQJ/GS3JHwZijw5XL0kKaI6WbN1o5dcwn1maSaZHw=; b=E/2ttqoENSforChKXmGSWLPxkQWUYC6d7qbE5fIaxV2iVP3FhhqgpcD+if0gB0Opxj CWoN9ipu12f/1J3V2/8ZjmJIY+ebAOpBEQF/cRC+DF8BEfEO9fPbcX/SBmq++q1B5Do0 DVQW4EyGwJ9RHOYzq2PIcVdZCid5fTE5P4kPA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-lucasit-mailscanner-watermark :x-lucasit-mailscanner-information:x-mailscanner-id :x-lucasit-mailscanner:x-lucasit-mailscanner-spamcheck :x-lucasit-mailscanner-from:x-spam-status; b=qsH37wWyZAOzPKCyZAHVINu5+iciXY9sndqfvzu9SvswXb+3TMXjPrf+aNYGNzdD1u f2bJ7emXYnSYGMHHoWO3h5hftStLIUzUedRZvwdgadMW4GhMc5Ys7hoeOQy0MG6fiIdT UfRIQnW7PNvj6OsLvh2pw2+v0U31sHRtss/5Q= Received: by 10.42.179.129 with SMTP id bq1mr1168375icb.306.1290910718870; Sat, 27 Nov 2010 18:18:38 -0800 (PST) Received: from postal.lucasit.com (99-62-160-221.lightspeed.stlsmo.sbcglobal.net [99.62.160.221]) by mx.google.com with ESMTPS id gy41sm4026710ibb.23.2010.11.27.18.18.36 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 27 Nov 2010 18:18:37 -0800 (PST) Received: from [192.168.143.240] (unknown [192.168.143.240]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by postal.lucasit.com (Postfix) with ESMTPSA id 6B9F08158B2; Sun, 28 Nov 2010 02:18:30 +0000 (GMT) Message-ID: <4CF1BBF5.9080103@linuxfromscratch.org> Date: Sat, 27 Nov 2010 20:18:29 -0600 From: DJ Lucas User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.4) Gecko/20100710 Thunderbird/3.1 MIME-Version: 1.0 To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> In-Reply-To: <4CF1A911.1000000@cs.ucla.edu> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-LucasIT-MailScanner-Watermark: 1291515512.71587@WjKuepXtFY9GHuZcKg1xjw X-LucasIT-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: 6B9F08158B2.2AF3C X-LucasIT-MailScanner: Clean X-LucasIT-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-16.8, required 2.7, autolearn=not spam, ALL_TRUSTED -1.80, BAYES_00 -15.00) X-LucasIT-MailScanner-From: dj@linuxfromscratch.org X-Spam-Status: No X-Spam-Score: -3.4 (---) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 11/27/2010 06:57 PM, Paul Eggert wrote: > Could you please try this little patch? It should fix your > problem. I came up with this fix in my sleep (literally! > I woke up this morning and the patch was in my head), but > haven't had time to look at the code in this area to see > if it's the best fix. > lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat /lfs-source-archive/cracklib-words-20080507 | sort -u > /dev/null; echo $? 0 lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ Appears to work as expected. Thanks for jumping on this so quickly. -- DJ Lucas -- This message has been scanned for viruses and dangerous content, and is believed to be clean. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 02:09:03 2010 Received: (at 7489) by debbugs.gnu.org; 29 Nov 2010 07:09:03 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMxrC-0003zO-L1 for submit@debbugs.gnu.org; Mon, 29 Nov 2010 02:09:02 -0500 Received: from mail-iw0-f172.google.com ([209.85.214.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PMxrA-0003yy-Qk for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 02:09:01 -0500 Received: by iwn40 with SMTP id 40so5381992iwn.3 for <7489@debbugs.gnu.org>; Sun, 28 Nov 2010 23:14:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:received:message-id :date:from:user-agent:mime-version:to:cc:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-lucasit-mailscanner-watermark :x-lucasit-mailscanner-information:x-mailscanner-id :x-lucasit-mailscanner:x-lucasit-mailscanner-spamcheck :x-lucasit-mailscanner-from:x-spam-status; bh=AwBnVpvTVwcxnbSpS240kN/F/eUG/Dwe5+J9KgRO5Vs=; b=Pk8VYCBH2Yr2DhPQ+/lCWhEeaHi+kaAb9TgL1BsXoJjtiTahiBEe6g2nIvU4uQAPyX ofET87M+zq1Cth0tUBXFibzd+XUtxwYHB9oybhosSXLiA8mXjEyNVHFrV3ZtmAhBYTM5 48JML13yxmTYlU8yNQH7+1VNe8K6F6zjROh10= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding:x-lucasit-mailscanner-watermark :x-lucasit-mailscanner-information:x-mailscanner-id :x-lucasit-mailscanner:x-lucasit-mailscanner-spamcheck :x-lucasit-mailscanner-from:x-spam-status; b=UG+dXKmLZeklYF/2BWXuPFV6KI4/IEPZFeixm9ll+8lu0xoL4z9cMmcQ6BePVYrS0D 65oPZbcgHLwd9gWZGQ8m7hxI5mdapOKPviMdRDUTtp0bIVkz8r2YiE2DNBVyWgTIWK6V aWrfhlAwid6qGy9WBrv1vpvowczFaMTdxvJTI= Received: by 10.231.191.16 with SMTP id dk16mr2039992ibb.89.1291014873041; Sun, 28 Nov 2010 23:14:33 -0800 (PST) Received: from postal.lucasit.com (99-62-160-221.lightspeed.stlsmo.sbcglobal.net [99.62.160.221]) by mx.google.com with ESMTPS id d21sm5575025ibg.9.2010.11.28.23.14.30 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sun, 28 Nov 2010 23:14:31 -0800 (PST) Received: from [192.168.143.240] (unknown [192.168.143.240]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by postal.lucasit.com (Postfix) with ESMTPSA id 3F04B8158B2; Mon, 29 Nov 2010 07:14:21 +0000 (GMT) Message-ID: <4CF352CD.7010206@linuxfromscratch.org> Date: Mon, 29 Nov 2010 01:14:21 -0600 From: DJ Lucas User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.4) Gecko/20100710 Thunderbird/3.1 MIME-Version: 1.0 To: 7489@debbugs.gnu.org Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> In-Reply-To: <4CF1BBF5.9080103@linuxfromscratch.org> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-LucasIT-MailScanner-Watermark: 1291619664.97663@cEhjxOgibvm44K3vj4uA7w X-LucasIT-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: 3F04B8158B2.309DD X-LucasIT-MailScanner: Clean X-LucasIT-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=-16.8, required 2.7, autolearn=not spam, ALL_TRUSTED -1.80, BAYES_00 -15.00) X-LucasIT-MailScanner-From: dj@linuxfromscratch.org X-Spam-Status: No X-Spam-Score: -3.2 (---) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.1 (---) On 11/27/2010 08:18 PM, DJ Lucas wrote: > > lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat > /lfs-source-archive/cracklib-words-20080507 | sort -u > /dev/null; echo $? > 0 > lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ > > Appears to work as expected. Thanks for jumping on this so quickly. > Okay, so that fixes the segfault for both the original example from the RedHat bug, and the example I submitted. However, CPU is still showing 100% utilization in the original test (from RedHat bz), and the test that, I believe (not sure of the quoting), was submitted by PÃdraig Brady still fails. I don't have a link to the original (maybe private message), but it is quoted here: http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html. I ran only the quoted test. Unrelated bug? (again, not entirely sure of the context) lfs [ lfs-source-archive ]$ seq 100000 > in lfs [ lfs-source-archive ]$ mkfifo fifo lfs [ lfs-source-archive ]$ (for i in $(seq 12); do read line; echo $i; sleep .1; done > cat > /dev/null) < fifo & [1] 16123 lfs [ lfs-source-archive ]$ (ulimit -t 1; sort in > fifo \ > || echo killed via $(env kill -l $(expr $? - 128))) 1 2 3 4 5 6 7 8 9 killed via KILL lfs [ lfs-source-archive ]$ 10 11 12 ^C [1]+ Done ( for i in $(seq 12); do read line; echo $i; sleep .1; done; cat > /dev/null ) < fifo -- DJ Lucas -- This message has been scanned for viruses and dangerous content, and is believed to be clean. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 06:25:28 2010 Received: (at 7489) by debbugs.gnu.org; 29 Nov 2010 11:25:28 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PN1rM-00014l-04 for submit@debbugs.gnu.org; Mon, 29 Nov 2010 06:25:28 -0500 Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1PN1rI-00014X-Q8 for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 06:25:25 -0500 Received: (qmail 39560 invoked from network); 29 Nov 2010 11:30:57 -0000 Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218) by mail1.slb.deg.dub.stisp.net with SMTP; 29 Nov 2010 11:30:57 -0000 Message-ID: <4CF38E90.2060804@draigBrady.com> Date: Mon, 29 Nov 2010 11:29:20 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3 MIME-Version: 1.0 To: DJ Lucas Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> In-Reply-To: <4CF352CD.7010206@linuxfromscratch.org> X-Enigmail-Version: 1.0.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.7 (--) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.7 (--) On 29/11/10 07:14, DJ Lucas wrote: > On 11/27/2010 08:18 PM, DJ Lucas wrote: > >> >> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ cat >> /lfs-source-archive/cracklib-words-20080507 | sort -u > /dev/null; echo $? >> 0 >> lfs [ /lfs-source-archive/coreutils-8.7-new/src ]$ >> >> Appears to work as expected. Thanks for jumping on this so quickly. >> > > Okay, so that fixes the segfault for both the original example from the > RedHat bug, and the example I submitted. However, CPU is still showing > 100% utilization in the original test (from RedHat bz), and the test > that, I believe (not sure of the quoting), was submitted by PÃdraig > Brady still fails. I don't have a link to the original (maybe private > message), but it is quoted here: > http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html. I ran > only the quoted test. Unrelated bug? (again, not entirely sure of the > context) There was a suggested workaround for test here: https://bugzilla.redhat.com/show_bug.cgi?id=655096#c5 I've no time to try that at present though. sorry, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 14:10:59 2010 Received: (at 7489) by debbugs.gnu.org; 29 Nov 2010 19:10:59 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PN97q-0003VN-BH for submit@debbugs.gnu.org; Mon, 29 Nov 2010 14:10:58 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PN97o-0003VA-H6 for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 14:10:57 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id A474139E80E0; Mon, 29 Nov 2010 11:16:30 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GNaepDjPSvhy; Mon, 29 Nov 2010 11:16:29 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id CD8D539E80DF; Mon, 29 Nov 2010 11:16:29 -0800 (PST) Message-ID: <4CF3FC08.3020705@cs.ucla.edu> Date: Mon, 29 Nov 2010 11:16:24 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: DJ Lucas Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> In-Reply-To: <4CF352CD.7010206@linuxfromscratch.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , coreutils@gnu.org, 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 11/28/10 23:14, DJ Lucas wrote: > http://lists.gnu.org/archive/html/coreutils/2010-11/msg00124.html Ah, sorry, I didn't understand that message and thought P=C3=A1draig had handled it. On an 8-core RHEL 5.5 x86-64 host I reproduced the problem with the stated test case: (for i in $(seq 12); do read line; echo $i; sleep .1; done cat > /dev/null) < fifo & (ulimit -t 1; ./sort in > fifo \ || echo killed via $(env kill -l $(expr $? - 128))) If I understand this other bug correctly, the problem is that one thread is using a spin lock to wait for another thread, which in turn is waiting to write to the pipe. Surely the simplest fix is to drop spin locks entirely and use mutexes instead. Perhaps a better fix would be to use mutexes at the top level (where threads can write to a file and therefore can wait) and to use spin locks at lower levels (where threads are merely storing into memory and thus can't wait). Chen, do you have any advice on this as well? I can look into either fix but am mildly inclined to take the simpler (albeit slower) approach, at least at first. Here's that thread again: http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00209.html From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 15:04:00 2010 Received: (at 7489) by debbugs.gnu.org; 29 Nov 2010 20:04:01 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PN9x9-0004ec-Ty for submit@debbugs.gnu.org; Mon, 29 Nov 2010 15:04:00 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PN9x7-0004eP-99 for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 15:03:59 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 97E45600CC; Mon, 29 Nov 2010 21:09:30 +0100 (CET) From: Jim Meyering To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: <4CF1A911.1000000@cs.ucla.edu> (Paul Eggert's message of "Sat, 27 Nov 2010 16:57:53 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> Date: Mon, 29 Nov 2010 21:09:30 +0100 Message-ID: <87vd3fuc1x.fsf@meyering.net> Lines: 133 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?utf-8?Q?P=C3=A1draig?= Brady , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.6 (-----) Paul Eggert wrote: > Could you please try this little patch? It should fix your > problem. I came up with this fix in my sleep (literally! > I woke up this morning and the patch was in my head), but > haven't had time to look at the code in this area to see > if it's the best fix. > > Clearly there's at least one more bug as noted in my previous email > > but I expect it's less likely to fire. > > diff --git a/src/sort.c b/src/sort.c > index 7e25f6a..1aa1eb4 100644 > --- a/src/sort.c > +++ b/src/sort.c > @@ -3226,13 +3226,13 @@ queue_pop (struct merge_node_queue *queue) > static void > write_unique (struct line const *line, FILE *tfp, char const *temp_output) > { > - static struct line const *saved = NULL; > + static struct line saved; > > if (!unique) > write_line (line, tfp, temp_output); > - else if (!saved || compare (line, saved)) > + else if (!saved.text || compare (line, &saved)) > { > - saved = line; > + saved = *line; > write_line (line, tfp, temp_output); > } > } Nice. That looks right to me. FYI, what happens is the first fillbuf call gets a block of lines, and "saved" ends up pointing at one of them. The second fillbuf's fread races to overwrite a byte or two of the saved->text pointer that is being dereferenced by the other thread. The patch below adds a small test case to exercise it. It demonstrates the failure in all but ~20 trials out of 1000 for me. So far, I've reproduced this only on multi-core systems, with --parallel=2. >From 3b8f453a325fbd094e2605347b1d8a05efab9b0d Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Sun, 28 Nov 2010 12:59:38 +0100 Subject: [PATCH] tests: add test for parallel sort -u segfault bug * tests/misc/sort-unique-segv: New file. * tests/Makefile.am (TESTS): Add it. --- tests/Makefile.am | 1 + tests/misc/sort-unique-segv | 55 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+), 0 deletions(-) create mode 100755 tests/misc/sort-unique-segv diff --git a/tests/Makefile.am b/tests/Makefile.am index b3be4df..d52f677 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -239,6 +239,7 @@ TESTS = \ misc/sort-rand \ misc/sort-spinlock-abuse \ misc/sort-unique \ + misc/sort-unique-segv \ misc/sort-version \ misc/split-a \ misc/split-bchunk \ diff --git a/tests/misc/sort-unique-segv b/tests/misc/sort-unique-segv new file mode 100755 index 0000000..0a1d4cb --- /dev/null +++ b/tests/misc/sort-unique-segv @@ -0,0 +1,55 @@ +#!/bin/sh +# parallel sort with --unique (-u) would segfault + +# Copyright (C) 2010 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +print_ver_ sort + +test "$(nproc)" = 1 && skip_ "requires a multi-core system" + +cat <<\EOF > in || framework_failure_ + + + + + + + +z +zzzzzz +zzzzzzz +zzzzzzz +zzzzzzz +zzzzzzzzz +zzzzzzzzzzz +zzzzzzzzzzzz +EOF + +cat <<\EOF > exp || fail=1 + +z +zzzzzz +zzzzzzz +zzzzzzzzz +zzzzzzzzzzz +zzzzzzzzzzzz +EOF + +sort --parallel=2 -u -S 10b < in > out || fail=1 +compare out exp || fail=1 + +Exit $fail -- 1.7.3.2.846.gf4b062 From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 15:08:32 2010 Received: (at 7489) by debbugs.gnu.org; 29 Nov 2010 20:08:32 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNA1X-0004kq-Ou for submit@debbugs.gnu.org; Mon, 29 Nov 2010 15:08:32 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNA1V-0004ke-7X for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 15:08:29 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 60A8E6008E; Mon, 29 Nov 2010 21:14:03 +0100 (CET) From: Jim Meyering To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: <4CF1A911.1000000@cs.ucla.edu> (Paul Eggert's message of "Sat, 27 Nov 2010 16:57:53 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> Date: Mon, 29 Nov 2010 21:14:03 +0100 Message-ID: <87pqtnubuc.fsf@meyering.net> Lines: 13 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?utf-8?Q?P=C3=A1draig?= Brady , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.6 (-----) Paul Eggert wrote: > Could you please try this little patch? It should fix your > problem. I came up with this fix in my sleep (literally! > I woke up this morning and the patch was in my head), but > haven't had time to look at the code in this area to see > if it's the best fix. > > Clearly there's at least one more bug as noted in my previous email > > but I expect it's less likely to fire. I haven't tried to trigger that one yet. Have you? From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 17:41:06 2010 Received: (at 7489) by debbugs.gnu.org; 29 Nov 2010 22:41:06 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNCPB-0008Lg-83 for submit@debbugs.gnu.org; Mon, 29 Nov 2010 17:41:06 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNCP8-0008L3-Gk for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 17:41:03 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 021A739E80DC; Mon, 29 Nov 2010 14:46:37 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R256twscY2f5; Mon, 29 Nov 2010 14:46:36 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 5202139E80DB; Mon, 29 Nov 2010 14:46:36 -0800 (PST) Message-ID: <4CF42D4B.7020904@cs.ucla.edu> Date: Mon, 29 Nov 2010 14:46:35 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87pqtnubuc.fsf@meyering.net> In-Reply-To: <87pqtnubuc.fsf@meyering.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?ISO-8859-1?Q?P=E1draig_Brady?= , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 11/29/10 12:14, Jim Meyering wrote: > I haven't tried to trigger that one yet. > Have you? No, sorry, haven't had time. My current guess, by the way, is that it's not a bug that can be triggered: it's merely useless code that is harmless and can safely be removed. (This is a guess that I also came up with in my sleep, so take it for what it's worth. :-) From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 19:28:42 2010 Received: (at 7489) by debbugs.gnu.org; 30 Nov 2010 00:28:42 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNE5K-0002Jh-12 for submit@debbugs.gnu.org; Mon, 29 Nov 2010 19:28:42 -0500 Received: from mail-ww0-f42.google.com ([74.125.82.42]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNE5H-0002JW-Ps for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 19:28:40 -0500 Received: by wwf26 with SMTP id 26so699229wwf.3 for <7489@debbugs.gnu.org>; Mon, 29 Nov 2010 16:34:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=5dMPqfOTU7uCG+Q3L8TT8QUSy2c3OitDwiBIcVICxmg=; b=iXOnHA+laf2pBdeAWV6qXvnzL40oq7WlouKOaZON/PMLS/W+GJkLznlwAFbHG6WBWt IOUoBWPbDSUjpdWt4oULZaejQB1fc0ZdPGb3RIFNPiBIw5JApBNNPkpmcOykOwcZT8Xy Fmtsiq0l1D/hFP7LkVcFRI9DMBWVC90uEYzyM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=BXZsZ0RecUhifJFH5Esc7B0FnhEi60TGX/vVSC05v0cCYyMsb4Yyn3iq76FDjY/1lO sFh2EVwcmtCrBvkFkr6CNOIm5aEtknYRIyVPnCOehJtA9uJJA4WvGGL4D8P4quiRcJ4A iVe2x3xOEFMwZenBqOdzwPqUyvyHv9hKBKr4s= MIME-Version: 1.0 Received: by 10.216.7.8 with SMTP id 8mr1199931weo.30.1291077253219; Mon, 29 Nov 2010 16:34:13 -0800 (PST) Received: by 10.216.240.196 with HTTP; Mon, 29 Nov 2010 16:34:13 -0800 (PST) In-Reply-To: <4CF3FC08.3020705@cs.ucla.edu> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> Date: Mon, 29 Nov 2010 16:34:13 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.2 (---) Hi all, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote: > entirely and use mutexes instead. =A0Perhaps a better fix would be to > use mutexes at the top level (where threads can write to a file and > therefore can wait) and to use spin locks at lower levels (where > threads are merely storing into memory and thus can't wait). > > Chen, do you have any advice on this as well? =A0I can look into > either fix but am mildly inclined to take the simpler (albeit slower) > approach, at least at first. =A0Here's that thread again: > > http://lists.gnu.org/archive/html/bug-coreutils/2010-11/msg00209.html I haven't looked at the code in a couple of months, so I wont have a definite answer until tonight, but off the top of my head I'm not sure mixing spinlocks and mutexes will work, since they exist orthogonally. That is, a thread can lock the mutex of a structure, but the spin lock of the structure is still free to be acquired. The only way to ensure the struct is locked down is to lock both the mutex and the spinl= ock, in which case, what's the point? The only way this would work is if, when a struct is locked via mutex the o= nly threads trying to acquire the struct are trying to do so via mutex, and no threads are looking to lock via spinlock. I'll take a look when I get home tonight to see if this condition always holds. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 19:34:15 2010 Received: (at 7489) by debbugs.gnu.org; 30 Nov 2010 00:34:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNEAf-0002Rk-C3 for submit@debbugs.gnu.org; Mon, 29 Nov 2010 19:34:14 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNEAd-0002RY-TB for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 19:34:12 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 646BC39E80DC; Mon, 29 Nov 2010 16:39:46 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dqMmTMVmulul; Mon, 29 Nov 2010 16:39:45 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id C3F5439E80DB; Mon, 29 Nov 2010 16:39:45 -0800 (PST) Message-ID: <4CF447D0.9050907@cs.ucla.edu> Date: Mon, 29 Nov 2010 16:39:44 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 11/29/10 16:34, Chen Guo wrote: > The only way this would work is if, when a struct is locked via mutex the only > threads trying to acquire the struct are trying to do so via mutex, > and no threads > are looking to lock via spinlock. Yes, that's definitely the idea. Under either of my proposals, a particular object would be protected either by a spinlock, or by a mutex, but never by both; and the decision as to whether to use a spinlock or a mutex would be made by object-creation time at the latest. Thanks for looking into it. From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 29 23:27:04 2010 Received: (at 7489) by debbugs.gnu.org; 30 Nov 2010 04:27:04 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNHo0-0007Su-8N for submit@debbugs.gnu.org; Mon, 29 Nov 2010 23:27:04 -0500 Received: from mail-wy0-f172.google.com ([74.125.82.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNHny-0007SR-27 for 7489@debbugs.gnu.org; Mon, 29 Nov 2010 23:27:02 -0500 Received: by wyf23 with SMTP id 23so4409175wyf.3 for <7489@debbugs.gnu.org>; Mon, 29 Nov 2010 20:32:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=V4hZqrv0VN9g6el16Lj9shLpfyPMBfjYR+mZxn5s0lI=; b=hMhnHnHaq4jlrtEv2pUXQBsbdhbTHQPotAzM6dvxGmb4dex+MWpU43XnYu7J1t7INg av6bxpsfrEQK1u53E74EGwu4OzZkdlmyzYdstNFrO/n8RTjYgZsODLrqUKg/lzV9wpU3 akDOPA6gG0UHuErUWgRFnAhXD44CCgMnHeao0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=JZYS9dqMpRSS+5v55WZfT/0MwEil0aXbY9JsaTqhgUQF654+QuoXssQX3v6AGmeasL 41oziFwU2656KsqMDXtZ9JTrjyr1hju+jWpmWAiWE6zn/WZf1Uekktd+2/CnE63ovoI0 GHIfn2nSBQJzycZYMZdexTxACHV7S3OoRMsHg= MIME-Version: 1.0 Received: by 10.227.68.143 with SMTP id v15mr7029655wbi.12.1291091556708; Mon, 29 Nov 2010 20:32:36 -0800 (PST) Received: by 10.216.240.196 with HTTP; Mon, 29 Nov 2010 20:32:36 -0800 (PST) In-Reply-To: <4CF447D0.9050907@cs.ucla.edu> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <4CF447D0.9050907@cs.ucla.edu> Date: Mon, 29 Nov 2010 20:32:36 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -3.1 (---) X-Debbugs-Envelope-To: 7489 Cc: DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.1 (---) Hi guys, Is something up with Savannah? I just tried a git clone and got connection time out; I cant even reach git.sv.gnu.org via ping. I'll try again at work tomorrow. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 30 02:33:01 2010 Received: (at 7489) by debbugs.gnu.org; 30 Nov 2010 07:33:02 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNKhw-00030d-83 for submit@debbugs.gnu.org; Tue, 30 Nov 2010 02:33:01 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNKhu-00030S-AC for 7489@debbugs.gnu.org; Tue, 30 Nov 2010 02:32:59 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 922D339E80DC; Mon, 29 Nov 2010 23:38:33 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Nn1ML0LJ3BVN; Mon, 29 Nov 2010 23:38:32 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id A2AFB39E80B1; Mon, 29 Nov 2010 23:38:32 -0800 (PST) Message-ID: <4CF4A9F1.6080107@cs.ucla.edu> Date: Mon, 29 Nov 2010 23:38:25 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <4CF447D0.9050907@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 11/29/2010 08:32 PM, Chen Guo wrote: > Hi guys, > Is something up with Savannah? I just tried a git clone and got > connection time out; I cant even reach git.sv.gnu.org via ping. There was a breakin, which led to leaking of encrypted account passwords, some of them discovered via a brute-force attack. It's pretty bad. They're reinstalling and restoring the data from a November 24 backup. No firm ETA yet. Expect an official announcement soon. You can find the current status at . From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 30 16:35:38 2010 Received: (at 7489) by debbugs.gnu.org; 30 Nov 2010 21:35:38 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNXrO-00061m-Dv for submit@debbugs.gnu.org; Tue, 30 Nov 2010 16:35:38 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNXrL-00061a-92 for 7489@debbugs.gnu.org; Tue, 30 Nov 2010 16:35:36 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 056AE603BB; Tue, 30 Nov 2010 22:41:11 +0100 (CET) From: Jim Meyering To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: <87vd3fuc1x.fsf@meyering.net> (Jim Meyering's message of "Mon, 29 Nov 2010 21:09:30 +0100") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87vd3fuc1x.fsf@meyering.net> Date: Tue, 30 Nov 2010 22:41:11 +0100 Message-ID: <87lj4apk08.fsf@meyering.net> Lines: 86 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?utf-8?Q?P=C3=A1draig?= Brady , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.6 (-----) Jim Meyering wrote: > Paul Eggert wrote: >> Could you please try this little patch? It should fix your >> problem. I came up with this fix in my sleep (literally! >> I woke up this morning and the patch was in my head), but >> haven't had time to look at the code in this area to see >> if it's the best fix. >> >> Clearly there's at least one more bug as noted in my previous email >> >> but I expect it's less likely to fire. >> >> diff --git a/src/sort.c b/src/sort.c >> index 7e25f6a..1aa1eb4 100644 >> --- a/src/sort.c >> +++ b/src/sort.c >> @@ -3226,13 +3226,13 @@ queue_pop (struct merge_node_queue *queue) >> static void >> write_unique (struct line const *line, FILE *tfp, char const *temp_output) >> { >> - static struct line const *saved = NULL; >> + static struct line saved; >> >> if (!unique) >> write_line (line, tfp, temp_output); >> - else if (!saved || compare (line, saved)) >> + else if (!saved.text || compare (line, &saved)) >> { >> - saved = line; >> + saved = *line; >> write_line (line, tfp, temp_output); >> } >> } > > Nice. > That looks right to me. > > FYI, what happens is the first fillbuf call gets a block > of lines, and "saved" ends up pointing at one of them. > The second fillbuf's fread races to overwrite a byte or two of the > saved->text pointer that is being dereferenced by the other thread. ... Hi Paul, I'm getting ready to push something like the following (I'll add a NEWS entry first, though) Is there anything you'd like to add? >From 46589f6be5ce6cd7ff474059a33f47b57094ff21 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 30 Nov 2010 22:30:12 +0100 Subject: [PATCH] sort -u: fix a thread-race pointer corruption bug * src/sort.c (write_unique): Save the entire "struct line", not just a pointer to one. Otherwise, with a multi-thread run, sometimes, with some inputs, fillbuf would would win a race and clobber a "saved->text" pointer in one thread just before it was dereferenced in a comparison in another thread. --- src/sort.c | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/sort.c b/src/sort.c index 7e25f6a..1aa1eb4 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3226,13 +3226,13 @@ queue_pop (struct merge_node_queue *queue) static void write_unique (struct line const *line, FILE *tfp, char const *temp_output) { - static struct line const *saved = NULL; + static struct line saved; if (!unique) write_line (line, tfp, temp_output); - else if (!saved || compare (line, saved)) + else if (!saved.text || compare (line, &saved)) { - saved = line; + saved = *line; write_line (line, tfp, temp_output); } } -- 1.7.3.2.846.gf4b062 From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 30 17:16:50 2010 Received: (at 7489) by debbugs.gnu.org; 30 Nov 2010 22:16:50 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNYVF-0006vq-Mb for submit@debbugs.gnu.org; Tue, 30 Nov 2010 17:16:50 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNYVC-0006vZ-N0 for 7489@debbugs.gnu.org; Tue, 30 Nov 2010 17:16:48 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 57AE239E80E0; Tue, 30 Nov 2010 14:22:23 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yo7VpMdPzxrM; Tue, 30 Nov 2010 14:22:21 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 0C85439E80DC; Tue, 30 Nov 2010 14:22:21 -0800 (PST) Message-ID: <4CF57916.6050103@cs.ucla.edu> Date: Tue, 30 Nov 2010 14:22:14 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87vd3fuc1x.fsf@meyering.net> <87lj4apk08.fsf@meyering.net> In-Reply-To: <87lj4apk08.fsf@meyering.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?ISO-8859-1?Q?P=E1draig_Brady?= , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 11/30/10 13:41, Jim Meyering wrote: > Is there anything you'd like to add? No, thanks, that looks good. I have some other patches to clean things up in this area, but they can wait. I hate to tease, so here is a draft of the cleanup patches. Most of this stuff is cleanup, but the first line of the change does fix a bug with very large sorts: when level > 14 on a typical 64-bit host, there is an unwanted integer overflow and 'sort' will divide by zero. Admittedly this bug is unlikely (doesn't this mean you need thousands of threads?). Anyway, perhaps Chen can review them (I don't have time to test them right now). Plus we have to fix the other bug. I wrote these cleanup patches while trying to understand the code well enough to fix the other bug. diff --git a/src/sort.c b/src/sort.c index 1aa1eb4..708cc3c 100644 --- a/src/sort.c +++ b/src/sort.c @@ -107,7 +107,7 @@ struct rlimit { size_t rlim_cur; }; /* Maximum number of lines to merge every time a NODE is taken from the MERGE_QUEUE. Node is at LEVEL in the binary merge tree, and is responsible for merging TOTAL lines. */ -#define MAX_MERGE(total, level) ((total) / ((2 << level) * (2 << level)) + 1) +#define MAX_MERGE(total, level) (((total) >> (((level) << 1) + 2)) + 1) /* Heuristic value for the number of lines for which it is worth creating a subthread, during an internal merge sort, on a machine @@ -237,10 +237,10 @@ struct merge_node struct line **dest; /* Pointer to destination of merge. */ size_t nlo; /* Total Lines remaining from LO. */ size_t nhi; /* Total lines remaining from HI. */ - size_t level; /* Level in merge tree. */ struct merge_node *parent; /* Parent node. */ + unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ - pthread_spinlock_t *lock; /* Lock for node operations. */ + pthread_spinlock_t lock; /* Lock for node operations. */ }; /* Priority queue of merge nodes. */ @@ -3156,7 +3156,7 @@ compare_nodes (void const *a, void const *b) static inline void lock_node (struct merge_node *node) { - pthread_spin_lock (node->lock); + pthread_spin_lock (&node->lock); } /* Unlock a merge tree NODE. */ @@ -3164,7 +3164,7 @@ lock_node (struct merge_node *node) static inline void unlock_node (struct merge_node *node) { - pthread_spin_unlock (node->lock); + pthread_spin_unlock (&node->lock); } /* Destroy merge QUEUE. */ @@ -3240,69 +3240,38 @@ write_unique (struct line const *line, FILE *tfp, char const *temp_output) /* Merge the lines currently available to a NODE in the binary merge tree, up to a maximum specified by MAX_MERGE. */ -static size_t +static void mergelines_node (struct merge_node *restrict node, size_t total_lines, FILE *tfp, char const *temp_output) { - struct line *lo_orig = node->lo; - struct line *hi_orig = node->hi; + bool merge_to_dest = (MERGE_ROOT < node->level); + struct line const *lo_orig = node->lo; + struct line const *hi_orig = node->hi; + struct line *dest = *node->dest; size_t to_merge = MAX_MERGE (total_lines, node->level); - size_t merged_lo; - size_t merged_hi; - if (node->level > MERGE_ROOT) + while (to_merge--) { - /* Merge to destination buffer. */ - struct line *dest = *node->dest; - while (node->lo != node->end_lo && node->hi != node->end_hi && to_merge--) - if (compare (node->lo - 1, node->hi - 1) <= 0) - *--dest = *--node->lo; - else - *--dest = *--node->hi; - - merged_lo = lo_orig - node->lo; - merged_hi = hi_orig - node->hi; - - if (node->nhi == merged_hi) - while (node->lo != node->end_lo && to_merge--) - *--dest = *--node->lo; - else if (node->nlo == merged_lo) - while (node->hi != node->end_hi && to_merge--) - *--dest = *--node->hi; - } - else - { - /* Merge directly to output. */ - while (node->lo != node->end_lo && node->hi != node->end_hi && to_merge--) + bool lo_exhausted = (node->lo == node->end_lo); + bool hi_exhausted = (node->hi == node->end_hi); + int cmp = lo_exhausted - hi_exhausted; + if (! cmp) { - if (compare (node->lo - 1, node->hi - 1) <= 0) - write_unique (--node->lo, tfp, temp_output); - else - write_unique (--node->hi, tfp, temp_output); + if (lo_exhausted) + break; + cmp = compare (node->lo - 1, node->hi - 1); } - merged_lo = lo_orig - node->lo; - merged_hi = hi_orig - node->hi; - - if (node->nhi == merged_hi) - { - while (node->lo != node->end_lo && to_merge--) - write_unique (--node->lo, tfp, temp_output); - } - else if (node->nlo == merged_lo) - { - while (node->hi != node->end_hi && to_merge--) - write_unique (--node->hi, tfp, temp_output); - } - node->dest -= lo_orig - node->lo + hi_orig - node->hi; + struct line const *lesser = (cmp <= 0 ? --node->lo : --node->hi); + if (merge_to_dest) + *--dest = *lesser; + else + write_unique (lesser, tfp, temp_output); } - /* Update NODE. */ - merged_lo = lo_orig - node->lo; - merged_hi = hi_orig - node->hi; - node->nlo -= merged_lo; - node->nhi -= merged_hi; - return merged_lo + merged_hi; + *node->dest = dest; + node->nlo -= lo_orig - node->lo; + node->nhi -= hi_orig - node->hi; } /* Insert NODE into QUEUE if it passes insertion checks. */ @@ -3328,13 +3297,11 @@ check_insert (struct merge_node *node, struct merge_node_queue *queue) /* Update parent merge tree NODE. */ static void -update_parent (struct merge_node *node, size_t merged, - struct merge_node_queue *queue) +update_parent (struct merge_node *node, struct merge_node_queue *queue) { if (node->level > MERGE_ROOT) { lock_node (node->parent); - *node->dest -= merged; check_insert (node->parent, queue); unlock_node (node->parent); } @@ -3364,10 +3331,9 @@ merge_loop (struct merge_node_queue *queue, queue_insert (queue, node); break; } - size_t merged_lines = mergelines_node (node, total_lines, tfp, - temp_output); + mergelines_node (node, total_lines, tfp, temp_output); check_insert (node, queue); - update_parent (node, merged_lines, queue); + update_parent (node, queue); unlock_node (node); } @@ -3442,10 +3408,9 @@ sortlines (struct line *restrict lines, struct line *restrict dest, struct line *lo = dest - total_lines; struct line *hi = lo - nlo; struct line **parent_end = (lo_child)? &parent->end_lo : &parent->end_hi; - pthread_spinlock_t lock; - pthread_spin_init (&lock, PTHREAD_PROCESS_PRIVATE); struct merge_node node = {lo, hi, lo, hi, parent_end, nlo, nhi, - parent->level + 1, parent, false, &lock}; + parent, parent->level + 1, false, }; + pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); /* Calculate thread arguments. */ unsigned long int lo_threads = nthreads / 2; @@ -3481,7 +3446,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, merge_loop (merge_queue, total_lines, tfp, temp_output); } - pthread_spin_destroy (&lock); + pthread_spin_destroy (&node.lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3758,16 +3723,15 @@ sort (char * const *files, size_t nfiles, char const *output_file, struct merge_node_queue merge_queue; queue_init (&merge_queue, 2 * nthreads); - pthread_spinlock_t lock; - pthread_spin_init (&lock, PTHREAD_PROCESS_PRIVATE); struct merge_node node = {NULL, NULL, NULL, NULL, NULL, buf.nlines, - buf.nlines, MERGE_END, NULL, false, &lock}; + buf.nlines, NULL, MERGE_END, false, }; + pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); sortlines (line, line, nthreads, buf.nlines, &node, true, &merge_queue, tfp, temp_output); queue_destroy (&merge_queue); - pthread_spin_destroy (&lock); + pthread_spin_destroy (&node.lock); } else write_unique (line - 1, tfp, temp_output); From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 30 19:14:10 2010 Received: (at 7489) by debbugs.gnu.org; 1 Dec 2010 00:14:10 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNaKn-000115-R7 for submit@debbugs.gnu.org; Tue, 30 Nov 2010 19:14:10 -0500 Received: from mail-wy0-f172.google.com ([74.125.82.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNaKl-00010r-FT for 7489@debbugs.gnu.org; Tue, 30 Nov 2010 19:14:08 -0500 Received: by wyf23 with SMTP id 23so5491303wyf.3 for <7489@debbugs.gnu.org>; Tue, 30 Nov 2010 16:19:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=kib6yJEGfK4lURqfcnTzZHopUOguHWV2uf0Z2oQHAu8=; b=s6jnrK1fKbJTO9q05LQBlXod2ORMazllr9DMo1N7xSZnEcEndJkkcJHD3urkJR1cmx r0NwEXnnxOgQlw3bsDzK8BLznYC4TLfYOZ5P46lrwrUz6GSvBH1wkcbvObny2yGwTxlW gEdMyXamJhFJ1FTRQFmcpz0L/h5g6eVmYUw9g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=PxqlMh58Hkgy3tKn5hNwamKTm3cpjeKAev6O4EXTt4kMFSvgB6Fm733fb24Vjr7IfH z/gDiRv38XFjLJiIz25XJlzVe69zf9206LF9OzXn9ZsKrXMQwozzxsCTuzUo4bIHCuz8 FsAS5CE2YfsLFOMIC1QOIRySsm8FPGwEN/9yc= MIME-Version: 1.0 Received: by 10.216.175.18 with SMTP id y18mr8209199wel.30.1291162784396; Tue, 30 Nov 2010 16:19:44 -0800 (PST) Received: by 10.216.240.196 with HTTP; Tue, 30 Nov 2010 16:19:44 -0800 (PST) In-Reply-To: <4CF57916.6050103@cs.ucla.edu> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87vd3fuc1x.fsf@meyering.net> <87lj4apk08.fsf@meyering.net> <4CF57916.6050103@cs.ucla.edu> Date: Tue, 30 Nov 2010 16:19:44 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, =?ISO-8859-1?Q?P=E1draig_Brady?= , Jim Meyering , 7489@debbugs.gnu.org, DJ Lucas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.6 (---) On Tue, Nov 30, 2010 at 2:22 PM, Paul Eggert wrote: Hi Professor Eggert, > Anyway, perhaps Chen can review them (I don't have time > to test them right now). I'll look at it as soon as Savannah's back up; I never actually pulled from coreutils after the original patch was submitted. Professor Eggert, could you detail how you can trigger the divide-by-zero bug? From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 01 01:10:36 2010 Received: (at 7489) by debbugs.gnu.org; 1 Dec 2010 06:10:36 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNftj-0000Kd-HS for submit@debbugs.gnu.org; Wed, 01 Dec 2010 01:10:36 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNfth-0000KR-U6 for 7489@debbugs.gnu.org; Wed, 01 Dec 2010 01:10:34 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 85D6739E80DC; Tue, 30 Nov 2010 22:16:11 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Nn0mXnOlQGnb; Tue, 30 Nov 2010 22:16:11 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 010C839E80B1; Tue, 30 Nov 2010 22:16:10 -0800 (PST) Message-ID: <4CF5E825.1010807@cs.ucla.edu> Date: Tue, 30 Nov 2010 22:16:05 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87vd3fuc1x.fsf@meyering.net> <87lj4apk08.fsf@meyering.net> <4CF57916.6050103@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= , Jim Meyering , 7489@debbugs.gnu.org, DJ Lucas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 11/30/2010 04:19 PM, Chen Guo wrote: > could you detail how you can trigger the divide-by-zero bug? Invoke MAX_MERGE(total, level) with level == 15. 2 << level yields 65536, and 65536 * 65536 overflows to zero. From debbugs-submit-bounces@debbugs.gnu.org Wed Dec 01 02:31:38 2010 Received: (at 7489) by debbugs.gnu.org; 1 Dec 2010 07:31:38 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNhAA-00025B-Kw for submit@debbugs.gnu.org; Wed, 01 Dec 2010 02:31:38 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PNhA8-00024w-Fl for 7489@debbugs.gnu.org; Wed, 01 Dec 2010 02:31:37 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 6098E600E3; Wed, 1 Dec 2010 08:37:14 +0100 (CET) From: Jim Meyering To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: <4CF4A9F1.6080107@cs.ucla.edu> (Paul Eggert's message of "Mon, 29 Nov 2010 23:38:25 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <4CF447D0.9050907@cs.ucla.edu> <4CF4A9F1.6080107@cs.ucla.edu> Date: Wed, 01 Dec 2010 08:37:14 +0100 Message-ID: <87wrnundud.fsf@meyering.net> Lines: 20 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.6 (-----) Paul Eggert wrote: > On 11/29/2010 08:32 PM, Chen Guo wrote: >> Hi guys, >> Is something up with Savannah? I just tried a git clone and got >> connection time out; I cant even reach git.sv.gnu.org via ping. > > There was a breakin, which led to leaking of encrypted account > passwords, some of them discovered via a brute-force attack. Note that it was web passwords. > It's pretty bad. They're reinstalling and restoring the data > from a November 24 backup. No firm ETA yet. Expect an > official announcement soon. > > You can find the current status at . Here's the chronology: http://www.fsf.org/blogs/sysadmin/savannah-and-www.gnu.org-downtime From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 02 00:51:34 2010 Received: (at 7489) by debbugs.gnu.org; 2 Dec 2010 05:51:34 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PO24s-0001wa-1e for submit@debbugs.gnu.org; Thu, 02 Dec 2010 00:51:34 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PO24p-0001wO-Se for 7489@debbugs.gnu.org; Thu, 02 Dec 2010 00:51:32 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 5E2F039E80E0; Wed, 1 Dec 2010 21:57:12 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6DQeQXlZzomT; Wed, 1 Dec 2010 21:57:11 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 9338239E80DC; Wed, 1 Dec 2010 21:57:11 -0800 (PST) Message-ID: <4CF73537.3050403@cs.ucla.edu> Date: Wed, 01 Dec 2010 21:57:11 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: Chen Guo Subject: [PATCH] sort: fix bug on 64-bit hosts with at least 32768 processors References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87vd3fuc1x.fsf@meyering.net> <87lj4apk08.fsf@meyering.net> <4CF57916.6050103@cs.ucla.edu> <4CF5E825.1010807@cs.ucla.edu> In-Reply-To: <4CF5E825.1010807@cs.ucla.edu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: Jim Meyering , =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 11/30/2010 10:16 PM, Paul Eggert wrote: > Invoke MAX_MERGE(total, level) with level == 15. > 2 << level yields 65536, and 65536 * 65536 overflows to zero. I managed to reproduce this bug on a (faked) host with 32768 processors, using a command like this: seq 1000000000 | sort --parallel=32768 -S 10G The result was a floating point exception (actually, a division by zero) and 'sort' crashed. However, the bug is timing dependent and is very hard to reproduce. I tried many more times to reproduce it, and they all failed. This proved to my satisfaction that it is a real bug, though, so I pushed the following patch. >From 1561c2b228d93a049e527824e14ad4fe8c256b52 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 1 Dec 2010 21:50:00 -0800 Subject: [PATCH] sort: fix bug on 64-bit hosts with at least 32768 processors * src/sort.c (MAX_MERGE): Avoid integer overflow when on a machine with (say) 32-bit int and 64-bit size_t and when level == 15. Without this fix, on such a machine with 32768 or more processors, the level computation could overflow on large input, and this would result in division by zero. --- src/sort.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/src/sort.c b/src/sort.c index 1aa1eb4..5c368cd 100644 --- a/src/sort.c +++ b/src/sort.c @@ -107,7 +107,7 @@ struct rlimit { size_t rlim_cur; }; /* Maximum number of lines to merge every time a NODE is taken from the MERGE_QUEUE. Node is at LEVEL in the binary merge tree, and is responsible for merging TOTAL lines. */ -#define MAX_MERGE(total, level) ((total) / ((2 << level) * (2 << level)) + 1) +#define MAX_MERGE(total, level) (((total) >> (2 * ((level) + 1))) + 1) /* Heuristic value for the number of lines for which it is worth creating a subthread, during an internal merge sort, on a machine -- 1.7.2 From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 02 05:17:19 2010 Received: (at 7489) by debbugs.gnu.org; 2 Dec 2010 10:17:19 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PO6E3-0007oE-0z for submit@debbugs.gnu.org; Thu, 02 Dec 2010 05:17:19 -0500 Received: from mail-ww0-f46.google.com ([74.125.82.46]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PO6E1-0007o1-D9 for 7489@debbugs.gnu.org; Thu, 02 Dec 2010 05:17:17 -0500 Received: by wwj40 with SMTP id 40so1425245wwj.15 for <7489@debbugs.gnu.org>; Thu, 02 Dec 2010 02:22:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Ru0+0Jrch/hhMPFPGPx43CI0Gd/s2BWkwxO0DLFrCss=; b=gM8QmmDiYk+LNrjtexC62OHd47B965Cm1lg3Y1h1+LIe2V5rwzncbeCUoglT1t/nvc WeGFHauFWGN3Nv7jiEjKw+x5h8vddT6cINWpu/4plYYz3JodQui43wS4lLl0lzhfUMt5 XccD7Q95GJPRxyTEVeAlKDKfTs3IoJkueJT+I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=gxQy7gu0wNkFUAOZuQl/RylZ6R5iEsHCnvhfOjCnFqgQTO1N4r4VR2GFdTTl2tW+7C XGo9i3hFz0Vue92fxVFBzYTwDV976AYY0Kw2VlhJm4fjdNxZzq5nUFUfDlptBMIH6fgD ZKlqg9rlAGU0RylgiN86L2sA7v2/tdZj35thk= MIME-Version: 1.0 Received: by 10.216.46.200 with SMTP id r50mr3431807web.45.1291285378162; Thu, 02 Dec 2010 02:22:58 -0800 (PST) Received: by 10.216.240.196 with HTTP; Thu, 2 Dec 2010 02:22:58 -0800 (PST) In-Reply-To: <4CF3FC08.3020705@cs.ucla.edu> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> Date: Thu, 2 Dec 2010 02:22:58 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 7489 Cc: DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.6 (---) Hi Professor Eggert, On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote: > =A0(for i in $(seq 12); do read line; echo $i; sleep .1; done > =A0cat > /dev/null) < fifo & > =A0(ulimit -t 1; ./sort in > fifo \ > =A0|| echo killed via $(env kill -l $(expr $? - 128))) I ran this 10 times or so on an i7 and couldn't trigger anything. Is seq 12 supposed to vary depending on the number of cores? From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 02 12:43:22 2010 Received: (at 7489) by debbugs.gnu.org; 2 Dec 2010 17:43:22 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PODBi-0001TR-Fw for submit@debbugs.gnu.org; Thu, 02 Dec 2010 12:43:22 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PODBg-0001TF-J2 for 7489@debbugs.gnu.org; Thu, 02 Dec 2010 12:43:21 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 51A2939E80F2; Thu, 2 Dec 2010 09:49:02 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iKpyC2eanaXT; Thu, 2 Dec 2010 09:49:01 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 935EE39E80E1; Thu, 2 Dec 2010 09:49:01 -0800 (PST) Message-ID: <4CF7DC08.30002@cs.ucla.edu> Date: Thu, 02 Dec 2010 09:48:56 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 12/02/10 02:22, Chen Guo wrote: > On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote: >> (for i in $(seq 12); do read line; echo $i; sleep .1; done >> cat > /dev/null) < fifo & >> (ulimit -t 1; ./sort in > fifo \ >> || echo killed via $(env kill -l $(expr $? - 128))) > > I ran this 10 times or so on an i7 and couldn't trigger anything. Is > seq 12 supposed to vary depending on the number of cores? I imagine it does. It's timing-dependent; I can't always reproduce it on my test host (Intel Xeon E5620 2.4 GHz). I just now realized that the above doesn't say what "in" is; it's the output of "seq 100000". Also, it may help to use an explicit --parallel=2 or whatever. What happens if you do the following with the latest git version (savannah's back up, by the way)? cd tests && make check TESTS=misc/sort-spinlock-abuse This gets an XFAIL fairly reliably on my test host. From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 02 16:37:49 2010 Received: (at 7489) by debbugs.gnu.org; 2 Dec 2010 21:37:49 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POGqa-0000Rc-Rm for submit@debbugs.gnu.org; Thu, 02 Dec 2010 16:37:49 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POGqX-0000RF-Nm for 7489@debbugs.gnu.org; Thu, 02 Dec 2010 16:37:46 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id ED5E860171; Thu, 2 Dec 2010 22:42:57 +0100 (CET) From: Jim Meyering To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: (Chen Guo's message of "Thu, 2 Dec 2010 02:22:58 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> Date: Thu, 02 Dec 2010 22:42:57 +0100 Message-ID: <87r5dzc0m6.fsf@meyering.net> Lines: 45 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 7489 Cc: Paul Eggert , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.6 (-----) Chen Guo wrote: > Hi Professor Eggert, > On Mon, Nov 29, 2010 at 11:16 AM, Paul Eggert wrote: >> =C2=A0(for i in $(seq 12); do read line; echo $i; sleep .1; done >> =C2=A0cat > /dev/null) < fifo & >> =C2=A0(ulimit -t 1; ./sort in > fifo \ >> =C2=A0|| echo killed via $(env kill -l $(expr $? - 128))) > > I ran this 10 times or so on an i7 and couldn't trigger anything. Is > seq 12 supposed to vary depending on the number of cores? Hi Chen, Here's a stand-alone command-line test using the sort in your PATH: rm -f in fifo seq 100000 > in mkfifo fifo (for i in $(seq 12); do read line; echo $i; sleep .1; done cat > /dev/null) < fifo & (ulimit -t 1; sort in > fifo \ || echo killed via $(env kill -l $(expr $? - 128))) The 12 x 0.1-second sleeps are to ensure that the busy-waiting sort will accumulate more than 1.0 seconds of CPU time, and thus surpass the CPU time limit imposed by "ulimit -t 1". With more processes, then you may use a number smaller than 12. Running on a 6-core i7, I see the "killed via XCPU" message anywhere between the "1" and "4" output lines. E.g., 1 2 killed via XCPU $ 3 = : 4 5 6 7 8 9 10 11 12 From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 03 15:13:04 2010 Received: (at 7489) by debbugs.gnu.org; 3 Dec 2010 20:13:04 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POc07-0001FV-QR for submit@debbugs.gnu.org; Fri, 03 Dec 2010 15:13:03 -0500 Received: from mail-ww0-f46.google.com ([74.125.82.46]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POc06-0001F2-ID for 7489@debbugs.gnu.org; Fri, 03 Dec 2010 15:13:02 -0500 Received: by wwj40 with SMTP id 40so3324432wwj.15 for <7489@debbugs.gnu.org>; Fri, 03 Dec 2010 12:18:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=HpaW+nET/napdJEwrfpDoZEPKg98C+vH90u/he6Vu+M=; b=Kj1hzzwd07+PUEppad29NesmjYqcuj5zPPKLNmQ8a41bGyNcQjpnjDhaD3iLAoBCQY TAozaZBTsFqv30utkhfR2WGZuR2jCR8yRpFx05hQFRzy+bchlIZOSSMKyVlZ+OWzfE7G KR8j1lUT0XDYZc5xqLCKvEqIY9jR07dAOcxKU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Qmku+rYEnWbhM8ZklYMX6ps372JggYl+AC0dCt5YXsRiED61GaIJPtmQR999yFRPe9 Wz9BO9x8kgF1Hc221wQaUxEn4n/hK0NPrsf5m2rYTpeQ7JEJTft43CvPv3yMwbV6hl0k nZKrL1yW2QyWaD1EPccD/3VdalrgEqXfRQShM= MIME-Version: 1.0 Received: by 10.216.11.203 with SMTP id 53mr2620625wex.15.1291407526646; Fri, 03 Dec 2010 12:18:46 -0800 (PST) Received: by 10.216.240.196 with HTTP; Fri, 3 Dec 2010 12:18:46 -0800 (PST) In-Reply-To: <87r5dzc0m6.fsf@meyering.net> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> Date: Fri, 3 Dec 2010 12:18:46 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Jim Meyering Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 7489 Cc: Paul Eggert , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.6 (---) Thanks Jim, that helped a lot. I'll try out Professor Eggert's suggestion, of switching to mutexes only at the top level merge. Of the following approaches, which would you guys consider better practice? 1) void pointer, cast as either mutex or spinlock in lock function 2) union of mutex and spinlock, chosen in lock function The idea is basically, in the lock function if we see the node is MERGE_LEVEL_ROOT we'd lock the mutex either via cast or union member, and likewise for other merge levels for spinlock. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 03 16:05:18 2010 Received: (at 7489) by debbugs.gnu.org; 3 Dec 2010 21:05:18 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POcog-00038l-3t for submit@debbugs.gnu.org; Fri, 03 Dec 2010 16:05:18 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POcod-00038Y-39 for 7489@debbugs.gnu.org; Fri, 03 Dec 2010 16:05:16 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 01E3B39E80F2; Fri, 3 Dec 2010 13:11:00 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DvjuW1MPjco8; Fri, 3 Dec 2010 13:10:59 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 2B12E39E80DC; Fri, 3 Dec 2010 13:10:59 -0800 (PST) Message-ID: <4CF95CE2.8090602@cs.ucla.edu> Date: Fri, 03 Dec 2010 13:10:58 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, Jim Meyering , 7489@debbugs.gnu.org, DJ Lucas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 12/03/10 12:18, Chen Guo wrote: > I'll try out Professor Eggert's suggestion, of switching to mutexes > only at the top level merge. I'm having second thoughts about that. Yes, that'll prevent the top-level merge (which is generating the actual output) from chewing up CPU time. But it already has that property, since it's outputting to stdout. And if second-level merges use mutexes, then the third-level merges will spin. We'll have to use mutexes at all levels, unless I'm missing something. How about this idea instead. Keep using spin locks everywhere, but have the top-level merge output to memory, as the lower merges already do. The main thread can wait for the top level merge to finish and then generate output. That way, none of the merges will have to wait on an output pipe (or a slow output file). Either option (either switch to mutexes everywhere, or have the top-level merge go to memory) should work. Perhaps we should try both and benchmark them. From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 04 03:37:39 2010 Received: (at 7489) by debbugs.gnu.org; 4 Dec 2010 08:37:39 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POncg-0001Ie-8Z for submit@debbugs.gnu.org; Sat, 04 Dec 2010 03:37:39 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1POncd-0001IU-IK for 7489@debbugs.gnu.org; Sat, 04 Dec 2010 03:37:37 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 7EF7539E80F9; Sat, 4 Dec 2010 00:43:21 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I2wQNLyYn6ol; Sat, 4 Dec 2010 00:43:19 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 23FDF39E80E0; Sat, 4 Dec 2010 00:43:18 -0800 (PST) Message-ID: <4CF9FF26.7010204@cs.ucla.edu> Date: Sat, 04 Dec 2010 00:43:18 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87pqtnubuc.fsf@meyering.net> <4CF42D4B.7020904@cs.ucla.edu> In-Reply-To: <4CF42D4B.7020904@cs.ucla.edu> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= , Jim Meyering , 7489@debbugs.gnu.org, DJ Lucas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 11/29/2010 02:46 PM, Paul Eggert wrote: > My current guess, by the way, > is that it's not a bug that can be triggered: it's merely > useless code that is harmless and can safely be removed. I removed it as part of the following series of cleanup patches. These are intended merely to refactor the code and simplify it a bit, to make it easier to fix the CPU spinlock bug. Please feel free to undo anything that looks at all questionable. While we're on that subject, I assume that if we want to fix it by doing the last pass in memory rather than to stdout, we need to allocate a bit more memory at the start, basically N * (1 + log2(N)) cells instead of N (log2(N)) cells. This is enough to hold the extra pass of output. If this isn't right, Chen, please let me know. >From 6c2452cd9f58c30e2e451c57d49384a2df785e3e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Dec 2010 14:27:02 -0800 Subject: [PATCH 1/7] sort: Clarify comments * src/sort.c: Improve comments a bit. --- src/sort.c | 81 +++++++++++++++++++++++++++++++++++++++++++---------------- 1 files changed, 59 insertions(+), 22 deletions(-) diff --git a/src/sort.c b/src/sort.c index 5c368cd..3d89a32 100644 --- a/src/sort.c +++ b/src/sort.c @@ -105,7 +105,7 @@ struct rlimit { size_t rlim_cur; }; #endif /* Maximum number of lines to merge every time a NODE is taken from - the MERGE_QUEUE. Node is at LEVEL in the binary merge tree, + the merge queue. Node is at LEVEL in the binary merge tree, and is responsible for merging TOTAL lines. */ #define MAX_MERGE(total, level) (((total) >> (2 * ((level) + 1))) + 1) @@ -196,6 +196,7 @@ struct buffer bool eof; /* An EOF has been read. */ }; +/* Sort key. */ struct keyfield { size_t sword; /* Zero-origin 'word' to start at. */ @@ -3072,7 +3073,9 @@ mergelines (struct line *restrict t, size_t nlines, } /* Sort the array LINES with NLINES members, using TEMP for temporary space. - NLINES must be at least 2. + Do this all within one thread. NLINES must be at least 2. + If TO_TEMP, put the sorted output into TEMP, and TEMP is as large as LINES. + Otherwise the sort is in-place and TEMP is half-sized. The input and output arrays are in reverse order, and LINES and TEMP point just past the end of their respective arrays. @@ -3134,9 +3137,7 @@ sequential_sort (struct line *restrict lines, size_t nlines, } } -/* Compare two NODEs for priority. The NODE with the higher (numerically - lower) level has priority. If tie, the NODE with the most remaining - lines has priority. */ +/* Compare two merge nodes A and B for priority. */ static int compare_nodes (void const *a, void const *b) @@ -3149,9 +3150,9 @@ compare_nodes (void const *a, void const *b) } /* Lock a merge tree NODE. - Note spin locks were seen to perform better than mutexes - as long as the number of threads is limited to the - number of processors. */ + Spin locks were seen to perform better than mutexes when the number + of threads is limited to the number of processors, assuming 'sort' + never waits when writing to stdout. */ static inline void lock_node (struct merge_node *node) @@ -3190,8 +3191,8 @@ queue_init (struct merge_node_queue *queue, size_t reserve) pthread_cond_init (&queue->cond, NULL); } -/* Insert NODE into priority QUEUE. Assume caller either holds lock on NODE - or does not need to lock NODE. */ +/* Insert NODE into QUEUE. The caller either holds a lock on NODE, or + does not need to lock NODE. */ static void queue_insert (struct merge_node_queue *queue, struct merge_node *node) @@ -3203,7 +3204,7 @@ queue_insert (struct merge_node_queue *queue, struct merge_node *node) pthread_cond_signal (&queue->cond); } -/* Pop NODE off priority QUEUE. Guarantee a non-null, spinlocked NODE. */ +/* Pop the top node off the priority QUEUE, lock the node, return it. */ static struct merge_node * queue_pop (struct merge_node_queue *queue) @@ -3218,10 +3219,12 @@ queue_pop (struct merge_node_queue *queue) return node; } -/* If UNQIUE is set, checks to make sure line isn't a duplicate before - outputting. If UNIQUE is not set, output the passed in line. Note that - this function does not actually save the line, nor any key information, - thus is only appropriate for internal sort. */ +/* Output LINE to TFP, unless -u is specified and the line compares + equal to the previous line. TEMP_OUTPUT is the name of TFP, or + is null if TFP is standard output. + + This function does not save the line for comparison later, so it is + appropriate only for internal sort. */ static void write_unique (struct line const *line, FILE *tfp, char const *temp_output) @@ -3238,7 +3241,11 @@ write_unique (struct line const *line, FILE *tfp, char const *temp_output) } /* Merge the lines currently available to a NODE in the binary - merge tree, up to a maximum specified by MAX_MERGE. */ + merge tree. Merge a number of lines appropriate for this merge + level, assuming TOTAL_LINES is the total number of lines. + + If merging at the top level, send output to TFP. TEMP_OUTPUT is + the name of TFP, or is null if TFP is standard output. */ static size_t mergelines_node (struct merge_node *restrict node, size_t total_lines, @@ -3305,7 +3312,9 @@ mergelines_node (struct merge_node *restrict node, size_t total_lines, return merged_lo + merged_hi; } -/* Insert NODE into QUEUE if it passes insertion checks. */ +/* Insert NODE into QUEUE if it is not already queued, and if one of + NODE's children has available lines and the other either has + available lines or has exhausted its lines. */ static void check_insert (struct merge_node *node, struct merge_node_queue *queue) @@ -3325,7 +3334,7 @@ check_insert (struct merge_node *node, struct merge_node_queue *queue) } } -/* Update parent merge tree NODE. */ +/* Insert NODE's parent into QUEUE if the parent can now be worked on. */ static void update_parent (struct merge_node *node, size_t merged, @@ -3346,8 +3355,11 @@ update_parent (struct merge_node *node, size_t merged, } } -/* Repeatedly pop QUEUE for a NODE with lines to merge, and merge at least - some of those lines, until the MERGE_END node is popped. */ +/* Repeatedly pop QUEUE for a node with lines to merge, and merge at least + some of those lines, until the MERGE_END node is popped. + TOTAL_LINES is the total number of lines. If merging at the top + level, send output to TFP. TEMP_OUTPUT is the name of TFP, or is + null if TFP is standard output. */ static void merge_loop (struct merge_node_queue *queue, @@ -3384,13 +3396,33 @@ static void sortlines (struct line *restrict, struct line *restrict, struct thread_args { + /* Source, i.e., the array of lines to sort. This points just past + the end of the array. */ struct line *lines; + + /* Destination, i.e., the array of lines to store result into. This + also points just past the end of the array. */ struct line *dest; + + /* Number of threads to use. If 0 or 1, sort single-threaded. */ unsigned long int nthreads; + + /* Number of lines in LINES and DEST. */ size_t const total_lines; + + /* Parent node; it will merge this node's output to the output + of this node's sibling. */ struct merge_node *const parent; + + /* True if this node is sorting the lower half of the parent's work. */ bool lo_child; + + /* The priority queue controlling available work for the entire + internal sort. */ struct merge_node_queue *const merge_queue; + + /* If at the top level, the file to output to, and the file's name. + If the file is standard output, the file's name is null. */ FILE *tfp; char const *output_temp; }; @@ -3407,7 +3439,10 @@ sortlines_thread (void *data) return NULL; } -/* There are three phases to the algorithm: node creation, sequential sort, +/* Sort lines, possibly in parallel. The arguments are as in struct + thread_args above. + + The algorithm has three phases: node creation, sequential sort, and binary merge. During node creation, sortlines recursively visits each node in the @@ -3683,7 +3718,7 @@ merge (struct sortfile *files, size_t ntemps, size_t nfiles, } } -/* Sort NFILES FILES onto OUTPUT_FILE. */ +/* Sort NFILES FILES onto OUTPUT_FILE. Use at most NTHREADS threads. */ static void sort (char * const *files, size_t nfiles, char const *output_file, @@ -3967,6 +4002,8 @@ set_ordering (char const *s, struct keyfield *key, enum blanktype blanktype) return (char *) s; } +/* Initialize KEY. */ + static struct keyfield * key_init (struct keyfield *key) { -- 1.7.2 >From 5e87272634c4f5459808ec5d5c15d1710a2a5bbe Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Dec 2010 14:39:23 -0800 Subject: [PATCH 2/7] sort: tune struct_merge_node slightly * src/sort.c (struct merge_node): 'lock' is now the actual lock, not a pointer to the lock; there's no need for indirection here. Make 'level' unsigned int instead of size_t, since it is a bit-shift count; also, move it next to a bool so that it's more likely to take less space. All uses changed. (sortlines, sort): Spell out initialization instead of using an initializer. This makes the initializer a bit easier to understand, and avoids unnecessary stores into the spin lock. --- src/sort.c | 40 +++++++++++++++++++++++++--------------- 1 files changed, 25 insertions(+), 15 deletions(-) diff --git a/src/sort.c b/src/sort.c index 3d89a32..141af52 100644 --- a/src/sort.c +++ b/src/sort.c @@ -238,10 +238,10 @@ struct merge_node struct line **dest; /* Pointer to destination of merge. */ size_t nlo; /* Total Lines remaining from LO. */ size_t nhi; /* Total lines remaining from HI. */ - size_t level; /* Level in merge tree. */ struct merge_node *parent; /* Parent node. */ + unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ - pthread_spinlock_t *lock; /* Lock for node operations. */ + pthread_spinlock_t lock; /* Lock for node operations. */ }; /* Priority queue of merge nodes. */ @@ -3157,7 +3157,7 @@ compare_nodes (void const *a, void const *b) static inline void lock_node (struct merge_node *node) { - pthread_spin_lock (node->lock); + pthread_spin_lock (&node->lock); } /* Unlock a merge tree NODE. */ @@ -3165,7 +3165,7 @@ lock_node (struct merge_node *node) static inline void unlock_node (struct merge_node *node) { - pthread_spin_unlock (node->lock); + pthread_spin_unlock (&node->lock); } /* Destroy merge QUEUE. */ @@ -3477,10 +3477,17 @@ sortlines (struct line *restrict lines, struct line *restrict dest, struct line *lo = dest - total_lines; struct line *hi = lo - nlo; struct line **parent_end = (lo_child)? &parent->end_lo : &parent->end_hi; - pthread_spinlock_t lock; - pthread_spin_init (&lock, PTHREAD_PROCESS_PRIVATE); - struct merge_node node = {lo, hi, lo, hi, parent_end, nlo, nhi, - parent->level + 1, parent, false, &lock}; + + struct merge_node node; + node.lo = node.end_lo = lo; + node.hi = node.end_hi = hi; + node.dest = parent_end; + node.nlo = nlo; + node.nhi = nhi; + node.parent = parent; + node.level = parent->level + 1; + node.queued = false; + pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); /* Calculate thread arguments. */ unsigned long int lo_threads = nthreads / 2; @@ -3516,7 +3523,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, merge_loop (merge_queue, total_lines, tfp, temp_output); } - pthread_spin_destroy (&lock); + pthread_spin_destroy (&node.lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3793,16 +3800,19 @@ sort (char * const *files, size_t nfiles, char const *output_file, struct merge_node_queue merge_queue; queue_init (&merge_queue, 2 * nthreads); - pthread_spinlock_t lock; - pthread_spin_init (&lock, PTHREAD_PROCESS_PRIVATE); - struct merge_node node = - {NULL, NULL, NULL, NULL, NULL, buf.nlines, - buf.nlines, MERGE_END, NULL, false, &lock}; + struct merge_node node; + node.lo = node.hi = node.end_lo = node.end_hi = NULL; + node.dest = NULL; + node.nlo = node.nhi = buf.nlines; + node.parent = NULL; + node.level = MERGE_END; + node.queued = false; + pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); sortlines (line, line, nthreads, buf.nlines, &node, true, &merge_queue, tfp, temp_output); queue_destroy (&merge_queue); - pthread_spin_destroy (&lock); + pthread_spin_destroy (&node.lock); } else write_unique (line - 1, tfp, temp_output); -- 1.7.2 >From a7beff9385827f47b97e089cfe15f2761c9a1efd Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Dec 2010 15:01:21 -0800 Subject: [PATCH 3/7] sort: put queue arg first * src/sort.c (queue_check_insert, queue_check_insert_parent): Make the queue arg first, for consistency with other functions such as queue_insert that put the queue arg first. Rename from check_insert and update_parent, respectively. All callers changed. --- src/sort.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/sort.c b/src/sort.c index 141af52..af4b20c 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3312,12 +3312,12 @@ mergelines_node (struct merge_node *restrict node, size_t total_lines, return merged_lo + merged_hi; } -/* Insert NODE into QUEUE if it is not already queued, and if one of +/* Into QUEUE, insert NODE if it is not already queued, and if one of NODE's children has available lines and the other either has available lines or has exhausted its lines. */ static void -check_insert (struct merge_node *node, struct merge_node_queue *queue) +queue_check_insert (struct merge_node_queue *queue, struct merge_node *node) { size_t lo_avail = node->lo - node->end_lo; size_t hi_avail = node->hi - node->end_hi; @@ -3334,17 +3334,17 @@ check_insert (struct merge_node *node, struct merge_node_queue *queue) } } -/* Insert NODE's parent into QUEUE if the parent can now be worked on. */ +/* Into QUEUE, insert NODE's parent if the parent can now be worked on. */ static void -update_parent (struct merge_node *node, size_t merged, - struct merge_node_queue *queue) +queue_check_insert_parent (struct merge_node_queue *queue, + struct merge_node *node, size_t merged) { if (node->level > MERGE_ROOT) { lock_node (node->parent); *node->dest -= merged; - check_insert (node->parent, queue); + queue_check_insert (queue, node->parent); unlock_node (node->parent); } else if (node->nlo + node->nhi == 0) @@ -3378,8 +3378,8 @@ merge_loop (struct merge_node_queue *queue, } size_t merged_lines = mergelines_node (node, total_lines, tfp, temp_output); - check_insert (node, queue); - update_parent (node, merged_lines, queue); + queue_check_insert (queue, node); + queue_check_insert_parent (queue, node, merged_lines); unlock_node (node); } -- 1.7.2 >From 66d448d0757c3ad3cd11f20142aeeefbdcdd354e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Dec 2010 15:04:31 -0800 Subject: [PATCH 4/7] sort: simplify write_unique * src/sort.c (write_unique): Simplify slightly so that there is just one call to write_line, not two. --- src/sort.c | 9 +++++---- 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/src/sort.c b/src/sort.c index af4b20c..1faf171 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3231,13 +3231,14 @@ write_unique (struct line const *line, FILE *tfp, char const *temp_output) { static struct line saved; - if (!unique) - write_line (line, tfp, temp_output); - else if (!saved.text || compare (line, &saved)) + if (unique) { + if (saved.text && ! compare (line, &saved)) + return; saved = *line; - write_line (line, tfp, temp_output); } + + write_line (line, tfp, temp_output); } /* Merge the lines currently available to a NODE in the binary -- 1.7.2 >From 809431a68369189a3bc4be2388fdb0f20922c4e0 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Dec 2010 15:11:46 -0800 Subject: [PATCH 5/7] sort: fix problems with merge node dest pointer * src/sort.c (mergelines_node): Return void, not size_t. All callers changed. Change *node->dest here, not in caller. Do not change node->dest: it's not needed and could cause problems on (mostly theoretical) hosts that do not allow adding integers to null pointers. (queue_check_insert_parent): Omit MERGED parameter; no longer needed. All callers changed. --- src/sort.c | 13 +++++-------- 1 files changed, 5 insertions(+), 8 deletions(-) diff --git a/src/sort.c b/src/sort.c index 1faf171..f7296d6 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3248,7 +3248,7 @@ write_unique (struct line const *line, FILE *tfp, char const *temp_output) If merging at the top level, send output to TFP. TEMP_OUTPUT is the name of TFP, or is null if TFP is standard output. */ -static size_t +static void mergelines_node (struct merge_node *restrict node, size_t total_lines, FILE *tfp, char const *temp_output) { @@ -3277,6 +3277,7 @@ mergelines_node (struct merge_node *restrict node, size_t total_lines, else if (node->nlo == merged_lo) while (node->hi != node->end_hi && to_merge--) *--dest = *--node->hi; + *node->dest = dest; } else { @@ -3302,7 +3303,6 @@ mergelines_node (struct merge_node *restrict node, size_t total_lines, while (node->hi != node->end_hi && to_merge--) write_unique (--node->hi, tfp, temp_output); } - node->dest -= lo_orig - node->lo + hi_orig - node->hi; } /* Update NODE. */ @@ -3310,7 +3310,6 @@ mergelines_node (struct merge_node *restrict node, size_t total_lines, merged_hi = hi_orig - node->hi; node->nlo -= merged_lo; node->nhi -= merged_hi; - return merged_lo + merged_hi; } /* Into QUEUE, insert NODE if it is not already queued, and if one of @@ -3339,12 +3338,11 @@ queue_check_insert (struct merge_node_queue *queue, struct merge_node *node) static void queue_check_insert_parent (struct merge_node_queue *queue, - struct merge_node *node, size_t merged) + struct merge_node *node) { if (node->level > MERGE_ROOT) { lock_node (node->parent); - *node->dest -= merged; queue_check_insert (queue, node->parent); unlock_node (node->parent); } @@ -3377,10 +3375,9 @@ merge_loop (struct merge_node_queue *queue, queue_insert (queue, node); break; } - size_t merged_lines = mergelines_node (node, total_lines, tfp, - temp_output); + mergelines_node (node, total_lines, tfp, temp_output); queue_check_insert (queue, node); - queue_check_insert_parent (queue, node, merged_lines); + queue_check_insert_parent (queue, node); unlock_node (node); } -- 1.7.2 >From 68bdf2752625e74a4783b441f69e79a7bd65ab5c Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Dec 2010 15:23:43 -0800 Subject: [PATCH 6/7] sort: clarify queue_check_insert * src/sort.c (queue_check_insert): Clarify body a bit, and remove no-longer-needed comment. --- src/sort.c | 16 +++++----------- 1 files changed, 5 insertions(+), 11 deletions(-) diff --git a/src/sort.c b/src/sort.c index f7296d6..3ed7c5b 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3319,18 +3319,12 @@ mergelines_node (struct merge_node *restrict node, size_t total_lines, static void queue_check_insert (struct merge_node_queue *queue, struct merge_node *node) { - size_t lo_avail = node->lo - node->end_lo; - size_t hi_avail = node->hi - node->end_hi; - - /* Conditions for insertion: - 1. NODE is not already in heap. - 2. NODE has available lines from both it's children, OR one child has - available lines, but the other has exhausted all its lines. */ - if ((!node->queued) - && ((lo_avail && (hi_avail || !(node->nhi))) - || (hi_avail && !(node->nlo)))) + if (! node->queued) { - queue_insert (queue, node); + bool lo_avail = (node->lo - node->end_lo) != 0; + bool hi_avail = (node->hi - node->end_hi) != 0; + if (lo_avail ? hi_avail || ! node->nhi : hi_avail && ! node->nlo) + queue_insert (queue, node); } } -- 1.7.2 >From 49c7f475ddcba334d76ef364d65882802c26163e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Fri, 3 Dec 2010 15:39:50 -0800 Subject: [PATCH 7/7] sort: merge_queue -> queue * src/sort.c (struct thread_args, sortlines_thread, sortlines, sort): Rename "merge_queue" to "queue", for consistency with other functions that just use the name "queue" for these things. --- src/sort.c | 22 +++++++++++----------- 1 files changed, 11 insertions(+), 11 deletions(-) diff --git a/src/sort.c b/src/sort.c index 3ed7c5b..a4c2cbf 100644 --- a/src/sort.c +++ b/src/sort.c @@ -3411,7 +3411,7 @@ struct thread_args /* The priority queue controlling available work for the entire internal sort. */ - struct merge_node_queue *const merge_queue; + struct merge_node_queue *const queue; /* If at the top level, the file to output to, and the file's name. If the file is standard output, the file's name is null. */ @@ -3426,7 +3426,7 @@ sortlines_thread (void *data) { struct thread_args const *args = data; sortlines (args->lines, args->dest, args->nthreads, args->total_lines, - args->parent, args->lo_child, args->merge_queue, + args->parent, args->lo_child, args->queue, args->tfp, args->output_temp); return NULL; } @@ -3459,7 +3459,7 @@ static void sortlines (struct line *restrict lines, struct line *restrict dest, unsigned long int nthreads, size_t total_lines, struct merge_node *parent, bool lo_child, - struct merge_node_queue *merge_queue, + struct merge_node_queue *queue, FILE *tfp, char const *temp_output) { /* Create merge tree NODE. */ @@ -3486,13 +3486,13 @@ sortlines (struct line *restrict lines, struct line *restrict dest, unsigned long int hi_threads = nthreads - lo_threads; pthread_t thread; struct thread_args args = {lines, lo, lo_threads, total_lines, &node, - true, merge_queue, tfp, temp_output}; + true, queue, tfp, temp_output}; if (nthreads > 1 && SUBTHREAD_LINES_HEURISTIC <= nlines && pthread_create (&thread, NULL, sortlines_thread, &args) == 0) { sortlines (lines - nlo, hi, hi_threads, total_lines, &node, false, - merge_queue, tfp, temp_output); + queue, tfp, temp_output); pthread_join (thread, NULL); } else @@ -3511,8 +3511,8 @@ sortlines (struct line *restrict lines, struct line *restrict dest, node.end_lo = lines - nlo; node.end_hi = lines - nlo - nhi; - queue_insert (merge_queue, &node); - merge_loop (merge_queue, total_lines, tfp, temp_output); + queue_insert (queue, &node); + merge_loop (queue, total_lines, tfp, temp_output); } pthread_spin_destroy (&node.lock); @@ -3789,8 +3789,8 @@ sort (char * const *files, size_t nfiles, char const *output_file, } if (1 < buf.nlines) { - struct merge_node_queue merge_queue; - queue_init (&merge_queue, 2 * nthreads); + struct merge_node_queue queue; + queue_init (&queue, 2 * nthreads); struct merge_node node; node.lo = node.hi = node.end_lo = node.end_hi = NULL; @@ -3802,8 +3802,8 @@ sort (char * const *files, size_t nfiles, char const *output_file, pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); sortlines (line, line, nthreads, buf.nlines, &node, true, - &merge_queue, tfp, temp_output); - queue_destroy (&merge_queue); + &queue, tfp, temp_output); + queue_destroy (&queue); pthread_spin_destroy (&node.lock); } else -- 1.7.2 From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 05 10:31:35 2010 Received: (at 7489) by debbugs.gnu.org; 5 Dec 2010 15:31:37 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPGYp-0002NL-4r for submit@debbugs.gnu.org; Sun, 05 Dec 2010 10:31:35 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPGYl-0002N4-E7 for 7489@debbugs.gnu.org; Sun, 05 Dec 2010 10:31:33 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id F3456602B5; Sun, 5 Dec 2010 12:21:01 +0100 (CET) From: Jim Meyering To: Paul Eggert Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: <4CF9FF26.7010204@cs.ucla.edu> (Paul Eggert's message of "Sat, 04 Dec 2010 00:43:18 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87pqtnubuc.fsf@meyering.net> <4CF42D4B.7020904@cs.ucla.edu> <4CF9FF26.7010204@cs.ucla.edu> Date: Sun, 05 Dec 2010 12:21:01 +0100 Message-ID: <87sjyc5uua.fsf@meyering.net> Lines: 65 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?utf-8?Q?P=C3=A1draig?= Brady , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.6 (-----) Paul Eggert wrote: > On 11/29/2010 02:46 PM, Paul Eggert wrote: >> My current guess, by the way, >> is that it's not a bug that can be triggered: it's merely >> useless code that is harmless and can safely be removed. > > I removed it as part of the following series of cleanup > patches. These are intended merely to refactor the code > and simplify it a bit, to make it easier to fix the CPU > spinlock bug. Please feel free to undo anything that > looks at all questionable. Hi Paul, Thanks for all the clean-up. I have no idea if the following is as a result of your changes, since the segfault failure has been hard to reproduce. It is from the sort-compress test, and has happened so far only twice during "make -j9 check" on a quad-core F14 system: Core was generated by `sort --compress-program=./dzip -S 1k in'. Program terminated with signal 11, Segmentation fault. #0 queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322 3322 if (! node->queued) (gdb) p node $1 = (struct merge_node *) 0x5 (gdb) bt #0 queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322 #1 0x00000000004055a9 in queue_check_insert_parent ( lines=, dest=, nthreads=140173261458952, total_lines=10, parent=, lo_child=, queue=0x7fffdbdc5620, tfp=0x1c2fb90, temp_output=0x1c2f72c "./sortpns55x") at sort.c:3340 #2 merge_loop (lines=, dest=, nthreads=140173261458952, total_lines=10, parent=, lo_child=, queue=0x7fffdbdc5620, tfp=0x1c2fb90, temp_output=0x1c2f72c "./sortpns55x") at sort.c:3374 #3 sortlines (lines=, dest=, nthreads=140173261458952, total_lines=10, parent=, lo_child=, queue=0x7fffdbdc5620, tfp=0x1c2fb90, temp_output=0x1c2f72c "./sortpns55x") at sort.c:3515 #4 0x00000000004059cb in sortlines_thread (data=) at sort.c:3428 #5 0x0000003f49806d5b in start_thread () from /lib64/libpthread-2.12.90.so #6 0x0000003f48ce4aad in clone () from /lib64/libc-2.12.90.so However, there is another failure that makes me suspicious: (also based on sort-compress): seq -w 200000 > exp && tac exp > in PATH=.:$PATH ./sort --compress-program=dzip -S 1k in > out That gets stuck in waitpid (from sort.c's reap), waiting for a dzip invocation that appears will never terminate. This is also on that same 4-core system, and is relatively easy to reproduce, so it should be easy to identify the offending change, but I'm out of time, now. The hang is also reproducible with just 2000 input lines, but then it doesn't arise as consistently. I'll note in passing that the spinlock CPU utilization problem is particularly noticeable when using --compress-program= because there is a lot more waiting. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 06 00:10:31 2010 Received: (at 7489) by debbugs.gnu.org; 6 Dec 2010 05:10:31 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPTLL-0003ID-02 for submit@debbugs.gnu.org; Mon, 06 Dec 2010 00:10:31 -0500 Received: from mail-wy0-f172.google.com ([74.125.82.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPTLJ-0003Hz-Ng for 7489@debbugs.gnu.org; Mon, 06 Dec 2010 00:10:30 -0500 Received: by wyf23 with SMTP id 23so10934714wyf.3 for <7489@debbugs.gnu.org>; Sun, 05 Dec 2010 21:16:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=RFkIro1QcmafqARqID8cHhXjeHxhxd1mxFtuN6iwjKM=; b=HSeUfJZE7MXVfzRyOxFHfoev/PvsyLtQ7WcP3g1ubtlM/1zDUKaYmLHF5q25tmdHZQ wiRzd0VCIAOsLiw7oLbvJXroG+T1EYhoezh20lwIADQNNzKzEsdPftLblayB1kgAjbCP 28J8q1zC0GigcN0QUoxCCx4V2OYyBX+i/BaME= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=N+HktPwRIZfSKBXRYY3IWGLdVOz+ErFlreXWCVZpoJsXUAPbxmoXeepzqlD3q2c6F8 EQ8xv4eSSvZP/Rz2c6+t1z34TeaZ2+wSilSIy2SDYxWNEirfwQRFK8hM8c1zqaZNUwXc 4CD6TL+osniBQywN/aZ0icbiLyUrAUKlk+Lus= MIME-Version: 1.0 Received: by 10.216.7.8 with SMTP id 8mr2384403weo.30.1291612580398; Sun, 05 Dec 2010 21:16:20 -0800 (PST) Received: by 10.216.240.196 with HTTP; Sun, 5 Dec 2010 21:16:20 -0800 (PST) In-Reply-To: <4CF95CE2.8090602@cs.ucla.edu> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> Date: Sun, 5 Dec 2010 21:16:20 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Paul Eggert Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, Jim Meyering , 7489@debbugs.gnu.org, DJ Lucas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.6 (---) Hi Professor Eggert, On Fri, Dec 3, 2010 at 1:10 PM, Paul Eggert wrote: > On 12/03/10 12:18, Chen Guo wrote: > Either option (either switch to mutexes everywhere, or have the top-level > merge go to memory) should work. =A0Perhaps we should try both and benchm= ark > them. Test machine is 4 core i7. The numbers I'm giving are averaged over 20 runs, given in seconds, and are of the form elapsed / user + system. spinlock: 1 thread: 3.354 / 3.349 2 threads: 1.960 / 3.812 4 threads: 1.366 / 5.085 mutex: 1 thread: 3.354 / 3.350 2 threads: 2.062 / 3.628 4 threads: 1.497 / 4.172 spin/ output after 1 thread: 3.519 / 3.517 2 threads: 2.098 / 3.996 4 threads: 1.488 / 5.347 It seems if we have to choose between mutex and output post-sort, mutex is the way to go. Mutex is faster in the single threaded case, while in multithreaded the elapsed time is negligibly different, the user time is much greater. With spinlocks only, the greater system time was justified (though some might disagree) by the lower elapsed time. With spinlock outputting post-sort, there is no more justification for the higher user time. Before saying anything else, I should note that for mutexes, on 4 threads 20% of the time there's a segfault on a seemingly innocuous line in queue_insert (): node->queued =3D true GDB shows that pointers all look normal, and I could not trigger this over 10 runs with valgrind (it seems valgrind is singlethreaded). If we do decide to go back to mutexes, I'll look into this issue more. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 06 01:55:44 2010 Received: (at 7489) by debbugs.gnu.org; 6 Dec 2010 06:55:44 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPUz9-0005Y3-SH for submit@debbugs.gnu.org; Mon, 06 Dec 2010 01:55:44 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPUz7-0005Xs-HM for 7489@debbugs.gnu.org; Mon, 06 Dec 2010 01:55:42 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id BDA1939E80FA; Sun, 5 Dec 2010 23:01:32 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MentTdKQbia7; Sun, 5 Dec 2010 23:01:32 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 17E0239E80DC; Sun, 5 Dec 2010 23:01:32 -0800 (PST) Message-ID: <4CFC8A47.9000006@cs.ucla.edu> Date: Sun, 05 Dec 2010 23:01:27 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6 MIME-Version: 1.0 To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, Jim Meyering , 7489@debbugs.gnu.org, DJ Lucas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 12/05/2010 09:16 PM, Chen Guo wrote: > Before saying anything else, I should note that for mutexes, on 4 > threads 20% of the time there's a segfault on a seemingly innocuous > line in queue_insert (): > node->queued = true It does sound like mutexes are the way to go, and that this bug needs to be fixed. I assume that this is the call to queue_insert from queue_check_insert_parent? What's the backtrace? (And what patch are you using, to get mutexes?) From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 06 03:19:37 2010 Received: (at 7489) by debbugs.gnu.org; 6 Dec 2010 08:19:37 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPWIK-0007O8-RP for submit@debbugs.gnu.org; Mon, 06 Dec 2010 03:19:37 -0500 Received: from mail-ww0-f46.google.com ([74.125.82.46]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPWIJ-0007Nv-8B for 7489@debbugs.gnu.org; Mon, 06 Dec 2010 03:19:36 -0500 Received: by wwj40 with SMTP id 40so5294510wwj.15 for <7489@debbugs.gnu.org>; Mon, 06 Dec 2010 00:25:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=YsM8zRrd7fHUx+q97ciYss0+bXauzuvKqXrcbzZYD4Q=; b=Df6RloyPRqjBmhMoTRBr64zGbSoeJeml34roSArxedhX7jV+Kr5EYbdd0lsIucr/28 vED8KlhBbiErcMLG7V7ZUbq2FHAxyhPS0/SotbrlC1jlysp1XwqP6nFNEFt3HNvvNT+j jrPXf1JVXA3Ju8FSbckzV4mG0NyaMYKT9Qgvs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=cT55sv23BBscVTnFq16Zk4Tkgyg2+BLjEl2VwZ02oMLrcoKPIHq1WcwHq65YUHtkJh BqoThdSSpuo1s3vksKjj2IJr93ZLQJJrV3CFtLpZlY6neDr1nwTXT/yJLo306tWvMcMf 3kwL1iTe67LUpeGqtFKLbhOYjpq4cyoWi+xi8= MIME-Version: 1.0 Received: by 10.216.11.203 with SMTP id 53mr5427903wex.15.1291623926376; Mon, 06 Dec 2010 00:25:26 -0800 (PST) Received: by 10.216.240.196 with HTTP; Mon, 6 Dec 2010 00:25:26 -0800 (PST) In-Reply-To: <4CFC8A47.9000006@cs.ucla.edu> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> Date: Mon, 6 Dec 2010 00:25:26 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Paul Eggert Content-Type: multipart/mixed; boundary=0016364d2ebf50aa0a0496b9a1f0 X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 7489 Cc: coreutils@gnu.org, Jim Meyering , 7489@debbugs.gnu.org, DJ Lucas X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.6 (---) --0016364d2ebf50aa0a0496b9a1f0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Professor Eggert, On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert wrote: > On 12/05/2010 09:16 PM, Chen Guo wrote: >> Before saying anything else, I should note that for mutexes, on 4 >> threads 20% of the time there's a segfault on a seemingly innocuous >> line in queue_insert (): >> =A0 node->queued =3D true > > It does sound like mutexes are the way to go, and that this bug > needs to be fixed. =A0I assume that this is the call to queue_insert > from queue_check_insert_parent? =A0What's the backtrace? =A0(And > what patch are you using, to get mutexes?) > I've attached the patch (inlined at the bottom). Here's the GDB crash, with backtrace. I also printed node->queued in GDB, so it's evident that its accessible. (gdb) run --parallel 2 rec_1M > /dev/null Starting program: /data/chen/Coding/Coreutils/test/sort-mutex --parallel 2 rec_1M > /dev/null [Thread debugging using libthread_db enabled] [New Thread 0x7fffcbb95710 (LWP 19850)] Program received signal SIGSEGV, Segmentation fault. queue_insert (queue=3D0x7fffffffe0c0, node=3D0x7ffff7ddc560) at sort.c:3202 3202 node->queued =3D true; (gdb) bt #0 queue_insert (queue=3D0x7fffffffe0c0, node=3D0x7ffff7ddc560) at sort.c:= 3202 #1 0x0000000000405844 in queue_check_insert_parent (lines=3D, dest=3D, nthreads=3D, total_lines=3D, parent=3D, lo_child=3D, queue=3D0x7fffffffe0c0, tfp=3D0x7ffff7bb9780, temp_output=3D0x0) at sort.c:3340 #2 merge_loop (lines=3D, dest=3D, nthreads=3D, total_lines=3D, parent=3D, lo_child=3D, queue=3D0x7fffffffe0c0, tfp=3D0x7ffff7bb9780, temp_output=3D0x0) at sort.c:3374 #3 sortlines (lines=3D, dest=3D, nthreads=3D, total_lines=3D, parent=3D, lo_child=3D, queue=3D0x7fffffffe0c0, tfp=3D0x7ffff7bb9780, temp_output=3D0x0) at sort.c:3515 #4 0x0000000000405c2b in sortlines (lines=3D0x7ffff7344030, dest=3D, nthreads=3D, total_lines=3D, parent=3D, lo_child=3D, queue=3D0x7fffffffe0c0, tfp=3D0x7ffff7bb9780, temp_output=3D0x0) at sor= t.c:3494 #5 0x00000000004095e8 in sort (argc=3D, argv=3D) at sort.c:3804 #6 main (argc=3D, argv=3D) at so= rt.c:4576 (gdb) print node $1 =3D (struct merge_node *) 0x7ffff7ddc560 (gdb) print node->queued $2 =3D false And here's the patch: >From 5a1d0b8f1a4c6d7cd99da7376f8c03f8a4cc2be9 Mon Sep 17 00:00:00 2001 From: Chen Guo Date: Mon, 6 Dec 2010 00:15:42 -0800 Subject: [PATCH] sort: change spinlocks to mutexes. * src/sort.c: (struct merge_node) Change member lock to mutex. All uses changed. --- src/sort.c | 14 +++++++------- 1 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/sort.c b/src/sort.c index a4c2cbf..9cfc814 100644 --- a/src/sort.c +++ b/src/sort.c @@ -241,7 +241,7 @@ struct merge_node struct merge_node *parent; /* Parent node. */ unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ - pthread_spinlock_t lock; /* Lock for node operations. */ + pthread_mutex_t lock; /* Lock for node operations. */ }; /* Priority queue of merge nodes. */ @@ -3157,7 +3157,7 @@ compare_nodes (void const *a, void const *b) static inline void lock_node (struct merge_node *node) { - pthread_spin_lock (&node->lock); + pthread_mutex_lock (&node->lock); } /* Unlock a merge tree NODE. */ @@ -3165,7 +3165,7 @@ lock_node (struct merge_node *node) static inline void unlock_node (struct merge_node *node) { - pthread_spin_unlock (&node->lock); + pthread_mutex_unlock (&node->lock); } /* Destroy merge QUEUE. */ @@ -3479,7 +3479,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, node.parent =3D parent; node.level =3D parent->level + 1; node.queued =3D false; - pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); + pthread_mutex_init (&node.lock, NULL); /* Calculate thread arguments. */ unsigned long int lo_threads =3D nthreads / 2; @@ -3515,7 +3515,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, merge_loop (queue, total_lines, tfp, temp_output); } - pthread_spin_destroy (&node.lock); + pthread_mutex_destroy (&node.lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3799,12 +3799,12 @@ sort (char * const *files, size_t nfiles, char const *output_file, node.parent =3D NULL; node.level =3D MERGE_END; node.queued =3D false; - pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); + pthread_mutex_init (&node.lock, NULL); sortlines (line, line, nthreads, buf.nlines, &node, true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_spin_destroy (&node.lock); + pthread_mutex_destroy (&node.lock); } else write_unique (line - 1, tfp, temp_output); --=20 1.7.1 --0016364d2ebf50aa0a0496b9a1f0 Content-Type: text/x-patch; charset=US-ASCII; name="mutex.patch" Content-Disposition: attachment; filename="mutex.patch" Content-Transfer-Encoding: base64 X-Attachment-Id: f_ghd3fofz0 RnJvbSA1YTFkMGI4ZjFhNGM2ZDdjZDk5ZGE3Mzc2ZjhjMDNmOGE0Y2MyYmU5IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBDaGVuIEd1byA8Y2hlbmd1bzRAdWNsYS5lZHU+CkRhdGU6IE1v biwgNiBEZWMgMjAxMCAwMDoxNTo0MiAtMDgwMApTdWJqZWN0OiBbUEFUQ0hdIHNvcnQ6IGNoYW5n ZSBzcGlubG9ja3MgdG8gbXV0ZXhlcy4KCiogc3JjL3NvcnQuYzogKHN0cnVjdCBtZXJnZV9ub2Rl KSBDaGFuZ2UgbWVtYmVyIGxvY2sgdG8gbXV0ZXguIEFsbAp1c2VzIGNoYW5nZWQuCi0tLQogc3Jj L3NvcnQuYyB8ICAgMTQgKysrKysrKy0tLS0tLS0KIDEgZmlsZXMgY2hhbmdlZCwgNyBpbnNlcnRp b25zKCspLCA3IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9zb3J0LmMgYi9zcmMvc29y dC5jCmluZGV4IGE0YzJjYmYuLjljZmM4MTQgMTAwNjQ0Ci0tLSBhL3NyYy9zb3J0LmMKKysrIGIv c3JjL3NvcnQuYwpAQCAtMjQxLDcgKzI0MSw3IEBAIHN0cnVjdCBtZXJnZV9ub2RlCiAgIHN0cnVj dCBtZXJnZV9ub2RlICpwYXJlbnQ7ICAgIC8qIFBhcmVudCBub2RlLiAqLwogICB1bnNpZ25lZCBp bnQgbGV2ZWw7ICAgICAgICAgICAvKiBMZXZlbCBpbiBtZXJnZSB0cmVlLiAqLwogICBib29sIHF1 ZXVlZDsgICAgICAgICAgICAgICAgICAvKiBOb2RlIGlzIGFscmVhZHkgaW4gaGVhcC4gKi8KLSAg cHRocmVhZF9zcGlubG9ja190IGxvY2s7ICAgICAgLyogTG9jayBmb3Igbm9kZSBvcGVyYXRpb25z LiAqLworICBwdGhyZWFkX211dGV4X3QgbG9jazsgICAgICAgICAvKiBMb2NrIGZvciBub2RlIG9w ZXJhdGlvbnMuICovCiB9OwogCiAvKiBQcmlvcml0eSBxdWV1ZSBvZiBtZXJnZSBub2Rlcy4gKi8K QEAgLTMxNTcsNyArMzE1Nyw3IEBAIGNvbXBhcmVfbm9kZXMgKHZvaWQgY29uc3QgKmEsIHZvaWQg Y29uc3QgKmIpCiBzdGF0aWMgaW5saW5lIHZvaWQKIGxvY2tfbm9kZSAoc3RydWN0IG1lcmdlX25v ZGUgKm5vZGUpCiB7Ci0gIHB0aHJlYWRfc3Bpbl9sb2NrICgmbm9kZS0+bG9jayk7CisgIHB0aHJl YWRfbXV0ZXhfbG9jayAoJm5vZGUtPmxvY2spOwogfQogCiAvKiBVbmxvY2sgYSBtZXJnZSB0cmVl IE5PREUuICovCkBAIC0zMTY1LDcgKzMxNjUsNyBAQCBsb2NrX25vZGUgKHN0cnVjdCBtZXJnZV9u b2RlICpub2RlKQogc3RhdGljIGlubGluZSB2b2lkCiB1bmxvY2tfbm9kZSAoc3RydWN0IG1lcmdl X25vZGUgKm5vZGUpCiB7Ci0gIHB0aHJlYWRfc3Bpbl91bmxvY2sgKCZub2RlLT5sb2NrKTsKKyAg cHRocmVhZF9tdXRleF91bmxvY2sgKCZub2RlLT5sb2NrKTsKIH0KIAogLyogRGVzdHJveSBtZXJn ZSBRVUVVRS4gKi8KQEAgLTM0NzksNyArMzQ3OSw3IEBAIHNvcnRsaW5lcyAoc3RydWN0IGxpbmUg KnJlc3RyaWN0IGxpbmVzLCBzdHJ1Y3QgbGluZSAqcmVzdHJpY3QgZGVzdCwKICAgbm9kZS5wYXJl bnQgPSBwYXJlbnQ7CiAgIG5vZGUubGV2ZWwgPSBwYXJlbnQtPmxldmVsICsgMTsKICAgbm9kZS5x dWV1ZWQgPSBmYWxzZTsKLSAgcHRocmVhZF9zcGluX2luaXQgKCZub2RlLmxvY2ssIFBUSFJFQURf UFJPQ0VTU19QUklWQVRFKTsKKyAgcHRocmVhZF9tdXRleF9pbml0ICgmbm9kZS5sb2NrLCBOVUxM KTsKIAogICAvKiBDYWxjdWxhdGUgdGhyZWFkIGFyZ3VtZW50cy4gKi8KICAgdW5zaWduZWQgbG9u ZyBpbnQgbG9fdGhyZWFkcyA9IG50aHJlYWRzIC8gMjsKQEAgLTM1MTUsNyArMzUxNSw3IEBAIHNv cnRsaW5lcyAoc3RydWN0IGxpbmUgKnJlc3RyaWN0IGxpbmVzLCBzdHJ1Y3QgbGluZSAqcmVzdHJp Y3QgZGVzdCwKICAgICAgIG1lcmdlX2xvb3AgKHF1ZXVlLCB0b3RhbF9saW5lcywgdGZwLCB0ZW1w X291dHB1dCk7CiAgICAgfQogCi0gIHB0aHJlYWRfc3Bpbl9kZXN0cm95ICgmbm9kZS5sb2NrKTsK KyAgcHRocmVhZF9tdXRleF9kZXN0cm95ICgmbm9kZS5sb2NrKTsKIH0KIAogLyogU2NhbiB0aHJv dWdoIEZJTEVTW05URU1QUyAuLiBORklMRVMtMV0gbG9va2luZyBmb3IgYSBmaWxlIHRoYXQgaXMK QEAgLTM3OTksMTIgKzM3OTksMTIgQEAgc29ydCAoY2hhciAqIGNvbnN0ICpmaWxlcywgc2l6ZV90 IG5maWxlcywgY2hhciBjb25zdCAqb3V0cHV0X2ZpbGUsCiAgICAgICAgICAgICAgIG5vZGUucGFy ZW50ID0gTlVMTDsKICAgICAgICAgICAgICAgbm9kZS5sZXZlbCA9IE1FUkdFX0VORDsKICAgICAg ICAgICAgICAgbm9kZS5xdWV1ZWQgPSBmYWxzZTsKLSAgICAgICAgICAgICAgcHRocmVhZF9zcGlu X2luaXQgKCZub2RlLmxvY2ssIFBUSFJFQURfUFJPQ0VTU19QUklWQVRFKTsKKyAgICAgICAgICAg ICAgcHRocmVhZF9tdXRleF9pbml0ICgmbm9kZS5sb2NrLCBOVUxMKTsKIAogICAgICAgICAgICAg ICBzb3J0bGluZXMgKGxpbmUsIGxpbmUsIG50aHJlYWRzLCBidWYubmxpbmVzLCAmbm9kZSwgdHJ1 ZSwKICAgICAgICAgICAgICAgICAgICAgICAgICAmcXVldWUsIHRmcCwgdGVtcF9vdXRwdXQpOwog ICAgICAgICAgICAgICBxdWV1ZV9kZXN0cm95ICgmcXVldWUpOwotICAgICAgICAgICAgICBwdGhy ZWFkX3NwaW5fZGVzdHJveSAoJm5vZGUubG9jayk7CisgICAgICAgICAgICAgIHB0aHJlYWRfbXV0 ZXhfZGVzdHJveSAoJm5vZGUubG9jayk7CiAgICAgICAgICAgICB9CiAgICAgICAgICAgZWxzZQog ICAgICAgICAgICAgd3JpdGVfdW5pcXVlIChsaW5lIC0gMSwgdGZwLCB0ZW1wX291dHB1dCk7Ci0t IAoxLjcuMQoK --0016364d2ebf50aa0a0496b9a1f0-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 06 21:17:50 2010 Received: (at 7489) by debbugs.gnu.org; 7 Dec 2010 02:17:50 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPn7l-0005q5-Oe for submit@debbugs.gnu.org; Mon, 06 Dec 2010 21:17:50 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPn7i-0005pr-Lt for 7489@debbugs.gnu.org; Mon, 06 Dec 2010 21:17:47 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 6BBE539E80DF; Mon, 6 Dec 2010 18:23:39 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KKpzV8IHUBD7; Mon, 6 Dec 2010 18:23:38 -0800 (PST) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id D4F7E39E80DC; Mon, 6 Dec 2010 18:23:38 -0800 (PST) Message-ID: <4CFD9AAA.9080302@cs.ucla.edu> Date: Mon, 06 Dec 2010 18:23:38 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#7489: [coreutils] over aggressive threads in sort References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <87pqtnubuc.fsf@meyering.net> <4CF42D4B.7020904@cs.ucla.edu> <4CF9FF26.7010204@cs.ucla.edu> <87sjyc5uua.fsf@meyering.net> In-Reply-To: <87sjyc5uua.fsf@meyering.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam-Score: -3.3 (---) X-Debbugs-Envelope-To: 7489 Cc: Chen Guo , =?ISO-8859-1?Q?P=E1draig_Brady?= , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.3 (---) On 12/05/10 03:21, Jim Meyering wrote: > seq -w 200000 > exp && tac exp > in > PATH=.:$PATH ./sort --compress-program=dzip -S 1k in > out > > That gets stuck in waitpid (from sort.c's reap), waiting for a > dzip invocation that appears will never terminate. This is also > on that same 4-core system, and is relatively easy to reproduce, > so it should be easy to identify the offending change, but I'm > out of time, now. > I've verified this bug. It occurs regardless of whether the December 3 changes are applied, so it looks like it was an earlier change. I've also verified that it occurs with --parallel=1 so it seems unlikely it would be thread-related. What happens is that 'sort' creates a pipe to a compressor but then never closes the pipe. When it waits for the compressor to finish there is an obvious deadlock. I'll see if I can track it down, but my guess it's somewhere in the compressor/decompressor code. It's doing some lazy evaluation and I wouldn't be surprised if that goes haywire in some cases. I'd guess there's also a bug in the thread code too, having to do with your core dump (and Chen's; could be two bugs). From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 07 05:01:03 2010 Received: (at 7489) by debbugs.gnu.org; 7 Dec 2010 10:01:03 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPuM2-0007ZG-L5 for submit@debbugs.gnu.org; Tue, 07 Dec 2010 05:01:03 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPuLz-0007Ym-Rb for 7489@debbugs.gnu.org; Tue, 07 Dec 2010 05:01:01 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id EBAED60190; Tue, 7 Dec 2010 11:06:53 +0100 (CET) From: Jim Meyering To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: (Chen Guo's message of "Mon, 6 Dec 2010 00:25:26 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> Date: Tue, 07 Dec 2010 11:06:53 +0100 Message-ID: <87ipz5ucaq.fsf@meyering.net> Lines: 160 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.6 (-----) X-Debbugs-Envelope-To: 7489 Cc: Paul Eggert , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) Chen Guo wrote: ... > I've attached the patch (inlined at the bottom). Here's the GDB > crash, with backtrace. I also printed node->queued in GDB, so it's > evident that its accessible. > > (gdb) run --parallel 2 rec_1M > /dev/null > Starting program: /data/chen/Coding/Coreutils/test/sort-mutex > --parallel 2 rec_1M > /dev/null > [Thread debugging using libthread_db enabled] > [New Thread 0x7fffcbb95710 (LWP 19850)] > > Program received signal SIGSEGV, Segmentation fault. > queue_insert (queue=0x7fffffffe0c0, node=0x7ffff7ddc560) at sort.c:3202 > 3202 node->queued = true; > (gdb) bt ... > (gdb) print node > $1 = (struct merge_node *) 0x7ffff7ddc560 > (gdb) print node->queued > $2 = false Could it be that you're looking at one thread, while the segfault happened in another? In my core dump, the offending "node" pointer had a value of 5. ... > And here's the patch: > > Subject: [PATCH] sort: change spinlocks to mutexes. > > * src/sort.c: (struct merge_node) Change member lock to mutex. All > uses changed. Hi Chen, Thanks for the patch. Since this fixes a bug, I've added a NEWS entry. Also, since with your patch, the sort-spinlock-abuse test now passes, I've adjusted tests/Makefile.am to reflect that. The segfault may be easier to reproduce with mutexes, but I've seen the same one also with spinlocks (albeit rarely). Here's the amended patch: >From 7a80bc63e1411f0a2ed7c4259e852de34591a65a Mon Sep 17 00:00:00 2001 From: Chen Guo Date: Mon, 6 Dec 2010 00:15:42 -0800 Subject: [PATCH] sort: use mutexes, not spinlocks (avoid busy loop on blocked output) Running a command like this on a multi-core system sort < big-file | less would peg all processors at near 100% utilization. * src/sort.c: (struct merge_node) Change member lock to mutex. All uses changed. * tests/Makefile.am (XFAIL_TESTS): Remove definition, now that this test passes once again. I.e., the sort-spinlock-abuse test no longer fails. * NEWS (Bug reports): Mention this. Reported by DJ Lucas in http://debbugs.gnu.org/7489. --- NEWS | 5 +++++ src/sort.c | 14 +++++++------- tests/Makefile.am | 3 --- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/NEWS b/NEWS index c3110a3..9f55cbb 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,11 @@ GNU coreutils NEWS -*- outline -*- sort -u with at least two threads could attempt to read through a corrupted pointer. [bug introduced in coreutils-8.6] + sort with at least two threads and with blocked output would busy-loop + (spinlock) all threads, often using 100% of available CPU cycles to + do no work. I.e., "sort < big-file | less" could waste a lot of power. + [bug introduced in coreutils-8.6] + ** New features split accepts the --number option to generate a specific number of files. diff --git a/src/sort.c b/src/sort.c index a4c2cbf..9cfc814 100644 --- a/src/sort.c +++ b/src/sort.c @@ -241,7 +241,7 @@ struct merge_node struct merge_node *parent; /* Parent node. */ unsigned int level; /* Level in merge tree. */ bool queued; /* Node is already in heap. */ - pthread_spinlock_t lock; /* Lock for node operations. */ + pthread_mutex_t lock; /* Lock for node operations. */ }; /* Priority queue of merge nodes. */ @@ -3157,7 +3157,7 @@ compare_nodes (void const *a, void const *b) static inline void lock_node (struct merge_node *node) { - pthread_spin_lock (&node->lock); + pthread_mutex_lock (&node->lock); } /* Unlock a merge tree NODE. */ @@ -3165,7 +3165,7 @@ lock_node (struct merge_node *node) static inline void unlock_node (struct merge_node *node) { - pthread_spin_unlock (&node->lock); + pthread_mutex_unlock (&node->lock); } /* Destroy merge QUEUE. */ @@ -3479,7 +3479,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, node.parent = parent; node.level = parent->level + 1; node.queued = false; - pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); + pthread_mutex_init (&node.lock, NULL); /* Calculate thread arguments. */ unsigned long int lo_threads = nthreads / 2; @@ -3515,7 +3515,7 @@ sortlines (struct line *restrict lines, struct line *restrict dest, merge_loop (queue, total_lines, tfp, temp_output); } - pthread_spin_destroy (&node.lock); + pthread_mutex_destroy (&node.lock); } /* Scan through FILES[NTEMPS .. NFILES-1] looking for a file that is @@ -3799,12 +3799,12 @@ sort (char * const *files, size_t nfiles, char const *output_file, node.parent = NULL; node.level = MERGE_END; node.queued = false; - pthread_spin_init (&node.lock, PTHREAD_PROCESS_PRIVATE); + pthread_mutex_init (&node.lock, NULL); sortlines (line, line, nthreads, buf.nlines, &node, true, &queue, tfp, temp_output); queue_destroy (&queue); - pthread_spin_destroy (&node.lock); + pthread_mutex_destroy (&node.lock); } else write_unique (line - 1, tfp, temp_output); diff --git a/tests/Makefile.am b/tests/Makefile.am index d52f677..b573061 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -656,7 +656,4 @@ pr_data = \ pr/ttb3-FF \ pr/w72l24f-ll -XFAIL_TESTS = \ - misc/sort-spinlock-abuse - include $(srcdir)/check.mk -- 1.7.3.2.92.g7e4eb From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 07 06:19:06 2010 Received: (at 7489) by debbugs.gnu.org; 7 Dec 2010 11:19:06 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPvZa-0000t9-2Q for submit@debbugs.gnu.org; Tue, 07 Dec 2010 06:19:06 -0500 Received: from mx.meyering.net ([82.230.74.64]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PPvZX-0000sg-Dr for 7489@debbugs.gnu.org; Tue, 07 Dec 2010 06:19:04 -0500 Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000) id 96DF860190; Tue, 7 Dec 2010 12:24:57 +0100 (CET) From: Jim Meyering To: Chen Guo Subject: Re: bug#7489: [coreutils] over aggressive threads in sort In-Reply-To: (Chen Guo's message of "Mon, 6 Dec 2010 00:25:26 -0800") References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> Date: Tue, 07 Dec 2010 12:24:57 +0100 Message-ID: <877hflu8om.fsf@meyering.net> Lines: 33 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -5.7 (-----) X-Debbugs-Envelope-To: 7489 Cc: Paul Eggert , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.7 (-----) Chen Guo wrote: > Hi Professor Eggert, > On Sun, Dec 5, 2010 at 11:01 PM, Paul Eggert wrote: >> On 12/05/2010 09:16 PM, Chen Guo wrote: >>> Before saying anything else, I should note that for mutexes, on 4 >>> threads 20% of the time there's a segfault on a seemingly innocuous >>> line in queue_insert (): >>> =C2=A0 node->queued =3D true >> >> It does sound like mutexes are the way to go, and that this bug >> needs to be fixed. =C2=A0I assume that this is the call to queue_insert >> from queue_check_insert_parent? =C2=A0What's the backtrace? =C2=A0(And >> what patch are you using, to get mutexes?) >> > > I've attached the patch (inlined at the bottom). Here's the GDB > crash, with backtrace. I also printed node->queued in GDB, so it's > evident that its accessible. > > (gdb) run --parallel 2 rec_1M > /dev/null > Starting program: /data/chen/Coding/Coreutils/test/sort-mutex > --parallel 2 rec_1M > /dev/null Hi Chen, Thanks. What does your input file look like? I've been unable to reproduce the failure using the output of seq 1000000. I've tried a few different -S ... options, in case the amount of available memory is a factor: seq 1000000 > in-1M for i in $(seq 1000); do \ printf '%03d\r' $i; src/sort --parallel=3D2 -S 1M in-1M > /dev/null; do= ne From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 07 15:56:16 2010 Received: (at 7489) by debbugs.gnu.org; 7 Dec 2010 20:56:16 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQ4a8-0000dC-CI for submit@debbugs.gnu.org; Tue, 07 Dec 2010 15:56:16 -0500 Received: from mail-wy0-f172.google.com ([74.125.82.172]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PQ4a5-0000cx-J2 for 7489@debbugs.gnu.org; Tue, 07 Dec 2010 15:56:14 -0500 Received: by wyf23 with SMTP id 23so377191wyf.3 for <7489@debbugs.gnu.org>; Tue, 07 Dec 2010 13:02:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=JTySsq9PA89eNetMuvsYamOob3n3MoSP1yAWEsH6wto=; b=hXgudchkRX5WmR3O7wZL7olz0cyFmCaBYmsisSVijVrLWHcsW4ecOoehBdCRz8lL+H Z+faiRjCzFMehG2nclppGhdRyTccWDN1xlR0cCee+hOj8N9WeU0/2y6yvrC8V3mPO0Ot f82W9eD/AeSgZx8PvmQbfjeZ8qBwrSWj47KGo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=a6ptZNlDhsfU5in6krfEt8ir6SEWzWM9tjyGMI7mrnngfLEoyQOFWuDQE2ULLRNUQn 9OAUkausbeuJM+Q1kICdlz6t84FjIbso1NV3c8BKYX/A3lONjqmYOaBHMP3EAUaoi+pV UjG7eC43l19Y5nlIIX7g1yTbfz/N46WTXcftc= MIME-Version: 1.0 Received: by 10.216.71.209 with SMTP id r59mr1163102wed.15.1291755716878; Tue, 07 Dec 2010 13:01:56 -0800 (PST) Received: by 10.216.240.196 with HTTP; Tue, 7 Dec 2010 13:01:56 -0800 (PST) In-Reply-To: <877hflu8om.fsf@meyering.net> References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> Date: Tue, 7 Dec 2010 13:01:56 -0800 Message-ID: Subject: Re: bug#7489: [coreutils] over aggressive threads in sort From: Chen Guo To: Jim Meyering Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 7489 Cc: Paul Eggert , DJ Lucas , 7489@debbugs.gnu.org, coreutils@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.6 (---) Hi Jim, On Tue, Dec 7, 2010 at 3:24 AM, Jim Meyering wrote: > Hi Chen, > > Thanks. =A0What does your input file look like? > I've been unable to reproduce the failure using the output of > seq 1000000. =A0I've tried a few different -S ... options, in case > the amount of available memory is a factor: > > =A0seq 1000000 > in-1M > =A0for i in $(seq 1000); do \ > =A0 =A0printf '%03d\r' $i; src/sort --parallel=3D2 -S 1M in-1M > /dev/nul= l; done > The input file I used was generated with gensort (http://www.ordinal.com/gensort.html) I used the -a 1000000 to generate a 1 million line ASCII file. Running sort 10 times on 2 or 4 threads almost always triggered at least 1 segfault. From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 30 03:47:46 2018 Received: (at 7489) by debbugs.gnu.org; 30 Oct 2018 07:47:46 +0000 Received: from localhost ([127.0.0.1]:53181 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gHOkb-0005Ra-OQ for submit@debbugs.gnu.org; Tue, 30 Oct 2018 03:47:45 -0400 Received: from mail-io1-f65.google.com ([209.85.166.65]:39333) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gHOkZ-0005RN-A8 for 7489@debbugs.gnu.org; Tue, 30 Oct 2018 03:47:43 -0400 Received: by mail-io1-f65.google.com with SMTP id n11-v6so6668664iob.6 for <7489@debbugs.gnu.org>; Tue, 30 Oct 2018 00:47:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=arQhnWo6+vuZKqqRhYgeNDoImuCOsXuy2CDdvtQ/fAk=; b=McbCz6lWs7jibF6X0yQwXjly5bgfQ+Lct6LxAfIfHmnT7C5ub7929PgjvYeCs9RViy 6eE3+GJ91sZG+YBDCEoXRPNej0je3prA0OUw5nhUAZbPOcQt96wb1WoavDooZQigpI5D 0dY/FE8zmRHMvsNUHlz17ZKOBBmn+0I88XmFU+fkZdvwpPy38StRtSssiQ0ItxLoqvzL 30cPAZz6wdyX6SdM3midH8XgWBpF7E3MWdFl+MmM09kQxLtH4eAWorp+NKiieAloZ9RM KDQ+jzJMnZjdoZZBfxaZGAJfsIG4ETB1DO0Y847DcG55R0kkt7HI0O+7kgiwINHFNa10 ydNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=arQhnWo6+vuZKqqRhYgeNDoImuCOsXuy2CDdvtQ/fAk=; b=VzH0ulYLuZrwci2YtjRtD0OWF1Ec95d6E7bASB8yZqSjnZpm7YJkioTT+7ELdON6nr OGNP6iuqHVniRUWd3dxgl/aOi7/bf2oJS+yTdXHiMfcivfpRQFRo9+MjJEymDrOzDn/q vEZYFNMOQZlnH6SAow1WeTPmiWdLXz8ooFaV6KCboz+xhEoMqsuZd/K5KuVknJcwEcN1 p3ldycAaV8ECErtf+O48Juh8vJm3oO2W0iUQvUyizkNTVBNTq/MKcA+p30Wq9RuXia8E fkM17CvpdFIRa0gT6P6sWPE4RSda1Et8uhIeKhd77q6NK1OjJXvEsz41U5qlKgas1LA1 NvDg== X-Gm-Message-State: AGRZ1gLSS0nYyCpOjw/CETXTxwHjanuNHzH8zl9NtU5fRqyCfutfXhYL n8HXHTdd2FivnQL/UY+3FxmmiQc23WU= X-Google-Smtp-Source: AJdET5elIJGEX5d24UhgIilDRDHTgAJBw6dXOwrZjCYoorL9qdDz+dDVPtNbXJYgjzEJiDgq6H+ugg== X-Received: by 2002:a6b:8cc2:: with SMTP id o185-v6mr10053714iod.202.1540885656660; Tue, 30 Oct 2018 00:47:36 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id x21-v6sm7773939ita.6.2018.10.30.00.47.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 30 Oct 2018 00:47:34 -0700 (PDT) Subject: Re: bug#7489: [coreutils] over aggressive threads in sort To: Chen Guo , Jim Meyering , Paul Eggert , DJ Lucas References: <4CEFF613.8060200@linuxfromscratch.org> <4CF07288.60902@draigBrady.com> <4CF0C15C.9080201@cs.ucla.edu> <4CF0D1BE.6070908@cs.ucla.edu> <4CF1A911.1000000@cs.ucla.edu> <4CF1BBF5.9080103@linuxfromscratch.org> <4CF352CD.7010206@linuxfromscratch.org> <4CF3FC08.3020705@cs.ucla.edu> <87r5dzc0m6.fsf@meyering.net> <4CF95CE2.8090602@cs.ucla.edu> <4CFC8A47.9000006@cs.ucla.edu> <877hflu8om.fsf@meyering.net> From: Assaf Gordon Message-ID: <625ed2be-b945-4808-92de-694587c7d8af@gmail.com> Date: Tue, 30 Oct 2018 01:47:25 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 7489 Cc: 7489@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) (triaging old bugs) Hello, This long thread ( http://bugs.gnu.org/7489 ) deals with multiple parallel-sort bugs, resulting in many commits: ==== 1d0a12037 Paul Eggert 2010-12-22 sort: minor performance tweak with num_processors 41159f960 Pádraig Brady 2010-12-20 maint: fix a typo in sort --parallel help message 0e181024c Pádraig Brady 2010-12-18 sort: use at most 8 threads by default 8e81a99c2 Paul Eggert 2010-12-16 sort: do not generate thousands of subprocesses for 16-way merge 1b31ce698 Paul Eggert 2010-12-16 sort: fix hang with sort --compress f3c584d1e Paul Eggert 2010-12-16 sort: don't dump core when merging from input twice 14ad7a255 Paul Eggert 2010-12-14 sort: fix very-unlikely buffer overrun when merging to input file 8f40ed634 Paul Eggert 2010-12-14 sort: document --compress reaper fixes 0da4d8430 Paul Eggert 2010-12-13 sort: fix some --compress reaper bugs ad61335bf Jim Meyering 2010-12-11 sort: avoid segfault when using two or more threads 9a9d69e9e Jim Meyering 2010-12-11 sort: syntax cleanup 27e997d0e Paul Eggert 2010-12-11 sort: integer overflow checks in thread counts, etc. c9db0ac6d Chen Guo 2010-12-10 sort: preallocate merge tree nodes to heap. d1f700355 Paul Eggert 2010-12-10 sort: comment fix 621876ff4 Chen Guo 2010-12-06 sort: use mutexes, not spinlocks (avoid busy loop on blocked output) a1629ba1e Jim Meyering 2010-12-05 tests: remove useless definition of $SORT in sort-compress cd00fa6ee Paul Eggert 2010-12-03 sort: merge_queue -> queue fb282c7b3 Paul Eggert 2010-12-03 sort: clarify queue_check_insert fb41e82c7 Paul Eggert 2010-12-03 sort: fix problems with merge node dest pointer f2d977aff Paul Eggert 2010-12-03 sort: simplify write_unique 6f4279421 Paul Eggert 2010-12-03 sort: put queue arg first f35f4b339 Paul Eggert 2010-12-03 sort: tune struct_merge_node slightly eb989f4b7 Paul Eggert 2010-12-03 sort: Clarify comments 0ec869e8b Paul Eggert 2010-12-01 sort: fix bug on 64-bit hosts with at least 32768 processors === All the above were included in coreutils 8.8. I assume the sort bugs are fixed, despite no 'final' message in the thread. If there are no objections, I'll close this bug in couple of days. -assaf From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 06 12:55:46 2018 Received: (at control) by debbugs.gnu.org; 6 Nov 2018 17:55:46 +0000 Received: from localhost ([127.0.0.1]:37664 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gK5Zq-0002Yi-0F for submit@debbugs.gnu.org; Tue, 06 Nov 2018 12:55:46 -0500 Received: from mail-pg1-f174.google.com ([209.85.215.174]:37425) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gK5Zo-0002YW-A1 for control@debbugs.gnu.org; Tue, 06 Nov 2018 12:55:44 -0500 Received: by mail-pg1-f174.google.com with SMTP id c10-v6so6145497pgq.4 for ; Tue, 06 Nov 2018 09:55:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=l1h081PofiAxnwTT5r8TqfZaNHXXaS7AF+dQdPc8zDc=; b=CQJ/0QWJ0kdMFppukF4A6C5xTWFjOYYo0EyttRT5WBUUvxcDz15Z+PCzp6ueznXPhl AYT7Nnu0LQdA7uYvBVP9tsbLcpMqYVIEIAne2G/Q7EJuMEOIiMA6ZaTuBGTc2pFTmYEK hr3GuyIgWx+m7QenYbqAn2hASEfGrA5sSa0jN5MWk6BIPtOJMCMdPSxEMdgxQNvWlhw5 ySSR0NVBJVvOj1QyqOt9Ltkki1ta/uqMzpMdsESLElh/ZIVo/2DAtd+o2lVzztBwLops fSnJmmuz1e/ke/9EXHVVQao8B4TQfJYmMDS1cNboQWYVc35E9WgnQB+03qxX7cQJ6Vpk e6vA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=l1h081PofiAxnwTT5r8TqfZaNHXXaS7AF+dQdPc8zDc=; b=mkrw1sM8podWdoyO1/RRFN2lhJ37mZTYQwWorqabCG7LVmePD31aeZwv087UnKZJbM LoTJ8P5pGJkBh8WCvOEM/LGvekfzNJCukVAk20fHSKNzqk9tpZeCqKsFWcNk1EHk5ylQ w3Vg0wp8tn+LzomuKs1nk76wXhll8+awSgBsY6NCSWo3YIKJPKMx3rcKBQ7o9jZidpPh DodlQUT5k6wpqEWC8bLlAhzFo8ZATzW6jc1/CQ0Y3eLtN2nkOgv3OOYB0jhwsR2p0CaW 2kHi3UV8DIQhTXVcNvPW2WwRVX1WRF5L/T3Y9vpZAbCxBli6Bzdv4IDIUAJqopmoTqtW LqXw== X-Gm-Message-State: AGRZ1gJNO8Rd50MUxkYYt5Qa/jyPRtJ6FuhwlUqEjWFsqSb99glyC+ZL NaaXjQ74Pzwb/5igpjUISQYuFC6g X-Google-Smtp-Source: AJdET5cVRJk1/ZLLm+OLBL/UUnxQky3mHc44EGq2f/z1bDpmp+BDkFAhA8lomMp9/1x1pAF4UZUeFg== X-Received: by 2002:a63:4926:: with SMTP id w38mr3819779pga.353.1541526937787; Tue, 06 Nov 2018 09:55:37 -0800 (PST) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id p87-v6sm48284957pfk.186.2018.11.06.09.55.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Nov 2018 09:55:36 -0800 (PST) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: <977c78f9-60a1-51c5-b745-61deab7bb0cf@gmail.com> Date: Tue, 6 Nov 2018 10:55:35 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: tags 7489 fixed close 7489 [...] Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.215.174 listed in list.dnswl.org] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.215.174 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) tags 7489 fixed close 7489 From unknown Mon Jun 23 18:27:31 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 05 Dec 2018 12:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator