From unknown Sun Jun 15 08:45:08 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#9321 <9321@debbugs.gnu.org> To: bug#9321 <9321@debbugs.gnu.org> Subject: Status: repeated segfaults sorting large files in 8.12 Reply-To: bug#9321 <9321@debbugs.gnu.org> Date: Sun, 15 Jun 2025 15:45:08 +0000 retitle 9321 repeated segfaults sorting large files in 8.12 reassign 9321 coreutils submitter 9321 Andras Salamon severity 9321 normal tag 9321 notabug thanks From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 18 12:10:14 2011 Received: (at submit) by debbugs.gnu.org; 18 Aug 2011 16:10:14 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qu5Ab-0003Zn-81 for submit@debbugs.gnu.org; Thu, 18 Aug 2011 12:10:14 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qu3dU-0001OB-TZ for submit@debbugs.gnu.org; Thu, 18 Aug 2011 10:31:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qu3bY-0002ub-5O for submit@debbugs.gnu.org; Thu, 18 Aug 2011 10:29:57 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:51899) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qu3bY-0002uX-1V for submit@debbugs.gnu.org; Thu, 18 Aug 2011 10:29:56 -0400 Received: from eggs.gnu.org ([140.186.70.92]:48558) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qu3bW-0007rg-P4 for bug-coreutils@gnu.org; Thu, 18 Aug 2011 10:29:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qu3bV-0002uD-HT for bug-coreutils@gnu.org; Thu, 18 Aug 2011 10:29:54 -0400 Received: from server2.gaon.net ([46.4.121.115]:37645) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qu3bV-0002tt-9o for bug-coreutils@gnu.org; Thu, 18 Aug 2011 10:29:53 -0400 Received: from server2.gaon.net (localhost [127.0.0.1]) by server2.gaon.net (8.14.3/8.14.3) with ESMTP id p7IETjm0011204; Thu, 18 Aug 2011 14:29:45 GMT Received: (from asalamon@localhost) by server2.gaon.net (8.14.3/8.14.3/Submit 0.2) id p7IETjAM011203; Thu, 18 Aug 2011 14:29:45 GMT Date: Thu, 18 Aug 2011 15:30:05 +0100 From: Andras Salamon To: bug-coreutils@gnu.org Subject: repeated segfaults sorting large files in 8.12 Message-ID: <20110818143005.GA59624@gaon.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.6 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 18 Aug 2011 12:10:12 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.6 (------) I am seeing repeated (but not reliably repeatable) segmentation faults sorting datasets in the 100MB-100GB range on a 64-bit Debian system using GNU sort 8.12 (and also 8.9). Stack traces seem to indicate problems during the merge phase, usually when the temporary files are being combined. This may or may not be related to the recent discussion about #9307, but I am definitely using 8.12, rebuilt with CFLAGS=-g since several indicative values were otherwise optimised out, configured with --disable-nls --disable-threads, and am running with a fixed buffer -S 100M and also --parallel=1 to try to isolate problems from possible threading issues. I was seeing these crashes with a vanilla build also. At least one crash occurred when comparing the very last entry in the memory buffer to a non-existent entry, when merging large files. There was also a crash with total_lines=851122 in mergelines_node, which leads to node->hi containing what appears to be garbage, with length=2882303761517117516. The repository changelog seems to indicate that the current development release of sort has not changed since 8.12. Will attempting to track the problem down with 8.12 be useful? If so I can post stack traces and values of relevant variables from the core dump, or post a new issue in the tracker, or reopen #9307. If not, please suggest some specific actions I should take to generate useful information. Thanks, -- Andras Salamon andras@dns.net From debbugs-submit-bounces@debbugs.gnu.org Thu Aug 18 13:16:40 2011 Received: (at 9321) by debbugs.gnu.org; 18 Aug 2011 17:16:40 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qu6Ct-00053j-5x for submit@debbugs.gnu.org; Thu, 18 Aug 2011 13:16:39 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qu6Cm-00053W-1N for 9321@debbugs.gnu.org; Thu, 18 Aug 2011 13:16:34 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id E032AA60003; Thu, 18 Aug 2011 10:14:30 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oVh3V0ldzNGU; Thu, 18 Aug 2011 10:14:30 -0700 (PDT) Received: from [192.168.0.3] (97-120-140-129.ptld.qwest.net [97.120.140.129]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 5935DA60001; Thu, 18 Aug 2011 10:14:30 -0700 (PDT) Message-ID: <4E4D486C.2080506@cs.ucla.edu> Date: Thu, 18 Aug 2011 10:14:20 -0700 From: Paul Eggert User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11 MIME-Version: 1.0 To: Andras Salamon Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12 References: <20110818143005.GA59624@gaon.net> In-Reply-To: <20110818143005.GA59624@gaon.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 9321 Cc: 9321@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) On 08/18/2011 07:30 AM, Andras Salamon wrote: > The repository changelog seems to indicate that the current development > release of sort has not changed since 8.12. Will attempting to track > the problem down with 8.12 be useful? Yes, I think so; thanks. From debbugs-submit-bounces@debbugs.gnu.org Fri Aug 19 18:57:09 2011 Received: (at 9321) by debbugs.gnu.org; 19 Aug 2011 22:57:09 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QuXzx-0001ze-0e for submit@debbugs.gnu.org; Fri, 19 Aug 2011 18:57:09 -0400 Received: from mail2.vodafone.ie ([213.233.128.44]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QuXzt-0001zT-Al for 9321@debbugs.gnu.org; Fri, 19 Aug 2011 18:57:06 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAKHpTk5tTjRH/2dsb2JhbAAMNYRLpXoBAQUjBAsBRhALDQsCAgUWCwICCQMCAQIBRQYNAQUCAQGvFpEqgSyEDIEQBJg9i0w Received: from unknown (HELO [192.168.1.79]) ([109.78.52.71]) by mail2.vodafone.ie with ESMTP; 19 Aug 2011 23:54:46 +0100 Message-ID: <4E4EE9B6.2090202@draigBrady.com> Date: Fri, 19 Aug 2011 23:54:46 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0 MIME-Version: 1.0 To: Andras Salamon Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12 References: <20110818143005.GA59624@gaon.net> In-Reply-To: <20110818143005.GA59624@gaon.net> X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 9321 Cc: 9321@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) On 08/18/2011 03:30 PM, Andras Salamon wrote: > I am seeing repeated (but not reliably repeatable) segmentation faults > sorting datasets in the 100MB-100GB range on a 64-bit Debian system > using GNU sort 8.12 (and also 8.9). Stack traces seem to indicate > problems during the merge phase, usually when the temporary files > are being combined. > > This may or may not be related to the recent discussion about > #9307, but I am definitely using 8.12, rebuilt with CFLAGS=-g since > several indicative values were otherwise optimised out, configured > with --disable-nls --disable-threads, and am running with a fixed > buffer -S 100M and also --parallel=1 to try to isolate problems from > possible threading issues. I was seeing these crashes with a vanilla > build also. > > At least one crash occurred when comparing the very last entry in > the memory buffer to a non-existent entry, when merging large files. > > There was also a crash with total_lines=851122 in mergelines_node, > which leads to node->hi containing what appears to be garbage, with > length=2882303761517117516. > > The repository changelog seems to indicate that the current development > release of sort has not changed since 8.12. Will attempting to track > the problem down with 8.12 be useful? If so I can post stack traces > and values of relevant variables from the core dump, or post a new > issue in the tracker, or reopen #9307. If not, please suggest some > specific actions I should take to generate useful information. > > Thanks, > > -- Andras Salamon andras@dns.net > > > > Andras, could you give the exact command line your having issue with, and perhaps make sort inputs available too? Also could you try to bisect the issue? You say it happens even with --parallel=1, but could you try to reproduce without the threading changes at all. I.E. with: ftp://ftp.gnu.org/gnu/coreutils/coreutils-8.5.tar.gz Also there were temp file handling changes made in 7.2 so could you try: ftp://ftp.gnu.org/gnu/coreutils/coreutils-7.1.tar.gz Do the --batch-size=NMERGE or --compress-program=PROG options change anything? cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 20 02:34:01 2011 Received: (at 9321) by debbugs.gnu.org; 20 Aug 2011 06:34:01 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Quf84-0007ew-RI for submit@debbugs.gnu.org; Sat, 20 Aug 2011 02:34:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Quf81-0007en-S8 for 9321@debbugs.gnu.org; Sat, 20 Aug 2011 02:33:59 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p7K6Vmc0017133 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 20 Aug 2011 02:31:48 -0400 Received: from mx.meyering.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p7K6Vk9t028811; Sat, 20 Aug 2011 02:31:47 -0400 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 8166E60036; Sat, 20 Aug 2011 08:31:46 +0200 (CEST) From: Jim Meyering To: Andras Salamon Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12 In-Reply-To: <20110818143005.GA59624@gaon.net> (Andras Salamon's message of "Thu, 18 Aug 2011 15:30:05 +0100") References: <20110818143005.GA59624@gaon.net> Date: Sat, 20 Aug 2011 08:31:46 +0200 Message-ID: <87ty9c61bh.fsf@rho.meyering.net> Lines: 42 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -10.6 (----------) X-Debbugs-Envelope-To: 9321 Cc: 9321@debbugs.gnu.org, =?iso-8859-1?Q?P=E1draig?= Brady X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.6 (----------) Andras Salamon wrote: > I am seeing repeated (but not reliably repeatable) segmentation faults > sorting datasets in the 100MB-100GB range on a 64-bit Debian system > using GNU sort 8.12 (and also 8.9). Stack traces seem to indicate > problems during the merge phase, usually when the temporary files > are being combined. > > This may or may not be related to the recent discussion about > #9307, but I am definitely using 8.12, rebuilt with CFLAGS=3D-g since > several indicative values were otherwise optimised out, configured > with --disable-nls --disable-threads, and am running with a fixed > buffer -S 100M and also --parallel=3D1 to try to isolate problems from > possible threading issues. I was seeing these crashes with a vanilla > build also. > > At least one crash occurred when comparing the very last entry in > the memory buffer to a non-existent entry, when merging large files. > > There was also a crash with total_lines=3D851122 in mergelines_node, > which leads to node->hi containing what appears to be garbage, with > length=3D2882303761517117516. > > The repository changelog seems to indicate that the current development > release of sort has not changed since 8.12. Will attempting to track > the problem down with 8.12 be useful? Yes, most definitely. As P=E1draig already mentioned, most useful would be instructions showing how to reproduce the failure, even if part of that is something like "run this command 30 times" to provoke the rare failure. > If so I can post stack traces > and values of relevant variables from the core dump, or post a new > issue in the tracker, or reopen #9307. If not, please suggest some > specific actions I should take to generate useful information. Thanks for the detailed report and investigation. Have you reproduced the problem on more than one system? If not, have you recently run any tests of your system's hardware? It would be a shame to invest a lot of debugging effort, if it ends up being a hardware problem with one specific system. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 20 17:01:15 2011 Received: (at 9321) by debbugs.gnu.org; 20 Aug 2011 21:01:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QusfK-0006ee-RM for submit@debbugs.gnu.org; Sat, 20 Aug 2011 17:01:15 -0400 Received: from server2.gaon.net ([46.4.121.115]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QusfH-0006eU-UU for 9321@debbugs.gnu.org; Sat, 20 Aug 2011 17:01:14 -0400 Received: from server2.gaon.net (localhost [127.0.0.1]) by server2.gaon.net (8.14.3/8.14.3) with ESMTP id p7KKwwDc001920; Sat, 20 Aug 2011 20:58:58 GMT Received: (from asalamon@localhost) by server2.gaon.net (8.14.3/8.14.3/Submit 0.2) id p7KKwvFN001919; Sat, 20 Aug 2011 20:58:57 GMT Date: Sat, 20 Aug 2011 21:58:57 +0100 From: Andras Salamon To: =?utf-8?Q?P=C3=A1draig?= Brady Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12 Message-ID: <20110820205857.GA69145@gaon.net> References: <20110818143005.GA59624@gaon.net> <4E4EE9B6.2090202@draigBrady.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline In-Reply-To: <4E4EE9B6.2090202@draigBrady.com> User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by server2.gaon.net id p7KKwwDc001920 X-Spam-Score: -4.6 (----) X-Debbugs-Envelope-To: 9321 Cc: 9321@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.9 (---) On Fri, Aug 19, 2011 at 11:54:46PM +0100, P=C3=A1draig Brady wrote: >On 08/18/2011 03:30 PM, Andras Salamon wrote: >> I am seeing repeated (but not reliably repeatable) segmentation faults >> sorting datasets in the 100MB-100GB range on a 64-bit Debian system >> using GNU sort 8.12 (and also 8.9). Stack traces seem to indicate >> problems during the merge phase, usually when the temporary files >> are being combined. >Andras, could you give the exact command line your having issue with, >and perhaps make sort inputs available too? The sort inputs are several-gigabyte-range files containing strings, each typically 60 to 140 bytes long, one per line. There are many duplicates, and the first reason to sort is to establish the distribution of duplicates. I would be happy to make available data if I could find a reasonably sized file that causes a reproducible segfault. The problem seems easier to reproduce with larger files, unfortunately. >Do the --batch-size=3DNMERGE or --compress-program=3DPROG options change= anything? Thanks for the suggestion, I will try forcing smaller batches. Compressing batches was something I tried early on with no apparent change in likelihood of failure, but it led to much slower runtimes. >Also there were temp file handling changes made in 7.2 so could you try: >ftp://ftp.gnu.org/gnu/coreutils/coreutils-7.1.tar.gz Here are some of the relevant-seeming parts of a gdb session for coreutils-7.1. Here ?.xz is a compressed file which has already been sorted, around 35MB in size. Built with: configure CFLAGS=3D-g --disable-nls Commandline: % nohup xzcat 1.xz 2.xz 3.xz 4.xz | sort -S 100M -T /home/a/tmp | xz > o.= xz & Segmentation fault ../bin/sort -T /home/a/tmp -S 100M | (core dumped) During the run there were 435 temp files active at one point. There may have been more at a later stage, but these were reduced to a final 32 which remained after the crash. There is around 600GB free disk space on this volume. % du -smc sort* | tail -1 29556 total % ls -sktr sort* 62776 sortR07gPu 62056 sortS3H1Mu 10848 sortECN8Nx 951020 sortlk9Xd1 1001668 sortrDhnFQ 1001420 sortItDvPu 1001216 sortIBlIVY 1001500 sortDWg5Vj 1012504 sortOulxqu 916424 sortOTNgnn 907976 sortRlRPsA 997840 sortuQbWXj 1001328 sortoWTS4K 1001436 sort3GpGf2 1001544 sortVudEk7 1009412 sortJou3Y3 926628 sortL2SeVF 950584 sortSTuAkJ 1001376 sortX9rCaf 1000928 sortAjXZkz 1001120 sortQzXcgK 1001412 sortLwoe9K 1012704 sortM4WHnD 955044 sort1c8ja8 981680 sortJhX3rd 1001040 sortqGq4yV 1000596 sort7obBHs 1000540 sortW4fLHR 1000800 sortSzB3s6 999624 sortMD7K0b 305892 sortqSxpe4 3183480 sortcOqzkh (gdb) bt #0 0x000000000040e6bc in memcoll ( s1=3D0x7800000005824d58
,=20 s1len=3D15564440312192434243,=20 s2=3D0x2b2a1a0 "\n\n,=20 s1len=3D15564440312192434243,=20 s2=3D0x2b2a1a0 "\n\n,=20 length =3D 15564440312192434244, keybeg =3D 0x0, keylim =3D 0x0} (gdb) print *(cur[7]-1) $54 =3D { text =3D 0x5824d9c "\n\n,=20 keylim =3D 0x8900000000000000
} (gdb) print *(cur[7]+1) $55 =3D { text =3D 0x5824d14 "\n\n\n\ntext, alen + 1, b->text, blen + 1); #6 0x000000000040837b in mergefps (files=3D0x119e230, ntemps=3D11, nfile= s=3D11,=20 ofp=3D0x11978b0, output_file=3D0x119787d "/home/a/tmp/sort1mESrU",=20 fps=3D0x1197af0) at sort.c:2995 2995 int cmp =3D compare (cur[ord0], cur[ord[probe]]); In frame 6: (gdb) p cur[0]@11 $6 =3D {0x228b5e0, 0x2a9ff30, 0x30dff60, 0x35293b0, 0x4913940, 0x5020050,= =20 0x5660080, 0x5bd0290, 0x68ce2d0, 0x6f60140, 0x75a0170} (gdb) p ord[0]@11 $8 =3D {0, 8, 4, 9, 1, 5, 10, 2, 6, 3, 7} (gdb) p ord0 $9 =3D 0 (gdb) p probe $10 =3D 1 (gdb) p *(const struct line *)0x2a9ff30 $15 =3D { text =3D 0x245ff30 "_:httpx3Ax2Fx2Fapix2Ehi5x2Ecomx2Frestx2Fprofilex2F= foafx2F350598182xxbnode337", length =3D 77, keybeg =3D 0x0, keylim =3D 0x= 0} (gdb) p *(const struct line *)0x228b5e0 $16 =3D {text =3D 0x600000000226d720
,=20 length =3D 14843864371813154892,=20 keybeg =3D 0x756566736f4e2f72
,=20 keylim =3D 0x66626f5f6f746c61
} (gdb) p *(const struct line *)0x75a0170 $18 =3D { text =3D 0x6f60170 "_:httpx3Ax2Fx2Fapix2Ehi5x2Ecomx2Frestx2Fprofilex2Ff= oafx2F492419832xxbnode215", length =3D 77, keybeg =3D 0x0, keylim =3D 0x0= } (gdb) p *buffer $33 =3D { buf =3D 0x1e1ff00 "_:httpx3Ax2Fx2Fapix2Ehi5x2Ecomx2Frestx2Fprofilex2Ffo= afx2F104700830xxbnode271", used =3D 4596991, nlines =3D 61144, alloc =3D = 6553632, left =3D 62,=20 line_bytes =3D 32, eof =3D false} -- Andras Salamon andras@dns.net From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 20 17:05:30 2011 Received: (at 9321) by debbugs.gnu.org; 20 Aug 2011 21:05:30 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QusjS-0006kC-IR for submit@debbugs.gnu.org; Sat, 20 Aug 2011 17:05:30 -0400 Received: from server2.gaon.net ([46.4.121.115]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QusjQ-0006k5-OJ for 9321@debbugs.gnu.org; Sat, 20 Aug 2011 17:05:29 -0400 Received: from server2.gaon.net (localhost [127.0.0.1]) by server2.gaon.net (8.14.3/8.14.3) with ESMTP id p7KL3FKF001953; Sat, 20 Aug 2011 21:03:15 GMT Received: (from asalamon@localhost) by server2.gaon.net (8.14.3/8.14.3/Submit 0.2) id p7KL3AdS001952; Sat, 20 Aug 2011 21:03:10 GMT Date: Sat, 20 Aug 2011 22:03:10 +0100 From: Andras Salamon To: Jim Meyering Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12 Message-ID: <20110820210310.GB69145@gaon.net> References: <20110818143005.GA59624@gaon.net> <87ty9c61bh.fsf@rho.meyering.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline In-Reply-To: <87ty9c61bh.fsf@rho.meyering.net> User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by server2.gaon.net id p7KL3FKF001953 X-Spam-Score: -3.6 (---) X-Debbugs-Envelope-To: 9321 Cc: 9321@debbugs.gnu.org, =?utf-8?Q?P=C3=A1draig?= Brady X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.4 (---) On Sat, Aug 20, 2011 at 08:31:46AM +0200, Jim Meyering wrote: >As P=C3=A1draig already mentioned, most useful would be instructions >showing how to reproduce the failure, even if part of that is something >like "run this command 30 times" to provoke the rare failure. I'm seeing roughly 1 in 5 failures with the larger runs. >Have you reproduced the problem on more than one system? >If not, have you recently run any tests of your system's hardware? >It would be a shame to invest a lot of debugging effort, >if it ends up being a hardware problem with one specific system. Good point, thanks for the suggestion. I hope to have access next week to a different system with enough free space to try to reproduce. Will run some hardware tests in the meantime. -- Andras Salamon andras@dns.net From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 20 19:31:02 2011 Received: (at 9321) by debbugs.gnu.org; 20 Aug 2011 23:31:02 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Quv0H-0008J4-US for submit@debbugs.gnu.org; Sat, 20 Aug 2011 19:31:02 -0400 Received: from mail3.vodafone.ie ([213.233.128.45]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Quv0F-0008Ig-Ss for 9321@debbugs.gnu.org; Sat, 20 Aug 2011 19:31:01 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAH1CUE5tThEp/2dsb2JhbAAMNBaENaYAAQEFIwQLAUYQCw0LAgIFFgsCAgkDAgECAUUGDQEHAQGHcadpkD2BLIQMgRAEmD6LTQ Received: from unknown (HELO [192.168.1.79]) ([109.78.17.41]) by mail3.vodafone.ie with ESMTP; 21 Aug 2011 00:28:45 +0100 Message-ID: <4E50432B.4050909@draigBrady.com> Date: Sun, 21 Aug 2011 00:28:43 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0 MIME-Version: 1.0 To: Andras Salamon Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12 References: <20110818143005.GA59624@gaon.net> <4E4EE9B6.2090202@draigBrady.com> <20110820205857.GA69145@gaon.net> In-Reply-To: <20110820205857.GA69145@gaon.net> X-Enigmail-Version: 1.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 9321 Cc: 9321@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) On 08/20/2011 09:58 PM, Andras Salamon wrote: > On Fri, Aug 19, 2011 at 11:54:46PM +0100, Pádraig Brady wrote: >> On 08/18/2011 03:30 PM, Andras Salamon wrote: >>> I am seeing repeated (but not reliably repeatable) segmentation faults >>> sorting datasets in the 100MB-100GB range on a 64-bit Debian system >>> using GNU sort 8.12 (and also 8.9). Stack traces seem to indicate >>> problems during the merge phase, usually when the temporary files >>> are being combined. > >> Andras, could you give the exact command line your having issue with, >> and perhaps make sort inputs available too? > > The sort inputs are several-gigabyte-range files containing strings, > each typically 60 to 140 bytes long, one per line. There are > many duplicates, and the first reason to sort is to establish the > distribution of duplicates. I would be happy to make available data > if I could find a reasonably sized file that causes a reproducible > segfault. The problem seems easier to reproduce with larger files, > unfortunately. > >> Do the --batch-size=NMERGE or --compress-program=PROG options change anything? > > Thanks for the suggestion, I will try forcing smaller batches. > > Compressing batches was something I tried early on with no apparent > change in likelihood of failure, but it led to much slower runtimes. > >> Also there were temp file handling changes made in 7.2 so could you try: >> ftp://ftp.gnu.org/gnu/coreutils/coreutils-7.1.tar.gz > > Here are some of the relevant-seeming parts of a gdb session for > coreutils-7.1. If this happens with 2.5 year old sort, I'd be leaning towards a local issue. > (gdb) bt > #0 0x000000000040e6bc in memcoll ( > s1=0x7800000005824d58
, s1len=15564440312192434243, s2=0x2b2a1a0 "\n\n at memcoll.c:50 > #1 0x000000000040af4c in xmemcoll ( > s1=0x7800000005824d58
, s1len=15564440312192434243, s2=0x2b2a1a0 "\n\n at xmemcoll.c:43 > #2 0x00000000004059ee in compare (a=0x5b4a7f0, b=0x301dfc0) at sort.c:2059 > #3 0x0000000000406815 in mergefps (files=0x24063e0, ntemps=15, nfiles=15, ofp=0x23ff8e0, output_file=0x24062ec "/home/a/tmp/sortcOqzkh") > at sort.c:2326 > #4 0x000000000040708f in merge (files=0x24063e0, ntemps=16, nfiles=32, output_file=0x0) at sort.c:2567 > #5 0x000000000040766a in sort (files=0x61c660, nfiles=0, output_file=0x0) > at sort.c:2699 > #6 0x000000000040908c in main (argc=5, argv=0x7fff149247a8) at sort.c:3425 So the 'a' line struct is corrupted. a->text = 7800000005824D58 a->length = D800000000000043 Notice the 0x78 and 0xD8. They should be 0x00. Now whether this is software or hardware? It looks like hardware TBH as there are 4 bits incorrectly set in each of those bytes (which ECC couldn't correct if you have that), and also each incorrect bit is beside another. cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 27 19:51:31 2011 Received: (at 9321) by debbugs.gnu.org; 27 Aug 2011 23:51:31 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QxSew-0004ii-JB for submit@debbugs.gnu.org; Sat, 27 Aug 2011 19:51:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QxSet-0004iX-Bj; Sat, 27 Aug 2011 19:51:29 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p7RNmXoi032619 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 27 Aug 2011 19:48:33 -0400 Received: from mx.meyering.net (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p7RNmWaL016910; Sat, 27 Aug 2011 19:48:33 -0400 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 0F11860019; Sun, 28 Aug 2011 01:48:32 +0200 (CEST) From: Jim Meyering To: Andras Salamon Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12 In-Reply-To: <87ty9c61bh.fsf@rho.meyering.net> (Jim Meyering's message of "Sat, 20 Aug 2011 08:31:46 +0200") References: <20110818143005.GA59624@gaon.net> <87ty9c61bh.fsf@rho.meyering.net> Date: Sun, 28 Aug 2011 01:48:31 +0200 Message-ID: <87bovaquuo.fsf@rho.meyering.net> Lines: 50 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Spam-Score: -10.6 (----------) X-Debbugs-Envelope-To: 9321 Cc: 9321@debbugs.gnu.org, =?iso-8859-1?Q?P=E1draig?= Brady X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.6 (----------) tags 9321 + notabug close 9321 thanks Jim Meyering wrote: > Andras Salamon wrote: > >> I am seeing repeated (but not reliably repeatable) segmentation faults >> sorting datasets in the 100MB-100GB range on a 64-bit Debian system >> using GNU sort 8.12 (and also 8.9). Stack traces seem to indicate >> problems during the merge phase, usually when the temporary files >> are being combined. >> >> This may or may not be related to the recent discussion about >> #9307, but I am definitely using 8.12, rebuilt with CFLAGS=3D-g since >> several indicative values were otherwise optimised out, configured >> with --disable-nls --disable-threads, and am running with a fixed >> buffer -S 100M and also --parallel=3D1 to try to isolate problems from >> possible threading issues. I was seeing these crashes with a vanilla >> build also. >> >> At least one crash occurred when comparing the very last entry in >> the memory buffer to a non-existent entry, when merging large files. >> >> There was also a crash with total_lines=3D851122 in mergelines_node, >> which leads to node->hi containing what appears to be garbage, with >> length=3D2882303761517117516. >> >> The repository changelog seems to indicate that the current development >> release of sort has not changed since 8.12. Will attempting to track >> the problem down with 8.12 be useful? > > Yes, most definitely. > As P=E1draig already mentioned, most useful would be instructions > showing how to reproduce the failure, even if part of that is something > like "run this command 30 times" to provoke the rare failure. > >> If so I can post stack traces >> and values of relevant variables from the core dump, or post a new >> issue in the tracker, or reopen #9307. If not, please suggest some >> specific actions I should take to generate useful information. > > Thanks for the detailed report and investigation. > Have you reproduced the problem on more than one system? > If not, have you recently run any tests of your system's hardware? > It would be a shame to invest a lot of debugging effort, > if it ends up being a hardware problem with one specific system. Per http://thread.gmane.org/gmane.comp.gnu.coreutils.general/1527/focus=3D1= 551 I'm closing this. From unknown Sun Jun 15 08:45:08 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 25 Sep 2011 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator