GNU bug report logs - #9321
repeated segfaults sorting large files in 8.12

Previous Next

Package: coreutils;

Reported by: Andras Salamon <andras <at> dns.net>

Date: Thu, 18 Aug 2011 16:11:01 UTC

Severity: normal

Tags: notabug

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #14 received at 9321 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Andras Salamon <andras <at> dns.net>
Cc: 9321 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#9321: repeated segfaults sorting large files in 8.12
Date: Sat, 20 Aug 2011 08:31:46 +0200
Andras Salamon wrote:

> I am seeing repeated (but not reliably repeatable) segmentation faults
> sorting datasets in the 100MB-100GB range on a 64-bit Debian system
> using GNU sort 8.12 (and also 8.9).  Stack traces seem to indicate
> problems during the merge phase, usually when the temporary files
> are being combined.
>
> This may or may not be related to the recent discussion about
> #9307, but I am definitely using 8.12, rebuilt with CFLAGS=-g since
> several indicative values were otherwise optimised out, configured
> with --disable-nls --disable-threads, and am running with a fixed
> buffer -S 100M and also --parallel=1 to try to isolate problems from
> possible threading issues.  I was seeing these crashes with a vanilla
> build also.
>
> At least one crash occurred when comparing the very last entry in
> the memory buffer to a non-existent entry, when merging large files.
>
> There was also a crash with total_lines=851122 in mergelines_node,
> which leads to node->hi containing what appears to be garbage, with
> length=2882303761517117516.
>
> The repository changelog seems to indicate that the current development
> release of sort has not changed since 8.12.  Will attempting to track
> the problem down with 8.12 be useful?

Yes, most definitely.
As Pádraig already mentioned, most useful would be instructions
showing how to reproduce the failure, even if part of that is something
like "run this command 30 times" to provoke the rare failure.

> If so I can post stack traces
> and values of relevant variables from the core dump, or post a new
> issue in the tracker, or reopen #9307.  If not, please suggest some
> specific actions I should take to generate useful information.

Thanks for the detailed report and investigation.
Have you reproduced the problem on more than one system?
If not, have you recently run any tests of your system's hardware?
It would be a shame to invest a lot of debugging effort,
if it ends up being a hardware problem with one specific system.




This bug report was last modified 13 years and 267 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.