GNU bug report logs -
#9321
repeated segfaults sorting large files in 8.12
Previous Next
Reported by: Andras Salamon <andras <at> dns.net>
Date: Thu, 18 Aug 2011 16:11:01 UTC
Severity: normal
Tags: notabug
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On 08/20/2011 09:58 PM, Andras Salamon wrote:
> On Fri, Aug 19, 2011 at 11:54:46PM +0100, Pádraig Brady wrote:
>> On 08/18/2011 03:30 PM, Andras Salamon wrote:
>>> I am seeing repeated (but not reliably repeatable) segmentation faults
>>> sorting datasets in the 100MB-100GB range on a 64-bit Debian system
>>> using GNU sort 8.12 (and also 8.9). Stack traces seem to indicate
>>> problems during the merge phase, usually when the temporary files
>>> are being combined.
>
>> Andras, could you give the exact command line your having issue with,
>> and perhaps make sort inputs available too?
>
> The sort inputs are several-gigabyte-range files containing strings,
> each typically 60 to 140 bytes long, one per line. There are
> many duplicates, and the first reason to sort is to establish the
> distribution of duplicates. I would be happy to make available data
> if I could find a reasonably sized file that causes a reproducible
> segfault. The problem seems easier to reproduce with larger files,
> unfortunately.
>
>> Do the --batch-size=NMERGE or --compress-program=PROG options change anything?
>
> Thanks for the suggestion, I will try forcing smaller batches.
>
> Compressing batches was something I tried early on with no apparent
> change in likelihood of failure, but it led to much slower runtimes.
>
>> Also there were temp file handling changes made in 7.2 so could you try:
>> ftp://ftp.gnu.org/gnu/coreutils/coreutils-7.1.tar.gz
>
> Here are some of the relevant-seeming parts of a gdb session for
> coreutils-7.1.
If this happens with 2.5 year old sort, I'd be leaning
towards a local issue.
> (gdb) bt
> #0 0x000000000040e6bc in memcoll (
> s1=0x7800000005824d58 <Address 0x7800000005824d58 out of bounds>, s1len=15564440312192434243, s2=0x2b2a1a0 "<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066."..., s2len=68)
> at memcoll.c:50
> #1 0x000000000040af4c in xmemcoll (
> s1=0x7800000005824d58 <Address 0x7800000005824d58 out of bounds>, s1len=15564440312192434243, s2=0x2b2a1a0 "<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066.fasta>\n<http://scinets.org/item/cria214s2ria214u225719i~files~P58066."..., s2len=68)
> at xmemcoll.c:43
> #2 0x00000000004059ee in compare (a=0x5b4a7f0, b=0x301dfc0) at sort.c:2059
> #3 0x0000000000406815 in mergefps (files=0x24063e0, ntemps=15, nfiles=15, ofp=0x23ff8e0, output_file=0x24062ec "/home/a/tmp/sortcOqzkh")
> at sort.c:2326
> #4 0x000000000040708f in merge (files=0x24063e0, ntemps=16, nfiles=32, output_file=0x0) at sort.c:2567
> #5 0x000000000040766a in sort (files=0x61c660, nfiles=0, output_file=0x0)
> at sort.c:2699
> #6 0x000000000040908c in main (argc=5, argv=0x7fff149247a8) at sort.c:3425
So the 'a' line struct is corrupted.
a->text = 7800000005824D58
a->length = D800000000000043
Notice the 0x78 and 0xD8.
They should be 0x00.
Now whether this is software or hardware?
It looks like hardware TBH as there are 4
bits incorrectly set in each of those bytes
(which ECC couldn't correct if you have that),
and also each incorrect bit is beside another.
cheers,
Pádraig.
This bug report was last modified 13 years and 267 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.