GNU bug report logs - #18121
A bug in sort.

Previous Next

Package: coreutils;

Reported by: Tom Bryant <mainsequence <at> verizon.net>

Date: Mon, 28 Jul 2014 01:10:01 UTC

Severity: normal

Tags: notabug

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18121 in the body.
You can then email your comments to 18121 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#18121; Package coreutils. (Mon, 28 Jul 2014 01:10:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tom Bryant <mainsequence <at> verizon.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 28 Jul 2014 01:10:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tom Bryant <mainsequence <at> verizon.net>
To: bug-coreutils <at> gnu.org
Subject: A bug in sort.
Date: Sun, 27 Jul 2014 20:05:42 -0400
[Message part 1 (text/plain, inline)]
I issued a "sort -n hugeFile > sortedHugeFile" and it introduced a very 
occasional but destructive "x" in to the data.

The original data consisted of numeric fields, separated by the vertical 
bar, "|", and +, - and spaces.  It was 25861964610 bytes in size.

The final file had around 10 "x" characters overwritten in it.  It too 
was  25861964610 bytes in size.  I copy the first few lines to give you 
an idea of what sort was sorting:

0.01996377896414875189|-1.56937596815334989842|13950|13860|9|0|0|146|158|8|6|2|9697|59367|119|65406|159|161|1101364107|12467|12131|11963|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000076|1|2|1 
1
0.05686181376938173604|-1.56865877357861105423|14858|14817|7|0|0|158|160|6|6|2|9584|16962|42|65512|167|167|1229086934|12870|12167|12014|5|5|5|2|2|2|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000185|1|7|1 
2
0.08867460878463592766|-1.56748967932357308186|10400|10375|2|0|0|141|140|8|8|5|9290|56797|26|36|141|139|1181024763|7516|6675|6389|5|5|5|2|3|3|13182|10986|20000|20000|20000|99|99|99|99|99|2|310000001|0|0|1000431|1|10|1 
3
0.13659213373632231314|-1.56927658619685916896|14012|13924|9|0|0|151|148|8|8|2|9611|52428|153|65530|160|159|1127037907|12431|12038|11937|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000084|1|14|1 
4
0.15088146914756625505|-1.57030633530367280670|16079|16329|99|5|0|223|226|1|1|1|9874|37522|0|0|127|127|1085342271|15299|14894|14657|25|25|26|7|10|13|20000|20000|20000|20000|20000|99|99|99|99|99|0|0|0|0|1000007|0|0|1 
5
0.17178172876255659585|-1.56903360727616081327|13032|12989|5|0|0|148|145|8|8|2|9647|57825|126|0|157|157|1085364212|11087|10514|10353|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000121|1|24|1 
6
0.18379604688637316001|-1.56836539827576126882|15692|16287|39|0|0|195|200|2|2|2|9341|13621|65514|2|197|198|1085364149|14738|14268|13997|5|5|5|3|6|7|20000|20000|20000|20000|20000|99|99|99|99|99|4|1|0|0|1000243|0|0|1 
7

The data, FWIW, is an ASCII representation of the UCAC4 star catalog.

Here is an example of a record with the "x" added to it by sort:
                                                                                                                   V
2.04433377497687374102|0.22403821980488977661|16454|20000|99|1|0|23x24|1|1|2|8603|20560|141|65324|192|191|111893392|14129|13386|13099|25|2|2|4|99|99|20000|20000|20000|20000|20000|99|99|99|99|99|0|30|0|0|118728360|0|0|515 
44588

I still have the original and flawed sort if you're interested.

The computer this error occured on was a 16Gb machine with a 2TB drive 
and an Intel Quad core processor running Slackware Linux 13.0.

Tom


[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#18121; Package coreutils. (Mon, 28 Jul 2014 08:42:02 GMT) Full text and rfc822 format available.

Message #8 received at 18121 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Tom Bryant <mainsequence <at> verizon.net>
Cc: 18121 <at> debbugs.gnu.org
Subject: Re: bug#18121: A bug in sort.
Date: Mon, 28 Jul 2014 09:41:08 +0100
On 07/28/2014 01:05 AM, Tom Bryant wrote:
> I issued a "sort -n hugeFile > sortedHugeFile" and it introduced a very occasional but destructive "x" in to the data.
> 
> The original data consisted of numeric fields, separated by the vertical bar, "|", and +, - and spaces.  It was 25861964610 bytes in size.
> 
> The final file had around 10 "x" characters overwritten in it.  It too was  25861964610 bytes in size.  I copy the first few lines to give you an idea of what sort was sorting:
> 
> 0.01996377896414875189|-1.56937596815334989842|13950|13860|9|0|0|146|158|8|6|2|9697|59367|119|65406|159|161|1101364107|12467|12131|11963|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000076|1|2|1 1
> 0.05686181376938173604|-1.56865877357861105423|14858|14817|7|0|0|158|160|6|6|2|9584|16962|42|65512|167|167|1229086934|12870|12167|12014|5|5|5|2|2|2|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000185|1|7|1 2
> 0.08867460878463592766|-1.56748967932357308186|10400|10375|2|0|0|141|140|8|8|5|9290|56797|26|36|141|139|1181024763|7516|6675|6389|5|5|5|2|3|3|13182|10986|20000|20000|20000|99|99|99|99|99|2|310000001|0|0|1000431|1|10|1 3
> 0.13659213373632231314|-1.56927658619685916896|14012|13924|9|0|0|151|148|8|8|2|9611|52428|153|65530|160|159|1127037907|12431|12038|11937|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000084|1|14|1 4
> 0.15088146914756625505|-1.57030633530367280670|16079|16329|99|5|0|223|226|1|1|1|9874|37522|0|0|127|127|1085342271|15299|14894|14657|25|25|26|7|10|13|20000|20000|20000|20000|20000|99|99|99|99|99|0|0|0|0|1000007|0|0|1 5
> 0.17178172876255659585|-1.56903360727616081327|13032|12989|5|0|0|148|145|8|8|2|9647|57825|126|0|157|157|1085364212|11087|10514|10353|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000121|1|24|1 6
> 0.18379604688637316001|-1.56836539827576126882|15692|16287|39|0|0|195|200|2|2|2|9341|13621|65514|2|197|198|1085364149|14738|14268|13997|5|5|5|3|6|7|20000|20000|20000|20000|20000|99|99|99|99|99|4|1|0|0|1000243|0|0|1 7
> 
> The data, FWIW, is an ASCII representation of the UCAC4 star catalog.
> 
> Here is an example of a record with the "x" added to it by sort:
>                                                                                                                    V
> 2.04433377497687374102|0.22403821980488977661|16454|20000|99|1|0|23x24|1|1|2|8603|20560|141|65324|192|191|111893392|14129|13386|13099|25|2|2|4|99|99|20000|20000|20000|20000|20000|99|99|99|99|99|0|30|0|0|118728360|0|0|515 44588
> 
> I still have the original and flawed sort if you're interested.
> 
> The computer this error occured on was a 16Gb machine with a 2TB drive and an Intel Quad core processor running Slackware Linux 13.0.

When processing large amounts of data (25G in this case),
and one sees corruptions in the content but not the size,
it's worth considering hardware errors.

This case might be indicative of single bit errors in RAM,
as the difference between '|' and 'x' is only a single bit.
I would first eliminate that possibility with a RAM checker.

Note sort uses a large memory buffer by default,
so more susceptible than most data processors to issues like this.

If you can reproduce the issue on another system, then
we can start looking at software errors.

thanks,
Pádraig.

p.s. please provide the version of sort




Information forwarded to bug-coreutils <at> gnu.org:
bug#18121; Package coreutils. (Mon, 28 Jul 2014 15:51:02 GMT) Full text and rfc822 format available.

Message #11 received at 18121 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>, 
 Tom Bryant <mainsequence <at> verizon.net>
Cc: 18121 <at> debbugs.gnu.org
Subject: Re: bug#18121: A bug in sort.
Date: Mon, 28 Jul 2014 11:50:14 -0400
On 07/28/2014 04:41 AM, Pádraig Brady wrote:
> This case might be indicative of single bit errors in RAM,
> as the difference between '|' and 'x' is only a single bit.
> I would first eliminate that possibility with a RAM checker.

Good diagnosis, thanks.  I use ECC RAM in machines I do nontrivial work 
on, so attempting to work around this is low priority for me, but 
someone with some free time on their hands might look into having 'sort' 
detect internal memory corruption via permutation-insensitive 
checksums.  This wouldn't catch all hardware errors, but it might help 
folks who are trying to push too much data through systems with 
unreliable memory.

For some eye-opening analysis on how unreliable non-ECC RAM can be, see:

http://www.pugetsystems.com/labs/articles/Advantages-of-ECC-Memory-520/





Information forwarded to bug-coreutils <at> gnu.org:
bug#18121; Package coreutils. (Wed, 30 Jul 2014 08:58:02 GMT) Full text and rfc822 format available.

Message #14 received at 18121 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 18121 <at> debbugs.gnu.org
Subject: Re: bug#18121: A bug in sort.
Date: Wed, 30 Jul 2014 09:57:09 +0100
tag 18121 notabug
close 18121
stop

It was confirmed off list that this was a RAM issue.




Added tag(s) notabug. Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 30 Jul 2014 08:58:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 18121 <at> debbugs.gnu.org and Tom Bryant <mainsequence <at> verizon.net> Request was from Pádraig Brady <P <at> draigBrady.com> to control <at> debbugs.gnu.org. (Wed, 30 Jul 2014 08:58:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 27 Aug 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 16 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.