GNU bug report logs -
#18121
A bug in sort.
Previous Next
Reported by: Tom Bryant <mainsequence <at> verizon.net>
Date: Mon, 28 Jul 2014 01:10:01 UTC
Severity: normal
Tags: notabug
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18121 in the body.
You can then email your comments to 18121 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#18121
; Package
coreutils
.
(Mon, 28 Jul 2014 01:10:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Tom Bryant <mainsequence <at> verizon.net>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Mon, 28 Jul 2014 01:10:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I issued a "sort -n hugeFile > sortedHugeFile" and it introduced a very
occasional but destructive "x" in to the data.
The original data consisted of numeric fields, separated by the vertical
bar, "|", and +, - and spaces. It was 25861964610 bytes in size.
The final file had around 10 "x" characters overwritten in it. It too
was 25861964610 bytes in size. I copy the first few lines to give you
an idea of what sort was sorting:
0.01996377896414875189|-1.56937596815334989842|13950|13860|9|0|0|146|158|8|6|2|9697|59367|119|65406|159|161|1101364107|12467|12131|11963|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000076|1|2|1
1
0.05686181376938173604|-1.56865877357861105423|14858|14817|7|0|0|158|160|6|6|2|9584|16962|42|65512|167|167|1229086934|12870|12167|12014|5|5|5|2|2|2|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000185|1|7|1
2
0.08867460878463592766|-1.56748967932357308186|10400|10375|2|0|0|141|140|8|8|5|9290|56797|26|36|141|139|1181024763|7516|6675|6389|5|5|5|2|3|3|13182|10986|20000|20000|20000|99|99|99|99|99|2|310000001|0|0|1000431|1|10|1
3
0.13659213373632231314|-1.56927658619685916896|14012|13924|9|0|0|151|148|8|8|2|9611|52428|153|65530|160|159|1127037907|12431|12038|11937|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000084|1|14|1
4
0.15088146914756625505|-1.57030633530367280670|16079|16329|99|5|0|223|226|1|1|1|9874|37522|0|0|127|127|1085342271|15299|14894|14657|25|25|26|7|10|13|20000|20000|20000|20000|20000|99|99|99|99|99|0|0|0|0|1000007|0|0|1
5
0.17178172876255659585|-1.56903360727616081327|13032|12989|5|0|0|148|145|8|8|2|9647|57825|126|0|157|157|1085364212|11087|10514|10353|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000121|1|24|1
6
0.18379604688637316001|-1.56836539827576126882|15692|16287|39|0|0|195|200|2|2|2|9341|13621|65514|2|197|198|1085364149|14738|14268|13997|5|5|5|3|6|7|20000|20000|20000|20000|20000|99|99|99|99|99|4|1|0|0|1000243|0|0|1
7
The data, FWIW, is an ASCII representation of the UCAC4 star catalog.
Here is an example of a record with the "x" added to it by sort:
V
2.04433377497687374102|0.22403821980488977661|16454|20000|99|1|0|23x24|1|1|2|8603|20560|141|65324|192|191|111893392|14129|13386|13099|25|2|2|4|99|99|20000|20000|20000|20000|20000|99|99|99|99|99|0|30|0|0|118728360|0|0|515
44588
I still have the original and flawed sort if you're interested.
The computer this error occured on was a 16Gb machine with a 2TB drive
and an Intel Quad core processor running Slackware Linux 13.0.
Tom
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18121
; Package
coreutils
.
(Mon, 28 Jul 2014 08:42:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 18121 <at> debbugs.gnu.org (full text, mbox):
On 07/28/2014 01:05 AM, Tom Bryant wrote:
> I issued a "sort -n hugeFile > sortedHugeFile" and it introduced a very occasional but destructive "x" in to the data.
>
> The original data consisted of numeric fields, separated by the vertical bar, "|", and +, - and spaces. It was 25861964610 bytes in size.
>
> The final file had around 10 "x" characters overwritten in it. It too was 25861964610 bytes in size. I copy the first few lines to give you an idea of what sort was sorting:
>
> 0.01996377896414875189|-1.56937596815334989842|13950|13860|9|0|0|146|158|8|6|2|9697|59367|119|65406|159|161|1101364107|12467|12131|11963|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000076|1|2|1 1
> 0.05686181376938173604|-1.56865877357861105423|14858|14817|7|0|0|158|160|6|6|2|9584|16962|42|65512|167|167|1229086934|12870|12167|12014|5|5|5|2|2|2|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000185|1|7|1 2
> 0.08867460878463592766|-1.56748967932357308186|10400|10375|2|0|0|141|140|8|8|5|9290|56797|26|36|141|139|1181024763|7516|6675|6389|5|5|5|2|3|3|13182|10986|20000|20000|20000|99|99|99|99|99|2|310000001|0|0|1000431|1|10|1 3
> 0.13659213373632231314|-1.56927658619685916896|14012|13924|9|0|0|151|148|8|8|2|9611|52428|153|65530|160|159|1127037907|12431|12038|11937|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000084|1|14|1 4
> 0.15088146914756625505|-1.57030633530367280670|16079|16329|99|5|0|223|226|1|1|1|9874|37522|0|0|127|127|1085342271|15299|14894|14657|25|25|26|7|10|13|20000|20000|20000|20000|20000|99|99|99|99|99|0|0|0|0|1000007|0|0|1 5
> 0.17178172876255659585|-1.56903360727616081327|13032|12989|5|0|0|148|145|8|8|2|9647|57825|126|0|157|157|1085364212|11087|10514|10353|5|5|5|2|3|3|20000|20000|20000|20000|20000|99|99|99|99|99|3|1|0|0|1000121|1|24|1 6
> 0.18379604688637316001|-1.56836539827576126882|15692|16287|39|0|0|195|200|2|2|2|9341|13621|65514|2|197|198|1085364149|14738|14268|13997|5|5|5|3|6|7|20000|20000|20000|20000|20000|99|99|99|99|99|4|1|0|0|1000243|0|0|1 7
>
> The data, FWIW, is an ASCII representation of the UCAC4 star catalog.
>
> Here is an example of a record with the "x" added to it by sort:
> V
> 2.04433377497687374102|0.22403821980488977661|16454|20000|99|1|0|23x24|1|1|2|8603|20560|141|65324|192|191|111893392|14129|13386|13099|25|2|2|4|99|99|20000|20000|20000|20000|20000|99|99|99|99|99|0|30|0|0|118728360|0|0|515 44588
>
> I still have the original and flawed sort if you're interested.
>
> The computer this error occured on was a 16Gb machine with a 2TB drive and an Intel Quad core processor running Slackware Linux 13.0.
When processing large amounts of data (25G in this case),
and one sees corruptions in the content but not the size,
it's worth considering hardware errors.
This case might be indicative of single bit errors in RAM,
as the difference between '|' and 'x' is only a single bit.
I would first eliminate that possibility with a RAM checker.
Note sort uses a large memory buffer by default,
so more susceptible than most data processors to issues like this.
If you can reproduce the issue on another system, then
we can start looking at software errors.
thanks,
Pádraig.
p.s. please provide the version of sort
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18121
; Package
coreutils
.
(Mon, 28 Jul 2014 15:51:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 18121 <at> debbugs.gnu.org (full text, mbox):
On 07/28/2014 04:41 AM, Pádraig Brady wrote:
> This case might be indicative of single bit errors in RAM,
> as the difference between '|' and 'x' is only a single bit.
> I would first eliminate that possibility with a RAM checker.
Good diagnosis, thanks. I use ECC RAM in machines I do nontrivial work
on, so attempting to work around this is low priority for me, but
someone with some free time on their hands might look into having 'sort'
detect internal memory corruption via permutation-insensitive
checksums. This wouldn't catch all hardware errors, but it might help
folks who are trying to push too much data through systems with
unreliable memory.
For some eye-opening analysis on how unreliable non-ECC RAM can be, see:
http://www.pugetsystems.com/labs/articles/Advantages-of-ECC-Memory-520/
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18121
; Package
coreutils
.
(Wed, 30 Jul 2014 08:58:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 18121 <at> debbugs.gnu.org (full text, mbox):
tag 18121 notabug
close 18121
stop
It was confirmed off list that this was a RAM issue.
Added tag(s) notabug.
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Wed, 30 Jul 2014 08:58:03 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
18121 <at> debbugs.gnu.org and Tom Bryant <mainsequence <at> verizon.net>
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Wed, 30 Jul 2014 08:58:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 27 Aug 2014 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 11 years and 16 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.