GNU bug report logs -
#7878
"sort" bug--inconsistent single-column sorting influenced by other columns?
Previous Next
Full log
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
"sort" does inconsistent sorting.
I'm pretty sure it has NOTHING to do with the following warning, although I could be totally wrong.
" *** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values. "
See the attached shell script and text files.
bash-3.2$
cat test1.txt
323|1
36|2
406|3
40|4
587|5
cat test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Note that the first column is the same for both files.
sort test1.txt
323|1
36|2
40|4
406|3
587|5
sort test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
The rows are in a different order depending on the dataset--and it is NOT a numeric sort. I'm not even sure it is is ANY type of sort.
sort -k1 test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Trying to fix the problem by focusing on the first column doesn't work.
sort -t "|" test1.txt
323|1
36|2
40|4
406|3
587|5
sort -t "|" test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -t '|' test1.txt
323|1
36|2
40|4
406|3
587|5
sort -t '|' test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -k1 -t "|" test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 -t "|" test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -k1 -t '|' test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 -t '|' test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Trying to fix the problem by including delimiter information doesn't work.
sort -k1d test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1d test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -s test1.txt
323|1
36|2
40|4
406|3
587|5
sort -s test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -s -k1 test1.txt
323|1
36|2
40|4
406|3
587|5
sort -s -k1 test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Neither does dictionary order or stable matching.
sort -g test1.txt
36|2
40|4
323|1
406|3
587|5
sort -g test7.txt
36|C2
40|B4
323|B1
406|B3
587|C5
sort -n test1.txt
36|2
40|4
323|1
406|3
587|5
sort -n test7.txt
36|C2
40|B4
323|B1
406|B3
587|C5
Using numeric or general sorting appears to fix the problem on this numeric example. But why did it sort inconsistently in the first place based on the other contents of the
file rather than just focusing on the first column--even when I told it to?
sort test1.txt | join -a1 -a2 -t "\|" - test7.txt
323|1|B1
36|2|C2
40|4
406|3|B3
40|B4
587|5|C5
Inconsistent sorting when combined with 'join' provides incorrect matches and duplication of records. This is a mess.
sort test1.txt | sort -c
sort test7.txt | sort -c
Yet, sort -c says that it is sorted correctly.
sort test1.txt
323|1
36|2
40|4
406|3
587|5
sort test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort test1.txt | join -a1 -a2 -j1 -t "\|" -e "0" -o "1.1,1.2,2.2" - test7.txt
See COMMENTED Cygwin output.
# $ sort test1.txt
# 323|1
# 36|2
# 406|3
# 40|4
# 587|5
# $ sort test7.txt
# 323|B1
# 36|C2
# 406|B3
# 40|B4
# 587|C5
# $ sort test1.txt | join -a1 -a2 -j1 -t "|" -e "0" -o "1.1,1.2,2.2" - test7.txt
# |B1|1
# |C22
# |B3|3
# |B44
# |C5|5
And finally, Cygwin does this sort consistently across all three examples (but it does mess up the 'join'). ????? Sucks to be me with a defective Cygwin and an unreliable so
rt and work to get done. Any advice?
randall lewis
research scientist
ralewis <at> yahoo-inc.com
mobile 617-671-8294
4401 great america parkway, santa clara, ca, 95054, us
[Message part 2 (text/html, inline)]
[SortBug.sh (application/octet-stream, attachment)]
[test7.txt (text/plain, attachment)]
[test1.txt (text/plain, attachment)]
This bug report was last modified 14 years and 126 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.