GNU bug report logs -
#76290
"sort -u" vs "sort -h -u": possible bug
Previous Next
Full log
View this message in rfc822 format
On 2/18/25 7:45 PM, Rupert Gallagher via GNU coreutils Bug Reports wrote:
> By comparison, human (-h) and numeric (-n) sort cause data loss:
not really. That's the difference between
a)
"I have a list containing numbers; I merely care about numbers and want to get a unique, sorted list of them."
('sort -h -u')
and
b)
"I have a list containing numbers; I want to have it sorted by numbers, and then throw away duplicates."
('sort -h | uniq')
The point is: in case a), the numerical value of each non-number entry is Zero.
Consider the following:
$ printf "%s\n" 0 1 X-1 Ab2 3 ma | LC_ALL=C sort -nu
0
1
3
Here, the entries 0, "X-1", "Ab2" and "ma" all have the numerical value 0.
That's why the first Zero is output.
Now let's remove the literal/numerical 0 from the input:
$ printf "%s\n" 1 X-1 Ab2 3 ma | LC_ALL=C sort -nu
X-1
1
3
Now, the first entry which represents numerically 0 is "X-1".
Now even let's put the 0 back into the input, but at the end:
$ printf "%s\n" 1 X-1 Ab2 3 ma 0 | LC_ALL=C sort -nu
X-1
1
3
Still, sort(1) outputs the first entry which has a numerical value of Zero: "X-1".
Have a nice day,
Berny
This bug report was last modified 142 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.