GNU bug report logs -
#76290
"sort -u" vs "sort -h -u": possible bug
Previous Next
Full log
Message #40 received at submit <at> debbugs.gnu.org (full text, mbox):
On 19.02.25 18:14, Bernhard Voelker wrote:
On 2/18/25 7:45 PM, Rupert Gallagher via GNU coreutils Bug Reports wrote:
> By comparison, human (-h) and numeric (-n) sort cause data loss:
not really. That's the difference between
a)
"I have a list containing numbers; I merely care about numbers and
want to get a unique, sorted list of them."
('sort -h -u')
and
b)
"I have a list containing numbers; I want to have it sorted by
numbers, and then throw away duplicates."
('sort -h | uniq')
The point is: in case a), the numerical value of each non-number entry
is Zero.
I have no issue with the way 'sort -u' is currently working, but the man
page isn't clear at all about the fact that 'sort -h -u' and 'sort -h |
uniq' behave differently.
Specifically, the explanation for -u
-u, --unique
with -c, check for strict ordering; without -c, output
only the first of an equal run
does not provide any explanation what 'equal' or 'run' may mean. Maybe
add something like "where equality is assessed only based on the keys
and rules used to sort the output".
Rainer
This bug report was last modified 92 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.