GNU bug report logs - #76290
"sort -u" vs "sort -h -u": possible bug

Previous Next

Package: coreutils;

Reported by: Rupert Gallagher <ruga <at> protonmail.com>

Date: Fri, 14 Feb 2025 17:01:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #40 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Rainer Canavan <coreutils <at> canavan.de>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#76290: "sort -u" vs "sort -h -u": possible bug
Date: Wed, 19 Feb 2025 20:15:17 +0100
On 19.02.25 18:14, Bernhard Voelker wrote:

On 2/18/25 7:45 PM, Rupert Gallagher via GNU coreutils Bug Reports wrote:

> By comparison, human (-h) and numeric (-n) sort cause data loss:

not really.  That's the difference between
a)
  "I have a list containing numbers; I merely care about numbers and 
want to get a unique, sorted list of them."
  ('sort -h -u')

and
b)
  "I have a list containing numbers; I want to have it sorted by 
numbers, and then throw away duplicates."
  ('sort -h | uniq')

The point is: in case a), the numerical value of each non-number entry 
is Zero.


I have no issue with the way 'sort -u' is currently working, but the man 
page isn't clear at all about the fact that 'sort -h -u' and 'sort -h | 
uniq' behave differently.

Specifically, the explanation for -u

-u, --unique
             with -c, check for strict ordering; without -c, output 
only the first of an equal run

does not provide any explanation what 'equal' or 'run' may mean. Maybe 
add something like "where equality is assessed only based on the keys 
and rules used to sort the output".


Rainer





This bug report was last modified 92 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.