GNU bug report logs - #76290
"sort -u" vs "sort -h -u": possible bug

Previous Next

Package: coreutils;

Reported by: Rupert Gallagher <ruga <at> protonmail.com>

Date: Fri, 14 Feb 2025 17:01:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #37 received at 76290-done <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Rupert Gallagher <ruga <at> protonmail.com>,
 "eggert <at> cs.ucla.edu" <eggert <at> cs.ucla.edu>
Cc: "76290-done <at> debbugs.gnu.org" <76290-done <at> debbugs.gnu.org>
Subject: Re: bug#76290: "sort -u" vs "sort -h -u": possible bug
Date: Wed, 19 Feb 2025 18:14:13 +0100

On 2/18/25 7:45 PM, Rupert Gallagher via GNU coreutils Bug Reports wrote:
> By comparison, human (-h) and numeric (-n) sort cause data loss:

not really.  That's the difference between
a)
  "I have a list containing numbers; I merely care about numbers and want to get a unique, sorted list of them."
  ('sort -h -u')

and
b)
  "I have a list containing numbers; I want to have it sorted by numbers, and then throw away duplicates."
  ('sort -h | uniq')

The point is: in case a), the numerical value of each non-number entry is Zero.

Consider the following:

  $ printf "%s\n" 0 1 X-1 Ab2 3 ma | LC_ALL=C sort -nu
  0
  1
  3

Here, the entries 0, "X-1", "Ab2" and "ma" all have the numerical value 0.
That's why the first Zero is output.

Now let's remove the literal/numerical 0 from the input:

  $ printf "%s\n"  1 X-1 Ab2 3 ma | LC_ALL=C sort -nu
  X-1
  1
  3

Now, the first entry which represents numerically 0 is "X-1".
Now even let's put the 0 back into the input, but at the end:

  $ printf "%s\n"  1 X-1 Ab2 3 ma 0 | LC_ALL=C sort -nu
  X-1
  1
  3

Still, sort(1) outputs the first entry which has a numerical value of Zero: "X-1".

Have a nice day,
Berny





This bug report was last modified 92 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.