GNU bug report logs - #35939
version sort is incorrect with hyphen-minus

Previous Next

Package: coreutils;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Tue, 28 May 2019 00:55:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Ian Jackson <ijackson <at> chiark.greenend.org.uk>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: Assaf Gordon <assafgordon <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>, 35939 <at> debbugs.gnu.org
Subject: bug#35939: version sort is incorrect with hyphen-minus
Date: Thu, 27 Jun 2019 11:25:01 +0100
Vincent Lefevre writes ("Re: bug#35939: version sort is incorrect with hyphen-minus"):
> On 2019-06-26 18:40:50 -0700, Paul Eggert wrote:
> > Perhaps the coreutils manual could be improved to make this all clearer, and
> > perhaps it should refer to the Debian manual if it doesn't already.
> 
> In this case, there should be a new ordering option to provide
> true numeric sort with strings mixing non-negative integers and
> characters.

I think the Debian algorithm is such an algorithm, but it has a
wrinkle which you are not expecting.  Here is the specification:
  https://www.debian.org/doc/debian-policy/ch-controlfields.html#version

Note in particular
  | The lexical comparison is a comparison of ASCII values modified so
  | that all the letters sort earlier than all the non-letters and so
  | that a tilde sorts before anything, even the end of a part

So in the Debian algorithm, `-' sorts after `a'.  I specified this
rule.  I did it mainly because of versions like `1.0beta3', which is
is probably a prerelease of `1.0' and therefore earlier than `1.0.3'.
So `b' has to sort before `.' and my rule seemed the simplest one to
achieve that.  (The version comparison algorithm is a tradeoff between
complexity, and breadth of support for people's then-existing
practices.)  Nowadays Debian invariably writes `1.0~beta3' but when I
invented this scheme I did not include the (invaluable) `~' feature.

When this is extended to UTF-8, presumably the ordering should be an
ordering of unicode scalar values, with the rule about letters
interpreted as referring to anything which Unicode considers a letter.

If you want to test the Debian algorithm and have access to a copy of
dpkg, you can append -1 to both strings to be the "Debian revision",
and prepend "1:" to be the "epoch", and then the middle part should be
compared the same way as sort -V etc.

Vincent, what is your use case for a comparison algorithm which is
like the Debian one but which sorts letters after punctuation ?

Ian.

-- 
Ian Jackson <ijackson <at> chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.




This bug report was last modified 5 years and 351 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.