GNU bug report logs - #35939
version sort is incorrect with hyphen-minus

Previous Next

Package: coreutils;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Tue, 28 May 2019 00:55:01 UTC

Severity: normal

Full log


Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Vincent Lefevre <vincent <at> vinc17.net>, bug-coreutils <at> gnu.org
Cc: Ian Jackson <ijackson <at> chiark.greenend.org.uk>
Subject: Re: bug#35939: version sort is incorrect with hyphen-minus
Date: Wed, 26 Jun 2019 12:25:26 -0600
(Adding Ian Jackson for dpkg/debian-version details)

Hello,

On Tue, May 28, 2019 at 02:53:39AM +0200, Vincent Lefevre wrote:
> With GNU coreutils 8.30 under Debian/unstable, I get:
> 
> $ LC_ALL=C ls
> ab-cd  abb  abe
> $ LC_ALL=C ls -v
> abb  abe  ab-cd
> 
> The hyphen-minus character should still be regarded as being less
> than the letters (there are no digits, so both are expected to be
> equivalent). The GNU coreutils manual says:
> 
[...]

Thanks for the report and the clear details.

To summarize,
"ls -v" and "sort -V" (coreutils' version sort) behaves differently than
other implementations in regards to minus character:

    $ printf "%s\n" abb ab-cd | sort -V
    abb
    ab-cd

    $ v1="abb"
    $ v2="ab-cd"
    $ dpkg --compare-versions "$v1" lt "$v2" && printf "$v1\n$v2\n" || printf "$v2\n$v1\n"
    ab-cd
    abb

If I understand correctly,
The reason is that in Debian's version comparison algorithm [1], the minus
character has a special meaning: it separates the "upstream version"
part from the "debian revision" part.

In Debian's implementation [2], a version string is first split into three
parts (epoch, upstream version, debian revision) using ":" for epoch
delimiter and "-" for revision delimiter. Only then the three parts are
compared, separately [3].

[1] https://www.debian.org/doc/debian-policy/ch-controlfields.html#version
[2] https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/lib/dpkg/parsehelp.c#n191
[3] https://git.dpkg.org/cgit/dpkg/dpkg.git/tree/lib/dpkg/version.c#n140

On ther other hand, coreutils' implementation (from gnulib [4]) does not
break version string into three parts - it treats the entire string as a
single "upstream version" part.
The rules for sorting the "upstream version" string say:

  "... The lexical comparison is a comparison of ASCII values modified so
  that all the letters sort earlier than all the non-letters and so that a
  tilde sorts before anything" (from [1])

[4] https://git.savannah.gnu.org/cgit/gnulib.git/tree/lib/filevercmp.c

Therefore, dpkg first seprates "ab" from "cd", then compares "ab" to
"abb" - and 'ab' comes first;
Coreutils compare "ab-cd" to "abb" (or technically, just "ab-" to
"abb"), and because "letters sort earlier than all non-letters", "abb"
comes first.

I hope this helps explain the differences (I also hope this explanation is
correct, and I invite others to chime in).


regards,
 - assaf





This bug report was last modified 5 years and 351 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.