GNU bug report logs - #41563
Possible bug with 'sort -Vr' version sorting

Previous Next

Package: coreutils;

Reported by: Danie de Jager <danie.dejager <at> striata.com>

Date: Wed, 27 May 2020 15:04:02 UTC

Severity: normal

To reply to this bug, email your comments to 41563 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Wed, 27 May 2020 15:04:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Danie de Jager <danie.dejager <at> striata.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 27 May 2020 15:04:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Danie de Jager <danie.dejager <at> striata.com>
To: bug-coreutils <at> gnu.org
Subject: Possible bug with 'sort -Vr' version sorting
Date: Wed, 27 May 2020 14:07:32 +0200
Hi,

I use sort -Vr to sort version numbers. I noticed this discrepancy on
the latest kernel version from Centos 7.8.

command to get output:
# ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue
| sort -Vr
3.10.0-1127.el7.x86_64
3.10.0-1127.8.2.el7.x86_64
3.10.0-1062.18.1.el7.x86_64

I'd expect the middle value to be the highest version number. Is this
by design or a bug? If it is a bug please let me know if I must log it
somewhere.

Version details:
# sort --version
sort (GNU coreutils) 8.22
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

Regards,
Danie de Jager




Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Wed, 27 May 2020 15:24:02 GMT) Full text and rfc822 format available.

Message #8 received at 41563 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Danie de Jager <danie.dejager <at> striata.com>
Cc: 41563 <at> debbugs.gnu.org
Subject: Re: bug#41563: Possible bug with 'sort -Vr' version sorting
Date: Wed, 27 May 2020 17:23:27 +0200
Hi,

On Wed, May 27, 2020 at 02:07:32PM +0200, Danie de Jager via GNU coreutils Bug Reports wrote:
> I use sort -Vr to sort version numbers. I noticed this discrepancy on
> the latest kernel version from Centos 7.8.
> 
> command to get output:
> # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue
> | sort -Vr
> 3.10.0-1127.el7.x86_64
> 3.10.0-1127.8.2.el7.x86_64
> 3.10.0-1062.18.1.el7.x86_64
> 
> I'd expect the middle value to be the highest version number. Is this
> by design or a bug? If it is a bug please let me know if I must log it
> somewhere.

I'd say this is by design:

Sorting compares runs of non-digits, then runs of digits.  Thus each
"dot" (.) terminates a run of digits.  The "problem" is an unbalanced
number of digit and non-digit runs in the version numbers.

See the following two manual sections:
http://www.gnu.org/software/coreutils/manual/coreutils.html#Version_002dsort-ordering-rules
http://www.gnu.org/software/coreutils/manual/coreutils.html#Punctuation-Characters

The "version sort" is based on Debian's version sort (but different).
It seems as if Red Hat version numbers follow different rules.

HTH,
Erik
-- 
Be water, my friend.
                        -- Bruce Lee




Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Thu, 28 May 2020 06:49:01 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Kamil Dudka <kdudka <at> redhat.com>
To: Danie de Jager <danie.dejager <at> striata.com>
Cc: bug-coreutils <at> gnu.org, 41563 <at> debbugs.gnu.org
Subject: Re: bug#41563: Possible bug with 'sort -Vr' version sorting
Date: Thu, 28 May 2020 08:48:16 +0200
On Wednesday, May 27, 2020 2:07:32 PM CEST Danie de Jager via GNU coreutils 
Bug Reports wrote:
> Hi,
> 
> I use sort -Vr to sort version numbers. I noticed this discrepancy on
> the latest kernel version from Centos 7.8.
> 
> command to get output:
> # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue
> 
> | sort -Vr
> 
> 3.10.0-1127.el7.x86_64
> 3.10.0-1127.8.2.el7.x86_64
> 3.10.0-1062.18.1.el7.x86_64

It is the underscore in the .x86_64 suffix what breaks the version compare 
algorithm.  If you replace the underscore by an alphabetic character, it
sorts as you expect:

# ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue | \
    sed 's/x86_64/x86X64/' | sort -Vr | sed 's/x86X64/x86_64/'

3.10.0-1127.8.2.el7.x86_64
3.10.0-1127.el7.x86_64
3.10.0-1062.18.1.el7.x86_64

Kamil

> I'd expect the middle value to be the highest version number. Is this
> by design or a bug? If it is a bug please let me know if I must log it
> somewhere.
> 
> Version details:
> # sort --version
> sort (GNU coreutils) 8.22
> Copyright (C) 2013 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>. This is free software: you are free to
> change and redistribute it. There is NO WARRANTY, to the extent permitted
> by law.
> 
> Written by Mike Haertel and Paul Eggert.
> 
> Regards,
> Danie de Jager






Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Thu, 28 May 2020 06:49:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Thu, 28 May 2020 08:17:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Danie de Jager <danie.dejager <at> striata.com>
To: Kamil Dudka <kdudka <at> redhat.com>
Cc: bug-coreutils <at> gnu.org, 41563 <at> debbugs.gnu.org
Subject: Re: bug#41563: Possible bug with 'sort -Vr' version sorting
Date: Thu, 28 May 2020 09:11:15 +0200
Thank you for your response! I'll use it accordingly.

On Thu, 28 May 2020 at 08:48, Kamil Dudka <kdudka <at> redhat.com> wrote:
>
> On Wednesday, May 27, 2020 2:07:32 PM CEST Danie de Jager via GNU coreutils
> Bug Reports wrote:
> > Hi,
> >
> > I use sort -Vr to sort version numbers. I noticed this discrepancy on
> > the latest kernel version from Centos 7.8.
> >
> > command to get output:
> > # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue
> >
> > | sort -Vr
> >
> > 3.10.0-1127.el7.x86_64
> > 3.10.0-1127.8.2.el7.x86_64
> > 3.10.0-1062.18.1.el7.x86_64
>
> It is the underscore in the .x86_64 suffix what breaks the version compare
> algorithm.  If you replace the underscore by an alphabetic character, it
> sorts as you expect:
>
> # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue | \
>     sed 's/x86_64/x86X64/' | sort -Vr | sed 's/x86X64/x86_64/'
>
> 3.10.0-1127.8.2.el7.x86_64
> 3.10.0-1127.el7.x86_64
> 3.10.0-1062.18.1.el7.x86_64
>
> Kamil
>
> > I'd expect the middle value to be the highest version number. Is this
> > by design or a bug? If it is a bug please let me know if I must log it
> > somewhere.
> >
> > Version details:
> > # sort --version
> > sort (GNU coreutils) 8.22
> > Copyright (C) 2013 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>. This is free software: you are free to
> > change and redistribute it. There is NO WARRANTY, to the extent permitted
> > by law.
> >
> > Written by Mike Haertel and Paul Eggert.
> >
> > Regards,
> > Danie de Jager
>
>




Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Thu, 28 May 2020 08:17:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Thu, 28 May 2020 09:14:01 GMT) Full text and rfc822 format available.

Message #23 received at 41563 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Kamil Dudka <kdudka <at> redhat.com>
Cc: Danie de Jager <danie.dejager <at> striata.com>, 41563 <at> debbugs.gnu.org
Subject: Re: bug#41563: Possible bug with 'sort -Vr' version sorting
Date: Thu, 28 May 2020 11:02:43 +0200
Hi,

On Thu, May 28, 2020 at 08:48:16AM +0200, Kamil Dudka wrote:
> On Wednesday, May 27, 2020 2:07:32 PM CEST Danie de Jager via GNU coreutils 
> Bug Reports wrote:
> > 
> > I use sort -Vr to sort version numbers. I noticed this discrepancy on
> > the latest kernel version from Centos 7.8.
> > 
> > command to get output:
> > # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue | sort -Vr
> > 
> > 3.10.0-1127.el7.x86_64
> > 3.10.0-1127.8.2.el7.x86_64
> > 3.10.0-1062.18.1.el7.x86_64
> 
> It is the underscore in the .x86_64 suffix what breaks the version compare 
> algorithm.  If you replace the underscore by an alphabetic character, it
> sorts as you expect:
> 
> # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue | \
>     sed 's/x86_64/x86X64/' | sort -Vr | sed 's/x86X64/x86_64/'
> 
> 3.10.0-1127.8.2.el7.x86_64
> 3.10.0-1127.el7.x86_64
> 3.10.0-1062.18.1.el7.x86_64

That is interesting.  The underscore can be replaced by a digit or even
removed as well.  Replacing it with a dot (.)  does not help.

This differs from Debian's "dpkg --compare-versions", where the results
of the comparison do not change by replacing the underscore with a
digit or character, or by removing it (the underscore is identified as
problematic, though):

    $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86_64 lt 3.10.0-1127.el7.x86_64 && echo less
    dpkg: warning: version '3.10.0-1127.8.2.el7.x86_64' has bad syntax: invalid character in revision number
    dpkg: warning: version '3.10.0-1127.el7.x86_64' has bad syntax: invalid character in revision number
    less
    $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86.64 lt 3.10.0-1127.el7.x86.64 && echo less
    less
    $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86X64 lt 3.10.0-1127.el7.x86X64 && echo less
    less
    $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86164 lt 3.10.0-1127.el7.x86164 && echo less
    less
    $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x8664 lt 3.10.0-1127.el7.x8664 && echo less
    less

The way I read the GNU Coreutils documentation, removing the underscore
should not affect the version sort comparison result.

Thanks,
Erik
-- 
There is no remedy for anything in life.
                        -- Ernest Hemingway




Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Thu, 28 May 2020 11:02:01 GMT) Full text and rfc822 format available.

Message #26 received at 41563 <at> debbugs.gnu.org (full text, mbox):

From: Kamil Dudka <kdudka <at> redhat.com>
To: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
Cc: Danie de Jager <danie.dejager <at> striata.com>, 41563 <at> debbugs.gnu.org
Subject: Re: bug#41563: Possible bug with 'sort -Vr' version sorting
Date: Thu, 28 May 2020 13:01:05 +0200
On Thursday, May 28, 2020 11:02:43 AM CEST Erik Auerswald wrote:
> On Thu, May 28, 2020 at 08:48:16AM +0200, Kamil Dudka wrote:
> > It is the underscore in the .x86_64 suffix what breaks the version compare
> > algorithm.  If you replace the underscore by an alphabetic character, it
> > sorts as you expect:
> > 
> > # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue | \
> > 
> >     sed 's/x86_64/x86X64/' | sort -Vr | sed 's/x86X64/x86_64/'
> > 
> > 3.10.0-1127.8.2.el7.x86_64
> > 3.10.0-1127.el7.x86_64
> > 3.10.0-1062.18.1.el7.x86_64
> 
> That is interesting.  The underscore can be replaced by a digit or even
> removed as well.  Replacing it with a dot (.)  does not help.

If there is no underscore, the .el7.x86X64 suffix is recognized as file
extension.  See the corresponding documentation:

https://www.gnu.org/software/coreutils/manual/html_node/Special-handling-of-file-extensions.html

> This differs from Debian's "dpkg --compare-versions", where the results
> of the comparison do not change by replacing the underscore with a
> digit or character, or by removing it (the underscore is identified as
> problematic, though):

The problem is that `dpkg --compare-versions` expects version numbers only.
It does not work well if you feed it with file names including extensions:

$ dpkg --compare-versions 3.10.0-1127.8.2 '>>' 3.10.0-1127 && echo '>>' || echo '<='
>>
$ dpkg --compare-versions 3.10.0-1127.8.2.bz2 '>>' 3.10.0-1127.bz2 && echo '>>' || echo '<='
<=

>     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86_64 lt
> 3.10.0-1127.el7.x86_64 && echo less dpkg: warning: version
> '3.10.0-1127.8.2.el7.x86_64' has bad syntax: invalid character in revision
> number dpkg: warning: version '3.10.0-1127.el7.x86_64' has bad syntax:
> invalid character in revision number less
>     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86.64 lt
> 3.10.0-1127.el7.x86.64 && echo less less
>     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86X64 lt
> 3.10.0-1127.el7.x86X64 && echo less less
>     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86164 lt
> 3.10.0-1127.el7.x86164 && echo less less
>     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x8664 lt
> 3.10.0-1127.el7.x8664 && echo less less
> 
> The way I read the GNU Coreutils documentation, removing the underscore
> should not affect the version sort comparison result.

Not really.  See the link above to the documentation that covers this part.

Kamil

> Thanks,
> Erik






Information forwarded to bug-coreutils <at> gnu.org:
bug#41563; Package coreutils. (Thu, 28 May 2020 12:05:02 GMT) Full text and rfc822 format available.

Message #29 received at 41563 <at> debbugs.gnu.org (full text, mbox):

From: Erik Auerswald <auerswal <at> unix-ag.uni-kl.de>
To: Kamil Dudka <kdudka <at> redhat.com>
Cc: Danie de Jager <danie.dejager <at> striata.com>, 41563 <at> debbugs.gnu.org
Subject: Re: bug#41563: Possible bug with 'sort -Vr' version sorting
Date: Thu, 28 May 2020 14:04:22 +0200
Hi,

On Thu, May 28, 2020 at 01:01:05PM +0200, Kamil Dudka wrote:
> On Thursday, May 28, 2020 11:02:43 AM CEST Erik Auerswald wrote:
> > On Thu, May 28, 2020 at 08:48:16AM +0200, Kamil Dudka wrote:
> > > It is the underscore in the .x86_64 suffix what breaks the version compare
> > > algorithm.  If you replace the underscore by an alphabetic character, it
> > > sorts as you expect:
> > > 
> > > # ls -t /boot/vmlinuz-* | sed "s/\/boot\/vmlinuz-//g" | grep -v rescue | \
> > > 
> > >     sed 's/x86_64/x86X64/' | sort -Vr | sed 's/x86X64/x86_64/'
> > > 
> > > 3.10.0-1127.8.2.el7.x86_64
> > > 3.10.0-1127.el7.x86_64
> > > 3.10.0-1062.18.1.el7.x86_64
> > 
> > That is interesting.  The underscore can be replaced by a digit or even
> > removed as well.  Replacing it with a dot (.)  does not help.
> 
> If there is no underscore, the .el7.x86X64 suffix is recognized as file
> extension.  See the corresponding documentation:
> 
> https://www.gnu.org/software/coreutils/manual/html_node/Special-handling-of-file-extensions.html

Ah, el7.x86X64 or el7.x86164 is seen as an extension (i.e., a sequence
of suffixes), but el7.x86.64 or el7.x86_64 is not.  Since .8.2 does not
contain a letter, it is not seen as part of the extension.  Very subtle,
but documented.

Trvia: the usual 7-Zip extension .7z is no suffix resp. file extension
for this algorithm (according to the documented definition).

Thus changing the platform indicator to look like a file extension,
and relying on the behavior that the distribution version information
is interpreted as a file extension as well, you create a file extension
where initially there was none.  This file extension is then ignored for
the comparison, unless that comparison results in equality.  This seems
to be a useful hack when working with Red Hat products.

Fascinating. :-)

> > This differs from Debian's "dpkg --compare-versions", where the results
> > of the comparison do not change by replacing the underscore with a
> > digit or character, or by removing it (the underscore is identified as
> > problematic, though):
> 
> The problem is that `dpkg --compare-versions` expects version numbers only.
> It does not work well if you feed it with file names including extensions:

I did not, as you can see in the examples.  I gave version information
to dpkg, though not Debian version information.  So of course this is
illegal input and the GIGO principle applies.

> $ dpkg --compare-versions 3.10.0-1127.8.2 '>>' 3.10.0-1127 && echo '>>' || echo '<='
> >>
> $ dpkg --compare-versions 3.10.0-1127.8.2.bz2 '>>' 3.10.0-1127.bz2 && echo '>>' || echo '<='
> <=
> 
> >     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86_64 lt
> > 3.10.0-1127.el7.x86_64 && echo less dpkg: warning: version
> > '3.10.0-1127.8.2.el7.x86_64' has bad syntax: invalid character in revision
> > number dpkg: warning: version '3.10.0-1127.el7.x86_64' has bad syntax:
> > invalid character in revision number less
> >     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86.64 lt
> > 3.10.0-1127.el7.x86.64 && echo less less
> >     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86X64 lt
> > 3.10.0-1127.el7.x86X64 && echo less less
> >     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x86164 lt
> > 3.10.0-1127.el7.x86164 && echo less less
> >     $ dpkg --compare-versions 3.10.0-1127.8.2.el7.x8664 lt
> > 3.10.0-1127.el7.x8664 && echo less less
> > 
> > The way I read the GNU Coreutils documentation, removing the underscore
> > should not affect the version sort comparison result.
> 
> Not really.  See the link above to the documentation that covers this part.

Yes, you are correct.  I find this quite surprising, and see it as another
example where --version-sort fails to deliver on the short form promise
of "natural sort."  I am well aware that the long form description shows
that the sorting order is not "natural," but rather strange IMHO.

    $ sort --help | grep -- --version-sort
      -V, --version-sort          natural sort of (version) numbers within text

But then I do not even understand what is "natural" about version numbers
anyway. ;-)

Thanks,
Erik
-- 
[M]ost parts of this industry just work by chance.
                        -- Thomas Gleixner




This bug report was last modified 5 years and 25 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.