GNU bug report logs - #49239
Unexpected results with sort -V

Previous Next

Package: coreutils;

Reported by: Michael <michael.debertol <at> gmail.com>

Date: Sun, 27 Jun 2021 06:37:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#49239: closed (Unexpected results with sort -V)
Date: Sun, 13 Feb 2022 05:32:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Sat, 12 Feb 2022 21:31:33 -0800
with message-id <80ac3d45-b23f-7730-f9dc-e2c86136a29a <at> cs.ucla.edu>
and subject line Re: bug#49239: Unexpected results with sort -V
has caused the debbugs.gnu.org bug report #49239,
regarding Unexpected results with sort -V
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
49239: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=49239
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Michael <michael.debertol <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Unexpected results with sort -V
Date: Sun, 27 Jun 2021 00:04:53 +0200
[Message part 3 (text/plain, inline)]
Hi,
I found some unexpected results with sort -V. I hope this is the correct
place to send a bug report to [1].
They are caused by a bug in filevercmp inside gnulib, specifically in the
function match_suffix.
I assume it should, as documented, match a file ending as defined by this
regex: /(\.[A-Za-z~][A-Za-z0-9~]*)*$/
However, I found two cases where this does not happen:
1) Two consecutive dots. It is not checked if the character after a dot is
a dot. This results in nothing being matched in a case like "a..a", even
though it should match ".a" according to the regex.
Testcase: printf "a..a\na.+" | sort -V # a..a should be before a.+ I think
2) A trailing dot. If there is no additional character after a dot, it is
still matched (e.g. for "a." the . is matched).
Testcase: printf "a.\na+" | sort -V # I think a+ should be before a.

Additionally I noticed that filevercmp ignores all characters after a NULL
byte.
This can be seen here: printf "a\0a\na" | sort -Vs
sort seems to otherwise consider null bytes (that's why the --stable flag
is necessary in the above example). Is this the expected behavior?

Finally I wanted to ask if it is the expected behavior for filevercmp to do
a strcmp if it can't find another difference, at least from the perspective
of sort.
This means that the --stable flag for sort has no effect in combination
with --version-sort (well, except if the input contains NULL bytes, as
mentioned above :)

I'll attach a rather simple patch to fix 1) and 2) (including test), I hope
that's right.

Have a nice day,
Michael

[1]:
https://www.gnu.org/software/coreutils/manual/html_node/Reporting-bugs-or-incorrect-results.html#Reporting-bugs-or-incorrect-results
[Message part 4 (text/html, inline)]
[diff.txt (text/plain, attachment)]
[Message part 6 (message/rfc822, inline)]
From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Kamil Dudka <kdudka <at> redhat.com>,
 Michael Debertol <michael.debertol <at> gmail.com>
Cc: 49239-done <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#49239: Unexpected results with sort -V
Date: Sat, 12 Feb 2022 21:31:33 -0800
[Message part 7 (text/plain, inline)]
On 6/28/21 10:54, Kamil Dudka wrote:
> You are right.  The matching algorithm was not implemented correctly and
> the patch you attached fixes it.

I looked into Bug#49239 and found some more places where the 
documentation disagreed with the code. I installed the attached patches 
into Gnulib and Coreutils, respectively, which should bring the two into 
agreement and should fix the bugs that Michael reported albeit in a 
different way than his proposed patch. Briefly:

* The code didn't allow file name suffixes to be the entire file name, 
but the documentation did. Here I went with the documentation. I could 
be talked into the other way; it shouldn't matter much either way.

* The code did the preliminary test (without suffixes) using strcmp, the 
documentation said it should use version comparison. Here I went with 
the documentation.

* As Michael mentioned, sort -V mishandled NUL. I fixed this by adding a 
Gnulib function filenvercmp that treats NUL as just another character.

* As Michael also mentioned, filevercmp fell back on strcmp if version 
sort found no difference, which meant sort's --stable flag was 
ineffective. I fixed this by not having filevercmp fall back on strcmp.

* I fixed the two-consecutive dot and trailing-dot bugs Michael 
mentioned, by rewriting the suffix finder to not have that confusing 
READ_ALPHA state variable, and to instead implement the regular 
expression's nested * operators in the usual way with nested loops.

Thanks, Michael, for reporting the problem. I'm boldly closing the 
Coreutils bug report as fixed.
[0001-filevercmp-fix-several-unexpected-results.patch (text/x-patch, attachment)]
[0001-sort-fix-several-version-sort-problems.patch (text/x-patch, attachment)]

This bug report was last modified 3 years and 94 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.