GNU bug report logs -
#14226
Sort -c takes in account fields that were outside sorting scope
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 14226 in the body.
You can then email your comments to 14226 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#14226
; Package
coreutils
.
(Thu, 18 Apr 2013 17:11:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Camion SPAM <camion_spam-gnubugs <at> yahoo.fr>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 18 Apr 2013 17:11:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
The following commands report an error on equals lines because field outside sorting scope were not sorted
$ cat <<'.' |
> AAA AAA
> BBB BBB
> ZZZ CCC
> DDD DDD
> BBC EEE
> BBD EEE
> BBC EEE
> BBE EEE
> CCC FFF
> DDD GGG
> EEE HHH
> .
> LANG=C sort -k 2,2 -c
sort: -:7: disorder: BBC EEE
[Message part 2 (text/html, inline)]
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Thu, 18 Apr 2013 19:59:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Thu, 18 Apr 2013 19:59:03 GMT)
Full text and
rfc822 format available.
Notification sent
to
Camion SPAM <camion_spam-gnubugs <at> yahoo.fr>
:
bug acknowledged by developer.
(Thu, 18 Apr 2013 19:59:03 GMT)
Full text and
rfc822 format available.
Message #12 received at 14226-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
tag 14226 notabug
thanks
On 04/18/2013 09:04 AM, Camion SPAM wrote:
> The following commands report an error on equals lines because field outside sorting scope were not sorted
How refreshing to get a non-FAQ report on sort - you made me actually do
some research! The fact that you used LANG=C to pin the locale is also
nice (most people aren't aware that most reported non-bugs in sort are
due to locale issues). However, I still think sort is doing the right
thing.
>
> $ cat <<'.' |
>> AAA AAA
>> BBB BBB
>> ZZZ CCC
>> DDD DDD
>> BBC EEE
>> BBD EEE
>> BBC EEE
>> BBE EEE
>> CCC FFF
>> DDD GGG
>> EEE HHH
>> .
>> LANG=C sort -k 2,2 -c
> sort: -:7: disorder: BBC EEE
POSIX says:
"Except when the -u option is specified, lines that otherwise compare
equal shall be ordered as if none of the options -d, -f, -i, -n, or -k
were present (but with -r still in effect, if it was specified) and with
all bytes in the lines significant to the comparison."
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html
In your example, you did not use -u, and the key you specified was
duplicated between two rows, so POSIX requires sort to break the tie by
comparing the entire line, and the entire line is indeed different.
For comparison purposes, I checked out /usr/bin/sort on Solaris 10; it
has the same behavior of declaring your input unsorted.
/usr/xpg4/bin/sort on the same machine is not POSIX compliant, in that
it lacks -C, and treats -c like the POSIX -C; but it also had non-zero
exit status on your sample.
If you don't like the POSIX behavior of a mandated entire line as a sort
key of final resort, then you should use the GNU extension of -s, I
tested that 'LC_ALL=C sort -k2,2 -c -s' has no problems with your
example. To see the difference of using or not using the entire line as
the final sort key, replace -c by --debug, both with and without -s (you
can't use -c and --debug at the same time, unfortunately). However,
remember that not all sort implementations have -s, so there is no
standard way to get the behavior you are after.
I'm closing this as not a bug, although you may continue to add comments
or questions to this topic.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#14226
; Package
coreutils
.
(Sat, 20 Apr 2013 01:01:02 GMT)
Full text and
rfc822 format available.
Message #15 received at 14226-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Well, I'm satisfied with your suggestion of adding -s.
I would just add that this might be something that should be added to the man page.
________________________________
De : Eric Blake <eblake <at> redhat.com>
À : Camion SPAM <camion_spam-gnubugs <at> yahoo.fr>
Cc : 14226-done <at> debbugs.gnu.org
Envoyé le : Jeudi 18 avril 2013 21h53
Objet : Re: bug#14226: Sort -c takes in account fields that were outside sorting scope
tag 14226 notabug
thanks
On 04/18/2013 09:04 AM, Camion SPAM wrote:
> The following commands report an error on equals lines because field outside sorting scope were not sorted
How refreshing to get a non-FAQ report on sort - you made me actually do
some research! The fact that you used LANG=C to pin the locale is also
nice (most people aren't aware that most reported non-bugs in sort are
due to locale issues). However, I still think sort is doing the right
thing.
>
> $ cat <<'.' |
>> AAA AAA
>> BBB BBB
>> ZZZ CCC
>> DDD DDD
>> BBC EEE
>> BBD EEE
>> BBC EEE
>> BBE EEE
>> CCC FFF
>> DDD GGG
>> EEE HHH
>> .
>> LANG=C sort -k 2,2 -c
> sort: -:7: disorder: BBC EEE
POSIX says:
"Except when the -u option is specified, lines that otherwise compare
equal shall be ordered as if none of the options -d, -f, -i, -n, or -k
were present (but with -r still in effect, if it was specified) and with
all bytes in the lines significant to the comparison."
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html
In your example, you did not use -u, and the key you specified was
duplicated between two rows, so POSIX requires sort to break the tie by
comparing the entire line, and the entire line is indeed different.
For comparison purposes, I checked out /usr/bin/sort on Solaris 10; it
has the same behavior of declaring your input unsorted.
/usr/xpg4/bin/sort on the same machine is not POSIX compliant, in that
it lacks -C, and treats -c like the POSIX -C; but it also had non-zero
exit status on your sample.
If you don't like the POSIX behavior of a mandated entire line as a sort
key of final resort, then you should use the GNU extension of -s, I
tested that 'LC_ALL=C sort -k2,2 -c -s' has no problems with your
example. To see the difference of using or not using the entire line as
the final sort key, replace -c by --debug, both with and without -s (you
can't use -c and --debug at the same time, unfortunately). However,
remember that not all sort implementations have -s, so there is no
standard way to get the behavior you are after.
I'm closing this as not a bug, although you may continue to add comments
or questions to this topic.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#14226
; Package
coreutils
.
(Sat, 20 Apr 2013 16:00:01 GMT)
Full text and
rfc822 format available.
Message #18 received at 14226-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 04/19/2013 06:56 PM, Camion SPAM wrote:
> Well, I'm satisfied with your suggestion of adding -s.
> I would just add that this might be something that should be added to the man page.
The man page is generated from 'sort --help', and there, we are trying
to favor brevity. But it might indeed be worth adding to the 'info
sort' page. I read that page, and notice that while '-c' and '-C' are
mentioned first, details about '-k' and '-s' are several screenfuls
away, so it is not obvious that those two options can affect the
behavior of -c. You could help by reading that page, and finding the
spot(s) where adding a sentence would have helped you; if you could
propose the location and wording to add, then we can work with that to
turn it into a formal patch.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#14226
; Package
coreutils
.
(Sun, 21 Apr 2013 01:05:01 GMT)
Full text and
rfc822 format available.
Message #21 received at 14226-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Then, I suggest adding after -c :
-c, --check, --check=diagnose-first check for sorted input; do not sort
>> Note that, when selecting fields with -k while using the -c switch, user should probably also disable the last-resort comparison with -s to obtain the expected result.
after -k :
-k, --key=POS1[,POS2] start a key at POS1 (origin 1), end it at POS2 (default end of line)
>> By default, if two lines are considered equivalent regarding the key specification, sort will try to resolve the equivalence by comparing the whole lines. This is called the "last-resort comparison" (and may be disabled with -s)
and after -s :
-s, --stable stabilize sort by disabling last-resort comparison
>> (See -k)
________________________________
De : Eric Blake <eblake <at> redhat.com>
À : Camion SPAM <camion_spam-gnubugs <at> yahoo.fr>
Cc : "14226-done <at> debbugs.gnu.org" <14226-done <at> debbugs.gnu.org>
Envoyé le : Samedi 20 avril 2013 17h54
Objet : Re: bug#14226: Sort -c takes in account fields that were outside sorting scope
On 04/19/2013 06:56 PM, Camion SPAM wrote:
> Well, I'm satisfied with your suggestion of adding -s.
> I would just add that this might be something that should be added to the man page.
The man page is generated from 'sort --help', and there, we are trying
to favor brevity. But it might indeed be worth adding to the 'info
sort' page. I read that page, and notice that while '-c' and '-C' are
mentioned first, details about '-k' and '-s' are several screenfuls
away, so it is not obvious that those two options can affect the
behavior of -c. You could help by reading that page, and finding the
spot(s) where adding a sentence would have helped you; if you could
propose the location and wording to add, then we can work with that to
turn it into a formal patch.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[Message part 2 (text/html, inline)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 19 May 2013 11:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 12 years and 112 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.