GNU bug report logs -
#6327
sort fails on some UTF-8 input
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6327 in the body.
You can then email your comments to 6327 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6327
; Package
coreutils
.
(Wed, 02 Jun 2010 07:40:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
River Tarnell <river.tarnell <at> wikimedia.de>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 02 Jun 2010 07:40:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I'm using coreutils 8.5 on Solaris 10.
GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
correctly:
willow% /opt/ts/gnu/bin/sort sort_test.txt
/opt/ts/gnu/bin/sort: string comparison failed: Illegal byte sequence
/opt/ts/gnu/bin/sort: Set LC_ALL='C' to work around the problem.
/opt/ts/gnu/bin/sort: The strings compared were
`\360\222\203\276\360\222\205\226' and
`\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
willow% /usr/bin/sort sort_test.txt
πΎπ
ππ«π«π
willow%
I've attached the example file sort_test.txt.
- river.
[sort_test.txt (text/plain, attachment)]
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6327
; Package
coreutils
.
(Wed, 02 Jun 2010 14:41:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 6327 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
[adding gnulib]
On 06/01/2010 10:51 PM, River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
>
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:
>
> willow% /opt/ts/gnu/bin/sort sort_test.txt
> /opt/ts/gnu/bin/sort: string comparison failed: Illegal byte sequence
> /opt/ts/gnu/bin/sort: Set LC_ALL='C' to work around the problem.
> /opt/ts/gnu/bin/sort: The strings compared were
> `\360\222\203\276\360\222\205\226' and
> `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
Thanks for the report. What locale are you using (that is, the entire
output of 'locale')? I could not reproduce failure using:
$ export LC_ALL; for f in $(locale -a); do LC_ALL=$f || continue;
sort sort_test.txt >/dev/null || { echo $f; break; }; done
on a GNU/Linux system with 732 installed locales. But it is highly
likely that you could be in a non-UTF-8 locale, or that the Solaris
multibyte functions are not as robust as glibc at detecting valid UTF-8
sequences. If it is indeed a bug in Solaris strcoll(), then gnulib can
probably be taught to work around it.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6327
; Package
coreutils
.
(Wed, 02 Jun 2010 15:33:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 6327 <at> debbugs.gnu.org (full text, mbox):
On 02/06/10 05:51, River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
>
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:
>
> willow% /opt/ts/gnu/bin/sort sort_test.txt
> /opt/ts/gnu/bin/sort: string comparison failed: Illegal byte sequence
> /opt/ts/gnu/bin/sort: Set LC_ALL='C' to work around the problem.
> /opt/ts/gnu/bin/sort: The strings compared were
> `\360\222\203\276\360\222\205\226' and
> `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
> willow% /usr/bin/sort sort_test.txt
> πΎπ
> ππ«π«π
> willow%
>
> I've attached the example file sort_test.txt.
I'm not sure what those characters are, but they're valid UTF8
and my linux system here has no issue with sorting them.
Note we just use strcoll() to do the comparison.
What strcoll() are you linking against?
cheers,
PΓ‘draig.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6327
; Package
coreutils
.
(Wed, 02 Jun 2010 19:39:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 6327 <at> debbugs.gnu.org (full text, mbox):
On 06/01/2010 09:51 PM, River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
>
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:
Amusingly enough, on that same test case I found the same problem
with GNU 'sort' that you did, but I also found that Solaris 'sort'
reports that it runs out of memory, even in 64-bit mode. For example:
1010-kiwi $ LC_ALL=en_CA.UTF-8 /usr/bin/sparcv9/sort sort_test.txt
sort: insufficient memory; use -S option to increase allocation
1011-kiwi $ LC_ALL=en_CA.UTF-8 coreutils-8.5/src/sort sort_test.txt
coreutils-8.5/src/sort: string comparison failed: Illegal byte sequence
coreutils-8.5/src/sort: Set LC_ALL='C' to work around the problem.
coreutils-8.5/src/sort: The strings compared were `\360\222\203\276\360\222\205\226' and `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
I expect that the exact failure mode probably depends on the
locale (and/or whether you're using x86 or sparc),
and that GNU 'sort' checks for strcoll failures but
Solaris 'sort' does not (and thus crashes). If my guess is right,
this appears to be a bug in the Solaris strcoll implementation.
I don't see a simple workaround. You might file a bug report
with Sun.
Added tag(s) notabug.
Request was from
Jim Meyering <jim <at> meyering.net>
to
control <at> debbugs.gnu.org
.
(Mon, 08 Aug 2011 06:30:02 GMT)
Full text and
rfc822 format available.
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Mon, 08 Aug 2011 06:30:04 GMT)
Full text and
rfc822 format available.
Notification sent
to
River Tarnell <river.tarnell <at> wikimedia.de>
:
bug acknowledged by developer.
(Mon, 08 Aug 2011 06:30:04 GMT)
Full text and
rfc822 format available.
Message #21 received at 6327-done <at> debbugs.gnu.org (full text, mbox):
River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
>
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:
>
> willow% /opt/ts/gnu/bin/sort sort_test.txt
> /opt/ts/gnu/bin/sort: string comparison failed: Illegal byte sequence
> /opt/ts/gnu/bin/sort: Set LC_ALL='C' to work around the problem.
> /opt/ts/gnu/bin/sort: The strings compared were
> `\360\222\203\276\360\222\205\226' and
> `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
> willow% /usr/bin/sort sort_test.txt
> πΎπ
> ππ«π«π
> willow%
>
> I've attached the example file sort_test.txt.
Thanks for the report.
Since this appears not to be due to any problem
with GNU sort per se, but rather with solaris'
strcoll implementation, I'm closing this coreutils "issue"
and Cc'ing bug-gnulib, in case someone there wants to
pursue the strcoll-replacement approach.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Mon, 05 Sep 2011 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 13 years and 351 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.