GNU bug report logs -
#6327
sort fails on some UTF-8 input
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
[adding gnulib]
On 06/01/2010 10:51 PM, River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
>
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:
>
> willow% /opt/ts/gnu/bin/sort sort_test.txt
> /opt/ts/gnu/bin/sort: string comparison failed: Illegal byte sequence
> /opt/ts/gnu/bin/sort: Set LC_ALL='C' to work around the problem.
> /opt/ts/gnu/bin/sort: The strings compared were
> `\360\222\203\276\360\222\205\226' and
> `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.
Thanks for the report. What locale are you using (that is, the entire
output of 'locale')? I could not reproduce failure using:
$ export LC_ALL; for f in $(locale -a); do LC_ALL=$f || continue;
sort sort_test.txt >/dev/null || { echo $f; break; }; done
on a GNU/Linux system with 732 installed locales. But it is highly
likely that you could be in a non-UTF-8 locale, or that the Solaris
multibyte functions are not as robust as glibc at detecting valid UTF-8
sequences. If it is indeed a bug in Solaris strcoll(), then gnulib can
probably be taught to work around it.
--
Eric Blake eblake <at> redhat.com +1-801-349-2682
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
This bug report was last modified 13 years and 351 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.