GNU bug report logs - #32472
sort doesn't sort and uniq loses data for many non-Latin scripts on UTF-8 locales

Previous Next

Package: coreutils;

Reported by: Vaayda Yaasra <vaaydayaasra <at> gmail.com>

Date: Sat, 18 Aug 2018 16:05:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 32472 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vaayda Yaasra <vaaydayaasra <at> gmail.com>, 32472 <at> debbugs.gnu.org
Subject: Re: bug#32472: sort doesn't sort and uniq loses data for many
 non-Latin scripts on UTF-8 locales
Date: Sat, 18 Aug 2018 10:34:31 -0700
Vaayda Yaasra wrote:
> Here’s an example in Syriac:
> 
> ܡܠܬܐ
> ܒܝܬܐ
> ܒܪܢܫܐ
> ܡܠܬܐ
> 
> Sort produces the following:
> 
> ܡܠܬܐ
> ܒܝܬܐ
> ܡܠܬܐ
> ܒܪܢܫܐ

This is a property of your locale, so I suggest sending a bug report to whoever 
maintains your locale. You should be able to reproduce the problem by bypassing 
GNU 'sort' entirely and using the C strcoll function.

For what it's worth, I observe the problem on Ubuntu 18.04 but not on Fedora 28. 
As Fedora tends to be more up-to-date, perhaps the problem is fixed already in 
glibc.




This bug report was last modified 6 years and 289 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.