From unknown Wed Sep 10 13:42:51 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32472: sort doesn't sort and uniq loses data for many non-Latin scripts on UTF-8 locales Resent-From: Vaayda Yaasra Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 18 Aug 2018 16:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 32472 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 32472@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.153460827821190 (code B ref -1); Sat, 18 Aug 2018 16:05:02 +0000 Received: (at submit) by debbugs.gnu.org; 18 Aug 2018 16:04:38 +0000 Received: from localhost ([127.0.0.1]:54203 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fr3iQ-0005Vf-2f for submit@debbugs.gnu.org; Sat, 18 Aug 2018 12:04:38 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46618) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fr3YH-0004nn-Gy for submit@debbugs.gnu.org; Sat, 18 Aug 2018 11:54:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fr3YB-0007fi-32 for submit@debbugs.gnu.org; Sat, 18 Aug 2018 11:54:04 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:53008) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fr3YA-0007fe-VZ for submit@debbugs.gnu.org; Sat, 18 Aug 2018 11:54:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39367) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fr3Y9-0003CY-QL for bug-coreutils@gnu.org; Sat, 18 Aug 2018 11:54:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fr3Y8-0007eU-Js for bug-coreutils@gnu.org; Sat, 18 Aug 2018 11:54:01 -0400 Received: from mail-wm0-x22f.google.com ([2a00:1450:400c:c09::22f]:52977) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1fr3Y7-0007dL-1e for bug-coreutils@gnu.org; Sat, 18 Aug 2018 11:53:59 -0400 Received: by mail-wm0-x22f.google.com with SMTP id o11-v6so10168283wmh.2 for ; Sat, 18 Aug 2018 08:53:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=cTCxTWj5nl9WFOLe50qFBVwx2oNTmlJngAOdGPuiciM=; b=FWuikIGBKRN2vCGhDyXOyAiPleoed14/e6SZtyfNGw73Ue9Mb6fX9FfGZQr+6S/Pr4 xhzxnoF+WGzaAtPpbXx1Ki3tflVt9JBbeEkHBkiA4GKRHtLF+fQUG3Otix+joFVBAjMw gPcWIzdWXrZiqpwHrWB/kduFf91yHe7qL1q8/yKpRGiSTzvjuht3LJoKtO/b6CZuLbrC ioOL9gqylmeSfpxP+Erp0OOT+nmM/TFbKnbdYVuh+09gJk7pVW2ycBRUsis9AqxZTqUC 7U6DxSnRtNPwV3zdCg7T3jTYaK50M7MrVtgcq2M9lpzWBuK+9rTGffGvt634pZn8jrgs NlQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=cTCxTWj5nl9WFOLe50qFBVwx2oNTmlJngAOdGPuiciM=; b=VAYWhWqNKoeSOX66ZG5eAPXOQiPclflBQljQJS97sDDw+MaNfdcmXaqaCoQ9OAU03g vjGX56jB24cILaOcAvyrt4SaBY48c1MTj4BSW6VBOWW8NbZ6kmEws+Pcd3/XGKMeIhQD tx2UaCMOyoU5ECg1EoAy/YwMiM/IJMk2ZhRMrrdB2nHXxoEXQaegPUJb/WHiclzzRVeD cjRKZPT8WtJAM8YUSEQG5k71N2OgQaAw43HyEMo+HBmaJQC8/+YwviDRlZRlIGVt8wdr 21yrqVWJckGTXOFAsGvTOmPGTWcfmcXipHSEi2twCOAefqTj/olxKZuGj9W0lb3vRPpJ 8jCQ== X-Gm-Message-State: AOUpUlGSReIQ2RrLvoYPNu7ykOtNgQe+2dPvlDNlaevUfzH57iIHXJ80 UhD5VsLFXWpxdcWVkAskjG9o8FkvehcoWTBeEiGIJ/j6iYY= X-Google-Smtp-Source: AA+uWPz1w6S9VQkMB+I1r/XbzAi4xnVXCL3LXdPDK5hPL47xaZSyAP9GeMEM1AhNi5h9lb/RFfNRH1LIxzMVpopbDcM= X-Received: by 2002:a1c:f611:: with SMTP id w17-v6mr4122545wmc.143.1534607637452; Sat, 18 Aug 2018 08:53:57 -0700 (PDT) MIME-Version: 1.0 From: Vaayda Yaasra Date: Sat, 18 Aug 2018 15:53:44 +0000 Message-ID: Content-Type: multipart/alternative; boundary="000000000000199c750573b7afd2" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Mailman-Approved-At: Sat, 18 Aug 2018 12:04:36 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) --000000000000199c750573b7afd2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I=E2=80=99ve found out that sort doesn=E2=80=99t sort strings for many non-= Latin scripts at all if the locale you=E2=80=99re using is one of en_US.UTF-8, fr_FR.UTF-8 o= r fi_FI.UTF-8 (probably others, too, but these are the ones I have tested). For locales =E2=80=9DC=E2=80=9D and ko_KR.UTF-8, things work as expected. H= ere=E2=80=99s a test case: Open xterm, launch sort and input some lines of Syriac, Ethiopic, Korean, Japanese (Hiragana or Katakana, not Han) or Thai text repeating one of the lines twice. Here=E2=80=99s an example in Syriac: =DC=A1=DC=A0=DC=AC=DC=90 =DC=92=DC=9D=DC=AC=DC=90 =DC=92=DC=AA=DC=A2=DC=AB=DC=90 =DC=A1=DC=A0=DC=AC=DC=90 Sort produces the following: =DC=A1=DC=A0=DC=AC=DC=90 =DC=92=DC=9D=DC=AC=DC=90 =DC=A1=DC=A0=DC=AC=DC=90 =DC=92=DC=AA=DC=A2=DC=AB=DC=90 Here strings are ordered only according to their length but not characters. Even the two instances of the word =DC=A1=DC=A0=DC=AC=DC=90 are found on no= n-adjacent lines (1 and 3). The expected sort order based on Unicode points would be: =DC=92=DC=9D=DC=AC=DC=90 =DC=92=DC=AA=DC=A2=DC=AB=DC=90 =DC=A1=DC=A0=DC=AC=DC=90 =DC=A1=DC=A0=DC=AC=DC=90 If you further pass sort=E2=80=99s output to uniq, it produces the followin= g: =DC=A1=DC=A0=DC=AC=DC=90 =DC=92=DC=AA=DC=A2=DC=AB=DC=90 Here the word on line 2 =DC=92=DC=9D=DC=AC=DC=90 is completely lost since, = like sort, uniq seems to consider all Syriac strings of equal length as the same. Although this issue affects locale, I think it is not a locale issue per se, since perl seems to handle similar cases as expected. For instance, the following command produces the expected result: perl -CDS -e 'use locale; use utf8; @str =3D ("=DC=A1=DC=A0=DC=AC=DC=90", "= =DC=92=DC=9D=DC=AC=DC=90", "=DC=92=DC=AA=DC=A2=DC=AB=DC=90", "=DC=A1=DC=A0=DC=AC=DC=90"); foreach $i (sort @str) { print "$i\n"; }' Curiously enough, codepoints in Plane 1 seem to count as two codepoints of the basic plane, so that if you sort | uniq the following (six codepoints of Syriac and three codepoints of Phoenician): =DC=A5=DC=A0=DC=9D=DC=9F=DC=98=DC=A2 =F0=90=A4=81=F0=90=A4=89=F0=90=A4=95 you get =E2=80=9D=DC=A5=DC=A0=DC=9D=DC=9F=DC=98=DC=A2" as the result wherea= s =E2=80=9D=F0=90=A4=81=F0=90=A4=89=F0=90=A4=95=E2=80=9D is lost. This is o= f course due to the UTF-8 representation of Plane 1 characters as two surrogate characters on the basic plane. Also curiously, LTR scripts seem to conflate with each other and RTL scripts among themselves but not across the directionality line, so that if you sort | uniq the following (three codepoints each in Ethiopic, Hangul, Syriac, Hiragana and Thai): =E1=8B=98=E1=88=98=E1=8A=95 =EC=8A=A4=EB=AC=BC=EC=85=8B =DC=90=DC=A2=DC=90 =E3=82=8F=E3=81=9F=E3=81=97 =E0=B8=9F=E0=B9=89=E0=B8=B2 you are left with: =DC=90=DC=A2=DC=90 =E1=8B=98=E1=88=98=E1=8A=95 That=E2=80=99s one line of Syriac and one line of Ethiopic; everything else= was lost. This issue does not seem to affect most Indic scripts (Devanagari, Bengali, Telugu etc.) or Arabic. For CJK, things work as expected for the main Unicode block (4E00..9FFF) but not for Extension A (3400..4DBF, such as =E3=97=96 or =E3=A1=98 or =E3=B0=8B). For Greek, monotonic accents work = fine but all polytonic letters are conflated (=CE=B1=E1=BD=90=CE=BB=E1=BD=B8=CF=82 and =CE=B1=E1= =BD=90=CE=BB=E1=BF=86=CF=82 conflate to =CE=B1=E1=BD=90=CE=BB=E1=BF=86=CF= =82). For Hebrew, letters and vowel marks work fine but cantillation marks are conflated. I'm using coreutils 8.28 on Ubuntu 18.04. I first reported this bug on Launchpad at https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/1774857 but since nobody hasn't reacted for a couple of months, I decided to post the report here. --000000000000199c750573b7afd2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I=E2=80=99ve found out that sort doesn=E2=80=99t sort s= trings for many non-Latin=20 scripts at all if the locale you=E2=80=99re using is one of en_US.UTF-8,=20 fr_FR.UTF-8 or fi_FI.UTF-8 (probably others, too, but these are the ones I have tested). For locales =E2=80=9DC=E2=80=9D and ko_KR.UTF-8, things wo= rk as=20 expected. Here=E2=80=99s a test case:

Open xterm, launch sort and input some lines of Syriac, Ethiopic,=20 Korean, Japanese (Hiragana or Katakana, not Han) or Thai text repeating=20 one of the lines twice. Here=E2=80=99s an example in Syriac:

=DC=A1=DC=A0=DC=AC=DC=90
=DC=92=DC=9D=DC=AC=DC=90
=DC=92=DC=AA=DC=A2=DC=AB=DC=90
=DC=A1=DC=A0=DC=AC=DC=90

Sort produces the following:

=DC=A1=DC=A0=DC=AC=DC=90
=DC=92=DC=9D=DC=AC=DC=90
=DC=A1=DC=A0=DC=AC=DC=90
=DC=92=DC=AA=DC=A2=DC=AB=DC=90

Here strings are ordered only according to their length but not=20 characters. Even the two instances of the word =DC=A1=DC=A0=DC=AC=DC=90 are= found on=20 non-adjacent lines (1 and 3). The expected sort order based on Unicode=20 points would be:

=DC=92=DC=9D=DC=AC=DC=90
=DC=92=DC=AA=DC=A2=DC=AB=DC=90
=DC=A1=DC=A0=DC=AC=DC=90
=DC=A1=DC=A0=DC=AC=DC=90

If you further pass sort=E2=80=99s output to uniq, it produces the follo= wing:

=DC=A1=DC=A0=DC=AC=DC=90
=DC=92=DC=AA=DC=A2=DC=AB=DC=90

Here the word on line 2 =DC=92=DC=9D=DC=AC=DC=90 is completely lost sinc= e, like sort,=20 uniq seems to consider all Syriac strings of equal length as the same.

Although this issue affects locale, I think it is not a locale issue=20 per se, since perl seems to handle similar cases as expected. For=20 instance, the following command produces the expected result:

perl -CDS -e 'use locale; use utf8; @str =3D ("=DC=A1=DC=A0=DC= =AC=DC=90", "=DC=92=DC=9D=DC=AC=DC=90", "=DC=92=DC=AA= =DC=A2=DC=AB=DC=90", "=DC=A1=DC=A0=DC=AC=DC=90"); foreach $i= (sort @str) { print "$i\n"; }'

Curiously enough, codepoints in Plane 1 seem to count as two=20 codepoints of the basic plane, so that if you sort | uniq the following=20 (six codepoints of Syriac and three codepoints of Phoenician):

=DC=A5=DC=A0=DC=9D=DC=9F=DC=98=DC=A2
=F0=90=A4=81=F0=90=A4=89=F0=90=A4=95

you get =E2=80=9D=DC=A5=DC=A0=DC=9D=DC=9F=DC=98=DC=A2" as the resul= t whereas =E2=80=9D=F0=90=A4=81=F0=90=A4=89=F0=90=A4=95=E2=80=9D is lost. T= his is of=20 course due to the UTF-8 representation of Plane 1 characters as two=20 surrogate characters on the basic plane.

Also curiously, LTR scripts seem to conflate with each other and RTL=20 scripts among themselves but not across the directionality line, so that if you sort | uniq the following (three codepoints each in Ethiopic,=20 Hangul, Syriac, Hiragana and Thai):

=E1=8B=98=E1=88=98=E1=8A=95
=EC=8A=A4=EB=AC=BC=EC=85=8B
=DC=90=DC=A2=DC=90
=E3=82=8F=E3=81=9F=E3=81=97
=E0=B8=9F=E0=B9=89=E0=B8=B2

you are left with:

=DC=90=DC=A2=DC=90
=E1=8B=98=E1=88=98=E1=8A=95

That=E2=80=99s one line of Syriac and one line of Ethiopic; everything e= lse=20 was lost. This issue does not seem to affect most Indic scripts=20 (Devanagari, Bengali, Telugu etc.) or Arabic. For CJK, things work as=20 expected for the main Unicode block (4E00..9FFF) but not for Extension A (3400..4DBF, such as =E3=97=96 or =E3=A1=98 or =E3=B0=8B). For Greek, mono= tonic accents work=20 fine but all polytonic letters are conflated (=CE=B1=E1=BD=90=CE=BB=E1=BD= =B8=CF=82 and =CE=B1=E1=BD=90=CE=BB=E1=BF=86=CF=82 conflate=20 to =CE=B1=E1=BD=90=CE=BB=E1=BF=86=CF=82). For Hebrew, letters and vowel mar= ks work fine but=20 cantillation marks are conflated.

I'm using coreutils 8.28 on Ubu= ntu 18.04. I first reported this bug on Launchpad at https://bugs.launchp= ad.net/ubuntu/+source/coreutils/+bug/1774857 but since nobody hasn'= t reacted for a couple of months, I decided to post the report here.

--000000000000199c750573b7afd2-- From unknown Wed Sep 10 13:42:51 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32472: sort doesn't sort and uniq loses data for many non-Latin scripts on UTF-8 locales Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Sat, 18 Aug 2018 17:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32472 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Vaayda Yaasra , 32472@debbugs.gnu.org Received: via spool by 32472-submit@debbugs.gnu.org id=B32472.153461368210232 (code B ref 32472); Sat, 18 Aug 2018 17:35:02 +0000 Received: (at 32472) by debbugs.gnu.org; 18 Aug 2018 17:34:42 +0000 Received: from localhost ([127.0.0.1]:54218 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fr57a-0002eu-4m for submit@debbugs.gnu.org; Sat, 18 Aug 2018 13:34:42 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49452) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1fr57Y-0002eS-CV for 32472@debbugs.gnu.org; Sat, 18 Aug 2018 13:34:40 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7B87F160F92; Sat, 18 Aug 2018 10:34:34 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id AZTmt0tFBkpX; Sat, 18 Aug 2018 10:34:33 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CA042160FBE; Sat, 18 Aug 2018 10:34:33 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 5bcLmOpj_XNM; Sat, 18 Aug 2018 10:34:32 -0700 (PDT) Received: from [192.168.1.9] (unknown [47.154.30.119]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 7B899160F92; Sat, 18 Aug 2018 10:34:32 -0700 (PDT) References: From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <4784f422-c037-1a04-f0ea-7e4085e1d192@cs.ucla.edu> Date: Sat, 18 Aug 2018 10:34:31 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Vaayda Yaasra wrote: > Here=E2=80=99s an example in Syriac: >=20 > =DC=A1=DC=A0=DC=AC=DC=90 > =DC=92=DC=9D=DC=AC=DC=90 > =DC=92=DC=AA=DC=A2=DC=AB=DC=90 > =DC=A1=DC=A0=DC=AC=DC=90 >=20 > Sort produces the following: >=20 > =DC=A1=DC=A0=DC=AC=DC=90 > =DC=92=DC=9D=DC=AC=DC=90 > =DC=A1=DC=A0=DC=AC=DC=90 > =DC=92=DC=AA=DC=A2=DC=AB=DC=90 This is a property of your locale, so I suggest sending a bug report to w= hoever=20 maintains your locale. You should be able to reproduce the problem by byp= assing=20 GNU 'sort' entirely and using the C strcoll function. For what it's worth, I observe the problem on Ubuntu 18.04 but not on Fed= ora 28.=20 As Fedora tends to be more up-to-date, perhaps the problem is fixed alrea= dy in=20 glibc. From unknown Wed Sep 10 13:42:51 2025 X-Loop: help-debbugs@gnu.org Subject: bug#32472: sort doesn't sort and uniq loses data for many non-Latin scripts on UTF-8 locales Resent-From: Assaf Gordon Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 30 Oct 2018 03:56:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 32472 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Vaayda Yaasra , 32472@debbugs.gnu.org Received: via spool by 32472-submit@debbugs.gnu.org id=B32472.15408717146484 (code B ref 32472); Tue, 30 Oct 2018 03:56:02 +0000 Received: (at 32472) by debbugs.gnu.org; 30 Oct 2018 03:55:14 +0000 Received: from localhost ([127.0.0.1]:52880 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gHL7Z-0001gR-N7 for submit@debbugs.gnu.org; Mon, 29 Oct 2018 23:55:13 -0400 Received: from mail-pf1-f195.google.com ([209.85.210.195]:40666) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gHL7V-0001g1-DK; Mon, 29 Oct 2018 23:55:11 -0400 Received: by mail-pf1-f195.google.com with SMTP id g21-v6so5097267pfi.7; Mon, 29 Oct 2018 20:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=bA+KcQeRZS8aWq9IUm2dXKv1sd3UuG3m1KYxFLpH5M0=; b=mi9YJxZYF0ec67QjvJyKmlPnafeZe2IxHym5Dgfx7gytGa0bMcspYYad4Z1ZP3aC1L MscaY89YYZqv4val3lTMpHEDwFZxYfQO8lK/JceiGabYgOTf6/Lm9y7KWbWc+THmYFbg pSO9n/V6xcZke6GdduwBDUQFWqpV0GCFD6LVkC6SQxp18RH9ei1RzOhpLf7Pxd8kYPrv t4PNMJB3sclOLm26RQTax0d3wosizajQaDMfzmHebZPt79r4QTUD6SvlqJyVNwLRNITP faO/CiQa4Nf+JOuvplCV2bwOw+sdPPtwrLLEjU0jWygxnZkVJCu75EqHOphNDVv5jnEa c1GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=bA+KcQeRZS8aWq9IUm2dXKv1sd3UuG3m1KYxFLpH5M0=; b=ZUAoPqHenITeAzdL0K5BJumpKeRrSHzSIQ9uZUE45MQtglswQKWz0ut8ija7fHpsqD QwmYI34ikk8jMHLlWA4HyexAotDQVloUKJkQ15nGPuznL4MIq/yrNTZDWLG6OvWJcbkY 1jNtpLNgfCI3VYODk0LlMNlmpLall5Qih4q1dqd5+vX35xC7Jxdn+/0xAAW2FlOh2cxr Jg6+ZtrXxnb1DkL71uzhLg1QS7P3ofzwZ2PeyMEZS2yOKWXKrWhNozfqVK4Yp6S6G3vS dsC/MPrXCRhogr8eopynFp++bhKAsrUR5pGG80Mk7QfZpg7p+EUP49o4uRdxU1+5Qt7n 2ajw== X-Gm-Message-State: AGRZ1gLOOI8Gi9di83Qf0dzGaLCtqoIGHKZkzV9Z3+K8wFzruOejEhA/ taVkA0pgYIfM6aWNyG1NO+U6z0kEaek= X-Google-Smtp-Source: AJdET5dBUWGOulQpf6X+xet3XnHXcKieSR7vbJV93g95JCloZL1md76TnbBiHXlJwiXOsKXozFRf5w== X-Received: by 2002:a63:5357:: with SMTP id t23-v6mr10931524pgl.40.1540871702652; Mon, 29 Oct 2018 20:55:02 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id z5-v6sm10362478pfd.99.2018.10.29.20.55.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Oct 2018 20:55:01 -0700 (PDT) References: <4784f422-c037-1a04-f0ea-7e4085e1d192@cs.ucla.edu> From: Assaf Gordon Message-ID: Date: Mon, 29 Oct 2018 21:54:59 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <4784f422-c037-1a04-f0ea-7e4085e1d192@cs.ucla.edu> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 32472 notabug close 32472 stop On 2018-08-18 11:34 a.m., Paul Eggert wrote: > Vaayda Yaasra wrote: >> Here’s an example in Syriac: >> >> ܡܠܬܐ >> ܒܝܬܐ >> ܒܪܢܫܐ >> ܡܠܬܐ >> >> Sort produces the following: >> >> ܡܠܬܐ >> ܒܝܬܐ >> ܡܠܬܐ >> ܒܪܢܫܐ > > This is a property of your locale, so I suggest sending a bug report to > whoever maintains your locale. You should be able to reproduce the > problem by bypassing GNU 'sort' entirely and using the C strcoll function. > > For what it's worth, I observe the problem on Ubuntu 18.04 but not on > Fedora 28. As Fedora tends to be more up-to-date, perhaps the problem is > fixed already in glibc. Given the above, and with no further comments, I'm closing this bug. -assaf