From unknown Mon Jun 23 00:37:20 2025 X-Loop: help-debbugs@gnu.org Subject: bug#9740: Bug in sort Resent-From: =?UTF-8?Q?Llu=C3=ADs_?= =?UTF-8?Q?Padr=C3=B3?= Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 12 Oct 2011 18:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 9740 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 9740@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.131844531715850 (code B ref -1); Wed, 12 Oct 2011 18:49:02 +0000 Received: (at submit) by debbugs.gnu.org; 12 Oct 2011 18:48:37 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE3r3-00047a-CG for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:48:37 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE3lF-0003Dz-CM for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:42:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RE3kl-0007sO-49 for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:42:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:58537) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE3kl-0007sD-2f for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:42:07 -0400 Received: from eggs.gnu.org ([140.186.70.92]:32871) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE3kj-000744-ON for bug-coreutils@gnu.org; Wed, 12 Oct 2011 14:42:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RE3ki-0007ri-Sv for bug-coreutils@gnu.org; Wed, 12 Oct 2011 14:42:05 -0400 Received: from sebastian.lsi.upc.edu ([147.83.20.13]:33499) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE3ki-0007qx-FV for bug-coreutils@gnu.org; Wed, 12 Oct 2011 14:42:04 -0400 Received: from leon.ugdsi.upc.edu (leon.lsi.upc.edu [147.83.20.67]) by sebastian.lsi.upc.edu (8.13.8+Sun/8.13.8) with ESMTP id p9CIfxtb009819 for ; Wed, 12 Oct 2011 20:41:59 +0200 (CEST) Received: from [192.168.1.14] (localhost [127.0.0.1]) (authenticated bits=0) by leon.ugdsi.upc.edu (8.13.6/8.13.6/MSA-SMTP-AUTH) with ESMTP id p9CIfriF026314 for ; Wed, 12 Oct 2011 20:41:58 +0200 (CEST) Message-ID: <4E95DF6A.5000703@lsi.upc.edu> Date: Wed, 12 Oct 2011 20:41:46 +0200 From: =?UTF-8?Q?Llu=C3=ADs_?= =?UTF-8?Q?Padr=C3=B3?= User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Lightning/1.0b2 Thunderbird/3.1.15 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.0 (------) X-Mailman-Approved-At: Wed, 12 Oct 2011 14:48:37 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.0 (------) I found a bug in the "sort" utility that happens under utf8 locales, though no character beyond basic ascii is involved in it... I'm using "sort (GNU coreutils) 7.4" from package "coreutils-7.4-2ubuntu3" on ubuntu lucid 10.04.03 LTS Short reproduction of the error follows below. thank you Lluis ------------------------------------------------ ## test file for "sort" ~$ cat testfile abc Z ab Z abcd Z abce Z ## let's set C locale ~$ export LC_ALL="C" ~$ locale LANG=en_US.UTF-8 LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C ## sort works as expected ~$ sort testfile ab Z abc Z abcd Z abce Z ## Let's try another locale ~$ export LC_ALL="en_US.UTF-8" ~$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8 ## Sort fails. Shorter words are sorted after longer words with the same prefix. ~$ sort testfile abcd Z abce Z abc Z ab Z From debbugs-submit-bounces@debbugs.gnu.org Wed Oct 12 15:03:11 2011 Received: (at control) by debbugs.gnu.org; 12 Oct 2011 19:03:12 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE458-0005Ce-Ko for submit@debbugs.gnu.org; Wed, 12 Oct 2011 15:03:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE44z-0005Bt-7r; Wed, 12 Oct 2011 15:03:04 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p9CJ2V3i014855 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 12 Oct 2011 15:02:31 -0400 Received: from [10.3.113.147] (ovpn-113-147.phx2.redhat.com [10.3.113.147]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p9CJ2UsJ028295; Wed, 12 Oct 2011 15:02:30 -0400 Message-ID: <4E95E446.9000402@redhat.com> Date: Wed, 12 Oct 2011 13:02:30 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110928 Fedora/3.1.15-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.4 Thunderbird/3.1.15 MIME-Version: 1.0 To: =?ISO-8859-1?Q?Llu=EDs_Padr=F3?= Subject: Re: bug#9740: Bug in sort References: <4E95DF6A.5000703@lsi.upc.edu> In-Reply-To: <4E95DF6A.5000703@lsi.upc.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id p9CJ2V3i014855 X-Spam-Score: -10.3 (----------) X-Debbugs-Envelope-To: control Cc: 9740-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.3 (----------) tag 9740 notabug thanks On 10/12/2011 12:41 PM, Llu=EDs Padr=F3 wrote: > > I found a bug in the "sort" utility that happens under utf8 locales, th= ough > no character beyond basic ascii is involved in it... Thanks for the report; however, this is almost certainly a case of your=20 locale defining a different collation order than what you were=20 expecting. See the FAQ: https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-= order_0021 > > I'm using "sort (GNU coreutils) 7.4" from package > "coreutils-7.4-2ubuntu3" on ubuntu lucid 10.04.03 LTS The latest version of coreutils, 8.14, includes a --debug option that=20 makes it even more apparent why sort is behaving correctly: > ## Let's try another locale > ~$ export LC_ALL=3D"en_US.UTF-8" > ## Sort fails. Shorter words are sorted after longer words with the sam= e > prefix. > ~$ sort testfile > abcd Z > abce Z > abc Z > ab Z $ printf 'abc Z\nab Z\nabcd Z\nabce Z\n' | sort --debug sort: using `en_US.UTF-8' sorting rules abcd Z ______ abce Z ______ abc Z _____ ab Z ____ So, what exactly is sort comparing? The entire line (because you didn't=20 specify any -k options to limit it to fields). And how does it do the=20 comparison? By strcoll("abcd Z", "abc Z"). And how does strcoll()=20 behave in the en_US.UTF-8 locale? By dictionary collation - that is,=20 case and punctuation (including space) are ignored. So you get the same=20 answer for both strcoll("abcd Z", "abc Z") and for strcoll("abcdz",=20 "abcz") in that locale, and sure enough, d comes before z, so the sort=20 is correct. You already figured out that LC_ALL=3DC forces sorting to honor byte=20 values. But if you insist on using en_US collation, then maybe you=20 should also look at forcing the sort to honor specific fields: $ printf 'abc Z\nab Z\nabcd Z\nabce Z\n' | sort --debug -sb -k1,1 -k2,2 sort: using `en_US.UTF-8' sorting rules ab Z __ _ abc Z ___ _ abcd Z ____ _ abce Z ____ _ --=20 Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org From unknown Mon Jun 23 00:37:20 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.427 (Entity 5.427) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: =?UTF-8?Q?Llu=C3=ADs_?= =?UTF-8?Q?Padr=C3=B3?= Subject: bug#9740: closed (Re: bug#9740: Bug in sort) Message-ID: References: <4E95E446.9000402@redhat.com> <4E95DF6A.5000703@lsi.upc.edu> X-Gnu-PR-Message: they-closed 9740 X-Gnu-PR-Package: coreutils X-Gnu-PR-Keywords: notabug Reply-To: 9740@debbugs.gnu.org Date: Wed, 12 Oct 2011 19:04:03 +0000 Content-Type: multipart/mixed; boundary="----------=_1318446243-20075-1" This is a multi-part message in MIME format... ------------=_1318446243-20075-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #9740: Bug in sort which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 9740@debbugs.gnu.org. --=20 9740: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D9740 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1318446243-20075-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 9740-done) by debbugs.gnu.org; 12 Oct 2011 19:03:09 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE453-0005CU-HZ for submit@debbugs.gnu.org; Wed, 12 Oct 2011 15:03:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE44z-0005Bt-7r; Wed, 12 Oct 2011 15:03:04 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p9CJ2V3i014855 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 12 Oct 2011 15:02:31 -0400 Received: from [10.3.113.147] (ovpn-113-147.phx2.redhat.com [10.3.113.147]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p9CJ2UsJ028295; Wed, 12 Oct 2011 15:02:30 -0400 Message-ID: <4E95E446.9000402@redhat.com> Date: Wed, 12 Oct 2011 13:02:30 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110928 Fedora/3.1.15-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.4 Thunderbird/3.1.15 MIME-Version: 1.0 To: =?ISO-8859-1?Q?Llu=EDs_Padr=F3?= Subject: Re: bug#9740: Bug in sort References: <4E95DF6A.5000703@lsi.upc.edu> In-Reply-To: <4E95DF6A.5000703@lsi.upc.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id p9CJ2V3i014855 X-Spam-Score: -10.3 (----------) X-Debbugs-Envelope-To: 9740-done Cc: 9740-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.3 (----------) tag 9740 notabug thanks On 10/12/2011 12:41 PM, Llu=EDs Padr=F3 wrote: > > I found a bug in the "sort" utility that happens under utf8 locales, th= ough > no character beyond basic ascii is involved in it... Thanks for the report; however, this is almost certainly a case of your=20 locale defining a different collation order than what you were=20 expecting. See the FAQ: https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-= order_0021 > > I'm using "sort (GNU coreutils) 7.4" from package > "coreutils-7.4-2ubuntu3" on ubuntu lucid 10.04.03 LTS The latest version of coreutils, 8.14, includes a --debug option that=20 makes it even more apparent why sort is behaving correctly: > ## Let's try another locale > ~$ export LC_ALL=3D"en_US.UTF-8" > ## Sort fails. Shorter words are sorted after longer words with the sam= e > prefix. > ~$ sort testfile > abcd Z > abce Z > abc Z > ab Z $ printf 'abc Z\nab Z\nabcd Z\nabce Z\n' | sort --debug sort: using `en_US.UTF-8' sorting rules abcd Z ______ abce Z ______ abc Z _____ ab Z ____ So, what exactly is sort comparing? The entire line (because you didn't=20 specify any -k options to limit it to fields). And how does it do the=20 comparison? By strcoll("abcd Z", "abc Z"). And how does strcoll()=20 behave in the en_US.UTF-8 locale? By dictionary collation - that is,=20 case and punctuation (including space) are ignored. So you get the same=20 answer for both strcoll("abcd Z", "abc Z") and for strcoll("abcdz",=20 "abcz") in that locale, and sure enough, d comes before z, so the sort=20 is correct. You already figured out that LC_ALL=3DC forces sorting to honor byte=20 values. But if you insist on using en_US collation, then maybe you=20 should also look at forcing the sort to honor specific fields: $ printf 'abc Z\nab Z\nabcd Z\nabce Z\n' | sort --debug -sb -k1,1 -k2,2 sort: using `en_US.UTF-8' sorting rules ab Z __ _ abc Z ___ _ abcd Z ____ _ abce Z ____ _ --=20 Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org ------------=_1318446243-20075-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 12 Oct 2011 18:48:37 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE3r3-00047a-CG for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:48:37 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RE3lF-0003Dz-CM for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:42:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RE3kl-0007sO-49 for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:42:08 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:58537) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE3kl-0007sD-2f for submit@debbugs.gnu.org; Wed, 12 Oct 2011 14:42:07 -0400 Received: from eggs.gnu.org ([140.186.70.92]:32871) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE3kj-000744-ON for bug-coreutils@gnu.org; Wed, 12 Oct 2011 14:42:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RE3ki-0007ri-Sv for bug-coreutils@gnu.org; Wed, 12 Oct 2011 14:42:05 -0400 Received: from sebastian.lsi.upc.edu ([147.83.20.13]:33499) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RE3ki-0007qx-FV for bug-coreutils@gnu.org; Wed, 12 Oct 2011 14:42:04 -0400 Received: from leon.ugdsi.upc.edu (leon.lsi.upc.edu [147.83.20.67]) by sebastian.lsi.upc.edu (8.13.8+Sun/8.13.8) with ESMTP id p9CIfxtb009819 for ; Wed, 12 Oct 2011 20:41:59 +0200 (CEST) Received: from [192.168.1.14] (localhost [127.0.0.1]) (authenticated bits=0) by leon.ugdsi.upc.edu (8.13.6/8.13.6/MSA-SMTP-AUTH) with ESMTP id p9CIfriF026314 for ; Wed, 12 Oct 2011 20:41:58 +0200 (CEST) Message-ID: <4E95DF6A.5000703@lsi.upc.edu> Date: Wed, 12 Oct 2011 20:41:46 +0200 From: =?ISO-8859-1?Q?Llu=EDs_Padr=F3?= User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110921 Lightning/1.0b2 Thunderbird/3.1.15 MIME-Version: 1.0 To: bug-coreutils@gnu.org Subject: Bug in sort Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.0 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 12 Oct 2011 14:48:37 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.0 (------) I found a bug in the "sort" utility that happens under utf8 locales, though no character beyond basic ascii is involved in it... I'm using "sort (GNU coreutils) 7.4" from package "coreutils-7.4-2ubuntu3" on ubuntu lucid 10.04.03 LTS Short reproduction of the error follows below. thank you Lluis ------------------------------------------------ ## test file for "sort" ~$ cat testfile abc Z ab Z abcd Z abce Z ## let's set C locale ~$ export LC_ALL="C" ~$ locale LANG=en_US.UTF-8 LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C ## sort works as expected ~$ sort testfile ab Z abc Z abcd Z abce Z ## Let's try another locale ~$ export LC_ALL="en_US.UTF-8" ~$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8 ## Sort fails. Shorter words are sorted after longer words with the same prefix. ~$ sort testfile abcd Z abce Z abc Z ab Z ------------=_1318446243-20075-1--