From unknown Mon Aug 18 09:02:17 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11968: Bug in "uniq" Resent-From: Jaime Gaspar Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 17 Jul 2012 21:30:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 11968 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 11968@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.134256054631255 (code B ref -1); Tue, 17 Jul 2012 21:30:03 +0000 Received: (at submit) by debbugs.gnu.org; 17 Jul 2012 21:29:06 +0000 Received: from localhost ([127.0.0.1]:46885 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SrFKL-000880-Dw for submit@debbugs.gnu.org; Tue, 17 Jul 2012 17:29:05 -0400 Received: from eggs.gnu.org ([208.118.235.92]:39051) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SrDXY-0004cT-PR for submit@debbugs.gnu.org; Tue, 17 Jul 2012 15:34:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SrDRi-0004iX-3A for submit@debbugs.gnu.org; Tue, 17 Jul 2012 15:28:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DEAR_SOMETHING, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RECEIVED_FROM_WINDOWS_HOST, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=ham version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:40357) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SrDRi-0004iT-09 for submit@debbugs.gnu.org; Tue, 17 Jul 2012 15:28:34 -0400 Received: from eggs.gnu.org ([208.118.235.92]:59763) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SrDRg-00012e-O3 for bug-coreutils@gnu.org; Tue, 17 Jul 2012 15:28:33 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SrDRf-0004i4-K0 for bug-coreutils@gnu.org; Tue, 17 Jul 2012 15:28:32 -0400 Received: from wm40.inbox.com ([64.135.83.40]:3025) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1SrDRf-0004ht-Eq for bug-coreutils@gnu.org; Tue, 17 Jul 2012 15:28:31 -0400 Received: from inbox.com (127.0.0.1:25) by inbox.com with [InBox.Com SMTP Server] id <1207171018102.WM40> for from ; Tue, 17 Jul 2012 10:18:23 -0800 DomainKey-Signature: q=dns; a=rsa-sha1; c=nofws; d=inbox.com; s=s1; h=mime-version:date:message-id:from:subject:to:content-type; b=J0Lp7STMu66ucshTaHFCkLJQ0Kl9x7BJNQ+PKdximlaAE/NpxeKbnBE2oqH7kSN9PmRn kGHGWGmUiD/thnQ/eWEUJTAclJ18o12LFLUasXCvmqpOO6KGjrO7mYwwIpNA9jknpmPc9V DWP9ryapE5XzApS8mxP7XfAxj0QCiCGd8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; q=dns/txt; d=inbox.com; s=s1; h=mime-version:date:message-id:subject:from:to:content-type; bh=XdoxJcyevfoa4HWn0Go2boXobl4YGF3UE6S8w91hLpw=; b=AXHJVY315v4kA/JqWqOOSgqRVpj+c1IqA5WVhXdHsTQMSX7thLMVLac+vaSSjS2ZpzqO L6RsqQeKe0VAyemCKzEk8aZQi4y+bzNu/fLvs+swSqf8j+NXXJD2q3BMV8HR4Q710hddKw 02Q2+7w3u5R2tO0+8ZnTPeG9CXRHE6UJ8= Mime-Version: 1.0 Date: Tue, 17 Jul 2012 10:18:23 -0800 Message-ID: <21CA1DE9547.000003FFmail@jaimegaspar.com> From: Jaime Gaspar X-Mailer: INBOX.COM X-Originating-IP: 89.180.128.151 Content-Type: multipart/mixed; boundary="------------Boundary-00=_VV687DFXI6GQ5165T2S1" X-IWM-ACU: nIBFoZ0QlDEvg0aNhC_rZJyc0zj5aP49CmSa2o0dZTjPSV0_-yog0Jm4mR8m xkJex064Zh8v89EKOM9L0IQ7MmrPrIwCOwYHeTf_nMKewgQ2TRy_0DqxxAUv iaH-jNfdYYb079z-PMEA@ X-detected-operating-system: by eggs.gnu.org: Windows 2000 SP4, XP SP1+ X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -4.2 (----) X-Mailman-Approved-At: Tue, 17 Jul 2012 17:28:56 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) This is a multi-part message in MIME format. --------------Boundary-00=_VV687DFXI6GQ5165T2S1 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Dear Sir or Madam, I think that there is a bug in =22uniq=22 (version 8.13). The file =22bug.txt=22 attached consists of two lines: - the first one containing a character that looks like a =22v=22 and a line break; - the second one containing a character that looks like a upside down =22v=22 and a line break. In hex: E2 88 A8 0A E2 88 A7 0A When we run =22uniq bug.txt=22 in a terminal, =22uniq=22 outputs a single = line, so =22uniq=22 thinks that the two lines are equal, but they are not. Regards, Jaime Gaspar _____________________________ Homepage: www.jaimegaspar.com E-mail: mail=40jaimegaspar.com ____________________________________________________________ Send any screenshot to your friends in seconds... Works in all emails, instant messengers, blogs, forums and social networks. TRY IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=3Dif2 for FREE --------------Boundary-00=_VV687DFXI6GQ5165T2S1 Content-Type: text/plain Content-Disposition: attachment; filename=bug.txt Content-Transfer-Encoding: base64 4oioCuKIpwo= --------------Boundary-00=_VV687DFXI6GQ5165T2S1-- From debbugs-submit-bounces@debbugs.gnu.org Tue Jul 17 17:55:48 2012 Received: (at control) by debbugs.gnu.org; 17 Jul 2012 21:55:48 +0000 Received: from localhost ([127.0.0.1]:46910 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SrFkB-0000Ic-OL for submit@debbugs.gnu.org; Tue, 17 Jul 2012 17:55:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:29476) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SrFk6-0000IJ-UP; Tue, 17 Jul 2012 17:55:44 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q6HLndIN029406 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 17 Jul 2012 17:49:39 -0400 Received: from [10.3.113.126] (ovpn-113-126.phx2.redhat.com [10.3.113.126]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q6HLnddN004444; Tue, 17 Jul 2012 17:49:39 -0400 Message-ID: <5005DDF2.1040302@redhat.com> Date: Tue, 17 Jul 2012 15:49:38 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 To: Jaime Gaspar Subject: Re: bug#11967: Bug in "uniq" References: <21C89D03B31.000003FCmail@jaimegaspar.com> In-Reply-To: <21C89D03B31.000003FCmail@jaimegaspar.com> X-Enigmail-Version: 1.4.3 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enigD69354A0CDADCE9C9877D3BC" X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: control Cc: control@debbugs.gnu.org, 11967-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigD69354A0CDADCE9C9877D3BC Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable forcemerge 11967 11968 tag 11967 notabug thanks On 07/17/2012 12:17 PM, Jaime Gaspar wrote: > I think that there is a bug in "uniq" (version 8.13). Is this your distro's build? However, I repeated your claim with the latest coreutils.git (post-8.17)., so this is not likely to be a bug in a distro-specific multibyte patch. >=20 > The file "bug.txt" attached consists of two lines: > - the first one containing a character that > looks like a "v" and a line break; > - the second one containing a character that > looks like a upside down "v" and a line break. > In hex: >=20 > E2 88 A8 0A > E2 88 A7 0A Those glyphs that you describe line up with Unicode characters. I bet you are using a locale with UTF-8 character encoding. >=20 > When we run "uniq bug.txt" in a terminal, "uniq" outputs a single line,= so "uniq" thinks that the two lines are equal, but they are not. I can reproduce your symptoms, but only when I fudge my locale: $ LC_ALL=3DC uniq ../bug.txt =E2=88=A8 =E2=88=A7 $ LC_ALL=3Den_US.UTF-8 uniq ../bug.txt =E2=88=A8 $ Remember, 'uniq' is required by POSIX to use the same line comparison techniques as 'sort'; and 'sort' is required to use strcoll() (not strcmp) to compare lines. And in your particular choice of locale, strcoll() happens to state that '=E2=88=A8' and '=E2=88=A7' collate ident= ically; hence uniq is correct in stating that you have a duplicated line according to your current locale. $ LC_ALL=3Den_US.UTF-8 sort ../bug.txt -u --debug sort: using =E2=80=98en_US.UTF-8=E2=80=99 sorting rules =E2=88=A8 _ $ So I'm closing this as not a bug, along with a final pointer to our FAQ: https://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-= order_0021 --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enigD69354A0CDADCE9C9877D3BC Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJQBd3yAAoJEKeha0olJ0NqcZkH/Rtma4PWzVGhanZDAkgZBCKp jSA81NQvpdIethOMZxrtOLu2+XIKPFkn4oGPQdWzUV54uKuskwcxjNDXvZKHBuQE lPRvvK19udHbfpBVfX0AWguXWkupIR2wRcEuD2ujapS441oMV2ndDj1kfF4x/bhG Sqmc9qgp318QbvWyya40tINjD3dANxR9onyff7URly8/bHcSK9wesJVwfq2En+WL eSaFl/bC0Wga8YUIQpziVLOnmCT7DcShbQOJulz4Ul2hvwtk4CSHYI+BL2QdvB0b n5gxGUhxezYjJl8S45nmcLM7TQUCB+7R3Xi3T/z104bWlTp5hYSV3ZSV6aHDQ08= =jVWy -----END PGP SIGNATURE----- --------------enigD69354A0CDADCE9C9877D3BC--