From unknown Sat Aug 16 18:32:21 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16168: uniq mis-handles UTF8 (8bit) characters Resent-From: Shlomo Urbach Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Mon, 16 Dec 2013 16:56:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 16168 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 16168@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.138721295414015 (code B ref -1); Mon, 16 Dec 2013 16:56:03 +0000 Received: (at submit) by debbugs.gnu.org; 16 Dec 2013 16:55:54 +0000 Received: from localhost ([127.0.0.1]:54164 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VsbST-0003dy-Lr for submit@debbugs.gnu.org; Mon, 16 Dec 2013 11:55:54 -0500 Received: from eggs.gnu.org ([208.118.235.92]:36300) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VsYZG-0006bx-3e for submit@debbugs.gnu.org; Mon, 16 Dec 2013 08:50:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VsYZE-0003HM-IG for submit@debbugs.gnu.org; Mon, 16 Dec 2013 08:50:41 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_20,HTML_MESSAGE, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:37357) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsYZE-0003HH-Em for submit@debbugs.gnu.org; Mon, 16 Dec 2013 08:50:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38021) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsYZD-0006v1-8C for bug-coreutils@gnu.org; Mon, 16 Dec 2013 08:50:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VsYZC-0003Gw-5b for bug-coreutils@gnu.org; Mon, 16 Dec 2013 08:50:39 -0500 Received: from mail-ob0-x22b.google.com ([2607:f8b0:4003:c01::22b]:37489) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsYZC-0003Gh-0P for bug-coreutils@gnu.org; Mon, 16 Dec 2013 08:50:38 -0500 Received: by mail-ob0-f171.google.com with SMTP id wp18so4810237obc.16 for ; Mon, 16 Dec 2013 05:50:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=Fw8+3U/lLfzgLMc4Wfblpvy6N9I6j9t06eLtOo/fxFg=; b=Kq8IaN8CSaim6OqUhWbl/aZZEEVEzXR4r+9FNQkKIetTBsNEwF/ElMyF3lhXAItOEK /yNsB0sVB8WhbuM0BP7rReFqTPkPVzCilK7oMzD3sKiN8a/TwVl+9rlRett8HndNW+6r XUuYAZQjbUe6rHUjDLWzQzsEcNUUUjMvypa7siXTzpA5rEU8v3HxTuz8MlPpQOy64bJo Qf6Egg5NljbYySIj4AkT5RRDSBzQWvkaIM0mla+2fykCtnvoEVaB3BmGvUUPH+LuP9v8 w7Jc3NnBd0QCioFBpAhFwDr+kS1FVxwX/lmcKt35hO5a6K47TzVxa6Xvd5X7cDf7Z0ui aIBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=Fw8+3U/lLfzgLMc4Wfblpvy6N9I6j9t06eLtOo/fxFg=; b=ewId979zGTu9XJVUmeUCPlFgE67dDOy/T+ZgJgCawCrc6tTvvaaBMZPPuiK8ZEeFra iJ/WrkoX2lQaa3cy6xngtGMglydAXw5y65plY6x8q/bOoRm4jEKTxCcg4ZT/D2Q7jK1Y NEmdZufipBpuEykv7TYX4c8ZdG0wDPVeMVSMRIcF19hyG+XAlFczkAvfhKgQeAPdzSCQ HHZsm9KwX/2tN1coG4ux+8PcwcT1ZqlnSZNTPLwsDYssjZzpv8v5fMuUtraDqUPHe1i3 iBQPZv1CTtoCbPFwK1JDNHHurX0yikkZ7tpYtvy+kTVTz+Qpr2wKb1EWiKkyTPfoQtyU NNuQ== X-Gm-Message-State: ALoCoQkxDZFU3+thd5VSoLBt3Mu3r0rUvpWgWtheLhJL7yoyHuymWXh9k8DWFbj6eiFbRizyhGgWKzl8pvlZi1OgHvJMttAaU64Dvf16URlpc9GOmK7BZ4k2EaWuZ4HavCRNnP8+JdMnJwGFuczME6a6lsKPl1MO7imknsx1MzfJ7NCoyWqTMaqO9AFL6/xwgfqOmvFMMaRrwrOMPzqm1Zev/Pe1wTRFcQ== X-Received: by 10.60.136.132 with SMTP id qa4mr1563944oeb.68.1387201835704; Mon, 16 Dec 2013 05:50:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.80.166 with HTTP; Mon, 16 Dec 2013 05:50:15 -0800 (PST) From: Shlomo Urbach Date: Mon, 16 Dec 2013 15:50:15 +0200 Message-ID: Content-Type: multipart/alternative; boundary=047d7b414f40a63e3104eda718f6 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Mailman-Approved-At: Mon, 16 Dec 2013 11:55:51 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --047d7b414f40a63e3104eda718f6 Content-Type: text/plain; charset=ISO-8859-1 Lines with CJK letters are deemed equal by length only, since the characters seem to be ignored. I understand this is due to locale. But, it would be nice if a simple flag would do a locale-free comparison (i.e. equal = all bytes are equal). --047d7b414f40a63e3104eda718f6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Lines with CJK letters are deemed equal by length only, si= nce the characters seem to be ignored.
I understand this is due to loca= le.
But, it would be nice if a simple flag would do a locale-free= comparison (i.e. equal =3D all bytes are equal).

--047d7b414f40a63e3104eda718f6-- From unknown Sat Aug 16 18:32:21 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Shlomo Urbach Subject: bug#16168: closed (Re: bug#16168: uniq mis-handles UTF8 (8bit) characters) Message-ID: References: <52AF3963.6020003@draigBrady.com> X-Gnu-PR-Message: they-closed 16168 X-Gnu-PR-Package: coreutils Reply-To: 16168@debbugs.gnu.org Date: Mon, 16 Dec 2013 17:34:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1387215242-22990-1" This is a multi-part message in MIME format... ------------=_1387215242-22990-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #16168: uniq mis-handles UTF8 (8bit) characters which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 16168@debbugs.gnu.org. --=20 16168: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D16168 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1387215242-22990-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 16168-done) by debbugs.gnu.org; 16 Dec 2013 17:33:29 +0000 Received: from localhost ([127.0.0.1]:54226 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vsc2q-0005xq-Lm for submit@debbugs.gnu.org; Mon, 16 Dec 2013 12:33:28 -0500 Received: from mail3.vodafone.ie ([213.233.128.45]:46087) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vsc2n-0005xc-Jv for 16168-done@debbugs.gnu.org; Mon, 16 Dec 2013 12:33:26 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjkDAE44r1JtT9qD/2dsb2JhbAANTIcasmmDAQMCgTuDGQEBAQQjDwFGEAsNCwICBRYLAgIJAwIBAgFFBg0BBwEBiAWvEnaYJReBKY1wB4JugUgBA58CjlI Received: from unknown (HELO [192.168.1.79]) ([109.79.218.131]) by mail3.vodafone.ie with ESMTP; 16 Dec 2013 17:33:23 +0000 Message-ID: <52AF3963.6020003@draigBrady.com> Date: Mon, 16 Dec 2013 17:33:23 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Shlomo Urbach Subject: Re: bug#16168: uniq mis-handles UTF8 (8bit) characters References: In-Reply-To: X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16168-done Cc: 16168-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) tag 16168 notabug close 16168 stop On 12/16/2013 01:50 PM, Shlomo Urbach wrote: > Lines with CJK letters are deemed equal by length only, since the > characters seem to be ignored. > I understand this is due to locale. > But, it would be nice if a simple flag would do a locale-free comparison > (i.e. equal = all bytes are equal). If you want to compare byte by byte: LC_ALL=C uniq .... thanks, Pǽdraig. ------------=_1387215242-22990-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 16 Dec 2013 16:55:54 +0000 Received: from localhost ([127.0.0.1]:54164 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VsbST-0003dy-Lr for submit@debbugs.gnu.org; Mon, 16 Dec 2013 11:55:54 -0500 Received: from eggs.gnu.org ([208.118.235.92]:36300) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VsYZG-0006bx-3e for submit@debbugs.gnu.org; Mon, 16 Dec 2013 08:50:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VsYZE-0003HM-IG for submit@debbugs.gnu.org; Mon, 16 Dec 2013 08:50:41 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_20,HTML_MESSAGE, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:37357) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsYZE-0003HH-Em for submit@debbugs.gnu.org; Mon, 16 Dec 2013 08:50:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38021) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsYZD-0006v1-8C for bug-coreutils@gnu.org; Mon, 16 Dec 2013 08:50:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VsYZC-0003Gw-5b for bug-coreutils@gnu.org; Mon, 16 Dec 2013 08:50:39 -0500 Received: from mail-ob0-x22b.google.com ([2607:f8b0:4003:c01::22b]:37489) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VsYZC-0003Gh-0P for bug-coreutils@gnu.org; Mon, 16 Dec 2013 08:50:38 -0500 Received: by mail-ob0-f171.google.com with SMTP id wp18so4810237obc.16 for ; Mon, 16 Dec 2013 05:50:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=Fw8+3U/lLfzgLMc4Wfblpvy6N9I6j9t06eLtOo/fxFg=; b=Kq8IaN8CSaim6OqUhWbl/aZZEEVEzXR4r+9FNQkKIetTBsNEwF/ElMyF3lhXAItOEK /yNsB0sVB8WhbuM0BP7rReFqTPkPVzCilK7oMzD3sKiN8a/TwVl+9rlRett8HndNW+6r XUuYAZQjbUe6rHUjDLWzQzsEcNUUUjMvypa7siXTzpA5rEU8v3HxTuz8MlPpQOy64bJo Qf6Egg5NljbYySIj4AkT5RRDSBzQWvkaIM0mla+2fykCtnvoEVaB3BmGvUUPH+LuP9v8 w7Jc3NnBd0QCioFBpAhFwDr+kS1FVxwX/lmcKt35hO5a6K47TzVxa6Xvd5X7cDf7Z0ui aIBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-type; bh=Fw8+3U/lLfzgLMc4Wfblpvy6N9I6j9t06eLtOo/fxFg=; b=ewId979zGTu9XJVUmeUCPlFgE67dDOy/T+ZgJgCawCrc6tTvvaaBMZPPuiK8ZEeFra iJ/WrkoX2lQaa3cy6xngtGMglydAXw5y65plY6x8q/bOoRm4jEKTxCcg4ZT/D2Q7jK1Y NEmdZufipBpuEykv7TYX4c8ZdG0wDPVeMVSMRIcF19hyG+XAlFczkAvfhKgQeAPdzSCQ HHZsm9KwX/2tN1coG4ux+8PcwcT1ZqlnSZNTPLwsDYssjZzpv8v5fMuUtraDqUPHe1i3 iBQPZv1CTtoCbPFwK1JDNHHurX0yikkZ7tpYtvy+kTVTz+Qpr2wKb1EWiKkyTPfoQtyU NNuQ== X-Gm-Message-State: ALoCoQkxDZFU3+thd5VSoLBt3Mu3r0rUvpWgWtheLhJL7yoyHuymWXh9k8DWFbj6eiFbRizyhGgWKzl8pvlZi1OgHvJMttAaU64Dvf16URlpc9GOmK7BZ4k2EaWuZ4HavCRNnP8+JdMnJwGFuczME6a6lsKPl1MO7imknsx1MzfJ7NCoyWqTMaqO9AFL6/xwgfqOmvFMMaRrwrOMPzqm1Zev/Pe1wTRFcQ== X-Received: by 10.60.136.132 with SMTP id qa4mr1563944oeb.68.1387201835704; Mon, 16 Dec 2013 05:50:35 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.80.166 with HTTP; Mon, 16 Dec 2013 05:50:15 -0800 (PST) From: Shlomo Urbach Date: Mon, 16 Dec 2013 15:50:15 +0200 Message-ID: Subject: uniq mis-handles UTF8 (8bit) characters To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary=047d7b414f40a63e3104eda718f6 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 16 Dec 2013 11:55:51 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) --047d7b414f40a63e3104eda718f6 Content-Type: text/plain; charset=ISO-8859-1 Lines with CJK letters are deemed equal by length only, since the characters seem to be ignored. I understand this is due to locale. But, it would be nice if a simple flag would do a locale-free comparison (i.e. equal = all bytes are equal). --047d7b414f40a63e3104eda718f6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Lines with CJK letters are deemed equal by length only, si= nce the characters seem to be ignored.
I understand this is due to loca= le.
But, it would be nice if a simple flag would do a locale-free= comparison (i.e. equal =3D all bytes are equal).

--047d7b414f40a63e3104eda718f6-- ------------=_1387215242-22990-1-- From unknown Sat Aug 16 18:32:21 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16168: uniq mis-handles UTF8 (8bit) characters Resent-From: Linda Walsh Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Mon, 16 Dec 2013 18:03:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16168 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 16168@debbugs.gnu.org, P@draigBrady.com, urbach@google.com Received: via spool by 16168-submit@debbugs.gnu.org id=B16168.138721693426524 (code B ref 16168); Mon, 16 Dec 2013 18:03:01 +0000 Received: (at 16168) by debbugs.gnu.org; 16 Dec 2013 18:02:14 +0000 Received: from localhost ([127.0.0.1]:54305 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VscUg-0006tk-Dw for submit@debbugs.gnu.org; Mon, 16 Dec 2013 13:02:14 -0500 Received: from ishtar.tlinx.org ([173.164.175.65]:48723) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VscUe-0006tb-Dv for 16168@debbugs.gnu.org; Mon, 16 Dec 2013 13:02:13 -0500 Received: from [192.168.4.12] (Athenae [192.168.4.12]) by Ishtar.tlinx.org (8.14.7/8.14.4/SuSE Linux 0.8) with ESMTP id rBGI28W0088871; Mon, 16 Dec 2013 10:02:10 -0800 Message-ID: <52AF4020.5010505@tlinx.org> Date: Mon, 16 Dec 2013 10:02:08 -0800 From: Linda Walsh User-Agent: Thunderbird MIME-Version: 1.0 References: <52AF3963.6020003@draigBrady.com> In-Reply-To: <52AF3963.6020003@draigBrady.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Score: -0.5 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) Maybe he was hoping for a uniq [-b|--bytes] ? Suggestion to Shlomo (if you use bash): alias uniq='LC_ALL=C \uniq' or, if you want it in your shell scripts too: uniq() { LC_ALL=C; "${type -P uniq}" "$@" ; }; export -f uniq On 12/16/2013 9:33 AM, Pádraig Brady wrote: > tag 16168 notabug > close 16168 > stop > > On 12/16/2013 01:50 PM, Shlomo Urbach wrote: >> Lines with CJK letters are deemed equal by length only, since the >> characters seem to be ignored. >> I understand this is due to locale. >> But, it would be nice if a simple flag would do a locale-free comparison >> (i.e. equal = all bytes are equal). > > If you want to compare byte by byte: > > LC_ALL=C uniq .... > > thanks, > Pǽdraig. > > > From unknown Sat Aug 16 18:32:21 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16168: uniq mis-handles UTF8 (8bit) characters Resent-From: Shlomo Urbach Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Mon, 16 Dec 2013 20:20:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16168 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Linda Walsh Cc: 16168@debbugs.gnu.org, P@draigbrady.com Received: via spool by 16168-submit@debbugs.gnu.org id=B16168.13872251958470 (code B ref 16168); Mon, 16 Dec 2013 20:20:02 +0000 Received: (at 16168) by debbugs.gnu.org; 16 Dec 2013 20:19:55 +0000 Received: from localhost ([127.0.0.1]:54462 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vsedu-0002CW-Uz for submit@debbugs.gnu.org; Mon, 16 Dec 2013 15:19:55 -0500 Received: from mail-oa0-f54.google.com ([209.85.219.54]:34261) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Vseds-0002CM-Lk for 16168@debbugs.gnu.org; Mon, 16 Dec 2013 15:19:53 -0500 Received: by mail-oa0-f54.google.com with SMTP id h16so5668699oag.27 for <16168@debbugs.gnu.org>; Mon, 16 Dec 2013 12:19:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=bd4WOo3s5qgTqHkprO/99KrMrV5XlAHW5SvnASOba3E=; b=gsBwTmCdCaz6HbXU1NnzxoYgzQwCeQuNbvwQtt+kDzY5udYmMiJedVlso/36KbQEPg XS9iM3aZOtU21KRT/Ec6WyR5KkZa3iwv83TnGpQuvhm7nqSJR8GqDF6sIdyyABb1OeFF BjBI7BYD61FgEbG4mlH51Ts6rtD7Zd0Tx6sp3LtPZeVdFBSXQEp/6XkAdJEaNBPgX55Z TJelE8R0dclICYeZLnYh8Is4wilBMb9NXrsrlXnWj/jgR6JIHSchuf+eoSKfJIMWVZdA HHzUkA2koxngcj8g/hiqwWJZu7IbzLPWTUcS9LaS3dV2nSn980Q30qsgKgvKU0Oe3BdK lBOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-type; bh=bd4WOo3s5qgTqHkprO/99KrMrV5XlAHW5SvnASOba3E=; b=IFTBHwaEBw6ya/4FljTsI6OlmJ3wIzSyicIgejUMJHqAJZvl4ZTbnlYN08Hn6dwJmx S0nUTMjjAPCgsLggDOticE9Jdu6sfsNFhY1op9cC8WOG2NbHWh1KFKqb9Kh36k2Amt8G v0lAW/bFYJxezBhR7B0lY2epQfVqbd6WhAsXlV0krqmgChQm+pB9fpIEGCMGC8sRb21S V0LG4IOtFzqAY1zkxmPYKVBhoDO18o375Pd1Z+p0WOTGUd4DSW7m4gmDpmciwR/xgVx9 sJk0O6k8z0aDZd+Z4yF87oo+JZFXGK3EWvHnmZA2p3gXOsLodePRxHtXBmXyG/L8uYdV BAlg== X-Gm-Message-State: ALoCoQmKZ+GC1BmCBpF4T2YTPWT4TIGfQTJN1d0rWoHVDZI3Eh5EfJka6zaB6CNCmI5wlc++OzlI0xOvS+UKxA5BKKuaB0Oepb1Z2v7bE9gXFEm86SIjHdlQK+HwVb8fkWAIcHDIJ8mk/iOpuqU9uYv/ooyyMkyRjf0V/SF2fhOeeSREogQL6huD1hlKEx5pLTx2xz6DFNucllnnnPb3w9BtXYH0L+NENA== X-Received: by 10.182.250.200 with SMTP id ze8mr3239756obc.72.1387225191632; Mon, 16 Dec 2013 12:19:51 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.80.166 with HTTP; Mon, 16 Dec 2013 12:19:31 -0800 (PST) In-Reply-To: <52AF4020.5010505@tlinx.org> References: <52AF3963.6020003@draigBrady.com> <52AF4020.5010505@tlinx.org> From: Shlomo Urbach Date: Mon, 16 Dec 2013 22:19:31 +0200 Message-ID: Content-Type: multipart/alternative; boundary=089e0160c660c568f904edac88b4 X-Spam-Score: -1.2 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.2 (-) --089e0160c660c568f904edac88b4 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks, this works great. But, I'm sure the general public doesn't know of this issue. Shlomo On Mon, Dec 16, 2013 at 8:02 PM, Linda Walsh wrote: > Maybe he was hoping for a uniq [-b|--bytes] ? > > Suggestion to Shlomo (if you use bash): > > alias uniq=3D'LC_ALL=3DC \uniq' > > or, if you want it in your shell scripts too: > > uniq() { LC_ALL=3DC; "${type -P uniq}" "$@" ; }; export -f uniq > > > > On 12/16/2013 9:33 AM, P=C3=A1draig Brady wrote: > >> tag 16168 notabug >> close 16168 >> stop >> >> On 12/16/2013 01:50 PM, Shlomo Urbach wrote: >> >>> Lines with CJK letters are deemed equal by length only, since the >>> characters seem to be ignored. >>> I understand this is due to locale. >>> But, it would be nice if a simple flag would do a locale-free compariso= n >>> (i.e. equal =3D all bytes are equal). >>> >> >> If you want to compare byte by byte: >> >> LC_ALL=3DC uniq .... >> >> thanks, >> P=C7=BDdraig. >> >> >> >> --089e0160c660c568f904edac88b4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks,

this works great.
But= , I'm sure the general public doesn't know of this issue.

Shlomo


On Mon, Dec 16, 2013 at 8:02 PM, Linda Walsh <coreutils@tlinx.org&g= t; wrote:
Maybe he was hoping for a uniq [-b|--bytes] ?

Suggestion to Shlomo (if you use bash):

=C2=A0 alias uniq=3D'LC_ALL=3DC \uniq'

or, if you want it in your shell scripts too:

=C2=A0 uniq() { LC_ALL=3DC; "${type -P uniq}" "$@" ; };= export -f uniq



On 12/16/2013 9:33 AM, P=C3=A1draig Brady wrote:
tag 16168 notabug
close 16168
stop

On 12/16/2013 01:50 PM, Shlomo Urbach wrote:
Lines with CJK letters are deemed equal by length only, since the
characters seem to be ignored.
I understand this is due to locale.
But, it would be nice if a simple flag would do a locale-free comparison (i.e. equal =3D all bytes are equal).

If you want to compare byte by byte:

LC_ALL=3DC uniq ....

thanks,
P=C7=BDdraig.




--089e0160c660c568f904edac88b4--