From unknown Thu Sep 11 06:33:40 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10287: [wishlist] uniq can remove non adjacent lines Resent-From: =?UTF-8?Q?St=C3=A9phane?= Blondon Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 13 Dec 2011 02:52:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 10287 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10287@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.132374469215241 (code B ref -1); Tue, 13 Dec 2011 02:52:01 +0000 Received: (at submit) by debbugs.gnu.org; 13 Dec 2011 02:51:32 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaISq-0003xm-1y for submit@debbugs.gnu.org; Mon, 12 Dec 2011 21:51:32 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaEnW-0006Ni-RQ for submit@debbugs.gnu.org; Mon, 12 Dec 2011 17:56:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RaEmI-0000Z5-Uj for submit@debbugs.gnu.org; Mon, 12 Dec 2011 17:55:23 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([140.186.70.17]:55797) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaEmI-0000Z1-TC for submit@debbugs.gnu.org; Mon, 12 Dec 2011 17:55:22 -0500 Received: from eggs.gnu.org ([140.186.70.92]:48976) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaEmH-0006aP-DK for bug-coreutils@gnu.org; Mon, 12 Dec 2011 17:55:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RaEmG-0000Yf-DQ for bug-coreutils@gnu.org; Mon, 12 Dec 2011 17:55:21 -0500 Received: from mail-ww0-f49.google.com ([74.125.82.49]:62081) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaEmG-0000YV-2t for bug-coreutils@gnu.org; Mon, 12 Dec 2011 17:55:20 -0500 Received: by wgbdt11 with SMTP id dt11so9805725wgb.30 for ; Mon, 12 Dec 2011 14:55:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=RIK2Ivwn+jw1YwC3FuS1+R2jCtMwnlNM2LxucMAFP2M=; b=VNp0ikddzN+7d0rxKddHBrUfLoAKicX3w5TlUaOrtumRmUYNJ2GGDE3WEpjTBGEchD bYbSGM4EUour2Ba8RTY1mMsVbtkKvCpd1Aqb/t+2PYUICVbaEPUuN/AKOJP3cNQFSeRf BKrAfr26nWXUBUVIKLWdn0arlWG95sFUVGu0A= Received: by 10.227.203.131 with SMTP id fi3mr18460330wbb.17.1323730518490; Mon, 12 Dec 2011 14:55:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.101.142 with HTTP; Mon, 12 Dec 2011 14:54:57 -0800 (PST) From: =?UTF-8?Q?St=C3=A9phane?= Blondon Date: Mon, 12 Dec 2011 23:54:57 +0100 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-Mailman-Approved-At: Mon, 12 Dec 2011 21:51:30 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) Tool: uniq Priority: wishlist Hello, I think `uniq` should have an additional option (for example -a, --all) to remove same lines but not adjacent. The man page explains a workaround based on `sort` but it can be complex to use. Few weeks ago, I had to `uniq`-ize random numbers and the sort couldn't really work. Fortunately, the order was not important so using `sort | uniq | sort --random-sort` was an acceptable solution. I imagine cases based on other tools like `top` could be a problem too. If you are interested, I could try to provide a patch. (I have learnt C but I don't use it today.) I don't think the increase of memory use is a problem today, so a warning in the manpage should be enought. Thank for all, --=20 St=C3=A9phane From unknown Thu Sep 11 06:33:40 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10287: [wishlist] uniq can remove non adjacent lines Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 13 Dec 2011 04:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10287 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: =?UTF-8?Q?St=C3=A9phane?= Blondon Cc: 10287@debbugs.gnu.org Received: via spool by 10287-submit@debbugs.gnu.org id=B10287.132375009923114 (code B ref 10287); Tue, 13 Dec 2011 04:22:01 +0000 Received: (at 10287) by debbugs.gnu.org; 13 Dec 2011 04:21:39 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaJs2-00060k-7z for submit@debbugs.gnu.org; Mon, 12 Dec 2011 23:21:39 -0500 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaJrz-00060c-Dl for 10287@debbugs.gnu.org; Mon, 12 Dec 2011 23:21:36 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id AA2B3211D1; Mon, 12 Dec 2011 21:20:18 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id 61EF72DCD7; Mon, 12 Dec 2011 21:20:18 -0700 (MST) Date: Mon, 12 Dec 2011 21:20:18 -0700 From: Bob Proulx Message-ID: <20111213042018.GA31333@hysteria.proulx.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.5 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) St=E9phane Blondon wrote: > I think `uniq` should have an additional option (for example -a, > --all) to remove same lines but not adjacent. >=20 > The man page explains a workaround based on `sort` but it can be > complex to use. Few weeks ago, I had to `uniq`-ize random numbers and > the sort couldn't really work. Fortunately, the order was not > important so using `sort | uniq | sort --random-sort` was an > acceptable solution. I imagine cases based on other tools like `top` > could be a problem too. If you want to print only the first of a unique line then this perl one-liner will do it. perl -lne 'print $_ if ! defined $a{$_}; $a{$_}=3D$_;' Bob From unknown Thu Sep 11 06:33:40 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10287: [wishlist] uniq can remove non adjacent lines Resent-From: Davide Brini Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 13 Dec 2011 08:32:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10287 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10287@debbugs.gnu.org Received: via spool by 10287-submit@debbugs.gnu.org id=B10287.132376511813156 (code B ref 10287); Tue, 13 Dec 2011 08:32:02 +0000 Received: (at 10287) by debbugs.gnu.org; 13 Dec 2011 08:31:58 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaNmI-0003Q9-Bj for submit@debbugs.gnu.org; Tue, 13 Dec 2011 03:31:58 -0500 Received: from mailout-eu.gmx.com ([213.165.64.45]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1RaNmF-0003Py-6Z for 10287@debbugs.gnu.org; Tue, 13 Dec 2011 03:31:56 -0500 Received: (qmail invoked by alias); 13 Dec 2011 08:30:36 -0000 Received: from static-218-149-228-77.ipcom.comunitel.net (EHLO rowlf.zhilabs.net) [77.228.149.218] by mail.gmx.com (mp-eu005) with SMTP; 13 Dec 2011 09:30:36 +0100 X-Authenticated: #48875277 X-Provags-ID: V01U2FsdGVkX1+IkMEm+8eIjI6g5+Ycxfnh4fdf3XO7+i9t6Dy5Cy rgm+qoMQntUvSP Date: Tue, 13 Dec 2011 09:29:51 +0100 From: Davide Brini Message-ID: <20111213092951.2ab5a1c2@rowlf.zhilabs.net> In-Reply-To: <20111213042018.GA31333@hysteria.proulx.com> References: <20111213042018.GA31333@hysteria.proulx.com> Organization: Not organized X-Mailer: Claws Mail 3.7.10 (GTK+ 2.24.8; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Y-GMX-Trusted: 0 X-Spam-Score: -3.0 (---) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.0 (---) On Mon, 12 Dec 2011 21:20:18 -0700, Bob Proulx wrote: > St=C3=A9phane Blondon wrote: > > I think `uniq` should have an additional option (for example -a, > > --all) to remove same lines but not adjacent. > >=20 > > The man page explains a workaround based on `sort` but it can be > > complex to use. Few weeks ago, I had to `uniq`-ize random numbers and > > the sort couldn't really work. Fortunately, the order was not > > important so using `sort | uniq | sort --random-sort` was an > > acceptable solution. I imagine cases based on other tools like `top` > > could be a problem too. >=20 > If you want to print only the first of a unique line then this perl > one-liner will do it. >=20 > perl -lne 'print $_ if ! defined $a{$_}; $a{$_}=3D$_;' While we're at it, this is the typical awk way to do that: awk '!a[$0]++' --=20 D. From unknown Thu Sep 11 06:33:40 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.427 (Entity 5.427) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: =?UTF-8?Q?St=C3=A9phane?= Blondon Subject: bug#10287: closed (Re: bug#10287: [wishlist] uniq can remove non adjacent lines) Message-ID: References: <4EE710A4.2030104@draigBrady.com> X-Gnu-PR-Message: they-closed 10287 X-Gnu-PR-Package: coreutils Reply-To: 10287@debbugs.gnu.org Date: Tue, 13 Dec 2011 08:47:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1323766022-14464-1" This is a multi-part message in MIME format... ------------=_1323766022-14464-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #10287: [wishlist] uniq can remove non adjacent lines which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 10287@debbugs.gnu.org. --=20 10287: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D10287 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1323766022-14464-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 10287-done) by debbugs.gnu.org; 13 Dec 2011 08:46:48 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaO0d-0003kr-SS for submit@debbugs.gnu.org; Tue, 13 Dec 2011 03:46:48 -0500 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaO0a-0003kj-QX for 10287-done@debbugs.gnu.org; Tue, 13 Dec 2011 03:46:45 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id pBD8jR68017304 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 13 Dec 2011 03:45:27 -0500 Received: from [10.36.116.48] (ovpn-116-48.ams2.redhat.com [10.36.116.48]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id pBD8jO5s022201 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 13 Dec 2011 03:45:26 -0500 Message-ID: <4EE710A4.2030104@draigBrady.com> Date: Tue, 13 Dec 2011 08:45:24 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 To: =?UTF-8?B?U3TDqXBoYW5lIEJsb25kb24=?= Subject: Re: bug#10287: [wishlist] uniq can remove non adjacent lines References: In-Reply-To: X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id pBD8jR68017304 X-Spam-Score: -10.5 (----------) X-Debbugs-Envelope-To: 10287-done Cc: 10287-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.5 (----------) On 12/12/2011 10:54 PM, St=C3=A9phane Blondon wrote: > Tool: uniq > Priority: wishlist >=20 > Hello, >=20 > I think `uniq` should have an additional option (for example -a, > --all) to remove same lines but not adjacent. >=20 > The man page explains a workaround based on `sort` but it can be > complex to use. Few weeks ago, I had to `uniq`-ize random numbers and > the sort couldn't really work. Fortunately, the order was not > important so using `sort | uniq | sort --random-sort` was an > acceptable solution. I imagine cases based on other tools like `top` > could be a problem too. >=20 > If you are interested, I could try to provide a patch. (I have learnt > C but I don't use it today.) >=20 > I don't think the increase of memory use is a problem today, so a > warning in the manpage should be enought. Well that would increase the complexity of `uniq` a _lot_ http://lists.gnu.org/archive/html/coreutils/2011-11/msg00018.html For that reason I would be against adding such a feature. Note improving the field selection of `uniq` is appropriate, and would make DSU solutions using sort, easier to implement. cheers, P=C3=A1draig. ------------=_1323766022-14464-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 13 Dec 2011 02:51:32 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaISq-0003xm-1y for submit@debbugs.gnu.org; Mon, 12 Dec 2011 21:51:32 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaEnW-0006Ni-RQ for submit@debbugs.gnu.org; Mon, 12 Dec 2011 17:56:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RaEmI-0000Z5-Uj for submit@debbugs.gnu.org; Mon, 12 Dec 2011 17:55:23 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([140.186.70.17]:55797) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaEmI-0000Z1-TC for submit@debbugs.gnu.org; Mon, 12 Dec 2011 17:55:22 -0500 Received: from eggs.gnu.org ([140.186.70.92]:48976) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaEmH-0006aP-DK for bug-coreutils@gnu.org; Mon, 12 Dec 2011 17:55:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RaEmG-0000Yf-DQ for bug-coreutils@gnu.org; Mon, 12 Dec 2011 17:55:21 -0500 Received: from mail-ww0-f49.google.com ([74.125.82.49]:62081) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaEmG-0000YV-2t for bug-coreutils@gnu.org; Mon, 12 Dec 2011 17:55:20 -0500 Received: by wgbdt11 with SMTP id dt11so9805725wgb.30 for ; Mon, 12 Dec 2011 14:55:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=RIK2Ivwn+jw1YwC3FuS1+R2jCtMwnlNM2LxucMAFP2M=; b=VNp0ikddzN+7d0rxKddHBrUfLoAKicX3w5TlUaOrtumRmUYNJ2GGDE3WEpjTBGEchD bYbSGM4EUour2Ba8RTY1mMsVbtkKvCpd1Aqb/t+2PYUICVbaEPUuN/AKOJP3cNQFSeRf BKrAfr26nWXUBUVIKLWdn0arlWG95sFUVGu0A= Received: by 10.227.203.131 with SMTP id fi3mr18460330wbb.17.1323730518490; Mon, 12 Dec 2011 14:55:18 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.101.142 with HTTP; Mon, 12 Dec 2011 14:54:57 -0800 (PST) From: =?UTF-8?Q?St=C3=A9phane_Blondon?= Date: Mon, 12 Dec 2011 23:54:57 +0100 Message-ID: Subject: [wishlist] uniq can remove non adjacent lines To: bug-coreutils@gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 12 Dec 2011 21:51:30 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) Tool: uniq Priority: wishlist Hello, I think `uniq` should have an additional option (for example -a, --all) to remove same lines but not adjacent. The man page explains a workaround based on `sort` but it can be complex to use. Few weeks ago, I had to `uniq`-ize random numbers and the sort couldn't really work. Fortunately, the order was not important so using `sort | uniq | sort --random-sort` was an acceptable solution. I imagine cases based on other tools like `top` could be a problem too. If you are interested, I could try to provide a patch. (I have learnt C but I don't use it today.) I don't think the increase of memory use is a problem today, so a warning in the manpage should be enought. Thank for all, --=20 St=C3=A9phane ------------=_1323766022-14464-1-- From unknown Thu Sep 11 06:33:40 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10287: [wishlist] uniq can remove non adjacent lines Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 13 Dec 2011 08:48:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10287 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Bob Proulx Cc: 10287@debbugs.gnu.org, =?UTF-8?Q?St=C3=A9phane?= Blondon Received: via spool by 10287-submit@debbugs.gnu.org id=B10287.132376605414538 (code B ref 10287); Tue, 13 Dec 2011 08:48:01 +0000 Received: (at 10287) by debbugs.gnu.org; 13 Dec 2011 08:47:34 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaO1N-0003mP-Or for submit@debbugs.gnu.org; Tue, 13 Dec 2011 03:47:34 -0500 Received: from mx.meyering.net ([88.168.87.75]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaO1L-0003mI-Pw for 10287@debbugs.gnu.org; Tue, 13 Dec 2011 03:47:32 -0500 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 2B85A602F7; Tue, 13 Dec 2011 09:46:14 +0100 (CET) From: Jim Meyering In-Reply-To: <20111213042018.GA31333@hysteria.proulx.com> (Bob Proulx's message of "Mon, 12 Dec 2011 21:20:18 -0700") References: <20111213042018.GA31333@hysteria.proulx.com> Date: Tue, 13 Dec 2011 09:46:14 +0100 Message-ID: <87ty54svu1.fsf@rho.meyering.net> Lines: 30 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.7 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.7 (--) Bob Proulx wrote: > St=E9phane Blondon wrote: >> I think `uniq` should have an additional option (for example -a, >> --all) to remove same lines but not adjacent. >> >> The man page explains a workaround based on `sort` but it can be >> complex to use. Few weeks ago, I had to `uniq`-ize random numbers and >> the sort couldn't really work. Fortunately, the order was not >> important so using `sort | uniq | sort --random-sort` was an >> acceptable solution. I imagine cases based on other tools like `top` >> could be a problem too. > > If you want to print only the first of a unique line then this perl > one-liner will do it. > > perl -lne 'print $_ if ! defined $a{$_}; $a{$_}=3D$_;' Thanks, but with large files, isn't it better to store not the full line, but rather a constant? perl -lne 'print $_ if ! defined $seen{$_}; $seen{$_}=3D1' (actually, using "1" could be seen as misleading, since 0 or even undef would also work) I think you can drop the "l". I have a slight preference for this: perl -ne 'defined $seen{$_} or print; $seen{$_}=3D1' From unknown Thu Sep 11 06:33:40 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10287: [wishlist] uniq can remove non adjacent lines Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 13 Dec 2011 18:09:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10287 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10287@debbugs.gnu.org, =?UTF-8?Q?St=C3=A9phane?= Blondon Received: via spool by 10287-submit@debbugs.gnu.org id=B10287.13237996955805 (code B ref 10287); Tue, 13 Dec 2011 18:09:01 +0000 Received: (at 10287) by debbugs.gnu.org; 13 Dec 2011 18:08:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaWly-0001VY-CE for submit@debbugs.gnu.org; Tue, 13 Dec 2011 13:08:15 -0500 Received: from joseki.proulx.com ([216.17.153.58]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaWlv-0001VQ-3v for 10287@debbugs.gnu.org; Tue, 13 Dec 2011 13:08:12 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 0B806211D2; Tue, 13 Dec 2011 11:06:51 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id C6AAC2DCD7; Tue, 13 Dec 2011 11:06:50 -0700 (MST) Date: Tue, 13 Dec 2011 11:06:50 -0700 From: Bob Proulx Message-ID: <20111213180650.GA19531@hysteria.proulx.com> References: <20111213042018.GA31333@hysteria.proulx.com> <87ty54svu1.fsf@rho.meyering.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87ty54svu1.fsf@rho.meyering.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -2.5 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) Jim Meyering wrote: > Bob Proulx wrote: > > If you want to print only the first of a unique line then this perl > > one-liner will do it. > > > > perl -lne 'print $_ if ! defined $a{$_}; $a{$_}=$_;' > > Thanks, but with large files, isn't it better to store not > the full line, but rather a constant? > > perl -lne 'print $_ if ! defined $seen{$_}; $seen{$_}=1' Good point! I hadn't given it much thought since it usually runs so quickly in my usage that I never worried about it. > (actually, using "1" could be seen as misleading, since 0 or even undef > would also work) > > I think you can drop the "l". > I have a slight preference for this: > > perl -ne 'defined $seen{$_} or print; $seen{$_}=1' Refering to "print" v. "print $_" here I have never liked implicit use of $_ and so I tend to avoid it. At one time there was a push in the perl community to make all uses explicit. And as to whether to use the 'if (expr) { stmt }' or 'stmt if expr' or 'expr or stmt' forms is a matter of taste. Might as well discuss the one true indention and brace styles. :-) For one-liners I do tend to use short variables to keep the line length minimized. In order to compact a line I also sacrifice whitespace when required. But you have me thinking about conserving memory. If the file was large due to long lines then memory use would be proportionately large due to the key storage needs. This could be reduced by using a hash of the line as the storage key instead of the entire line. But the savings would be relative to the average line size. If the average line size was smaller than the hash size then this would increase memory use. perl -MDigest::MD5=md5 -lne '$m=md5($_); print $_ if ! defined $a{$m}; $a{$m}=1' If you are ever going to debug and print out the md5 value then substitute md5_hex for md5 to get a printable result. Bob From unknown Thu Sep 11 06:33:40 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10287: [wishlist] uniq can remove non adjacent lines Resent-From: Bob Proulx Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Tue, 13 Dec 2011 18:12:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10287 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10287@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13237998936100 (code B ref -1); Tue, 13 Dec 2011 18:12:01 +0000 Received: (at submit) by debbugs.gnu.org; 13 Dec 2011 18:11:33 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaWpB-0001aL-03 for submit@debbugs.gnu.org; Tue, 13 Dec 2011 13:11:33 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaWp9-0001aF-HP for submit@debbugs.gnu.org; Tue, 13 Dec 2011 13:11:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RaWnm-00010b-J6 for submit@debbugs.gnu.org; Tue, 13 Dec 2011 13:10:12 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([140.186.70.17]:56771) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaWnm-00010N-Hf for submit@debbugs.gnu.org; Tue, 13 Dec 2011 13:10:06 -0500 Received: from eggs.gnu.org ([140.186.70.92]:51590) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaWni-0005PO-Lj for bug-coreutils@gnu.org; Tue, 13 Dec 2011 13:10:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RaWnd-0000xR-1M for bug-coreutils@gnu.org; Tue, 13 Dec 2011 13:10:02 -0500 Received: from joseki.proulx.com ([216.17.153.58]:37178) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RaWnc-0000xL-RB for bug-coreutils@gnu.org; Tue, 13 Dec 2011 13:09:57 -0500 Received: from hysteria.proulx.com (hysteria.proulx.com [192.168.230.119]) by joseki.proulx.com (Postfix) with ESMTP id 1D43D211D2 for ; Tue, 13 Dec 2011 11:09:56 -0700 (MST) Received: by hysteria.proulx.com (Postfix, from userid 1000) id DAFEF2DCD7; Tue, 13 Dec 2011 11:09:55 -0700 (MST) Date: Tue, 13 Dec 2011 11:09:55 -0700 From: Bob Proulx Message-ID: <20111213180955.GB19531@hysteria.proulx.com> Mail-Followup-To: bug-coreutils@gnu.org References: <20111213042018.GA31333@hysteria.proulx.com> <20111213092951.2ab5a1c2@rowlf.zhilabs.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111213092951.2ab5a1c2@rowlf.zhilabs.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -4.1 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.1 (----) Davide Brini wrote: > Bob Proulx wrote: > > perl -lne 'print $_ if ! defined $a{$_}; $a{$_}=$_;' > > While we're at it, this is the typical awk way to do that: > > awk '!a[$0]++' I like it! I will definitely be using that awk idiom in the future. It is simple and concise. Bob From unknown Thu Sep 11 06:33:40 2025 X-Loop: help-debbugs@gnu.org Subject: bug#10287: [wishlist] uniq can remove non adjacent lines Resent-From: =?UTF-8?Q?St=C3=A9phane?= Blondon Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 14 Dec 2011 22:33:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 10287 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 10287@debbugs.gnu.org Received: via spool by 10287-submit@debbugs.gnu.org id=B10287.13239019238728 (code B ref 10287); Wed, 14 Dec 2011 22:33:02 +0000 Received: (at 10287) by debbugs.gnu.org; 14 Dec 2011 22:32:03 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaxMo-0002Gi-F9 for submit@debbugs.gnu.org; Wed, 14 Dec 2011 17:32:02 -0500 Received: from mail-ww0-f46.google.com ([74.125.82.46]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RaxMm-0002GJ-Ed for 10287@debbugs.gnu.org; Wed, 14 Dec 2011 17:32:01 -0500 Received: by wgbdq10 with SMTP id dq10so2523347wgb.15 for <10287@debbugs.gnu.org>; Wed, 14 Dec 2011 14:30:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=iJQyXxdnVw/AnUUPL+eMztmUUur8hh/B3Xx+KxCjZTs=; b=MhiW3yMZUEGhU7fRkB4fC88/q+gqipnS/ED/OrBWmHXEH/EPK0b3XRHsHwmN1XNIoa e48zsoe8skFWFH5QRbPMMdku0J4IlU6YuWKp7Wo/bKZCuvPNjehv5Tz5wB2RCSf2HVJ4 ipoUvgAE1aMakiB/blWsXi1vmXHkN4r6VcQW0= Received: by 10.216.139.222 with SMTP id c72mr2163095wej.4.1323901833592; Wed, 14 Dec 2011 14:30:33 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.101.142 with HTTP; Wed, 14 Dec 2011 14:30:12 -0800 (PST) In-Reply-To: <20111213180955.GB19531@hysteria.proulx.com> References: <20111213042018.GA31333@hysteria.proulx.com> <20111213092951.2ab5a1c2@rowlf.zhilabs.net> <20111213180955.GB19531@hysteria.proulx.com> From: =?UTF-8?Q?St=C3=A9phane?= Blondon Date: Wed, 14 Dec 2011 23:30:12 +0100 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -4.8 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.4 (----) 2011/12/13 Bob Proulx : > Davide Brini wrote: >> Bob Proulx wrote: >> > =C2=A0 perl -lne 'print $_ if ! defined $a{$_}; $a{$_}=3D$_;' >> >> While we're at it, this is the typical awk way to do that: >> >> awk '!a[$0]++' Very great thanks to you and David about providing a one-liner solution! I've modified the awk version in order it works as an alias. I send it in case some one asks the same question: Copy-paste the next line in ~/.bash_aliases: alias uniqall=3D'awk '"'"'! a[$0]++'"'"'' Then you can filter like that: cat file | ... | uniqall | ... (tested with bash, version 4.2.20(1)-release under Debian Wheezy) Thanks and good bye, --=20 St=C3=A9phane