From unknown Sun Sep 21 07:50:28 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11220: uniq -d and -Du bug? Resent-From: phil colbourn Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 11 Apr 2012 06:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 11220 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 11220@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13341254814507 (code B ref -1); Wed, 11 Apr 2012 06:25:01 +0000 Received: (at submit) by debbugs.gnu.org; 11 Apr 2012 06:24:41 +0000 Received: from localhost ([127.0.0.1]:50124 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHqyu-0001Ad-Om for submit@debbugs.gnu.org; Wed, 11 Apr 2012 02:24:41 -0400 Received: from eggs.gnu.org ([208.118.235.92]:34148) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHqM8-0007gR-Lb for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:44:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SHqL3-0006y7-QI for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:43:31 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:54138) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqL3-0006y3-KC for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:43:29 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45451) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqL1-0005OF-Pq for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SHqKz-0006xe-UA for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:27 -0400 Received: from mail-ey0-f169.google.com ([209.85.215.169]:43323) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqKz-0006xX-Kp for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:25 -0400 Received: by eaal1 with SMTP id l1so105086eaa.0 for ; Tue, 10 Apr 2012 22:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=s8U+KpwHzIQghAfizD7m2fYPMgbbEhuR/mA5tr6Dg8Q=; b=ERupT2MzWJwKZeUYIoDt/gSKw7EsUzv0/0ExaiFbUYY31j20dIThko2Ml5/UsUvrmN YZ+rtuyIzripqos2Cjb3MsPSImo5RJ55t5xqEEBsiDBS0GIfKlpwoOf61QgwUXz6QjDq IJPb4R7Z6D8qJuFM3Ds5KHdbLCW/yEMoanK23UriPRXVBGn6v/aOycuuAvFHJsHdHuvT 80OWoFn+ktPUATgUzpQEoHWD3lNMXFAhVSmcUWwlZdQDd/V8vgaq7+0ropry/w/4duJm mDwNQddO5f0BhMErtlqemoZrONdQM+HtHt1SNKSZcLNcafUhjyZNqDmaTecstYhPIC9N YEkA== Received: by 10.213.22.9 with SMTP id l9mr871711ebb.287.1334123003054; Tue, 10 Apr 2012 22:43:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.2.133 with HTTP; Tue, 10 Apr 2012 22:43:02 -0700 (PDT) In-Reply-To: References: From: phil colbourn Date: Wed, 11 Apr 2012 15:43:02 +1000 Message-ID: Content-Type: multipart/alternative; boundary=0015174bf1d0af09ca04bd60b75d X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.1 (------) X-Mailman-Approved-At: Wed, 11 Apr 2012 02:24:39 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.1 (------) --0015174bf1d0af09ca04bd60b75d Content-Type: text/plain; charset=ISO-8859-1 What should this print? echo -e 'aa\naa\naa\n' | uniq -d To me this says: 1. uniqueness is defined by whole line so there is 1 unique value 'aa'; 2. -d option say to 'only print duplicate lines'; 3. 1st 'aa' is (so far) unique so it should NOT be printed; 4. 2nd 'aa' is not unique so it SHOULD be printed; and 5. 3rd 'aa' is not unique so it SHOULD also be printed. I think I should get this: aa aa But I get this: aa To see what duplicated line is printed I tried this: echo -e 'a1\na2\na3\na4\n' | uniq -d -w 1 a1 So, first line is printed. This is not what I expected at all. Now, -D means 'print all duplicate lines' and echo -e 'aa\naa\naa\n' | uniq -D prints what I expect it to: aa aa aa Now, -D and -u means 'print all duplicate lines' and 'only print unique lines'. I think this should print all lines since union of all unique lines and all duplicate lines is all lines. But, echo -e 'aa\naa\naa\n' | uniq -Du prints this: aa aa To see what lines are being printed I tried this: echo -e 'a1\na2\na3\na4\n' | uniq -Du -w 1 a1 a2 a3 Therefore -Du prints first N-1 matching lines and not last matching line. (Which is sort-of like what I expect -d to print) Are these bugs? --0015174bf1d0af09ca04bd60b75d Content-Type: text/html; charset=ISO-8859-1

What should this print?

echo -e 'aa\naa\naa\n' | uniq -d

To me this says:

1. uniqueness is defined by whole line so there is 1 unique value 'aa';
2. -d option say to 'only print duplicate lines';
3. 1st 'aa' is (so far) unique so it should NOT be printed;
4. 2nd 'aa' is not unique so it SHOULD be printed; and
5. 3rd 'aa' is not unique so it SHOULD also be printed.

I think I should get this:

aa
aa

But I get this:

aa

To see what duplicated line is printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -d -w 1
a1

So, first line is printed. This is not what I expected at all.



Now, -D means 'print all duplicate lines' and

echo -e 'aa\naa\naa\n' | uniq -D

prints what I expect it to:

aa
aa
aa

Now, -D and -u means 'print all duplicate lines' and 'only print unique lines'.

I think this should print all lines since union of all unique lines and all duplicate lines is all lines.

But,

echo -e 'aa\naa\naa\n' | uniq -Du

prints this:

aa
aa

To see what lines are being printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -Du -w 1
a1
a2
a3

Therefore -Du prints first N-1 matching lines and not last matching line.

(Which is sort-of like what I expect -d to print)

Are these bugs?



--0015174bf1d0af09ca04bd60b75d-- From unknown Sun Sep 21 07:50:28 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11220: Sorry, forgot version. References: In-Reply-To: Resent-From: phil colbourn Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 11 Apr 2012 10:09:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11220 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 11220@debbugs.gnu.org Received: via spool by 11220-submit@debbugs.gnu.org id=B11220.13341389338058 (code B ref 11220); Wed, 11 Apr 2012 10:09:01 +0000 Received: (at 11220) by debbugs.gnu.org; 11 Apr 2012 10:08:53 +0000 Received: from localhost ([127.0.0.1]:50392 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHuTt-00025v-DJ for submit@debbugs.gnu.org; Wed, 11 Apr 2012 06:08:53 -0400 Received: from mail-ee0-f44.google.com ([74.125.83.44]:62942) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHuTq-00025n-WD for 11220@debbugs.gnu.org; Wed, 11 Apr 2012 06:08:51 -0400 Received: by eeke51 with SMTP id e51so176224eek.3 for <11220@debbugs.gnu.org>; Wed, 11 Apr 2012 03:07:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=pIQP4CwpUwSRt20yG443mRNOk++oTO0Nu7vL1ob0ShE=; b=kfJ32MCGnWhpTgwAdCGo+h/mpmOXcssRefVlXDkn0W5l9QFTdg9NZFhn5V71zCU6uy 6H95bf+TDqWCbB2pa0/0pe0GFZUiGymgGRBs1z6LZgmX7B5ZFGjXKxJhgMMPxK+DOX97 FuQcG6l3OvV7TRm7B7QdEfW2v19TyOrmDTNRChQSvcNtiaddifnB+el/sSgha9i1AFWf DDhMK1FDXi2VsRuo5CUs1vvrYuPn2vKYVEtYcDoFpL3Udh4jbMIi/my4urT2fA/7B3j2 0HvZwVM5/f1dNsVtXFnv751lyqyn/zG/rgKSH7eisxuWXEpIXIu/DoNN8zIzL60JXCiJ LOEg== Received: by 10.213.25.147 with SMTP id z19mr929475ebb.173.1334138863816; Wed, 11 Apr 2012 03:07:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.2.133 with HTTP; Wed, 11 Apr 2012 03:07:23 -0700 (PDT) From: phil colbourn Date: Wed, 11 Apr 2012 20:07:23 +1000 Message-ID: Content-Type: multipart/alternative; boundary=0015174c3e2a0f0efb04bd6469c0 X-Spam-Score: -2.6 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) --0015174c3e2a0f0efb04bd6469c0 Content-Type: text/plain; charset=ISO-8859-1 uniq version 8.13 --0015174c3e2a0f0efb04bd6469c0 Content-Type: text/html; charset=ISO-8859-1 uniq version 8.13

--0015174c3e2a0f0efb04bd6469c0-- From unknown Sun Sep 21 07:50:28 2025 X-Loop: help-debbugs@gnu.org Subject: bug#11220: uniq -d and -Du bug? Resent-From: Jim Meyering Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 11 Apr 2012 12:01:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 11220 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: phil colbourn Cc: 11220@debbugs.gnu.org Received: via spool by 11220-submit@debbugs.gnu.org id=B11220.133414564328028 (code B ref 11220); Wed, 11 Apr 2012 12:01:01 +0000 Received: (at 11220) by debbugs.gnu.org; 11 Apr 2012 12:00:43 +0000 Received: from localhost ([127.0.0.1]:50531 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwE6-0007I0-Hi for submit@debbugs.gnu.org; Wed, 11 Apr 2012 08:00:43 -0400 Received: from mx.meyering.net ([88.168.87.75]:32819) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwE4-0007Hs-2E for 11220@debbugs.gnu.org; Wed, 11 Apr 2012 08:00:41 -0400 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 4994E60056; Wed, 11 Apr 2012 13:59:31 +0200 (CEST) From: Jim Meyering In-Reply-To: (phil colbourn's message of "Wed, 11 Apr 2012 15:43:02 +1000") References: Date: Wed, 11 Apr 2012 13:59:31 +0200 Message-ID: <87ehruiin0.fsf@rho.meyering.net> Lines: 52 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) phil colbourn wrote: > What should this print? > > echo -e 'aa\naa\naa\n' | uniq -d It's better to avoid echo -e. Use printf instead: printf 'aa\naa\naa\n' | uniq -d > To me this says: > > 1. uniqueness is defined by whole line so there is 1 unique value 'aa'; > 2. -d option say to 'only print duplicate lines'; When in doubt, follow the advice at the bottom of the man page and read the "real" (texinfo) documentation: The full documentation for uniq is maintained as a Texinfo manual. If the info and uniq programs are properly installed at your site, the command info coreutils 'uniq invocation' should give you access to the complete manual. > 3. 1st 'aa' is (so far) unique so it should NOT be printed; > 4. 2nd 'aa' is not unique so it SHOULD be printed; and > 5. 3rd 'aa' is not unique so it SHOULD also be printed. > > I think I should get this: > > aa > aa > > But I get this: > > aa Thanks for the report. The problem is that the description in the man page is too succinct, perhaps because -d means different things, depending on what other options you use it with. How is what you see inconsistent with the documentation? (info coreutils uniq) `-d' `--repeated' Discard lines that are not repeated. When used by itself, this option causes `uniq' to print the first copy of each repeated line, and nothing else. From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 08:08:44 2012 Received: (at control) by debbugs.gnu.org; 11 Apr 2012 12:08:44 +0000 Received: from localhost ([127.0.0.1]:50549 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwLr-0007Tu-Mj for submit@debbugs.gnu.org; Wed, 11 Apr 2012 08:08:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:17847) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwLo-0007Te-56; Wed, 11 Apr 2012 08:08:42 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q3BC7VhT017695 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 11 Apr 2012 08:07:32 -0400 Received: from [10.3.113.67] (ovpn-113-67.phx2.redhat.com [10.3.113.67]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q3BC7Vwj011677; Wed, 11 Apr 2012 08:07:31 -0400 Message-ID: <4F857402.2020806@redhat.com> Date: Wed, 11 Apr 2012 06:07:30 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: phil colbourn Subject: Re: bug#11220: uniq -d and -Du bug? References: In-Reply-To: X-Enigmail-Version: 1.4 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig8F8BEC4A708E8197F9058536" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: control Cc: 11220-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8F8BEC4A708E8197F9058536 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable tag 11220 notabug thanks On 04/10/2012 11:43 PM, phil colbourn wrote: > What should this print? >=20 > echo -e 'aa\naa\naa\n' | uniq -d Thanks for the report. POSIX requires this to print only a single instance of 'aa', whether or not -d is in effect; coreutils does this by outputting the last line in a series of duplicates. The point of -d is to suppress the single-line outputs that do not have a corresponding duplicate input, not to output all instances of a duplicated line. By the way, 'echo -e' is not portable; POSIX recommends you use printf instead. >=20 > Now, -D and -u means 'print all duplicate lines' and 'only print unique= > lines'. -D is not specified by POSIX. However, -u is defined by POSIX to suppress output lines that have a corresponding duplicate input. >=20 > I think this should print all lines since union of all unique lines and= all > duplicate lines is all lines. >=20 >=20 > Therefore -Du prints first N-1 matching lines and not last matching lin= e. In isolation, uniq prints the last instance of the duplicated line, and uniq -u suppresses the output of the 4th line. In isolation, -D says to output the first three lines which are normally omitted because they have duplicates, in addition to the 4th line that is printed by default. So in combination, -Du says to print the lines with subsequent duplicates (the first three lines) but to suppress the output line that corresponds to the last input line that ends a sequence of duplicates (the 4th line). Perhaps we can document this behavior better. Or perhaps we can change the behavior of -D (but at risk of breaking existing clients that depend on the current behavior). But we can't change -u or -d behavior. Put another way, per POSIX, the default behavior is subtractive (remove any line with a subsequent duplicate), -d is subtractive (remove any line with no duplicate), and -u is subtractive (remove any last line that had a prior duplicate), and GNU -D is additive (print any line with a subsequent duplicate, to counter the initial default). >=20 > Are these bugs? At this point, I will claim that the behavior is intended, and therefore close out the bug. But if you are willing to submit documentation patches, or even code patches accompanied by extensive test cases to demonstrate the corner cases of any new behavior, feel free to continue to reply to this bug report. --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enig8F8BEC4A708E8197F9058536 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPhXQCAAoJEKeha0olJ0Nq1ZAH/j0z02Sw50wquAvbcfCJiH8p A/33/0CTlIuko7ZFvcoq2rIEeBOKNq5qh4ECFBNuiol50XkJnQMu8P3A5LsVHx+H e5qRtaFrRt68zyCCrrkqntbikHGrCQp6SqkYyrolOZ5R3veVaOg5tQpDopxvJVtg rd0eULthaJi4R/OrJsM3QZB1JUmYm0ETku8stzW0WTuzUCuhWNTG6tK6iihj+e4O opEOhohxkJfWloPioKJLIPT4IvFFxiS5Hi80FRi7uJHU4eyj3+/wCkIDRThGSBA7 lwIxlem2locxxbGoaWrkGvU3vOq7haAJBkIwLf/XiekxPKZfZtOR2mQPBwi8/7I= =hFdR -----END PGP SIGNATURE----- --------------enig8F8BEC4A708E8197F9058536-- From unknown Sun Sep 21 07:50:28 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.428 (Entity 5.428) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: phil colbourn Subject: bug#11220: closed (Re: bug#11220: uniq -d and -Du bug?) Message-ID: References: <4F857402.2020806@redhat.com> X-Gnu-PR-Message: they-closed 11220 X-Gnu-PR-Package: coreutils X-Gnu-PR-Keywords: notabug Reply-To: 11220@debbugs.gnu.org Date: Wed, 11 Apr 2012 12:09:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1334146142-28802-1" This is a multi-part message in MIME format... ------------=_1334146142-28802-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #11220: uniq -d and -Du bug? which was filed against the coreutils package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 11220@debbugs.gnu.org. --=20 11220: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D11220 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1334146142-28802-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 11220-done) by debbugs.gnu.org; 11 Apr 2012 12:08:43 +0000 Received: from localhost ([127.0.0.1]:50547 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwLr-0007Ts-6b for submit@debbugs.gnu.org; Wed, 11 Apr 2012 08:08:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:17847) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwLo-0007Te-56; Wed, 11 Apr 2012 08:08:42 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q3BC7VhT017695 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 11 Apr 2012 08:07:32 -0400 Received: from [10.3.113.67] (ovpn-113-67.phx2.redhat.com [10.3.113.67]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q3BC7Vwj011677; Wed, 11 Apr 2012 08:07:31 -0400 Message-ID: <4F857402.2020806@redhat.com> Date: Wed, 11 Apr 2012 06:07:30 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: phil colbourn Subject: Re: bug#11220: uniq -d and -Du bug? References: In-Reply-To: X-Enigmail-Version: 1.4 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig8F8BEC4A708E8197F9058536" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 11220-done Cc: 11220-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8F8BEC4A708E8197F9058536 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable tag 11220 notabug thanks On 04/10/2012 11:43 PM, phil colbourn wrote: > What should this print? >=20 > echo -e 'aa\naa\naa\n' | uniq -d Thanks for the report. POSIX requires this to print only a single instance of 'aa', whether or not -d is in effect; coreutils does this by outputting the last line in a series of duplicates. The point of -d is to suppress the single-line outputs that do not have a corresponding duplicate input, not to output all instances of a duplicated line. By the way, 'echo -e' is not portable; POSIX recommends you use printf instead. >=20 > Now, -D and -u means 'print all duplicate lines' and 'only print unique= > lines'. -D is not specified by POSIX. However, -u is defined by POSIX to suppress output lines that have a corresponding duplicate input. >=20 > I think this should print all lines since union of all unique lines and= all > duplicate lines is all lines. >=20 >=20 > Therefore -Du prints first N-1 matching lines and not last matching lin= e. In isolation, uniq prints the last instance of the duplicated line, and uniq -u suppresses the output of the 4th line. In isolation, -D says to output the first three lines which are normally omitted because they have duplicates, in addition to the 4th line that is printed by default. So in combination, -Du says to print the lines with subsequent duplicates (the first three lines) but to suppress the output line that corresponds to the last input line that ends a sequence of duplicates (the 4th line). Perhaps we can document this behavior better. Or perhaps we can change the behavior of -D (but at risk of breaking existing clients that depend on the current behavior). But we can't change -u or -d behavior. Put another way, per POSIX, the default behavior is subtractive (remove any line with a subsequent duplicate), -d is subtractive (remove any line with no duplicate), and -u is subtractive (remove any last line that had a prior duplicate), and GNU -D is additive (print any line with a subsequent duplicate, to counter the initial default). >=20 > Are these bugs? At this point, I will claim that the behavior is intended, and therefore close out the bug. But if you are willing to submit documentation patches, or even code patches accompanied by extensive test cases to demonstrate the corner cases of any new behavior, feel free to continue to reply to this bug report. --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enig8F8BEC4A708E8197F9058536 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPhXQCAAoJEKeha0olJ0Nq1ZAH/j0z02Sw50wquAvbcfCJiH8p A/33/0CTlIuko7ZFvcoq2rIEeBOKNq5qh4ECFBNuiol50XkJnQMu8P3A5LsVHx+H e5qRtaFrRt68zyCCrrkqntbikHGrCQp6SqkYyrolOZ5R3veVaOg5tQpDopxvJVtg rd0eULthaJi4R/OrJsM3QZB1JUmYm0ETku8stzW0WTuzUCuhWNTG6tK6iihj+e4O opEOhohxkJfWloPioKJLIPT4IvFFxiS5Hi80FRi7uJHU4eyj3+/wCkIDRThGSBA7 lwIxlem2locxxbGoaWrkGvU3vOq7haAJBkIwLf/XiekxPKZfZtOR2mQPBwi8/7I= =hFdR -----END PGP SIGNATURE----- --------------enig8F8BEC4A708E8197F9058536-- ------------=_1334146142-28802-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 11 Apr 2012 06:24:41 +0000 Received: from localhost ([127.0.0.1]:50124 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHqyu-0001Ad-Om for submit@debbugs.gnu.org; Wed, 11 Apr 2012 02:24:41 -0400 Received: from eggs.gnu.org ([208.118.235.92]:34148) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHqM8-0007gR-Lb for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:44:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SHqL3-0006y7-QI for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:43:31 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:54138) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqL3-0006y3-KC for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:43:29 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45451) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqL1-0005OF-Pq for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SHqKz-0006xe-UA for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:27 -0400 Received: from mail-ey0-f169.google.com ([209.85.215.169]:43323) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqKz-0006xX-Kp for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:25 -0400 Received: by eaal1 with SMTP id l1so105086eaa.0 for ; Tue, 10 Apr 2012 22:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=s8U+KpwHzIQghAfizD7m2fYPMgbbEhuR/mA5tr6Dg8Q=; b=ERupT2MzWJwKZeUYIoDt/gSKw7EsUzv0/0ExaiFbUYY31j20dIThko2Ml5/UsUvrmN YZ+rtuyIzripqos2Cjb3MsPSImo5RJ55t5xqEEBsiDBS0GIfKlpwoOf61QgwUXz6QjDq IJPb4R7Z6D8qJuFM3Ds5KHdbLCW/yEMoanK23UriPRXVBGn6v/aOycuuAvFHJsHdHuvT 80OWoFn+ktPUATgUzpQEoHWD3lNMXFAhVSmcUWwlZdQDd/V8vgaq7+0ropry/w/4duJm mDwNQddO5f0BhMErtlqemoZrONdQM+HtHt1SNKSZcLNcafUhjyZNqDmaTecstYhPIC9N YEkA== Received: by 10.213.22.9 with SMTP id l9mr871711ebb.287.1334123003054; Tue, 10 Apr 2012 22:43:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.2.133 with HTTP; Tue, 10 Apr 2012 22:43:02 -0700 (PDT) In-Reply-To: References: From: phil colbourn Date: Wed, 11 Apr 2012 15:43:02 +1000 Message-ID: Subject: uniq -d and -Du bug? To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary=0015174bf1d0af09ca04bd60b75d X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.1 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 11 Apr 2012 02:24:39 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.1 (------) --0015174bf1d0af09ca04bd60b75d Content-Type: text/plain; charset=ISO-8859-1 What should this print? echo -e 'aa\naa\naa\n' | uniq -d To me this says: 1. uniqueness is defined by whole line so there is 1 unique value 'aa'; 2. -d option say to 'only print duplicate lines'; 3. 1st 'aa' is (so far) unique so it should NOT be printed; 4. 2nd 'aa' is not unique so it SHOULD be printed; and 5. 3rd 'aa' is not unique so it SHOULD also be printed. I think I should get this: aa aa But I get this: aa To see what duplicated line is printed I tried this: echo -e 'a1\na2\na3\na4\n' | uniq -d -w 1 a1 So, first line is printed. This is not what I expected at all. Now, -D means 'print all duplicate lines' and echo -e 'aa\naa\naa\n' | uniq -D prints what I expect it to: aa aa aa Now, -D and -u means 'print all duplicate lines' and 'only print unique lines'. I think this should print all lines since union of all unique lines and all duplicate lines is all lines. But, echo -e 'aa\naa\naa\n' | uniq -Du prints this: aa aa To see what lines are being printed I tried this: echo -e 'a1\na2\na3\na4\n' | uniq -Du -w 1 a1 a2 a3 Therefore -Du prints first N-1 matching lines and not last matching line. (Which is sort-of like what I expect -d to print) Are these bugs? --0015174bf1d0af09ca04bd60b75d Content-Type: text/html; charset=ISO-8859-1

What should this print?

echo -e 'aa\naa\naa\n' | uniq -d

To me this says:

1. uniqueness is defined by whole line so there is 1 unique value 'aa';
2. -d option say to 'only print duplicate lines';
3. 1st 'aa' is (so far) unique so it should NOT be printed;
4. 2nd 'aa' is not unique so it SHOULD be printed; and
5. 3rd 'aa' is not unique so it SHOULD also be printed.

I think I should get this:

aa
aa

But I get this:

aa

To see what duplicated line is printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -d -w 1
a1

So, first line is printed. This is not what I expected at all.



Now, -D means 'print all duplicate lines' and

echo -e 'aa\naa\naa\n' | uniq -D

prints what I expect it to:

aa
aa
aa

Now, -D and -u means 'print all duplicate lines' and 'only print unique lines'.

I think this should print all lines since union of all unique lines and all duplicate lines is all lines.

But,

echo -e 'aa\naa\naa\n' | uniq -Du

prints this:

aa
aa

To see what lines are being printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -Du -w 1
a1
a2
a3

Therefore -Du prints first N-1 matching lines and not last matching line.

(Which is sort-of like what I expect -d to print)

Are these bugs?



--0015174bf1d0af09ca04bd60b75d-- ------------=_1334146142-28802-1--