From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 02:24:41 2012 Received: (at submit) by debbugs.gnu.org; 11 Apr 2012 06:24:41 +0000 Received: from localhost ([127.0.0.1]:50124 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHqyu-0001Ad-Om for submit@debbugs.gnu.org; Wed, 11 Apr 2012 02:24:41 -0400 Received: from eggs.gnu.org ([208.118.235.92]:34148) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHqM8-0007gR-Lb for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:44:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SHqL3-0006y7-QI for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:43:31 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:54138) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqL3-0006y3-KC for submit@debbugs.gnu.org; Wed, 11 Apr 2012 01:43:29 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45451) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqL1-0005OF-Pq for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SHqKz-0006xe-UA for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:27 -0400 Received: from mail-ey0-f169.google.com ([209.85.215.169]:43323) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SHqKz-0006xX-Kp for bug-coreutils@gnu.org; Wed, 11 Apr 2012 01:43:25 -0400 Received: by eaal1 with SMTP id l1so105086eaa.0 for ; Tue, 10 Apr 2012 22:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=s8U+KpwHzIQghAfizD7m2fYPMgbbEhuR/mA5tr6Dg8Q=; b=ERupT2MzWJwKZeUYIoDt/gSKw7EsUzv0/0ExaiFbUYY31j20dIThko2Ml5/UsUvrmN YZ+rtuyIzripqos2Cjb3MsPSImo5RJ55t5xqEEBsiDBS0GIfKlpwoOf61QgwUXz6QjDq IJPb4R7Z6D8qJuFM3Ds5KHdbLCW/yEMoanK23UriPRXVBGn6v/aOycuuAvFHJsHdHuvT 80OWoFn+ktPUATgUzpQEoHWD3lNMXFAhVSmcUWwlZdQDd/V8vgaq7+0ropry/w/4duJm mDwNQddO5f0BhMErtlqemoZrONdQM+HtHt1SNKSZcLNcafUhjyZNqDmaTecstYhPIC9N YEkA== Received: by 10.213.22.9 with SMTP id l9mr871711ebb.287.1334123003054; Tue, 10 Apr 2012 22:43:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.2.133 with HTTP; Tue, 10 Apr 2012 22:43:02 -0700 (PDT) In-Reply-To: References: From: phil colbourn Date: Wed, 11 Apr 2012 15:43:02 +1000 Message-ID: Subject: uniq -d and -Du bug? To: bug-coreutils@gnu.org Content-Type: multipart/alternative; boundary=0015174bf1d0af09ca04bd60b75d X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 208.118.235.17 X-Spam-Score: -6.1 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 11 Apr 2012 02:24:39 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.1 (------) --0015174bf1d0af09ca04bd60b75d Content-Type: text/plain; charset=ISO-8859-1 What should this print? echo -e 'aa\naa\naa\n' | uniq -d To me this says: 1. uniqueness is defined by whole line so there is 1 unique value 'aa'; 2. -d option say to 'only print duplicate lines'; 3. 1st 'aa' is (so far) unique so it should NOT be printed; 4. 2nd 'aa' is not unique so it SHOULD be printed; and 5. 3rd 'aa' is not unique so it SHOULD also be printed. I think I should get this: aa aa But I get this: aa To see what duplicated line is printed I tried this: echo -e 'a1\na2\na3\na4\n' | uniq -d -w 1 a1 So, first line is printed. This is not what I expected at all. Now, -D means 'print all duplicate lines' and echo -e 'aa\naa\naa\n' | uniq -D prints what I expect it to: aa aa aa Now, -D and -u means 'print all duplicate lines' and 'only print unique lines'. I think this should print all lines since union of all unique lines and all duplicate lines is all lines. But, echo -e 'aa\naa\naa\n' | uniq -Du prints this: aa aa To see what lines are being printed I tried this: echo -e 'a1\na2\na3\na4\n' | uniq -Du -w 1 a1 a2 a3 Therefore -Du prints first N-1 matching lines and not last matching line. (Which is sort-of like what I expect -d to print) Are these bugs? --0015174bf1d0af09ca04bd60b75d Content-Type: text/html; charset=ISO-8859-1

What should this print?

echo -e 'aa\naa\naa\n' | uniq -d

To me this says:

1. uniqueness is defined by whole line so there is 1 unique value 'aa';
2. -d option say to 'only print duplicate lines';
3. 1st 'aa' is (so far) unique so it should NOT be printed;
4. 2nd 'aa' is not unique so it SHOULD be printed; and
5. 3rd 'aa' is not unique so it SHOULD also be printed.

I think I should get this:

aa
aa

But I get this:

aa

To see what duplicated line is printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -d -w 1
a1

So, first line is printed. This is not what I expected at all.



Now, -D means 'print all duplicate lines' and

echo -e 'aa\naa\naa\n' | uniq -D

prints what I expect it to:

aa
aa
aa

Now, -D and -u means 'print all duplicate lines' and 'only print unique lines'.

I think this should print all lines since union of all unique lines and all duplicate lines is all lines.

But,

echo -e 'aa\naa\naa\n' | uniq -Du

prints this:

aa
aa

To see what lines are being printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -Du -w 1
a1
a2
a3

Therefore -Du prints first N-1 matching lines and not last matching line.

(Which is sort-of like what I expect -d to print)

Are these bugs?



--0015174bf1d0af09ca04bd60b75d-- From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 06:08:53 2012 Received: (at 11220) by debbugs.gnu.org; 11 Apr 2012 10:08:53 +0000 Received: from localhost ([127.0.0.1]:50392 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHuTt-00025v-DJ for submit@debbugs.gnu.org; Wed, 11 Apr 2012 06:08:53 -0400 Received: from mail-ee0-f44.google.com ([74.125.83.44]:62942) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHuTq-00025n-WD for 11220@debbugs.gnu.org; Wed, 11 Apr 2012 06:08:51 -0400 Received: by eeke51 with SMTP id e51so176224eek.3 for <11220@debbugs.gnu.org>; Wed, 11 Apr 2012 03:07:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=pIQP4CwpUwSRt20yG443mRNOk++oTO0Nu7vL1ob0ShE=; b=kfJ32MCGnWhpTgwAdCGo+h/mpmOXcssRefVlXDkn0W5l9QFTdg9NZFhn5V71zCU6uy 6H95bf+TDqWCbB2pa0/0pe0GFZUiGymgGRBs1z6LZgmX7B5ZFGjXKxJhgMMPxK+DOX97 FuQcG6l3OvV7TRm7B7QdEfW2v19TyOrmDTNRChQSvcNtiaddifnB+el/sSgha9i1AFWf DDhMK1FDXi2VsRuo5CUs1vvrYuPn2vKYVEtYcDoFpL3Udh4jbMIi/my4urT2fA/7B3j2 0HvZwVM5/f1dNsVtXFnv751lyqyn/zG/rgKSH7eisxuWXEpIXIu/DoNN8zIzL60JXCiJ LOEg== Received: by 10.213.25.147 with SMTP id z19mr929475ebb.173.1334138863816; Wed, 11 Apr 2012 03:07:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.2.133 with HTTP; Wed, 11 Apr 2012 03:07:23 -0700 (PDT) From: phil colbourn Date: Wed, 11 Apr 2012 20:07:23 +1000 Message-ID: Subject: Sorry, forgot version. To: 11220@debbugs.gnu.org Content-Type: multipart/alternative; boundary=0015174c3e2a0f0efb04bd6469c0 X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 11220 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) --0015174c3e2a0f0efb04bd6469c0 Content-Type: text/plain; charset=ISO-8859-1 uniq version 8.13 --0015174c3e2a0f0efb04bd6469c0 Content-Type: text/html; charset=ISO-8859-1 uniq version 8.13

--0015174c3e2a0f0efb04bd6469c0-- From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 08:00:43 2012 Received: (at 11220) by debbugs.gnu.org; 11 Apr 2012 12:00:43 +0000 Received: from localhost ([127.0.0.1]:50531 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwE6-0007I0-Hi for submit@debbugs.gnu.org; Wed, 11 Apr 2012 08:00:43 -0400 Received: from mx.meyering.net ([88.168.87.75]:32819) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwE4-0007Hs-2E for 11220@debbugs.gnu.org; Wed, 11 Apr 2012 08:00:41 -0400 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 4994E60056; Wed, 11 Apr 2012 13:59:31 +0200 (CEST) From: Jim Meyering To: phil colbourn Subject: Re: bug#11220: uniq -d and -Du bug? In-Reply-To: (phil colbourn's message of "Wed, 11 Apr 2012 15:43:02 +1000") References: Date: Wed, 11 Apr 2012 13:59:31 +0200 Message-ID: <87ehruiin0.fsf@rho.meyering.net> Lines: 52 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 11220 Cc: 11220@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) phil colbourn wrote: > What should this print? > > echo -e 'aa\naa\naa\n' | uniq -d It's better to avoid echo -e. Use printf instead: printf 'aa\naa\naa\n' | uniq -d > To me this says: > > 1. uniqueness is defined by whole line so there is 1 unique value 'aa'; > 2. -d option say to 'only print duplicate lines'; When in doubt, follow the advice at the bottom of the man page and read the "real" (texinfo) documentation: The full documentation for uniq is maintained as a Texinfo manual. If the info and uniq programs are properly installed at your site, the command info coreutils 'uniq invocation' should give you access to the complete manual. > 3. 1st 'aa' is (so far) unique so it should NOT be printed; > 4. 2nd 'aa' is not unique so it SHOULD be printed; and > 5. 3rd 'aa' is not unique so it SHOULD also be printed. > > I think I should get this: > > aa > aa > > But I get this: > > aa Thanks for the report. The problem is that the description in the man page is too succinct, perhaps because -d means different things, depending on what other options you use it with. How is what you see inconsistent with the documentation? (info coreutils uniq) `-d' `--repeated' Discard lines that are not repeated. When used by itself, this option causes `uniq' to print the first copy of each repeated line, and nothing else. From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 11 08:08:44 2012 Received: (at control) by debbugs.gnu.org; 11 Apr 2012 12:08:44 +0000 Received: from localhost ([127.0.0.1]:50549 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwLr-0007Tu-Mj for submit@debbugs.gnu.org; Wed, 11 Apr 2012 08:08:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:17847) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SHwLo-0007Te-56; Wed, 11 Apr 2012 08:08:42 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q3BC7VhT017695 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 11 Apr 2012 08:07:32 -0400 Received: from [10.3.113.67] (ovpn-113-67.phx2.redhat.com [10.3.113.67]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q3BC7Vwj011677; Wed, 11 Apr 2012 08:07:31 -0400 Message-ID: <4F857402.2020806@redhat.com> Date: Wed, 11 Apr 2012 06:07:30 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120329 Thunderbird/11.0.1 MIME-Version: 1.0 To: phil colbourn Subject: Re: bug#11220: uniq -d and -Du bug? References: In-Reply-To: X-Enigmail-Version: 1.4 OpenPGP: url=http://people.redhat.com/eblake/eblake.gpg Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="------------enig8F8BEC4A708E8197F9058536" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: control Cc: 11220-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig8F8BEC4A708E8197F9058536 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable tag 11220 notabug thanks On 04/10/2012 11:43 PM, phil colbourn wrote: > What should this print? >=20 > echo -e 'aa\naa\naa\n' | uniq -d Thanks for the report. POSIX requires this to print only a single instance of 'aa', whether or not -d is in effect; coreutils does this by outputting the last line in a series of duplicates. The point of -d is to suppress the single-line outputs that do not have a corresponding duplicate input, not to output all instances of a duplicated line. By the way, 'echo -e' is not portable; POSIX recommends you use printf instead. >=20 > Now, -D and -u means 'print all duplicate lines' and 'only print unique= > lines'. -D is not specified by POSIX. However, -u is defined by POSIX to suppress output lines that have a corresponding duplicate input. >=20 > I think this should print all lines since union of all unique lines and= all > duplicate lines is all lines. >=20 >=20 > Therefore -Du prints first N-1 matching lines and not last matching lin= e. In isolation, uniq prints the last instance of the duplicated line, and uniq -u suppresses the output of the 4th line. In isolation, -D says to output the first three lines which are normally omitted because they have duplicates, in addition to the 4th line that is printed by default. So in combination, -Du says to print the lines with subsequent duplicates (the first three lines) but to suppress the output line that corresponds to the last input line that ends a sequence of duplicates (the 4th line). Perhaps we can document this behavior better. Or perhaps we can change the behavior of -D (but at risk of breaking existing clients that depend on the current behavior). But we can't change -u or -d behavior. Put another way, per POSIX, the default behavior is subtractive (remove any line with a subsequent duplicate), -d is subtractive (remove any line with no duplicate), and -u is subtractive (remove any last line that had a prior duplicate), and GNU -D is additive (print any line with a subsequent duplicate, to counter the initial default). >=20 > Are these bugs? At this point, I will claim that the behavior is intended, and therefore close out the bug. But if you are willing to submit documentation patches, or even code patches accompanied by extensive test cases to demonstrate the corner cases of any new behavior, feel free to continue to reply to this bug report. --=20 Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --------------enig8F8BEC4A708E8197F9058536 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJPhXQCAAoJEKeha0olJ0Nq1ZAH/j0z02Sw50wquAvbcfCJiH8p A/33/0CTlIuko7ZFvcoq2rIEeBOKNq5qh4ECFBNuiol50XkJnQMu8P3A5LsVHx+H e5qRtaFrRt68zyCCrrkqntbikHGrCQp6SqkYyrolOZ5R3veVaOg5tQpDopxvJVtg rd0eULthaJi4R/OrJsM3QZB1JUmYm0ETku8stzW0WTuzUCuhWNTG6tK6iihj+e4O opEOhohxkJfWloPioKJLIPT4IvFFxiS5Hi80FRi7uJHU4eyj3+/wCkIDRThGSBA7 lwIxlem2locxxbGoaWrkGvU3vOq7haAJBkIwLf/XiekxPKZfZtOR2mQPBwi8/7I= =hFdR -----END PGP SIGNATURE----- --------------enig8F8BEC4A708E8197F9058536-- From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 12 08:49:16 2012 Received: (at 11220-done) by debbugs.gnu.org; 12 Apr 2012 12:49:16 +0000 Received: from localhost ([127.0.0.1]:53486 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SIJSd-00038l-Jt for submit@debbugs.gnu.org; Thu, 12 Apr 2012 08:49:16 -0400 Received: from mail-ee0-f44.google.com ([74.125.83.44]:38979) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1SIJSa-00038d-So for 11220-done@debbugs.gnu.org; Thu, 12 Apr 2012 08:49:13 -0400 Received: by eeke51 with SMTP id e51so526666eek.3 for <11220-done@debbugs.gnu.org>; Thu, 12 Apr 2012 05:47:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=LLOOQbSoRM95E8OTuraYXq5TjULl/a/eJqlMjBdo71s=; b=bqR49oTyICwq9OJii/xo2051Ij2MuxPslAF/A0TeQ6DDSN2v3pRXs4UODkkK8IwlDD wyQvklfUpxNrw0vNAZ+C95i/M7bIPLQY7YwADSrUwtva2Yau7bQws5/v28Iqd6nxXi8P AT94Ak6pUQKi14+jLKZhPzHp/tcccGmyFrvSw/+M2nC0qYiUa10lsHF2/s1jFK2bqDvR bemnBYvhUme7JzIwX/htq6ixwIjfm53czv3P0lhj82BZkTHIosB3xxn6IP1bJIjPD9z5 4YnkIBdcMlgilh2ulHSJRlXPMPRyVYL3vGRNAl4bDszqdtbHirYrR1KMNlVvftssXFPm KQLQ== Received: by 10.14.127.139 with SMTP id d11mr328403eei.74.1334234879471; Thu, 12 Apr 2012 05:47:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.2.133 with HTTP; Thu, 12 Apr 2012 05:47:39 -0700 (PDT) In-Reply-To: <4F857402.2020806@redhat.com> References: <4F857402.2020806@redhat.com> From: phil colbourn Date: Thu, 12 Apr 2012 22:47:39 +1000 Message-ID: Subject: Re: bug#11220: uniq -d and -Du bug? To: 11220-done@debbugs.gnu.org Content-Type: multipart/alternative; boundary=90e6ba6153e009b1ce04bd7ac494 X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 11220-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) --90e6ba6153e009b1ce04bd7ac494 Content-Type: text/plain; charset=ISO-8859-1 Thanks Jim and Eric for your replies. Jim, perhaps info's version of option -d should be used in uniq's man page? In isolation, uniq prints the last instance of the duplicated line, and > I think it prints the first line, not the last: printf "a1\na2\na3\na4" | uniq -w 1 a1 > uniq -u suppresses the output of the 4th line. I have lost you here. -u suppresses any lines with duplicates. printf "a1\na2\na3\na4" | uniq -u -w 1 (no output) I suspect you mean -Du? printf "a1\na2\na3\na4" | uniq -Du -w 1 a1 a2 a3 > In isolation, -D says to > output the first three lines which are normally omitted because they > have duplicates, in addition to the 4th line that is printed by default. > But, printf "a1\na2\na3\na4" | uniq -D -w 1 a1 a2 a3 a4 and default is this: printf "a1\na2\na3\na4" | uniq -w 1 a1 So, if I understand your logic correctly and if I correct your logic by referring to the 1st and not the 4th duplicate then -Du should give me a2 a3 a4 > So in combination, -Du says to print the lines with subsequent > duplicates (the first three lines) but to suppress the output line that > corresponds to the last input line that ends a sequence of duplicates > (the 4th line). > > > Perhaps we can document this behavior better. Or perhaps we can change > the behavior of -D (but at risk of breaking existing clients that depend > on the current behavior). But we can't change -u or -d behavior. > > I think changing behaviour of a utility is dangerous - I thought it was a bug, but both you and Jim have indicated that it is poor documentation. > Put another way, per POSIX, the default behavior is subtractive (remove > any line with a subsequent duplicate), -d is subtractive (remove any > line with no duplicate), and -u is subtractive (remove any last line > that had a prior duplicate), and GNU -D is additive (print any line with > a subsequent duplicate, to counter the initial default). > > Whilst not exactly following your previous notes, I don't think this explains uniq's behaviour. At this point, I will claim that the behavior is intended, and therefore > close out the bug. But if you are willing to submit documentation > patches, or even code patches accompanied by extensive test cases to > demonstrate the corner cases of any new behavior, feel free to continue > to reply to this bug report. > > Having now read info's uniq pages, I think that 1) -Du is undefined behaviour 2) current output makes no sense. eg. printf "1\na1\na2\na3\na4\n2" | uniq -w 1 1 a1 2 (comply) printf "1\na1\na2\na3\na4\n2" | uniq -u -w 1 1 2 (comply) printf "1\na1\na2\na3\na4\n2" | uniq -D -w 1 a1 a2 a3 a4 (comply) printf "1\na1\na2\na3\na4\n2" | uniq -Du -w 1 a1 a2 a3 I think this one makes no sense. But... this behaviour IS exactly what I need so if this can be documented then I would be happy - others might not be. On a side point, why can't the official info pages be automagically converted into man pages to avoid discrepancies? --90e6ba6153e009b1ce04bd7ac494 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Jim and Eric for your replies.

Jim, perhaps info&= #39;s version of option -d should be used in uniq's man page?

In isolation, uniq prints the last instance of the duplicated line, and
=

I think it prints the first line, not the = last:

printf "a1\na2\na3\na4" | uni= q -w 1
a1
=A0
uniq -u suppresses the output of the 4th line.

<= div>I have lost you here. -u suppresses any lines with duplicates.

printf "a1\na2\na3\na4" | uniq -u -w 1
(no output)

I suspect you mean -Du?

printf "a1\na2\na3\na4" | uniq -Du -w = 1
a1
a2
a3
=A0
=A0In isolation, -D says to
output the first three lines which are normally omitted because they
have duplicates, in addition to the 4th line that is printed by default.

But,

printf &quo= t;a1\na2\na3\na4" | uniq -D -w 1
a1
a2
a3
a4

and default is this:
printf "a1\na2\na3\na4" | uniq -w 1
a1=

So, if I understand your logic correctly an= d if I correct your logic by referring to the 1st and not the 4th duplicate= then -Du should give me

a2
a3
a4
=A0
=A0So in combination, -Du says to print the lines with subsequent
duplicates (the first three lines) but to suppress the output line that
corresponds to the last input line that ends a sequence of duplicates
(the 4th line).



=A0
Perhaps we can document this behavior better. =A0Or perhaps we can change the behavior of -D (but at risk of breaking existing clients that depend on the current behavior). =A0But we can't change -u or -d behavior.


I think changing=A0behaviour=A0of a ut= ility is dangerous - I thought it was a bug, but both you and Jim have indi= cated that it is poor documentation.

=A0
Put another way, per POSIX, the default behavior is subtractive (remove
any line with a subsequent duplicate), -d is subtractive (remove any
line with no duplicate), and -u is subtractive (remove any last line
that had a prior duplicate), and GNU -D is additive (print any line with a subsequent duplicate, to counter the initial default).


Whilst not exactly following your prev= ious notes, I don't think this explains uniq's behaviour.

At this point, I will claim that the behavior is intended, and therefore close out the bug. =A0But if you are willing to submit documentation
patches, or even code patches accompanied by extensive test cases to
demonstrate the corner cases of any new behavior, feel free to continue
to reply to this bug report.

=A0
Having now read info's uniq pages, I think that

1) -Du is undefined behaviour
2) current = output makes no sense.

eg.

printf "1\na1\na= 2\na3\na4\n2" | uniq -w 1
1
a1
2
(comply)

printf "1\na1\na2\na3\= na4\n2" | uniq -u -w 1
1
2
(comply)

p= rintf "1\na1\na2\na3\na4\n2" | uniq -D -w 1
a1
a2
a3
a4
(comply)

printf "1\na1\na2\na3\na4\n2" | uniq -Du -w 1
a1
a2
a3

I think this o= ne makes no sense.

But... this behaviour IS exactl= y what I need so if this can be documented then I would be happy - others m= ight not be.


On a side point, why can't the offic= ial info pages be automagically converted into man pages to avoid=A0discrep= ancies?

--90e6ba6153e009b1ce04bd7ac494-- From unknown Sun Sep 21 07:49:16 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Fri, 11 May 2012 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator