GNU bug report logs - #11220
uniq -d and -Du bug?

Previous Next

Package: coreutils;

Reported by: phil colbourn <philcolbourn <at> gmail.com>

Date: Wed, 11 Apr 2012 06:25:01 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: phil colbourn <philcolbourn <at> gmail.com>
Subject: bug#11220: closed (Re: bug#11220: uniq -d and -Du bug?)
Date: Wed, 11 Apr 2012 12:09:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#11220: uniq -d and -Du bug?

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 11220 <at> debbugs.gnu.org.

-- 
11220: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11220
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Eric Blake <eblake <at> redhat.com>
To: phil colbourn <philcolbourn <at> gmail.com>
Cc: 11220-done <at> debbugs.gnu.org
Subject: Re: bug#11220: uniq -d and -Du bug?
Date: Wed, 11 Apr 2012 06:07:30 -0600
[Message part 3 (text/plain, inline)]
tag 11220 notabug
thanks

On 04/10/2012 11:43 PM, phil colbourn wrote:
> What should this print?
> 
> echo -e 'aa\naa\naa\n' | uniq -d

Thanks for the report.  POSIX requires this to print only a single
instance of 'aa', whether or not -d is in effect; coreutils does this by
outputting the last line in a series of duplicates.  The point of -d is
to suppress the single-line outputs that do not have a corresponding
duplicate input, not to output all instances of a duplicated line.

By the way, 'echo -e' is not portable; POSIX recommends you use printf
instead.

> 
> Now, -D and -u means 'print all duplicate lines' and 'only print unique
> lines'.

-D is not specified by POSIX.  However, -u is defined by POSIX to
suppress output lines that have a corresponding duplicate input.

> 
> I think this should print all lines since union of all unique lines and all
> duplicate lines is all lines.
> 

> 
> Therefore -Du prints first N-1 matching lines and not last matching line.

In isolation, uniq prints the last instance of the duplicated line, and
uniq -u suppresses the output of the 4th line.  In isolation, -D says to
output the first three lines which are normally omitted because they
have duplicates, in addition to the 4th line that is printed by default.
 So in combination, -Du says to print the lines with subsequent
duplicates (the first three lines) but to suppress the output line that
corresponds to the last input line that ends a sequence of duplicates
(the 4th line).

Perhaps we can document this behavior better.  Or perhaps we can change
the behavior of -D (but at risk of breaking existing clients that depend
on the current behavior).  But we can't change -u or -d behavior.

Put another way, per POSIX, the default behavior is subtractive (remove
any line with a subsequent duplicate), -d is subtractive (remove any
line with no duplicate), and -u is subtractive (remove any last line
that had a prior duplicate), and GNU -D is additive (print any line with
a subsequent duplicate, to counter the initial default).

> 
> Are these bugs?

At this point, I will claim that the behavior is intended, and therefore
close out the bug.  But if you are willing to submit documentation
patches, or even code patches accompanied by extensive test cases to
demonstrate the corner cases of any new behavior, feel free to continue
to reply to this bug report.

-- 
Eric Blake   eblake <at> redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]
[Message part 5 (message/rfc822, inline)]
From: phil colbourn <philcolbourn <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: uniq -d and -Du bug?
Date: Wed, 11 Apr 2012 15:43:02 +1000
[Message part 6 (text/plain, inline)]
What should this print?

echo -e 'aa\naa\naa\n' | uniq -d

To me this says:

1. uniqueness is defined by whole line so there is 1 unique value 'aa';
2. -d option say to 'only print duplicate lines';
3. 1st 'aa' is (so far) unique so it should NOT be printed;
4. 2nd 'aa' is not unique so it SHOULD be printed; and
5. 3rd 'aa' is not unique so it SHOULD also be printed.

I think I should get this:

aa
aa

But I get this:

aa

To see what duplicated line is printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -d -w 1
a1

So, first line is printed. This is not what I expected at all.



Now, -D means 'print all duplicate lines' and

echo -e 'aa\naa\naa\n' | uniq -D

prints what I expect it to:

aa
aa
aa

Now, -D and -u means 'print all duplicate lines' and 'only print unique
lines'.

I think this should print all lines since union of all unique lines and all
duplicate lines is all lines.

But,

echo -e 'aa\naa\naa\n' | uniq -Du

prints this:

aa
aa

To see what lines are being printed I tried this:

echo -e 'a1\na2\na3\na4\n' | uniq -Du -w 1
a1
a2
a3

Therefore -Du prints first N-1 matching lines and not last matching line.

(Which is sort-of like what I expect -d to print)

Are these bugs?
[Message part 7 (text/html, inline)]

This bug report was last modified 13 years and 136 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.