GNU bug report logs - #44704
uniq: replace repeated lines with a message about how many repeated lines

Previous Next

Package: coreutils;

Reported by: "Brian J. Murrell" <brian <at> interlinx.bc.ca>

Date: Tue, 17 Nov 2020 14:14:01 UTC

Severity: wishlist

Tags: notabug

To reply to this bug, email your comments to 44704 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#44704; Package coreutils. (Tue, 17 Nov 2020 14:14:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Brian J. Murrell" <brian <at> interlinx.bc.ca>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 17 Nov 2020 14:14:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Brian J. Murrell" <brian <at> interlinx.bc.ca>
To: bug-coreutils <at> gnu.org
Subject: uniq: replace repeated lines with a message about how many repeated
 lines
Date: Tue, 17 Nov 2020 08:32:36 -0500
[Message part 1 (text/plain, inline)]
It would be a useful enhancement to uniq to replace all lines
considered non-uniq (i.e. those that would be removed from the output)
with a message about how many times the previous line was repeated.

I.e.

$ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
first line
second line
repeated line
repeated line
repeated line
repeated line
repeated line
third line
EOF
first line
second line
repeated line
[previous line repeated 4 times]
third
line

Cheers,
b.


[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#44704; Package coreutils. (Tue, 17 Nov 2020 15:06:02 GMT) Full text and rfc822 format available.

Message #8 received at 44704 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: "Brian J. Murrell" <brian <at> interlinx.bc.ca>, 44704 <at> debbugs.gnu.org
Subject: Re: bug#44704: uniq: replace repeated lines with a message about how
 many repeated lines
Date: Tue, 17 Nov 2020 08:05:46 -0700
tag 44704 notabug
severity 44704 wishlist
stop

Hello,

On 2020-11-17 6:32 a.m., Brian J. Murrell wrote:
> It would be a useful enhancement to uniq to replace all lines
> considered non-uniq (i.e. those that would be removed from the output)
> with a message about how many times the previous line was repeated.
> 
> I.e.
> 
> $ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
[...]

uniq supports the "--group" option, which adds a blank line after each
group of identical lines - this can be used down-stream to process
groups in any way you want.

Example:
  $ cat <<EOF > in
  first line
  second line
  repeated line
  repeated line
  repeated line
  repeated line
  repeated line
  third line
  EOF

  $ cat in | uniq --group=append
  first line

  second line

  repeated line
  repeated line
  repeated line
  repeated line
  repeated line

  third line


  $ cat in | uniq --group=append \
      | awk '$0=="" { print "do something after group" ; next } ;
             1 { print }'
  first line
  do something after group
  second line
  do something after group
  repeated line
  repeated line
  repeated line
  repeated line
  repeated line
  do something after group
  third line
  do something after group

And with counting:

$ cat in | uniq --group=append \
     | awk 'BEGIN { c = 0 } ;
            $0=="" { print "Group has " c " lines" ; c=0 ; next } ;
            1 { print ; c++ }'
  first line
  Group has 1 lines
  second line
  Group has 1 lines
  repeated line
  repeated line
  repeated line
  repeated line
  repeated line
  Group has 5 lines
  third line
  Group has 1 lines


Hope this helps.
More information about "uniq --group=X" is here:

https://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html

I'm marking this as "notabug/wishlist", but will likely close soon as
"wontfix" unless we come up with convincing argument why "--group"
is not sufficient for your use case.

Regardless of the status, discussion can continue by replying to this 
thread.

regards,
 - assaf





Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 17 Nov 2020 15:06:03 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 17 Nov 2020 15:06:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#44704; Package coreutils. (Tue, 17 Nov 2020 15:29:01 GMT) Full text and rfc822 format available.

Message #15 received at 44704 <at> debbugs.gnu.org (full text, mbox):

From: "Brian J. Murrell" <brian <at> interlinx.bc.ca>
To: Assaf Gordon <assafgordon <at> gmail.com>, 44704 <at> debbugs.gnu.org
Subject: Re: bug#44704: uniq: replace repeated lines with a message about
 how many repeated lines
Date: Tue, 17 Nov 2020 10:28:53 -0500
[Message part 1 (text/plain, inline)]
On Tue, 2020-11-17 at 08:05 -0700, Assaf Gordon wrote:
> 
> Hello,

Hi,

> uniq supports the "--group" option, which adds a blank line after
> each
> group of identical lines - this can be used down-stream to process
> groups in any way you want.

But there is no way to have it remove the repeated lines also, correct?

By down-stream process, I feel like you are leaving it up to the down-
stream to remove the duplicate lines as well as add the "repeated %s
times" messages.  Is that correct?

If so, uniq really adds no value.  The down-stream might as well just
do the adjacent line comparison also in such a case.

> And with counting:
> 
> $ cat in | uniq --group=append \
>       | awk 'BEGIN { c = 0 } ;
>              $0=="" { print "Group has " c " lines" ; c=0 ; next } ;
>              1 { print ; c++ }'
>    first line
>    Group has 1 lines
>    second line
>    Group has 1 lines
>    repeated line
>    repeated line
>    repeated line
>    repeated line
>    repeated line
>    Group has 5 lines
>    third line
>    Group has 1 lines

This still doesn't really achieve the original stated goal as the
repeated lines are not being replaced by your "Group has %d lines".

I think once you add the repeated line suppression, you will see that
adding a simple adjacent line comparison and just not using uniq at all
is only slightly incrementally more in the down-stream (which is now
the main).

Cheers,
b.

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#44704; Package coreutils. (Tue, 17 Nov 2020 22:12:02 GMT) Full text and rfc822 format available.

Message #18 received at 44704 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Brian J. Murrell" <brian <at> interlinx.bc.ca>
Cc: 44704 <at> debbugs.gnu.org
Subject: Re: bug#44704: uniq: replace repeated lines with a message about how
 many repeated lines
Date: Tue, 17 Nov 2020 14:10:56 -0800
On 11/17/20 5:32 AM, Brian J. Murrell wrote:
> [previous line repeated 4 times]

uniq -c already does something like that, though it outputs "5" instead of "4". 
Not sure it's worth gussying up 'uniq' to provide exactly the functionality 
requested, as output reformatting is easy enough to do yourself using awk or 
Python or whatever.




Information forwarded to bug-coreutils <at> gnu.org:
bug#44704; Package coreutils. (Tue, 17 Nov 2020 22:19:02 GMT) Full text and rfc822 format available.

Message #21 received at 44704 <at> debbugs.gnu.org (full text, mbox):

From: "Brian J. Murrell" <brian <at> interlinx.bc.ca>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 44704 <at> debbugs.gnu.org
Subject: Re: bug#44704: uniq: replace repeated lines with a message about
 how many repeated lines
Date: Tue, 17 Nov 2020 17:18:07 -0500
[Message part 1 (text/plain, inline)]
On Tue, 2020-11-17 at 14:10 -0800, Paul Eggert wrote:
> On 11/17/20 5:32 AM, Brian J. Murrell wrote:
>  > [previous line repeated 4 times]
> 
> uniq -c already does something like that, though it outputs "5"
> instead of "4". 

Right.  I had considered that.  Something like:

$ cat /tmp/in | uniq -c | while read c line; do
> echo $line
> if [ $c -gt 1 ]; then
> echo "Last line repeated $((c-1)) times"
> fi
> done

But that eats leading whitespace on $line.

> Not sure it's worth gussying up 'uniq' to provide exactly the
> functionality 
> requested, as output reformatting is easy enough to do yourself using
> awk or 
> Python or whatever.

Right.  But if I were going to pull out such a big hammer, I'd just
again, eliminate uniq and do everything in awk or Python or whatever.

Anyway, it was just a suggestion.  Doesn't seem like it will go much of
anywhere.  That's fine.  If it really itched me enough, I guess I'd
just submit a patch.

Cheers,
b.

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#44704; Package coreutils. (Wed, 18 Nov 2020 11:26:02 GMT) Full text and rfc822 format available.

Message #24 received at 44704 <at> debbugs.gnu.org (full text, mbox):

From: Chris Elvidge <celvidge001 <at> gmail.com>
To: 44704 <at> debbugs.gnu.org
Cc: "Brian J. Murrell" <brian <at> interlinx.bc.ca>
Subject: Re: bug#44704: uniq: replace repeated lines with a message about how
 many repeated lines
Date: Wed, 18 Nov 2020 11:25:11 +0000
On 17/11/2020 01:32 pm, Brian J. Murrell wrote:
> It would be a useful enhancement to uniq to replace all lines
> considered non-uniq (i.e. those that would be removed from the output)
> with a message about how many times the previous line was repeated.
> 
> I.e.
> 
> $ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
> first line
> second line
> repeated line
> repeated line
> repeated line
> repeated line
> repeated line
> third line
> EOF
> first line
> second line
> repeated line
> [previous line repeated 4 times]
> third
> line
> 
> Cheers,
> b.
> 
> 

You could write your own function to do it. E.g.

unique() {
[ "$1" ] || { echo "Needs a readable file to test" && return 1; }
[ -r "$1" ] || { echo "Needs a readable file to test" && return 1; }
R=""; N=0
while IFS=$'\n' read L; do
[ "$L" = "$R" ] && { ((N++)); continue; }
[ "$N" -gt 0 ] && { echo "[Previous line repeated $N times]"; N=0; }
R="$L"
echo "$L"
done <$1
}


-- 

Chris Elvidge





Information forwarded to bug-coreutils <at> gnu.org:
bug#44704; Package coreutils. (Thu, 19 Nov 2020 00:24:02 GMT) Full text and rfc822 format available.

Message #27 received at 44704 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Chris Elvidge <celvidge001 <at> gmail.com>, 44704 <at> debbugs.gnu.org
Cc: "Brian J. Murrell" <brian <at> interlinx.bc.ca>
Subject: Re: bug#44704: uniq: replace repeated lines with a message about how
 many repeated lines
Date: Thu, 19 Nov 2020 01:22:42 +0100
On 11/18/20 12:25 PM, Chris Elvidge wrote:
> You could write your own function to do it. E.g.
> 
> unique() {
> [ "$1" ] || { echo "Needs a readable file to test" && return 1; }
> [ -r "$1" ] || { echo "Needs a readable file to test" && return 1; }
> R=""; N=0
> while IFS=$'\n' read L; do
> [ "$L" = "$R" ] && { ((N++)); continue; }
> [ "$N" -gt 0 ] && { echo "[Previous line repeated $N times]"; N=0; }
> R="$L"
> echo "$L"
> done <$1
> }

Nice.

The UNIX toolbox is diverse. ;-)
I'd use:

awk '
  function p(n) {
    if (n > 1) {
      printf("[previous line repeated %d times]\n", n-1);
    }
  }
  {
    if (line != $0) {
      p(n);
      n = 0;
    }
    line = $0;
    if (n == 0)
      print
    n++;
  }
  END { p(n); }
'

Have a nice day,
Berny




This bug report was last modified 4 years and 272 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.