GNU bug report logs -
#44704
uniq: replace repeated lines with a message about how many repeated lines
Previous Next
To reply to this bug, email your comments to 44704 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#44704
; Package
coreutils
.
(Tue, 17 Nov 2020 14:14:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Brian J. Murrell" <brian <at> interlinx.bc.ca>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Tue, 17 Nov 2020 14:14:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
It would be a useful enhancement to uniq to replace all lines
considered non-uniq (i.e. those that would be removed from the output)
with a message about how many times the previous line was repeated.
I.e.
$ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
first line
second line
repeated line
repeated line
repeated line
repeated line
repeated line
third line
EOF
first line
second line
repeated line
[previous line repeated 4 times]
third
line
Cheers,
b.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#44704
; Package
coreutils
.
(Tue, 17 Nov 2020 15:06:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 44704 <at> debbugs.gnu.org (full text, mbox):
tag 44704 notabug
severity 44704 wishlist
stop
Hello,
On 2020-11-17 6:32 a.m., Brian J. Murrell wrote:
> It would be a useful enhancement to uniq to replace all lines
> considered non-uniq (i.e. those that would be removed from the output)
> with a message about how many times the previous line was repeated.
>
> I.e.
>
> $ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
[...]
uniq supports the "--group" option, which adds a blank line after each
group of identical lines - this can be used down-stream to process
groups in any way you want.
Example:
$ cat <<EOF > in
first line
second line
repeated line
repeated line
repeated line
repeated line
repeated line
third line
EOF
$ cat in | uniq --group=append
first line
second line
repeated line
repeated line
repeated line
repeated line
repeated line
third line
$ cat in | uniq --group=append \
| awk '$0=="" { print "do something after group" ; next } ;
1 { print }'
first line
do something after group
second line
do something after group
repeated line
repeated line
repeated line
repeated line
repeated line
do something after group
third line
do something after group
And with counting:
$ cat in | uniq --group=append \
| awk 'BEGIN { c = 0 } ;
$0=="" { print "Group has " c " lines" ; c=0 ; next } ;
1 { print ; c++ }'
first line
Group has 1 lines
second line
Group has 1 lines
repeated line
repeated line
repeated line
repeated line
repeated line
Group has 5 lines
third line
Group has 1 lines
Hope this helps.
More information about "uniq --group=X" is here:
https://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html
I'm marking this as "notabug/wishlist", but will likely close soon as
"wontfix" unless we come up with convincing argument why "--group"
is not sufficient for your use case.
Regardless of the status, discussion can continue by replying to this
thread.
regards,
- assaf
Added tag(s) notabug.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 17 Nov 2020 15:06:03 GMT)
Full text and
rfc822 format available.
Severity set to 'wishlist' from 'normal'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 17 Nov 2020 15:06:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#44704
; Package
coreutils
.
(Tue, 17 Nov 2020 15:29:01 GMT)
Full text and
rfc822 format available.
Message #15 received at 44704 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Tue, 2020-11-17 at 08:05 -0700, Assaf Gordon wrote:
>
> Hello,
Hi,
> uniq supports the "--group" option, which adds a blank line after
> each
> group of identical lines - this can be used down-stream to process
> groups in any way you want.
But there is no way to have it remove the repeated lines also, correct?
By down-stream process, I feel like you are leaving it up to the down-
stream to remove the duplicate lines as well as add the "repeated %s
times" messages. Is that correct?
If so, uniq really adds no value. The down-stream might as well just
do the adjacent line comparison also in such a case.
> And with counting:
>
> $ cat in | uniq --group=append \
> | awk 'BEGIN { c = 0 } ;
> $0=="" { print "Group has " c " lines" ; c=0 ; next } ;
> 1 { print ; c++ }'
> first line
> Group has 1 lines
> second line
> Group has 1 lines
> repeated line
> repeated line
> repeated line
> repeated line
> repeated line
> Group has 5 lines
> third line
> Group has 1 lines
This still doesn't really achieve the original stated goal as the
repeated lines are not being replaced by your "Group has %d lines".
I think once you add the repeated line suppression, you will see that
adding a simple adjacent line comparison and just not using uniq at all
is only slightly incrementally more in the down-stream (which is now
the main).
Cheers,
b.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#44704
; Package
coreutils
.
(Tue, 17 Nov 2020 22:12:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 44704 <at> debbugs.gnu.org (full text, mbox):
On 11/17/20 5:32 AM, Brian J. Murrell wrote:
> [previous line repeated 4 times]
uniq -c already does something like that, though it outputs "5" instead of "4".
Not sure it's worth gussying up 'uniq' to provide exactly the functionality
requested, as output reformatting is easy enough to do yourself using awk or
Python or whatever.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#44704
; Package
coreutils
.
(Tue, 17 Nov 2020 22:19:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 44704 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Tue, 2020-11-17 at 14:10 -0800, Paul Eggert wrote:
> On 11/17/20 5:32 AM, Brian J. Murrell wrote:
> > [previous line repeated 4 times]
>
> uniq -c already does something like that, though it outputs "5"
> instead of "4".
Right. I had considered that. Something like:
$ cat /tmp/in | uniq -c | while read c line; do
> echo $line
> if [ $c -gt 1 ]; then
> echo "Last line repeated $((c-1)) times"
> fi
> done
But that eats leading whitespace on $line.
> Not sure it's worth gussying up 'uniq' to provide exactly the
> functionality
> requested, as output reformatting is easy enough to do yourself using
> awk or
> Python or whatever.
Right. But if I were going to pull out such a big hammer, I'd just
again, eliminate uniq and do everything in awk or Python or whatever.
Anyway, it was just a suggestion. Doesn't seem like it will go much of
anywhere. That's fine. If it really itched me enough, I guess I'd
just submit a patch.
Cheers,
b.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#44704
; Package
coreutils
.
(Wed, 18 Nov 2020 11:26:02 GMT)
Full text and
rfc822 format available.
Message #24 received at 44704 <at> debbugs.gnu.org (full text, mbox):
On 17/11/2020 01:32 pm, Brian J. Murrell wrote:
> It would be a useful enhancement to uniq to replace all lines
> considered non-uniq (i.e. those that would be removed from the output)
> with a message about how many times the previous line was repeated.
>
> I.e.
>
> $ cat <<EOF | uniq --replace-with-message '[previous line repeated %d times]'
> first line
> second line
> repeated line
> repeated line
> repeated line
> repeated line
> repeated line
> third line
> EOF
> first line
> second line
> repeated line
> [previous line repeated 4 times]
> third
> line
>
> Cheers,
> b.
>
>
You could write your own function to do it. E.g.
unique() {
[ "$1" ] || { echo "Needs a readable file to test" && return 1; }
[ -r "$1" ] || { echo "Needs a readable file to test" && return 1; }
R=""; N=0
while IFS=$'\n' read L; do
[ "$L" = "$R" ] && { ((N++)); continue; }
[ "$N" -gt 0 ] && { echo "[Previous line repeated $N times]"; N=0; }
R="$L"
echo "$L"
done <$1
}
--
Chris Elvidge
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#44704
; Package
coreutils
.
(Thu, 19 Nov 2020 00:24:02 GMT)
Full text and
rfc822 format available.
Message #27 received at 44704 <at> debbugs.gnu.org (full text, mbox):
On 11/18/20 12:25 PM, Chris Elvidge wrote:
> You could write your own function to do it. E.g.
>
> unique() {
> [ "$1" ] || { echo "Needs a readable file to test" && return 1; }
> [ -r "$1" ] || { echo "Needs a readable file to test" && return 1; }
> R=""; N=0
> while IFS=$'\n' read L; do
> [ "$L" = "$R" ] && { ((N++)); continue; }
> [ "$N" -gt 0 ] && { echo "[Previous line repeated $N times]"; N=0; }
> R="$L"
> echo "$L"
> done <$1
> }
Nice.
The UNIX toolbox is diverse. ;-)
I'd use:
awk '
function p(n) {
if (n > 1) {
printf("[previous line repeated %d times]\n", n-1);
}
}
{
if (line != $0) {
p(n);
n = 0;
}
line = $0;
if (n == 0)
print
n++;
}
END { p(n); }
'
Have a nice day,
Berny
This bug report was last modified 4 years and 272 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.