GNU bug report logs -
#32099
uniq: add hash-based implementation
Previous Next
Full log
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Thank you very much for maintaining coreutils!
It seems to me that it would be hard to
underestimate how useful it is.
The main reason I'm writing is to humbly suggest
a performance improvement to the "uniq" command.
It currently requires its input to be sorted.
Sorting generally takes about O(N*log(N)) time.
However, it seems to me that sorting is
unnecessary.
You can see this is so in the following example
$ echo -e "b\na\nb" | awk '!seen[$0]++'
It basically avoids sorting by using hashed
indexes into an associative array to find
previously seen values in about O(N) time.
I expect it would be about O(log(N)) times faster
than the sorting the uniq command currently
requires.
For big values of N, the time saved could be
substantial.
Humble suggestion.
Add an option to the "uniq" command.
Let it work quickly with unsorted data.
Maybe the new option could be something like
"-usi" for "unsorted input".
Thanks,
Kingsley
--
Time is the fire in which we all burn.
This bug report was last modified 6 years and 312 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.