GNU bug report logs -
#19533
comm does not detect common lines -- Mac OS X 10.9.5
Previous Next
Reported by: Ali Khanafer <ali.khanafer <at> gmail.com>
Date: Wed, 7 Jan 2015 21:36:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Thanks Eric and Bob. I had sorted the files before calling comm, but I
think the problem is that I sorted them as numeric:
sort -n test1 -o test1
When I removed the "-n", which is equivalent to what Bob has done, comm
worked like a charm.
Sorry for rushing to file this as bug.
Cheers,
Ali
On Wed, Jan 7, 2015 at 5:59 PM, Bob Proulx <bob <at> proulx.com> wrote:
> Eric Blake wrote:
> > Ali Khanafer wrote:
> > > I tried comm on test1.txt and test2.txt. The output I got is in
> > > comm-test.txt. Comm found 11 common lines and missed 6 other lines.
> > >
> > > Could you please explain why this is happening?
> >
> > Using a newer version of coreutils would tell you why:
> > ...
> > Proper use of comm requires that you pre-sort both input files. As
> > such, this is not a bug in comm, so I'm closing this bug. However, feel
> > free to add further comments or questions.
>
> If you are using bash then a bash specific feature is useful. You can
> sort them on the fly.
>
> comm <(sort test1) <(sort test2)
>
> Or perhaps forcing a sort locale.
>
> env LC_ALL=C comm <(sort test1) <(sort test2)
>
> I included LC_ALL=C to force a specific sort order which may or may
> not be appropriate for all of your use cases.
>
> Bob
>
[Message part 2 (text/html, inline)]
This bug report was last modified 10 years and 141 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.