GNU bug report logs -
#19533
comm does not detect common lines -- Mac OS X 10.9.5
Previous Next
Reported by: Ali Khanafer <ali.khanafer <at> gmail.com>
Date: Wed, 7 Jan 2015 21:36:02 UTC
Severity: normal
Tags: notabug
Done: Eric Blake <eblake <at> redhat.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19533 in the body.
You can then email your comments to 19533 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#19533
; Package
coreutils
.
(Wed, 07 Jan 2015 21:36:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ali Khanafer <ali.khanafer <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 07 Jan 2015 21:36:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
Thanks for this amazing tool.
I tried comm on test1.txt and test2.txt. The output I got is in
comm-test.txt. Comm found 11 common lines and missed 6 other lines.
Could you please explain why this is happening?
Thank you in advance.
Best,
Ali
[Message part 2 (text/html, inline)]
[comm-test (application/octet-stream, attachment)]
[test2 (application/octet-stream, attachment)]
[test1 (application/octet-stream, attachment)]
Added tag(s) notabug.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Wed, 07 Jan 2015 22:13:01 GMT)
Full text and
rfc822 format available.
Reply sent
to
Eric Blake <eblake <at> redhat.com>
:
You have taken responsibility.
(Wed, 07 Jan 2015 22:13:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Ali Khanafer <ali.khanafer <at> gmail.com>
:
bug acknowledged by developer.
(Wed, 07 Jan 2015 22:13:03 GMT)
Full text and
rfc822 format available.
Message #12 received at 19533-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
tag 19533 notabug
thanks
On 01/07/2015 02:23 PM, Ali Khanafer wrote:
> Hello,
>
> Thanks for this amazing tool.
>
> I tried comm on test1.txt and test2.txt. The output I got is in
> comm-test.txt. Comm found 11 common lines and missed 6 other lines.
>
> Could you please explain why this is happening?
Using a newer version of coreutils would tell you why:
$ comm test1 test2
1266281
11348282
15431856
16264803
17248121
17384099
18911432
20513956
21436960
21634600
24129206
33773592
37710752
44903491
comm: file 1 is not in sorted order
103652294
103865085
126302054
198494684
208442526
253536357
1002513128
46959037
51274038
comm: file 2 is not in sorted order
103652294
103865085
126302054
208442526
253536357
1002513128
Proper use of comm requires that you pre-sort both input files. As
such, this is not a bug in comm, so I'm closing this bug. However, feel
free to add further comments or questions.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#19533
; Package
coreutils
.
(Wed, 07 Jan 2015 23:00:03 GMT)
Full text and
rfc822 format available.
Message #15 received at 19533 <at> debbugs.gnu.org (full text, mbox):
Eric Blake wrote:
> Ali Khanafer wrote:
> > I tried comm on test1.txt and test2.txt. The output I got is in
> > comm-test.txt. Comm found 11 common lines and missed 6 other lines.
> >
> > Could you please explain why this is happening?
>
> Using a newer version of coreutils would tell you why:
> ...
> Proper use of comm requires that you pre-sort both input files. As
> such, this is not a bug in comm, so I'm closing this bug. However, feel
> free to add further comments or questions.
If you are using bash then a bash specific feature is useful. You can
sort them on the fly.
comm <(sort test1) <(sort test2)
Or perhaps forcing a sort locale.
env LC_ALL=C comm <(sort test1) <(sort test2)
I included LC_ALL=C to force a specific sort order which may or may
not be appropriate for all of your use cases.
Bob
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#19533
; Package
coreutils
.
(Thu, 08 Jan 2015 16:58:02 GMT)
Full text and
rfc822 format available.
Message #18 received at 19533 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thanks Eric and Bob. I had sorted the files before calling comm, but I
think the problem is that I sorted them as numeric:
sort -n test1 -o test1
When I removed the "-n", which is equivalent to what Bob has done, comm
worked like a charm.
Sorry for rushing to file this as bug.
Cheers,
Ali
On Wed, Jan 7, 2015 at 5:59 PM, Bob Proulx <bob <at> proulx.com> wrote:
> Eric Blake wrote:
> > Ali Khanafer wrote:
> > > I tried comm on test1.txt and test2.txt. The output I got is in
> > > comm-test.txt. Comm found 11 common lines and missed 6 other lines.
> > >
> > > Could you please explain why this is happening?
> >
> > Using a newer version of coreutils would tell you why:
> > ...
> > Proper use of comm requires that you pre-sort both input files. As
> > such, this is not a bug in comm, so I'm closing this bug. However, feel
> > free to add further comments or questions.
>
> If you are using bash then a bash specific feature is useful. You can
> sort them on the fly.
>
> comm <(sort test1) <(sort test2)
>
> Or perhaps forcing a sort locale.
>
> env LC_ALL=C comm <(sort test1) <(sort test2)
>
> I included LC_ALL=C to force a specific sort order which may or may
> not be appropriate for all of your use cases.
>
> Bob
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#19533
; Package
coreutils
.
(Thu, 08 Jan 2015 17:46:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 19533 <at> debbugs.gnu.org (full text, mbox):
Ali Khanafer wrote:
> Thanks Eric and Bob. I had sorted the files before calling comm, but I
> think the problem is that I sorted them as numeric:
>
> sort -n test1 -o test1
>
> When I removed the "-n", which is equivalent to what Bob has done, comm
> worked like a charm.
Yes that would cause the problem. comm is a simple program from years
and years ago and expects things to be sorted simply. Sort options in
the various programs have come up for discussion every so often. But
so far things have continued as they are. The biggest changes in this
area have been having the tools produce diagnostic information when
the input is not as they expect. Check out the sort --debug option
for more useful diagnostics about sorting.
Glad things have been /sorted/ out! :-)
Bob
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 06 Feb 2015 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 141 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.