GNU bug report logs - #19533
comm does not detect common lines -- Mac OS X 10.9.5

Previous Next

Package: coreutils;

Reported by: Ali Khanafer <ali.khanafer <at> gmail.com>

Date: Wed, 7 Jan 2015 21:36:02 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 19533 in the body.
You can then email your comments to 19533 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#19533; Package coreutils. (Wed, 07 Jan 2015 21:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ali Khanafer <ali.khanafer <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 07 Jan 2015 21:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ali Khanafer <ali.khanafer <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: comm does not detect common lines -- Mac OS X 10.9.5
Date: Wed, 7 Jan 2015 16:23:25 -0500
[Message part 1 (text/plain, inline)]
Hello,

Thanks for this amazing tool.

I tried comm on test1.txt and test2.txt. The output I got is in
comm-test.txt. Comm found 11 common lines and missed 6 other lines.

Could you please explain why this is happening?

Thank you in advance.

Best,
Ali
[Message part 2 (text/html, inline)]
[comm-test (application/octet-stream, attachment)]
[test2 (application/octet-stream, attachment)]
[test1 (application/octet-stream, attachment)]

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Wed, 07 Jan 2015 22:13:01 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Wed, 07 Jan 2015 22:13:02 GMT) Full text and rfc822 format available.

Notification sent to Ali Khanafer <ali.khanafer <at> gmail.com>:
bug acknowledged by developer. (Wed, 07 Jan 2015 22:13:03 GMT) Full text and rfc822 format available.

Message #12 received at 19533-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Ali Khanafer <ali.khanafer <at> gmail.com>, 19533-done <at> debbugs.gnu.org
Subject: Re: bug#19533: comm does not detect common lines -- Mac OS X 10.9.5
Date: Wed, 07 Jan 2015 15:12:26 -0700
[Message part 1 (text/plain, inline)]
tag 19533 notabug
thanks

On 01/07/2015 02:23 PM, Ali Khanafer wrote:
> Hello,
> 
> Thanks for this amazing tool.
> 
> I tried comm on test1.txt and test2.txt. The output I got is in
> comm-test.txt. Comm found 11 common lines and missed 6 other lines.
> 
> Could you please explain why this is happening?

Using a newer version of coreutils would tell you why:

$ comm test1 test2
		1266281
		11348282
		15431856
16264803
		17248121
		17384099
18911432
	20513956
		21436960
		21634600
		24129206
		33773592
		37710752
		44903491
comm: file 1 is not in sorted order
103652294
103865085
126302054
198494684
208442526
253536357
1002513128

	46959037
	51274038
comm: file 2 is not in sorted order
	103652294
	103865085
	126302054
	208442526
	253536357
	1002513128

Proper use of comm requires that you pre-sort both input files.  As
such, this is not a bug in comm, so I'm closing this bug.  However, feel
free to add further comments or questions.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19533; Package coreutils. (Wed, 07 Jan 2015 23:00:03 GMT) Full text and rfc822 format available.

Message #15 received at 19533 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: 19533 <at> debbugs.gnu.org, ali.khanafer <at> gmail.com
Subject: Re: bug#19533: comm does not detect common lines -- Mac OS X 10.9.5
Date: Wed, 7 Jan 2015 15:59:18 -0700
Eric Blake wrote:
> Ali Khanafer wrote:
> > I tried comm on test1.txt and test2.txt. The output I got is in
> > comm-test.txt. Comm found 11 common lines and missed 6 other lines.
> > 
> > Could you please explain why this is happening?
> 
> Using a newer version of coreutils would tell you why:
> ...
> Proper use of comm requires that you pre-sort both input files.  As
> such, this is not a bug in comm, so I'm closing this bug.  However, feel
> free to add further comments or questions.

If you are using bash then a bash specific feature is useful.  You can
sort them on the fly.

  comm <(sort test1) <(sort test2)

Or perhaps forcing a sort locale.

  env LC_ALL=C comm <(sort test1) <(sort test2)

I included LC_ALL=C to force a specific sort order which may or may
not be appropriate for all of your use cases.

Bob




Information forwarded to bug-coreutils <at> gnu.org:
bug#19533; Package coreutils. (Thu, 08 Jan 2015 16:58:02 GMT) Full text and rfc822 format available.

Message #18 received at 19533 <at> debbugs.gnu.org (full text, mbox):

From: Ali Khanafer <ali.khanafer <at> gmail.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: 19533 <at> debbugs.gnu.org
Subject: Re: bug#19533: comm does not detect common lines -- Mac OS X 10.9.5
Date: Thu, 8 Jan 2015 11:56:51 -0500
[Message part 1 (text/plain, inline)]
Thanks Eric and Bob. I had sorted the files before calling comm, but I
think the problem is that I sorted them as numeric:

sort -n test1 -o test1

When I removed the "-n", which is equivalent to what Bob has done, comm
worked like a charm.

Sorry for rushing to file this as bug.

Cheers,
Ali

On Wed, Jan 7, 2015 at 5:59 PM, Bob Proulx <bob <at> proulx.com> wrote:

> Eric Blake wrote:
> > Ali Khanafer wrote:
> > > I tried comm on test1.txt and test2.txt. The output I got is in
> > > comm-test.txt. Comm found 11 common lines and missed 6 other lines.
> > >
> > > Could you please explain why this is happening?
> >
> > Using a newer version of coreutils would tell you why:
> > ...
> > Proper use of comm requires that you pre-sort both input files.  As
> > such, this is not a bug in comm, so I'm closing this bug.  However, feel
> > free to add further comments or questions.
>
> If you are using bash then a bash specific feature is useful.  You can
> sort them on the fly.
>
>   comm <(sort test1) <(sort test2)
>
> Or perhaps forcing a sort locale.
>
>   env LC_ALL=C comm <(sort test1) <(sort test2)
>
> I included LC_ALL=C to force a specific sort order which may or may
> not be appropriate for all of your use cases.
>
> Bob
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#19533; Package coreutils. (Thu, 08 Jan 2015 17:46:02 GMT) Full text and rfc822 format available.

Message #21 received at 19533 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Ali Khanafer <ali.khanafer <at> gmail.com>
Cc: 19533 <at> debbugs.gnu.org
Subject: Re: bug#19533: comm does not detect common lines -- Mac OS X 10.9.5
Date: Thu, 8 Jan 2015 10:45:41 -0700
Ali Khanafer wrote:
> Thanks Eric and Bob. I had sorted the files before calling comm, but I
> think the problem is that I sorted them as numeric:
> 
> sort -n test1 -o test1
> 
> When I removed the "-n", which is equivalent to what Bob has done, comm
> worked like a charm.

Yes that would cause the problem.  comm is a simple program from years
and years ago and expects things to be sorted simply.  Sort options in
the various programs have come up for discussion every so often.  But
so far things have continued as they are.  The biggest changes in this
area have been having the tools produce diagnostic information when
the input is not as they expect.  Check out the sort --debug option
for more useful diagnostics about sorting.

Glad things have been /sorted/ out! :-)

Bob





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 06 Feb 2015 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 141 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.