GNU bug report logs - #29396
Comm bug verified

Previous Next

Package: coreutils;

Reported by: Saint Michael <venefax <at> gmail.com>

Date: Wed, 22 Nov 2017 14:17:01 UTC

Severity: normal

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29396 in the body.
You can then email your comments to 29396 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#29396; Package coreutils. (Wed, 22 Nov 2017 14:17:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Saint Michael <venefax <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 22 Nov 2017 14:17:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Saint Michael <venefax <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Comm bug verified
Date: Wed, 22 Nov 2017 09:15:56 -0500
[Message part 1 (text/plain, inline)]
Dear Maintainers
I guess the names are Richard M. Stallman and David MacKenzie.

I found a vierifiable bug in the utilitu comm. This is very important
because hundreds maybe thousands of application rely on this app to make
crucial decisions, in power plants, banks, etc. We nedd to trust it.

I have two files with phone numbers, one column, sorted (they pass the test
sort -c). One is large and the other one is small. The  comm -12
--check-order file1.csv file2.csv falis to find matches, but another
utility, join file1.csv file2.csv. does find a lot of matches.
The box is Centos 7 and


*comm --versioncomm (GNU coreutils) 8.22*

The only special think about my box is
export LC_ALL=C
please contact me to send you a zip file with the two files, or for
security, plain ascii files, or maybe you can login to our lab box and
execute the commands.
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#29396; Package coreutils. (Wed, 22 Nov 2017 16:49:02 GMT) Full text and rfc822 format available.

Message #8 received at 29396 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Saint Michael <venefax <at> gmail.com>, 29396 <at> debbugs.gnu.org
Subject: Re: bug#29396: Comm bug verified
Date: Wed, 22 Nov 2017 09:48:16 -0700
Hello,

On 2017-11-22 07:15 AM, Saint Michael wrote:
> I have two files with phone numbers, one column, sorted (they pass the test
> sort -c). One is large and the other one is small. The  comm -12
> --check-order file1.csv file2.csv falis to find matches, but another
> utility, join file1.csv file2.csv. does find a lot of matches.

To help us better diagnose the issue, please provide more details about 
the output your are seeing:

1. The "sort" command used to sort the files
2. The "join" command used to join the files
3. A small excerpt of the input files with which we can reproduce the error
4. The output of the commands


regards,
 - assaf







Information forwarded to bug-coreutils <at> gnu.org:
bug#29396; Package coreutils. (Wed, 22 Nov 2017 18:40:02 GMT) Full text and rfc822 format available.

Message #11 received at 29396 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Saint Michael <venefax <at> gmail.com>, 29396 <at> debbugs.gnu.org
Subject: Re: bug#29396: Comm bug verified
Date: Wed, 22 Nov 2017 11:39:37 -0700
tag 29396 notabug
close 29396
thanks

(based on reproducible example provided privately)

Hello,

On 2017-11-22 09:48 AM, Assaf Gordon wrote:
> On 2017-11-22 07:15 AM, Saint Michael wrote:
>> I have two files with phone numbers, one column, sorted (they pass the 
>> test
>> sort -c). One is large and the other one is small. The  comm -12
>> --check-order file1.csv file2.csv falis to find matches, but another
>> utility, join file1.csv file2.csv. does find a lot of matches.

This is not a bug in comm, but simply incorrect usage.

The file "file2.csv" (provided privately) contained a space character
after each number.

"comm" compares entire lines, and spaces do matter.
"join" compares fields, and trailing spaces field do not matter.

A simple reproducer:

    $ seq 5 > a
    $ echo "4 " > b

    $ join a b
    4

    $ comm -12 a b
    [ ... no output ... ]


To remove the trailing spaces on the file, try:

   $ sed 's/  *$//' file2.csv > file2-no-space.csv

   $ comm -12 file1.csv file2-no-space.csv  | wc -l
   864

   $ join file1.csv file2.csv | wc -l
   864

regards,
 - assaf





Reply sent to Assaf Gordon <assafgordon <at> gmail.com>:
You have taken responsibility. (Thu, 23 Nov 2017 17:51:01 GMT) Full text and rfc822 format available.

Notification sent to Saint Michael <venefax <at> gmail.com>:
bug acknowledged by developer. (Thu, 23 Nov 2017 17:51:02 GMT) Full text and rfc822 format available.

Message #16 received at 29396-done <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Saint Michael <venefax <at> gmail.com>, 29396-done <at> debbugs.gnu.org
Subject: Re: bug#29396: Comm bug verified
Date: Thu, 23 Nov 2017 10:50:05 -0700
(re-adding the mailing list)

Hello,

On 2017-11-22 12:39 PM, Saint Michael wrote:
> Thanks for the explanation
> Is there a place to download the source code?
> I want to compile a version that ignore spaces on the right, since 99% 
> of the time I forget to check the line endings.
> or maybe you could consider a switch to the tool that would make it 
> ignore traing spaces when comparing lines.

The source code for GNU coreutils is available here:
  https://git.savannah.gnu.org/cgit/coreutils.git

However,
I would recommend the 'sed' method below, as a much simpler
way to remove trailing spaces (instead of maintaining a custom modified
coreutils binaries).


>     To remove the trailing spaces on the file, try:
> 
>         $ sed 's/  *$//' file2.csv > file2-no-space.csv
> 
>         $ comm -12 file1.csv file2-no-space.csv  | wc -l
>         864
> 
>         $ join file1.csv file2.csv | wc -l
>         864


regards,
 - assaf





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 22 Dec 2017 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 235 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.