GNU bug report logs -
#29396
Comm bug verified
Previous Next
Reported by: Saint Michael <venefax <at> gmail.com>
Date: Wed, 22 Nov 2017 14:17:01 UTC
Severity: normal
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29396 in the body.
You can then email your comments to 29396 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#29396
; Package
coreutils
.
(Wed, 22 Nov 2017 14:17:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Saint Michael <venefax <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 22 Nov 2017 14:17:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Dear Maintainers
I guess the names are Richard M. Stallman and David MacKenzie.
I found a vierifiable bug in the utilitu comm. This is very important
because hundreds maybe thousands of application rely on this app to make
crucial decisions, in power plants, banks, etc. We nedd to trust it.
I have two files with phone numbers, one column, sorted (they pass the test
sort -c). One is large and the other one is small. The comm -12
--check-order file1.csv file2.csv falis to find matches, but another
utility, join file1.csv file2.csv. does find a lot of matches.
The box is Centos 7 and
*comm --versioncomm (GNU coreutils) 8.22*
The only special think about my box is
export LC_ALL=C
please contact me to send you a zip file with the two files, or for
security, plain ascii files, or maybe you can login to our lab box and
execute the commands.
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29396
; Package
coreutils
.
(Wed, 22 Nov 2017 16:49:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 29396 <at> debbugs.gnu.org (full text, mbox):
Hello,
On 2017-11-22 07:15 AM, Saint Michael wrote:
> I have two files with phone numbers, one column, sorted (they pass the test
> sort -c). One is large and the other one is small. The comm -12
> --check-order file1.csv file2.csv falis to find matches, but another
> utility, join file1.csv file2.csv. does find a lot of matches.
To help us better diagnose the issue, please provide more details about
the output your are seeing:
1. The "sort" command used to sort the files
2. The "join" command used to join the files
3. A small excerpt of the input files with which we can reproduce the error
4. The output of the commands
regards,
- assaf
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#29396
; Package
coreutils
.
(Wed, 22 Nov 2017 18:40:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 29396 <at> debbugs.gnu.org (full text, mbox):
tag 29396 notabug
close 29396
thanks
(based on reproducible example provided privately)
Hello,
On 2017-11-22 09:48 AM, Assaf Gordon wrote:
> On 2017-11-22 07:15 AM, Saint Michael wrote:
>> I have two files with phone numbers, one column, sorted (they pass the
>> test
>> sort -c). One is large and the other one is small. The comm -12
>> --check-order file1.csv file2.csv falis to find matches, but another
>> utility, join file1.csv file2.csv. does find a lot of matches.
This is not a bug in comm, but simply incorrect usage.
The file "file2.csv" (provided privately) contained a space character
after each number.
"comm" compares entire lines, and spaces do matter.
"join" compares fields, and trailing spaces field do not matter.
A simple reproducer:
$ seq 5 > a
$ echo "4 " > b
$ join a b
4
$ comm -12 a b
[ ... no output ... ]
To remove the trailing spaces on the file, try:
$ sed 's/ *$//' file2.csv > file2-no-space.csv
$ comm -12 file1.csv file2-no-space.csv | wc -l
864
$ join file1.csv file2.csv | wc -l
864
regards,
- assaf
Reply sent
to
Assaf Gordon <assafgordon <at> gmail.com>
:
You have taken responsibility.
(Thu, 23 Nov 2017 17:51:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Saint Michael <venefax <at> gmail.com>
:
bug acknowledged by developer.
(Thu, 23 Nov 2017 17:51:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 29396-done <at> debbugs.gnu.org (full text, mbox):
(re-adding the mailing list)
Hello,
On 2017-11-22 12:39 PM, Saint Michael wrote:
> Thanks for the explanation
> Is there a place to download the source code?
> I want to compile a version that ignore spaces on the right, since 99%
> of the time I forget to check the line endings.
> or maybe you could consider a switch to the tool that would make it
> ignore traing spaces when comparing lines.
The source code for GNU coreutils is available here:
https://git.savannah.gnu.org/cgit/coreutils.git
However,
I would recommend the 'sed' method below, as a much simpler
way to remove trailing spaces (instead of maintaining a custom modified
coreutils binaries).
> To remove the trailing spaces on the file, try:
>
> $ sed 's/ *$//' file2.csv > file2-no-space.csv
>
> $ comm -12 file1.csv file2-no-space.csv | wc -l
> 864
>
> $ join file1.csv file2.csv | wc -l
> 864
regards,
- assaf
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 22 Dec 2017 12:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 7 years and 235 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.