GNU bug report logs - #6903
join: support numeric keys

Previous Next

Package: coreutils;

Reported by: Bernhard Schiffner <bernhard <at> schiffner-limbach.de>

Date: Tue, 24 Aug 2010 19:57:01 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Bernhard Schiffner <bernhard <at> schiffner-limbach.de>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 6903 <at> debbugs.gnu.org
Subject: bug#6903: join: improve paralleles to sort?
Date: Wed, 25 Aug 2010 08:57:21 +0200
Am Dienstag, 24. August 2010, 23:23:55 schrieb Paul Eggert:
> On 08/24/2010 12:39 PM, Bernhard Schiffner wrote:
> > Because join uses strtoul() before doing comparisation it is
> > understandable. ("unpairable" is the result.)
> 
> No, join doesn't use strtoul. 
I was wrong (It is the number of the field to join.)

> It compares the numbers as strings.
> So if you use plain "sort" on the numbers, join will work, unless the
> numbers are numerically equal but textually different (e.g., 0 versus -0).
Not a problem for me.
> You can then sort the output of join with "sort -n", if you wish.

A small testcase is included here.
Do
join a  b
and try to understand, why the lines with
214618118	/temp/marketing_ms/emails.dat
214618118	/temp/bs/marketing_ms/emails.dat
are not in the result.
Do you see any reason?

Perhaps I'am missusing join here a litte bit, but until now I don't 
understand, why it should be wrong.
Before I'am going to blame someone else, I'll try to dig a little bit deeper 
too.

TIA!

Bernhard

File a:
21460	/ElsevierDocuments/EWX0886A/09218181/00220001/99000417/main.raw
21460	/ElsevierDocuments/EWX0889A/00319201/01200001/00001461/main.raw
21464	/apache/xerces/dom/DeferredAttrNSImpl.html
21466	/spam/1206882672_000701c89267_03453ee8_21fcd5a0 <at> jlsvsf
21467	/MINING/MIN0002A/03605442/00230009/98000218/main.raw
21468	/___MRA/___sophos_autoupdate1.dir/1207625107/encloa-b.ide
21468	/___MRA/___sophos_autoupdate1.dir/1208238697/encloa-b.ide
21468	/___MRA/___sophos_autoupdate1.dir/1208834890/encloa-b.ide
21468	/___MRA/___sophos_autoupdate1.dir/1209153877/encloa-b.ide
21468	/___MRA/___sophos_autoupdate1.dir/1209404409/encloa-b.ide
21468	/___MRA/___sophos_autoupdate1.dir/1209710971/encloa-b.ide
21468	/___MRA/___sophos_autoupdate1.dir/1209737271/encloa-b.ide
21468	/___MRA/___sophos_autoupdate1.dir/1214978929/encloa-b.ide
21469	/ElsevierDocuments/EWX0886A/09218181/00370003/02001996/main.raw
21469	/ElsevierDocuments/EWX0890A/00335894/00660002/06000846/main.xml
21469	/ElsevierDocuments/MINING/MIN0001A/01968904/00420007/00000911/main.raw
214602	/ElsevierDocuments/EWX0876A/00370738/01710001/04002477/main.xml
214604	/ElsevierDocuments/EWX0881A/00128252/00700001/04001333/main.xml
214614	/ElsevierDocuments/EWX0887A/02773791/00240020/05000223/main.xml
214666	/ElsevierDocuments/EWX0886A/09218181/00600003/07000240/main.xml
214682	/ElsevierDocuments/EWX0879A/0012821X/02430003/06000367/main.xml
2146369	/marketing/diffferent_Berichtsband_Online_Crossmedia_Kampagnen.pdf
2146427	/LBAtoJM/ROOT/WEB-INF/lib/hibernate-3.2.0.cr3.jar
214618118	/temp/marketing_ms/emails.dat
214618118	/temp/bs/marketing_ms/emails.dat
214618120	/temp/marketing_js/emails.dat

File b:
21460
21468
21469
214618118
215777777





This bug report was last modified 6 years and 320 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.