GNU bug report logs -
#6903
join: support numeric keys
Previous Next
To reply to this bug, email your comments to 6903 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6903
; Package
coreutils
.
(Tue, 24 Aug 2010 19:57:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Bernhard Schiffner <bernhard <at> schiffner-limbach.de>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Tue, 24 Aug 2010 19:57:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hi,
having to work with lists containing lage numbers (i.e. filesizes greater 2GB)
I have problems.
sort -n works
join dosen't do as a "newcomer" expects.
Because join uses strtoul() before doing comparisation it is understandable.
("unpairable" is the result.)
Do you see a chance to extend join with a -n parameter for numeric
comparisation as sort has already?
In the source / manpage join claims about combinations with sort. Why not
expand this? :-)
TIA!
Bernhard
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6903
; Package
coreutils
.
(Tue, 24 Aug 2010 21:23:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 6903 <at> debbugs.gnu.org (full text, mbox):
On 08/24/2010 12:39 PM, Bernhard Schiffner wrote:
> Because join uses strtoul() before doing comparisation it is understandable.
> ("unpairable" is the result.)
No, join doesn't use strtoul. It compares the numbers as strings.
So if you use plain "sort" on the numbers, join will work, unless the
numbers are numerically equal but textually different (e.g., 0 versus -0).
You can then sort the output of join with "sort -n", if you wish.
> Do you see a chance to extend join with a -n parameter for numeric
> comparisation as sort has already?
That would be a nice thing to add, if someone had the time to do it.
Generally speaking, any comparison that "sort" can do, "join" should
do too (except for random comparison I suppose).
The comparison code between sort and join should be shared, of course.
Can you write that?
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6903
; Package
coreutils
.
(Wed, 25 Aug 2010 06:56:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 6903 <at> debbugs.gnu.org (full text, mbox):
Am Dienstag, 24. August 2010, 23:23:55 schrieb Paul Eggert:
> On 08/24/2010 12:39 PM, Bernhard Schiffner wrote:
> > Because join uses strtoul() before doing comparisation it is
> > understandable. ("unpairable" is the result.)
>
> No, join doesn't use strtoul.
I was wrong (It is the number of the field to join.)
> It compares the numbers as strings.
> So if you use plain "sort" on the numbers, join will work, unless the
> numbers are numerically equal but textually different (e.g., 0 versus -0).
Not a problem for me.
> You can then sort the output of join with "sort -n", if you wish.
A small testcase is included here.
Do
join a b
and try to understand, why the lines with
214618118 /temp/marketing_ms/emails.dat
214618118 /temp/bs/marketing_ms/emails.dat
are not in the result.
Do you see any reason?
Perhaps I'am missusing join here a litte bit, but until now I don't
understand, why it should be wrong.
Before I'am going to blame someone else, I'll try to dig a little bit deeper
too.
TIA!
Bernhard
File a:
21460 /ElsevierDocuments/EWX0886A/09218181/00220001/99000417/main.raw
21460 /ElsevierDocuments/EWX0889A/00319201/01200001/00001461/main.raw
21464 /apache/xerces/dom/DeferredAttrNSImpl.html
21466 /spam/1206882672_000701c89267_03453ee8_21fcd5a0 <at> jlsvsf
21467 /MINING/MIN0002A/03605442/00230009/98000218/main.raw
21468 /___MRA/___sophos_autoupdate1.dir/1207625107/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1208238697/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1208834890/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209153877/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209404409/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209710971/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1209737271/encloa-b.ide
21468 /___MRA/___sophos_autoupdate1.dir/1214978929/encloa-b.ide
21469 /ElsevierDocuments/EWX0886A/09218181/00370003/02001996/main.raw
21469 /ElsevierDocuments/EWX0890A/00335894/00660002/06000846/main.xml
21469 /ElsevierDocuments/MINING/MIN0001A/01968904/00420007/00000911/main.raw
214602 /ElsevierDocuments/EWX0876A/00370738/01710001/04002477/main.xml
214604 /ElsevierDocuments/EWX0881A/00128252/00700001/04001333/main.xml
214614 /ElsevierDocuments/EWX0887A/02773791/00240020/05000223/main.xml
214666 /ElsevierDocuments/EWX0886A/09218181/00600003/07000240/main.xml
214682 /ElsevierDocuments/EWX0879A/0012821X/02430003/06000367/main.xml
2146369 /marketing/diffferent_Berichtsband_Online_Crossmedia_Kampagnen.pdf
2146427 /LBAtoJM/ROOT/WEB-INF/lib/hibernate-3.2.0.cr3.jar
214618118 /temp/marketing_ms/emails.dat
214618118 /temp/bs/marketing_ms/emails.dat
214618120 /temp/marketing_js/emails.dat
File b:
21460
21468
21469
214618118
215777777
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6903
; Package
coreutils
.
(Wed, 25 Aug 2010 16:22:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 6903 <at> debbugs.gnu.org (full text, mbox):
On 08/24/2010 11:57 PM, Bernhard Schiffner wrote:
> 2146427 /LBAtoJM/ROOT/WEB-INF/lib/hibernate-3.2.0.cr3.jar
> 214618118 /temp/marketing_ms/emails.dat
That won't work, because the two lines are not sorted correctly.
Recall that join uses lexicographic comparison, not numeric.
Its input must be sorted lexicographically.
You can sort its output numerically later, if you prefer numeric
order.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6903
; Package
coreutils
.
(Thu, 26 Aug 2010 19:08:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 6903 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Am Mittwoch, 25. August 2010, 18:22:13 schrieb Paul Eggert:
> On 08/24/2010 11:57 PM, Bernhard Schiffner wrote:
> > 2146427 /LBAtoJM/ROOT/WEB-INF/lib/hibernate-3.2.0.cr3.jar
> > 214618118 /temp/marketing_ms/emails.dat
>
> That won't work, because the two lines are not sorted correctly.
> Recall that join uses lexicographic comparison, not numeric.
> Its input must be sorted lexicographically.
Ok.
I solved my problem using the attached patch.
The patch shows that it is possible to use different sortings for keys
(joinfield) in join.
I integrated some / most of the code from sort.c verbaly in order to see
what's needed to compile it successfully in join.c .
I did no tests beside my special usecase mentioned earlier.
It's clear that a user-friendly key-selection needs a lot more work. Same is
about a unified version of join and sort.
Thanks to Paul and Christian Perle for their valueable help so far.
The FSF can make any use of the code here.
It was theirs already before ;-)
Bernhard
[join_proposal_2.diff (text/x-patch, attachment)]
Severity set to 'wishlist' from 'normal'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 30 Oct 2018 05:04:01 GMT)
Full text and
rfc822 format available.
Changed bug title to 'join: support numeric keys' from 'join: improve paralleles to sort?'
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 30 Oct 2018 05:04:01 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 319 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.