Package: coreutils;
Reported by: CDR <venefax <at> gmail.com>
Date: Mon, 12 Aug 2013 16:15:02 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Message #11 received at 15077 <at> debbugs.gnu.org (full text, mbox):
From: Assaf Gordon <assafgordon <at> gmail.com> To: CDR <venefax <at> gmail.com>, 15077 <at> debbugs.gnu.org Subject: Re: bug#15077: Clarification Date: Mon, 12 Aug 2013 18:04:34 -0600
Hello Federico Alves, On 08/12/2013 12:31 PM, CDR wrote: > I just found out that the "v" option does what I need. So in my opinion, > the "a" option is useless, for it gives you no new information. I'm glad to hear you found the combination of options that works for you. I would humbly disagree that the "-a" option is useless - it simply does something different than what you need. Especially when combined with output specifier ("-o") - the output of "join" is indeed not what you wanted. When used without "-o", the "-a1/2" options allow you see which keys are common to both files and which keys are just in one file. Example: The following "join" will show which lines are common (will have 9 fields) and which lines are only in the second file ("-a 2"): --- $ join -t, -1 1 -2 1 -a 2 today.txt yesterday.txt 2012067075,2013106025,6214,0,201,2019269533,6664,0,201 2012087388,8623689800,6214,2,201,2012320000,6006,0,201 2012088887,8623689800,6214,0,201,8624520081,6529,0,201 2012140209,2013700000,6006,0,201,9733360000,392A,0,201 2012204272,2019269533,6664,0,201 2012226151,2018209998,954F,0,201 2012299682,2018209998,954F,0,201 2012324322,9733360000,392A,0,201 2012334444,2017809469,6664,0,201 2012389608,2012320000,6006,0,201 --- If you don't care about the other fields, and just want to see the keys, using "-o 0,1.1,2.1" will give: ---- $ join -t, -1 1 -2 1 -a 2 -o 0,1.1,2.1 today.txt yesterday.txt 2012067075,2012067075,2012067075 2012087388,2012087388,2012087388 2012088887,2012088887,2012088887 2012140209,2012140209,2012140209 2012204272,,2012204272 2012226151,,2012226151 2012299682,,2012299682 2012324322,,2012324322 2012334444,,2012334444 2012389608,,2012389608 ---- Which again, quickly shows that lines with empty second field exist only in the second file. You can combine "-a 1" and "-a 2", to show all combination of items in both files: --- $ join -t, -1 1 -2 1 -a 1 -a 2 -o 0,1.1,2.1 today.txt yesterday.txt 2012054455,2012054455, 2012067075,2012067075,2012067075 2012087388,2012087388,2012087388 2012088887,2012088887,2012088887 2012120319,2012120319, 2012121177,2012121177, 2012122869,2012122869, 2012140209,2012140209,2012140209 2012143002,2012143002, 2012149116,2012149116, 2012204272,,2012204272 2012226151,,2012226151 2012299682,,2012299682 2012324322,,2012324322 2012334444,,2012334444 2012389608,,2012389608 --- In this example, all lines have three fields: First field is the combined key, and is always non-empty. Second field is non-empty if the key exists in the first file. Third field is non-empty if the key exists in the second file. (and thus, if both second and third fields are non empty, the key is common to both files). > In terms of new functionality, the "-o" option, format, should allow to add > arbitrary data, like ",A", "4", etc., in addition to the list of fields > (2.1 1.1 etc.) I would suggest using a different program (perhaps awk or sed), down-stream from the "join" program to add any additional information you need. Consider combining it with "-o auto" (new in join version 8.10) that will maintain the column ordering of the combined input files, and will allow you to easily add information. Example with "-a 1 -2" AND "-o auto": --- $ join -t, -1 1 -2 1 -a 1 -a 2 -o auto today.txt yesterday.txt 2012054455,8624520081,6529,0,201,,,, 2012067075,2013106025,6214,0,201,2019269533,6664,0,201 2012087388,8623689800,6214,2,201,2012320000,6006,0,201 2012088887,8623689800,6214,0,201,8624520081,6529,0,201 2012120319,9739789996,392A,0,201,,,, 2012121177,9739789996,392A,0,201,,,, 2012122869,2013700000,6006,0,201,,,, 2012140209,2013700000,6006,0,201,9733360000,392A,0,201 2012143002,2012339982,6529,0,201,,,, 2012149116,2012339982,6529,0,201,,,, 2012204272,,,,,2019269533,6664,0,201 2012226151,,,,,2018209998,954F,0,201 2012299682,,,,,2018209998,954F,0,201 2012324322,,,,,9733360000,392A,0,201 2012334444,,,,,2017809469,6664,0,201 2012389608,,,,,2012320000,6006,0,201 --- In this example, all lines have nine fields, and are easy to parse: 1. The common key 2-5 - The four fields from the first file (possibly empty) 6-9 - The four fields from the second file (possibly empty). Adding AWK on the output of "join" is now easy, because the fields are in fixed order. for example, adding "AA" as a first field and "44" as the last field: --- $ join -t, -1 1 -2 1 -a 1 -a 2 -o auto today.txt yesterday.txt | awk -F, -v OFS=, '{print "AA", $0, "44"}' AA,2012054455,8624520081,6529,0,201,,,,,44 AA,2012067075,2013106025,6214,0,201,2019269533,6664,0,201,44 AA,2012087388,8623689800,6214,2,201,2012320000,6006,0,201,44 AA,2012088887,8623689800,6214,0,201,8624520081,6529,0,201,44 AA,2012120319,9739789996,392A,0,201,,,,,44 AA,2012121177,9739789996,392A,0,201,,,,,44 AA,2012122869,2013700000,6006,0,201,,,,,44 AA,2012140209,2013700000,6006,0,201,9733360000,392A,0,201,44 AA,2012143002,2012339982,6529,0,201,,,,,44 AA,2012149116,2012339982,6529,0,201,,,,,44 AA,2012204272,,,,,2019269533,6664,0,201,44 AA,2012226151,,,,,2018209998,954F,0,201,44 AA,2012299682,,,,,2018209998,954F,0,201,44 AA,2012324322,,,,,9733360000,392A,0,201,44 AA,2012334444,,,,,2017809469,6664,0,201,44 AA,2012389608,,,,,2012320000,6006,0,201,44 --- Or something a little more informative: --- $ join -t, -1 1 -2 1 -a 1 -a 2 -o auto today.txt yesterday.txt | awk -F, -v OFS=, '$2=="" && $6!="" { print $1, "Yesterday" } $2!="" && $6=="" { print $1, "Today" } $2!="" && $6!="" { print $1, "Both" }' 2012054455,Today 2012067075,Both 2012087388,Both 2012088887,Both 2012120319,Today 2012121177,Today 2012122869,Today 2012140209,Both 2012143002,Today 2012149116,Today 2012204272,Yesterday 2012226151,Yesterday 2012299682,Yesterday 2012324322,Yesterday 2012334444,Yesterday 2012389608,Yesterday --- Hope this helps, -gordon
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.