GNU bug report logs -
#6366
join can't join on numeric fields
Previous Next
Full log
Message #18 received at 6366 <at> debbugs.gnu.org (full text, mbox):
Alex Shinn wrote:
> 2010/6/8 Pádraig Brady <P <at> draigbrady.com>:
>> On 07/06/10 06:19, Alex Shinn wrote:
>>>
>>> Ideally join should be able to handle files sorted in any order
>>> that sort provides, but as a bare minimum it should at least
>>> be able to join files sorted on numeric fields.
>>
>> Well if there were no aliases in the numbers, you could always
>> sort the output numerically after the join if it was important.
>
> By first sorting lexicographically, you mean?
> In the use case I had, the data was already sorted
> numerically. So whenever I want to join two files,
> currently I have to do:
>
> sort file1 > file1.tmp
> sort file2 > file2.tmp
> join file1.tmp file2.tmp | sort -n > out
> rm -f file1.tmp file2.tmp
>
> instead of just
>
> join -n file1 file2 > out
>
> In the small tools philosophy you want to avoid adding
> redundancy, but in this case join isn't doing the same
> thing as sort, it's just working with it better. Not to mention
> the fact that sort is an expensive operation to have to
> perform multiple times, not just an extra O(n) filter
> to throw in the middle of a pipeline.
>
>> However if you wanted to join "01" and "1" then your patch is required.
>> Are numeric aliases common enough to warrant this? I think so.
>
> Leading zeros may not be so common, but don't forget
> "1.0" and "1" or "1e2" and "100" and "100.0", etc.
>
>> I'd use -g, --general-numeric to correspond with `sort`.
>
> Yes, that's probably better.
There may be a fly in the ointment.
When comparing floating point numbers how would join measure equality?
Should it consider 1.000000000000001e2 to be equal to 100.0 ?
What if the maximum precision available does not
allow us to distinguish those two values?
What about -0 and 0? (with IEEE 754, they'll compare equal)
This bug report was last modified 6 years and 260 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.