GNU bug report logs - #6366
join can't join on numeric fields

Previous Next

Package: coreutils;

Reported by: Alex Shinn <alexshinn <at> gmail.com>

Date: Mon, 7 Jun 2010 05:24:02 UTC

Severity: wishlist

Tags: patch

Merged with 10924, 12264

Full log


Message #15 received at 6366 <at> debbugs.gnu.org (full text, mbox):

From: Alex Shinn <alexshinn <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 6366 <at> debbugs.gnu.org
Subject: Re: bug#6366: join can't join on numeric fields
Date: Wed, 9 Jun 2010 10:47:05 +0900
2010/6/8 Pádraig Brady <P <at> draigbrady.com>:
> On 07/06/10 06:19, Alex Shinn wrote:
>>
>> Ideally join should be able to handle files sorted in any order
>> that sort provides, but as a bare minimum it should at least
>> be able to join files sorted on numeric fields.
>
> Well if there were no aliases in the numbers, you could always
> sort the output numerically after the join if it was important.

By first sorting lexicographically, you mean?
In the use case I had, the data was already sorted
numerically.  So whenever I want to join two files,
currently I have to do:

  sort file1 > file1.tmp
  sort file2 > file2.tmp
  join file1.tmp file2.tmp | sort -n > out
  rm -f file1.tmp file2.tmp

instead of just

  join -n file1 file2 > out

In the small tools philosophy you want to avoid adding
redundancy, but in this case join isn't doing the same
thing as sort, it's just working with it better.  Not to mention
the fact that sort is an expensive operation to have to
perform multiple times, not just an extra O(n) filter
to throw in the middle of a pipeline.

> However if you wanted to join "01" and "1" then your patch is required.
> Are numeric aliases common enough to warrant this? I think so.

Leading zeros may not be so common, but don't forget
"1.0" and "1" or "1e2" and "100" and "100.0", etc.

> I'd use -g, --general-numeric to correspond with `sort`.

Yes, that's probably better.

-- 
Alex




This bug report was last modified 6 years and 260 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.