GNU bug report logs -
#16468
join
Previous Next
Reported by: barry kesner <modockesner <at> gmail.com>
Date: Thu, 16 Jan 2014 17:07:01 UTC
Severity: normal
Tags: notabug
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On 01/16/2014 07:10 PM, Eric Blake wrote:
> On 01/16/2014 10:46 AM, barry kesner wrote:
>> How do you tell join this without resorting. The files are huge!
>
> Unfortunately, there isn't any really good way, short of re-processing
> the files to make the data appear sorted in the order join expects.
> That said, it certainly appears that for your given data, you can write
> a sed filter that can reprocess on a line-by-line basis, and feed that
> into join, without the penalty of having to re-sort the entire file and
> without having to have the processed file stored in your file system all
> at once. It also seems possible to write a post filter to get back to
> the style of the line in the original file. Here, extensions such as bash's
> join <(infilter file1) <(infilter file2) | outfilter
> make it easier to type (where the trick is to now write the correct sed
> scripts to serve as infilter and outfilter) than the alternative of
> having to use named fifos for limiting yourself to just POSIX semantics.
Hum, isn't such number conversion filtering exactly what numfmt
wasn't designed for? But wait ...
$ numfmt --field 1 --format='%020f' < f2
99980081 1
100002129 1
100002136 2
100002162 3
... it doesn't support leading zeros, unfortunately. ;-/
Wouldn't this be a nice enhancement?
Have a nice day,
Berny
This bug report was last modified 6 years and 224 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.