GNU bug report logs - #16468
join

Previous Next

Package: coreutils;

Reported by: barry kesner <modockesner <at> gmail.com>

Date: Thu, 16 Jan 2014 17:07:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #20 received at 16468 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Bernhard Voelker <mail <at> bernhard-voelker.de>
Cc: barry kesner <modockesner <at> gmail.com>, Eric Blake <eblake <at> redhat.com>,
 16468 <at> debbugs.gnu.org
Subject: Re: bug#16468: join
Date: Fri, 17 Jan 2014 02:21:33 +0000
On 01/17/2014 12:00 AM, Bernhard Voelker wrote:
> On 01/16/2014 07:10 PM, Eric Blake wrote:
>> On 01/16/2014 10:46 AM, barry kesner wrote:
>>>   How do you tell join this without resorting.  The files are huge!
>>
>> Unfortunately, there isn't any really good way, short of re-processing
>> the files to make the data appear sorted in the order join expects.
>> That said, it certainly appears that for your given data, you can write
>> a sed filter that can reprocess on a line-by-line basis, and feed that
>> into join, without the penalty of having to re-sort the entire file and
>> without having to have the processed file stored in your file system all
>> at once.  It also seems possible to write a post filter to get back to
>> the style of the line in the original file.  Here, extensions such as bash's
>>   join <(infilter file1) <(infilter file2) | outfilter
>> make it easier to type (where the trick is to now write the correct sed
>> scripts to serve as infilter and outfilter) than the alternative of
>> having to use named fifos for limiting yourself to just POSIX semantics.
> 
> Hum, isn't such number conversion filtering exactly what numfmt
> wasn't designed for?  But wait ...
> 
>   $ numfmt --field 1 --format='%020f' < f2
>               99980081    1
>              100002129   1
>              100002136   2
>              100002162   3
> 
> ... it doesn't support leading zeros, unfortunately. ;-/
> Wouldn't this be a nice enhancement?

Yes it really should support standard formatting directives.
leading zeros, precision in the format, etc.

thanks,
Pádraig.




This bug report was last modified 6 years and 224 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.