On 01/17/2014 12:00 AM, Bernhard Voelker wrote: > On 01/16/2014 07:10 PM, Eric Blake wrote: >> On 01/16/2014 10:46 AM, barry kesner wrote: >>> How do you tell join this without resorting. The files are huge! >> >> Unfortunately, there isn't any really good way, short of re-processing >> the files to make the data appear sorted in the order join expects. >> That said, it certainly appears that for your given data, you can write >> a sed filter that can reprocess on a line-by-line basis, and feed that >> into join, without the penalty of having to re-sort the entire file and >> without having to have the processed file stored in your file system all >> at once. It also seems possible to write a post filter to get back to >> the style of the line in the original file. Here, extensions such as bash's >> join <(infilter file1) <(infilter file2) | outfilter >> make it easier to type (where the trick is to now write the correct sed >> scripts to serve as infilter and outfilter) than the alternative of >> having to use named fifos for limiting yourself to just POSIX semantics. > > Hum, isn't such number conversion filtering exactly what numfmt > wasn't designed for? But wait ... > > $ numfmt --field 1 --format='%020f' < f2 > 99980081 1 > 100002129 1 > 100002136 2 > 100002162 3 > > ... it doesn't support leading zeros, unfortunately. ;-/ > Wouldn't this be a nice enhancement? I've needed this a few times so I added it in the attached. thanks, Pádraig.