[re-adding the list, with permission] On 01/16/2014 10:46 AM, barry kesner wrote: > Eric, > Thanks for response. > I now realize it wants sorted alpha input not numerical. 999 1000 1001 is > how it is sorted. I think there have been requests in the past to enhance 'join' so that it can have more fine-tuned control over how its fields are selected. Maybe something like sharing code so that 'join -1 k1,1n' would behave like it were using 'sort -k1,1n' sorting on file 1. But right now, that functionality doesn't exist. > > How do you tell join this without resorting. The files are huge! Unfortunately, there isn't any really good way, short of re-processing the files to make the data appear sorted in the order join expects. That said, it certainly appears that for your given data, you can write a sed filter that can reprocess on a line-by-line basis, and feed that into join, without the penalty of having to re-sort the entire file and without having to have the processed file stored in your file system all at once. It also seems possible to write a post filter to get back to the style of the line in the original file. Here, extensions such as bash's join <(infilter file1) <(infilter file2) | outfilter make it easier to type (where the trick is to now write the correct sed scripts to serve as infilter and outfilter) than the alternative of having to use named fifos for limiting yourself to just POSIX semantics. > > I can't find LC_COLLATE? It's an environment variable, like LC_ALL, that affects your locale. Running 'locale' will show you your current locale settings, including LC_COLLATE. Setting LC_ALL in the environment is shorthand that forces all other categories to behave the same, so it's easier to test whether 'LC_ALL=C command' has an effect than it is to figure out which locale category(ies) matter. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org