Hi, > Am 09.03.2017 um 18:20 schrieb Assaf Gordon : > >> […] >> Aha, I didn't check this. Then the "-j" option should be moved to a new section "Deprecated" in the man/info page of the coreutils version too. (And mention the special handling of -j1 resp. -j2, while -j3 … works as one expects.) > > I would humbly suggest other wording: I'm not sure '-j' is deprecated. > It is useful, and does work as expected in most cases. It's only mentioned in the addendum here: http://pubs.opengroup.org/onlinepubs/9699919799//utilities/join.html "Earlier versions of this standard allowed -j, -j1, -j2 options, and a form of the -o option that allowed the list option-argument to be multiple arguments. These forms are no longer specified by POSIX.1-2008 but may be present in some implementations. … The obsolescent -j options and the multi-argument -o option are removed in this version." Therefore I still favor to move "-j" at the end of the man page in a separate section, also taking: Q15: http://www.opengroup.org/austin/papers/posix_faq.html into account. > > But, it should be better documented to warn against this edge-case. > > Reuti wrote: >> -j FIELD equivalent to '-1 FIELD -2 FIELD' >> does not work in all cases essentially. > > It 'just works' in most cases, but indeed we should improve the documentation about edge cases. > > First, > this is the relevant section that handles the '-j' parameter: > https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1079 Yep, this I checked in the source too. > > Second, > Let's ensure '-jN' works in the common cases, > when it is *not* followed by a number: > > Two input files: > > $ cat a.txt > 1 2 3 aaa > 2 3 4 bbb > > $ cat b.txt > 1 2 3 XXX > 2 3 4 YYY > > '-j1' alone is equivalent to '-1 1 -2 1': > > $ join -1 1 -2 1 a.txt b.txt > 1 2 3 aaa 2 3 XXX > 2 3 4 bbb 3 4 YYY > > $ join -j1 a.txt b.txt > 1 2 3 aaa 2 3 XXX > 2 3 4 bbb 3 4 YYY > > '-j2' alone is equivalent to '-1 2 -2 2': > > $ join -1 2 -2 2 a.txt b.txt > 2 1 3 aaa 1 3 XXX > 3 2 4 bbb 2 4 YYY > > $ join -j2 a.txt b.txt > 2 1 3 aaa 1 3 XXX > 3 2 4 bbb 2 4 YYY > > '-j3' alone is equivalent to '-1 3 -2 3': > > $ join -1 3 -2 3 a.txt b.txt > 3 1 2 aaa 1 2 XXX > 4 2 3 bbb 2 3 YYY > > $ join -j3 a.txt b.txt > 3 1 2 aaa 1 2 XXX > 4 2 3 bbb 2 3 YYY > > So, in the most common cases, '-jN' works for all Ns > (for "all" being 1,2,3 but really, who needs more than 3 numbers? :) ). > This is perhaps not like BSD's join. > > > Now comes the tricky part: > If the '-j1' or '-j2' is followed by another parameter, > and that parameter turns out *not* to be an valid field number, > It is treated like '-j 1' (or '-1 1 -2 1'), and join just "does the right thing": > > $ join -j2 -i a.txt b.txt > 2 1 3 aaa 1 3 XXX > 3 2 4 bbb 2 4 YYY > > This is implemented here: > https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1171 Aha, I didn't spot this. That's really tricky. I only observed the changing error message complaining about the remaining arguments depending on removing and adding an additional field number. And in case the filename is just a number it's even getting more convoluted, as also the overall number of arguments come into play then. $ join -j1 1 2 generates no error, although -j1 got a 1, but it predicts that it must be the name of a file, as otherwise one argument would be missing on the command line AFAICS. > And the result is that most of the time, join "just works" (IMHO, but > other opinions welcomed). > > > If the '-j1' or '-j2' is followed by a number, this is were the unexpected behaviour occurs, as it sets the key field for that file alone. E.g. '-j1 2' is equivalent to '-1 2' (and the key for the second > file is not set, thus defaults to 1): > > $ join -j1 2 a.txt b.txt > 2 1 3 aaa 3 4 YYY > > $ join -1 2 a.txt b.txt > 2 1 3 aaa 3 4 YYY > > > Is the above a satisfactory explanation? Yes, absolutely. > If so, it'll be more-or-less what I'll add to the manual. > > I see that this has been implemented back in 2005, here: > https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/join.c?id=f9118c1c2e35b > with the comment: > "Parse obsolete options -j1 and -j2 > so that it is a pure extension to POSIX 1003.1-2001." > > I can perhaps guestimate that since this usage is never > mentioned anywhere, it is considered undocumented and discouraged usage > (and indeed, I don't think I've ever encountered it, or previously > saw a bug-report or question about it - so it's rather rare). > > We could add a warning to the man page - what do others think? +1 -- Reuti