GNU bug report logs - #26029
Problems with join

Previous Next

Package: coreutils;

Reported by: "Peter Kluge" <linux-projekt <at> techno.ms>

Date: Wed, 8 Mar 2017 17:43:01 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 26029 <at> debbugs.gnu.org (full text, mbox):

From: Reuti <reuti <at> staff.Uni-Marburg.DE>
To: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 26029 <at> debbugs.gnu.org, Peter Kluge <linux-projekt <at> techno.ms>
Subject: Re: bug#26029: Problems with join
Date: Thu, 9 Mar 2017 19:24:40 +0100
[Message part 1 (text/plain, inline)]
Hi,

> Am 09.03.2017 um 18:20 schrieb Assaf Gordon <assafgordon <at> gmail.com>:
> 
>> […]
>> Aha, I didn't check this. Then the "-j" option should be moved to a new section "Deprecated" in the man/info page of the coreutils version too. (And mention the special handling of -j1 resp. -j2, while -j3 … works as one expects.)
> 
> I would humbly suggest other wording: I'm not sure '-j' is deprecated.
> It is useful, and does work as expected in most cases.

It's only mentioned in the addendum here:


http://pubs.opengroup.org/onlinepubs/9699919799//utilities/join.html

"Earlier versions  of  this  standard  allowed  -j, -j1, -j2 options, and a form of the -o option that allowed the list option-argument to be multiple arguments. These forms are  no longer specified by POSIX.1-2008 but may be present in some implementations.
…
The obsolescent -j options and the multi-argument -o option are removed in this version."


Therefore I still favor to move "-j" at the end of the man page in a separate section, also taking:

Q15: http://www.opengroup.org/austin/papers/posix_faq.html

into account.


> 
> But, it should be better documented to warn against this edge-case.
> 
> Reuti wrote:
>> -j FIELD equivalent to '-1 FIELD -2 FIELD'
>> does not work in all cases essentially.
> 
> It 'just works' in most cases, but indeed we should improve the documentation about edge cases.
> 
> First,
> this is the relevant section that handles the '-j' parameter:
> https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1079

Yep, this I checked in the source too.


> 
> Second,
> Let's ensure '-jN' works in the common cases,
> when it is *not* followed by a number:
> 
> Two input files:
> 
>   $ cat a.txt
>   1 2 3 aaa
>   2 3 4 bbb
> 
>   $ cat b.txt
>   1 2 3 XXX
>   2 3 4 YYY
> 
> '-j1' alone is equivalent to '-1 1 -2 1':
> 
>   $ join -1 1 -2 1 a.txt b.txt
>   1 2 3 aaa 2 3 XXX
>   2 3 4 bbb 3 4 YYY
> 
>   $ join -j1 a.txt b.txt
>   1 2 3 aaa 2 3 XXX
>   2 3 4 bbb 3 4 YYY
> 
> '-j2' alone is equivalent to '-1 2 -2 2':
> 
>   $ join -1 2 -2 2 a.txt b.txt
>   2 1 3 aaa 1 3 XXX
>   3 2 4 bbb 2 4 YYY
> 
>   $ join -j2 a.txt b.txt
>   2 1 3 aaa 1 3 XXX
>   3 2 4 bbb 2 4 YYY
> 
> '-j3' alone is equivalent to '-1 3 -2 3':
> 
>   $ join -1 3 -2 3 a.txt b.txt
>   3 1 2 aaa 1 2 XXX
>   4 2 3 bbb 2 3 YYY
> 
>   $ join -j3 a.txt b.txt
>   3 1 2 aaa 1 2 XXX
>   4 2 3 bbb 2 3 YYY
> 
> So, in the most common cases, '-jN' works for all Ns
> (for "all" being 1,2,3 but really, who needs more than 3 numbers? :) ).
> This is perhaps not like BSD's join.
> 
> 
> Now comes the tricky part:
> If the '-j1' or '-j2' is followed by another parameter,
> and that parameter turns out *not* to be an valid field number,
> It is treated like '-j 1' (or '-1 1 -2 1'), and join just "does the right thing":
> 
>   $ join -j2 -i a.txt b.txt
>   2 1 3 aaa 1 3 XXX
>   3 2 4 bbb 2 4 YYY
> 
> This is implemented here:
> https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/join.c#n1171

Aha, I didn't spot this. That's really tricky. I only observed the changing error message complaining about the remaining arguments depending on removing and adding an additional field number. And in case the filename is just a number it's even getting more convoluted, as also the overall number of arguments come into play then.

$ join -j1 1 2

generates no error, although -j1 got  a 1, but it predicts that it must be the name of a file, as otherwise one argument would be missing on the command line AFAICS.


> And the result is that most of the time, join "just works" (IMHO, but
> other opinions welcomed).
> 
> 
> If the '-j1' or '-j2' is followed by a number, this is were the unexpected behaviour occurs, as it sets the key field for that file alone. E.g. '-j1 2' is equivalent to '-1 2' (and the key for the second
> file is not set, thus defaults to 1):
> 
>   $ join -j1 2 a.txt b.txt
>   2 1 3 aaa 3 4 YYY
> 
>   $ join -1 2 a.txt b.txt
>   2 1 3 aaa 3 4 YYY
> 
> 
> Is the above a satisfactory explanation?

Yes, absolutely.


> If so, it'll be more-or-less what I'll add to the manual.
> 
> I see that this has been implemented back in 2005, here:
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/join.c?id=f9118c1c2e35b
> with the comment:
> "Parse obsolete options -j1 and -j2
>  so that it is a pure extension to POSIX 1003.1-2001."
> 
> I can perhaps guestimate that since this usage is never
> mentioned anywhere, it is considered undocumented and discouraged usage
> (and indeed, I don't think I've ever encountered it, or previously
> saw a bug-report or question about it - so it's rather rare).
> 
> We could add a warning to the man page - what do others think?

+1

-- Reuti
[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 6 years and 203 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.