GNU bug report logs - #23665
spaces in keys: doc, --debug in LC_ALL=C

Previous Next

Package: coreutils;

Reported by: Karl Berry <karl <at> freefriends.org>

Date: Tue, 31 May 2016 18:33:02 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 23665 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Assaf Gordon <assafgordon <at> gmail.com>, Karl Berry <karl <at> freefriends.org>,
 23665 <at> debbugs.gnu.org
Subject: Re: bug#23665: spaces in keys: doc, --debug in LC_ALL=C
Date: Tue, 31 May 2016 23:46:47 +0100
On 31/05/16 20:11, Assaf Gordon wrote:
> Hello Karl!
>
> On 05/31/2016 02:32 PM, Karl Berry wrote:
>> I run
>>     LC_ALL=en_US.UTF-8 sort --debug -k 2 /tmp/foo  # or -k 2,2 et al.
>> And get the nicely explanatory output for the "surprising" result:
> [...]
>
> Just to verify, the surprising result is in C locale?
>
> I'm seeing the following, for "en_US.UTF-8" it's the order I'd expect, but the "C" is surprising:
>
>       $ cat -A k.txt
>       M  Build/zfile$
>       M  Master/mfile$
>       MM Build/afile$
>
>       $ LC_ALL=en_US.UTF-8 sort -k2 k.txt
>       MM Build/afile
>       M  Build/zfile
>       M  Master/mfile
>
>       $ LC_ALL=C sort -k2 k.txt
>       M  Build/zfile
>       M  Master/mfile
>       MM Build/afile
>
>
>> But the information is just as valid in C as in UTF-8, so far as I can
>> see.  Thus it would be nice for it to be present.
>
> If I understand correctly, one could argue the warning is even more important in C locale than in UTF-8 locales,
> as collating rules for UTF-8 make leading spaces less significant.
>
> As in:
>
>       $ cat -A s.txt
>       M A$
>       M  B$
>       M   D$
>       M  C$
>
> UTF-8 makes leading spaces less important:
>
>       $ LC_ALL=en_US.UTF-8 sort -k2 s.txt
>       M A
>       M  B
>       M  C
>       M   D
>
> in C locale, spaces (as simple bytes) do matter:
>
>       $ LC_ALL=C sort -k2 s.txt
>       M   D
>       M  B
>       M  C
>       M A
>
> -b skips leading spaces:
>
>       $ LC_ALL=C sort -k2b s.txt
>       M A
>       M  B
>       M  C
>       M   D
>
>
>> More importantly, I urge that the documentation for sort give an example
>> of this.  The idea that following blanks after the first become part of
>> the next field is highly counter-intuitive.
>
> I agree,
> I can add the above example to the documentation (also possibly to the FAQ or Gotcha pages?).
> What do you think?
>
> The condition to print this message is here:
>    http://lingrok.org/xref/coreutils/src/sort.c#2435
> I can try to suggest a patch to print it in C locale as well (hopefully tonight).

The warning was suppressed in this case as one might be using
such a command to sort right aligned indexes:
http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=v8.5-40-g63761c0
Now I was probably over thinking that a bit,
so I'd be happy for the removal of the maybe_space_aligned from the condition.

cheers,
Pádraig.





This bug report was last modified 6 years and 245 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.