GNU bug report logs - #8032
comm: add collation order to error messages

Previous Next

Package: coreutils;

Reported by: Paul E Condon <pecondon <at> mesanetworks.net>

Date: Sun, 13 Feb 2011 20:23:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 8032 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8032; Package coreutils. (Sun, 13 Feb 2011 20:23:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul E Condon <pecondon <at> mesanetworks.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sun, 13 Feb 2011 20:23:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul E Condon <pecondon <at> mesanetworks.net>
To: bug-coreutils <at> gnu.org
Subject: Suggestion in re. error reports from 'comm'
Date: Sun, 13 Feb 2011 13:27:08 -0700
'comm' uses LC_COLLATE or LC_ALL to establish the collation that it
uses in its check for proper sorting of input. (I think this is true.)

The man page and info make no mention of LC_ALL (at least not as
delivered in Debian squeeze) but LC_ALL seems to affect 'comm'
behavior.

When neither LC_COLLATE nor LC_ALL is defined, 'comm' reports that the
file is out of order. I think this is misleading. I think it should
instead report that no LC_* is defined. 

Alteratively, it might be OK to silently assume the definition,
LC_COLLATE=C

I discovered this situation while writing (and debugging) a shell
script that I wanted to work when invoked from /etc/cron.daily.  
The scipt leaves a sorted file in disk for use in the next day run.
Naturely, during testing I seeded that files from manual runs of
the script. Always, when I set up what I thought was a working version,
the script, as run from cron failed with message that the file(s)
was/were out of order. Yes, I should have figured it out, and I 
finally have. But it would have been so much faster if ...

-- 
Paul E Condon           
pecondon <at> mesanetworks.net




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8032; Package coreutils. (Mon, 14 Feb 2011 17:25:01 GMT) Full text and rfc822 format available.

Message #8 received at 8032 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul E Condon <pecondon <at> mesanetworks.net>
Cc: 8032 <at> debbugs.gnu.org
Subject: Re: bug#8032: Suggestion in re. error reports from 'comm'
Date: Mon, 14 Feb 2011 10:33:09 -0700
[Message part 1 (text/plain, inline)]
On 02/13/2011 01:27 PM, Paul E Condon wrote:
> 'comm' uses LC_COLLATE or LC_ALL to establish the collation that it
> uses in its check for proper sorting of input. (I think this is true.)

Thanks for the report.  Yes, per POSIX, the following list of rules
should apply to ALL utilities that perform sorting (first one that
applies wins):

If LC_ALL is non-empty, honor that
If LC_COLLATE is non-empty, honor that
If LANG is non-empty, honor that
Use an implementation-defined default.

Some implementations set the implementation-defined default to the C
locale, but that is not universally true.

> 
> The man page and info make no mention of LC_ALL (at least not as
> delivered in Debian squeeze) but LC_ALL seems to affect 'comm'
> behavior.

You are correct that none of the coreutils man pages mention the effect
of LC_* and LANG environment variables; our excuse is that they are so
universally applicable that it is assumed that you are aware of their
effect on all utilities.

But patches are welcome to correct the man pages, if you think it would
help.

Also, a patch to the info pages to add a detailed section with chapter 2
Common Options, discussing the effect of all LC_*/LANG environment
variables on all coreutils, would be appreciated.


> When neither LC_COLLATE nor LC_ALL is defined, 'comm' reports that the
> file is out of order. I think this is misleading. I think it should
> instead report that no LC_* is defined. 

How is coreutils supposed to know the difference between an environment
variable not being defined being an error, vs. an environment variable
not being defined meaning that you explicitly wanted the
implementation-defined default?

> 
> Alteratively, it might be OK to silently assume the definition,
> LC_COLLATE=C

No, it is NOT silently okay to do that.  Coreutils uses
setlocale(LC_ALL,""), which is the POSIX-blessed means for determining
the four-step collation choice documented above.  Either you provide one
of the three variables that affects sorting, or you get the
implementation-defined default.

> 
> I discovered this situation while writing (and debugging) a shell
> script that I wanted to work when invoked from /etc/cron.daily.  
> The scipt leaves a sorted file in disk for use in the next day run.
> Naturely, during testing I seeded that files from manual runs of
> the script. Always, when I set up what I thought was a working version,
> the script, as run from cron failed with message that the file(s)
> was/were out of order. Yes, I should have figured it out, and I 
> finally have. But it would have been so much faster if ...

Any well-written script that depends on not being interrupted by
localization settings will explicitly set LC_ALL (or specific
categories) as needed, rather than relying on defaults.  That's a fact
of life for modern script-writing, and a lesson that unfortunately is
learned more often by experience than by documentation.  Changing
coreutils to fail when the variable is not set, rather than going with
the implementation-defined default, would unfortunately violate POSIX.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#8032; Package coreutils. (Tue, 15 Feb 2011 15:34:01 GMT) Full text and rfc822 format available.

Message #11 received at 8032 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul E Condon <pecondon <at> mesanetworks.net>, 8032 <at> debbugs.gnu.org
Subject: Re: bug#8032: Suggestion in re. error reports from 'comm'
Date: Tue, 15 Feb 2011 08:42:50 -0700
[Message part 1 (text/plain, inline)]
[let's keep the list in the loop]

On 02/14/2011 07:51 PM, Paul E Condon wrote:
>> Thanks for the report.  Yes, per POSIX, the following list of rules
>> should apply to ALL utilities that perform sorting (first one that
>> applies wins):
>>
>> If LC_ALL is non-empty, honor that
>> If LC_COLLATE is non-empty, honor that
>> If LANG is non-empty, honor that
>> Use an implementation-defined default.
> 
> My facilities for testing these issue are limited. Thanks for
> this information. 
> 
> My script is not yet well written. I had always supposed that the
> actual existence of scripts that are not well written is the
> justification for spending time writing useful error messages and
> even the justification for the existence of syserr
> 
> I'm not surprised that my second suggestion is not OK. My first
> suggestion was to modify the error message that is emitted by 'comm'.
> 
> Now that I better understand that there are four possible classes of
> collating sequences involved, I can refine my suggestion: Change the
> text of the error message about a file not being in sort order to
> append "according to collation rule: <and place here the rule>" For
> the fourth case (implementation dependent). it would be nice to invent
> a wording that is shorter than 'implementation dependent'.
> 
> This could be useful to people debugging scripts on many platforms.
> I can imagine a programmer getting major help from a message
> 
> "file 2 is not in sort order according to LC_COLLATE=C"

Ah, so your suggestion is to make the error message smarter, by adding a
clause which says which environment variable or system default
determined the current collation order, as well as which locale is
currently in use.  That actually sounds like a nice idea!  Would you
care to help write the code?

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 08:26:01 GMT) Full text and rfc822 format available.

Changed bug title to 'comm: add collation order to error messages' from 'Suggestion in re. error reports from 'comm'' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 08:26:01 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 234 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.