On 02/13/2011 01:27 PM, Paul E Condon wrote: > 'comm' uses LC_COLLATE or LC_ALL to establish the collation that it > uses in its check for proper sorting of input. (I think this is true.) Thanks for the report. Yes, per POSIX, the following list of rules should apply to ALL utilities that perform sorting (first one that applies wins): If LC_ALL is non-empty, honor that If LC_COLLATE is non-empty, honor that If LANG is non-empty, honor that Use an implementation-defined default. Some implementations set the implementation-defined default to the C locale, but that is not universally true. > > The man page and info make no mention of LC_ALL (at least not as > delivered in Debian squeeze) but LC_ALL seems to affect 'comm' > behavior. You are correct that none of the coreutils man pages mention the effect of LC_* and LANG environment variables; our excuse is that they are so universally applicable that it is assumed that you are aware of their effect on all utilities. But patches are welcome to correct the man pages, if you think it would help. Also, a patch to the info pages to add a detailed section with chapter 2 Common Options, discussing the effect of all LC_*/LANG environment variables on all coreutils, would be appreciated. > When neither LC_COLLATE nor LC_ALL is defined, 'comm' reports that the > file is out of order. I think this is misleading. I think it should > instead report that no LC_* is defined. How is coreutils supposed to know the difference between an environment variable not being defined being an error, vs. an environment variable not being defined meaning that you explicitly wanted the implementation-defined default? > > Alteratively, it might be OK to silently assume the definition, > LC_COLLATE=C No, it is NOT silently okay to do that. Coreutils uses setlocale(LC_ALL,""), which is the POSIX-blessed means for determining the four-step collation choice documented above. Either you provide one of the three variables that affects sorting, or you get the implementation-defined default. > > I discovered this situation while writing (and debugging) a shell > script that I wanted to work when invoked from /etc/cron.daily. > The scipt leaves a sorted file in disk for use in the next day run. > Naturely, during testing I seeded that files from manual runs of > the script. Always, when I set up what I thought was a working version, > the script, as run from cron failed with message that the file(s) > was/were out of order. Yes, I should have figured it out, and I > finally have. But it would have been so much faster if ... Any well-written script that depends on not being interrupted by localization settings will explicitly set LC_ALL (or specific categories) as needed, rather than relying on defaults. That's a fact of life for modern script-writing, and a lesson that unfortunately is learned more often by experience than by documentation. Changing coreutils to fail when the variable is not set, rather than going with the implementation-defined default, would unfortunately violate POSIX. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org