GNU bug report logs - #15450
SORT failing on some lines

Previous Next

Package: coreutils;

Reported by: sam <at> netinetics.com

Date: Mon, 23 Sep 2013 21:57:05 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

Full log


Message #15 received at 15450-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: sam <at> netinetics.com, 15450-done <at> debbugs.gnu.org
Subject: Re: bug#15450: SORT failing on some lines
Date: Wed, 25 Sep 2013 13:41:52 -0600
[Message part 1 (text/plain, inline)]
tag 15450 -moreinfo
tag 15450 +notabug
thanks

On 09/25/2013 12:28 PM, sam <at> netinetics.com wrote:
> 
> Hello Eric,
> Thank you kindly for your speedy reply.
> I should apologize for the lack of information included with my email.
> It was a hurried one.

Re-adding the list for closure, with permission.

> 
> In fact your suggestions and link and a bit of tinkering have cured the
> problem. SORT works fine it seems. I should have had more faith.
> The problem was purely with Locale, which I read up on in the FAQ link
> you sent. I had looked at Locale previously but didn't seem to have any
> success with it. I had also been trying various options for SORT,
> including -i, -d and even the field separation. (-t'#' -k1,1) I didn't
> have any luck but I realized after reading through your reply that it
> was the combination of these things which hadn't come right.
> 
> I'd just like to add here for anybody else who stumbles across this same
> problem, a description of the problem I was having in more detail (now
> solved)
> 
> The text file was a 605MB list of title texts extracted from Wikipedia,
> separated by a #--# and followed by the 'long long' integer offsets of
> where the article appeared in the dump file. (XML)
> Example lines:
> 
> Alps Electric#--#7701298893,12,24,364,394,420
> Alps Electric Co.#--#4280442890,12,28,339,3144,3170
> Alps Electric Corporation#--#9562165739,12,36,447,477,503
> 
> My machine was set to en-GB locale, although I had switched this to
> en-US with same (wrong) results.
> 
> It was necessary to set the locale to LC_ALL=C and also to instruct SORT
> only to look at the first field (up to the first #) using the -t'#' and
> -k1,1 switches as you mentioned.
> Obvious really, but the combination of the two is what caused my confusion.
> 
> It is really worth reading up on Locale for anybody using SORT and other
> utilities as it can profoundly change the results of an operation.
> Even setting locale to en-US doesn't help, as I read in the FAQ you
> linked, because en-US quite drastically reduces sort possibilities
> (case, punctuation etc ignored)
> 
> I'm sorry for the bother - but you put me on the right track.
> Many thanks for that.

Glad to hear it.  As such, I've closed the bug in the tracker.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 11 years and 239 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.