GNU bug report logs - #7878
"sort" bug--inconsistent single-column sorting influenced by other columns?

Previous Next

Package: coreutils;

Reported by: Randall Lewis <ralewis <at> yahoo-inc.com>

Date: Fri, 21 Jan 2011 02:36:02 UTC

Severity: normal

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 7878 in the body.
You can then email your comments to 7878 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7878; Package coreutils. (Fri, 21 Jan 2011 02:36:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Randall Lewis <ralewis <at> yahoo-inc.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 21 Jan 2011 02:36:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Randall Lewis <ralewis <at> yahoo-inc.com>
To: "bug-coreutils <at> gnu.org" <bug-coreutils <at> gnu.org>
Subject: "sort" bug--inconsistent single-column sorting influenced by other
	columns?
Date: Thu, 20 Jan 2011 18:40:01 -0800
[Message part 1 (text/plain, inline)]
"sort" does inconsistent sorting.

I'm pretty sure it has NOTHING to do with the following warning, although I could be totally wrong.

" *** WARNING ***
The locale specified by the environment affects sort order.
Set LC_ALL=C to get the traditional sort order that uses
native byte values. "


See the attached shell script and text files.

bash-3.2$


cat test1.txt
323|1
36|2
406|3
40|4
587|5
cat test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Note that the first column is the same for both files.

sort test1.txt
323|1
36|2
40|4
406|3
587|5
sort test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
The rows are in a different order depending on the dataset--and it is NOT a numeric sort. I'm not even sure it is is ANY type of sort.

sort -k1 test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Trying to fix the problem by focusing on the first column doesn't work.

sort -t "|" test1.txt
323|1
36|2
40|4
406|3
587|5
sort -t "|" test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -t '|' test1.txt
323|1
36|2
40|4
406|3
587|5
sort -t '|' test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -k1 -t "|" test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 -t "|" test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -k1 -t '|' test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1 -t '|' test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Trying to fix the problem by including delimiter information doesn't work.
sort -k1d test1.txt
323|1
36|2
40|4
406|3
587|5
sort -k1d test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -s test1.txt
323|1
36|2
40|4
406|3
587|5
sort -s test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort -s -k1 test1.txt
323|1
36|2
40|4
406|3
587|5
sort -s -k1 test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
Neither does dictionary order or stable matching.
sort -g test1.txt
36|2
40|4
323|1
406|3
587|5
sort -g test7.txt
36|C2
40|B4
323|B1
406|B3
587|C5
sort -n test1.txt
36|2
40|4
323|1
406|3
587|5
sort -n test7.txt
36|C2
40|B4
323|B1
406|B3
587|C5
Using numeric or general sorting appears to fix the problem on this numeric example. But why did it sort inconsistently in the first place based on the other contents of the
 file rather than just focusing on the first column--even when I told it to?
sort test1.txt | join -a1 -a2 -t "\|" - test7.txt
323|1|B1
36|2|C2
40|4
406|3|B3
40|B4
587|5|C5
Inconsistent sorting when combined with 'join' provides incorrect matches and duplication of records. This is a mess.
sort test1.txt | sort -c
sort test7.txt | sort -c
Yet, sort -c says that it is sorted correctly.
sort test1.txt
323|1
36|2
40|4
406|3
587|5
sort test7.txt
323|B1
36|C2
406|B3
40|B4
587|C5
sort test1.txt | join -a1 -a2 -j1 -t "\|" -e "0" -o "1.1,1.2,2.2" - test7.txt
See COMMENTED Cygwin output.

# $ sort test1.txt
# 323|1
# 36|2
# 406|3
# 40|4
# 587|5

# $ sort test7.txt
# 323|B1
# 36|C2
# 406|B3
# 40|B4
# 587|C5

# $ sort test1.txt | join -a1 -a2 -j1 -t "|" -e "0" -o "1.1,1.2,2.2" - test7.txt
# |B1|1
# |C22
# |B3|3
# |B44
# |C5|5


And finally, Cygwin does this sort consistently across all three examples (but it does mess up the 'join'). ????? Sucks to be me with a defective Cygwin and an unreliable so
rt and work to get done. Any advice?


randall lewis
research scientist

ralewis <at> yahoo-inc.com
mobile 617-671-8294

4401 great america parkway, santa clara, ca, 95054, us




[Message part 2 (text/html, inline)]
[SortBug.sh (application/octet-stream, attachment)]
[test7.txt (text/plain, attachment)]
[test1.txt (text/plain, attachment)]

Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7878; Package coreutils. (Fri, 21 Jan 2011 05:55:02 GMT) Full text and rfc822 format available.

Message #8 received at 7878 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Randall Lewis <ralewis <at> yahoo-inc.com>
Cc: 7878 <at> debbugs.gnu.org
Subject: Re: bug#7878: "sort" bug--inconsistent single-column sorting
	influenced by other columns?
Date: Thu, 20 Jan 2011 23:02:11 -0700
Randall Lewis wrote:
> "sort" does inconsistent sorting.

You are sure about that?  :-)

> I'm pretty sure it has NOTHING to do with the following warning,
> although I could be totally wrong.
> 
> " *** WARNING ***
> The locale specified by the environment affects sort order.
> Set LC_ALL=C to get the traditional sort order that uses
> native byte values. "

You read this, know that sort will base the sorting upon the locale
setting, but didn't tell us what locale you were using to sort?  Shame
on you.  Because you *know* I am going to ask you about it! :-)

What locale are you using?  C?  en_US.UTF-8?  Some other?  The locale
command will print this information.  Here is an example from my system.

  $ locale
  LANG=en_US.UTF-8
  LC_CTYPE="en_US.UTF-8"
  LC_NUMERIC="en_US.UTF-8"
  LC_TIME="en_US.UTF-8"
  LC_COLLATE=C
  LC_MONETARY="en_US.UTF-8"
  LC_MESSAGES="en_US.UTF-8"
  LC_PAPER="en_US.UTF-8"
  LC_NAME="en_US.UTF-8"
  LC_ADDRESS="en_US.UTF-8"
  LC_TELEPHONE="en_US.UTF-8"
  LC_MEASUREMENT="en_US.UTF-8"
  LC_IDENTIFICATION="en_US.UTF-8"
  LC_ALL=

> sort test1.txt
> 323|1
> 36|2
> 40|4
> 406|3
> 587|5

> sort test7.txt
> 323|B1
> 36|C2
> 406|B3
> 40|B4
> 587|C5

Looks okay to me for the en_US.UTF-8 locale.  But it will of course be
different in the C locale.

  $ LC_ALL=en_US.UTF-8 sort test1.txt 
  323|1
  36|2
  40|4
  406|3
  587|5

  $ LC_ALL=C sort test1.txt 
  323|1
  36|2
  406|3
  40|4
  587|5

What ordering did you expect there?  I assume you are expecting to see
these sorted as in the C locale?

> The rows are in a different order depending on the dataset--and it
> is NOT a numeric sort. I'm not even sure it is is ANY type of sort.

It is a character sort.  A string sort.  It is comparing the line of
characters from start to finish.  But it uses the system's collation
tables based upon the locale.  In the en_US.UTF-8 locale punctuation
is ignored and case is folded.  I don't like it but the powers that be
have decreed it.

Please see the FAQ:

  http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

The standards documentation:

  http://www.opengroup.org/onlinepubs/009695399/utilities/sort.html

Variables that control localization:

  http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html#tag_08_02

> sort -k1 -t "|" test1.txt

Hint: If you ever think you need to use -k POS1 then you almost always
should be using -k POS1,POS2 to specify where you want the sort to
stop comparing.  Otherwise it compares all of the way to the end of
the line.

> But why did it sort inconsistently in the first place based on the
> other contents of the file rather than just focusing on the first
> column--even when I told it to?

You never told it not to continue comparing all of the way to the end
of the line.  For example this way:

  $ sort -t'|' -k1,1n -k2,2n test1.txt 
  36|2
  40|4
  323|1
  406|3
  587|5

That won't help you with join since that expects a non-numeric sort
ordering.

> Inconsistent sorting when combined with 'join' provides incorrect
> matches and duplication of records. This is a mess.

Yes.  Recent versions of join detect and warn about this.  Recent
versions of sort have a --debug option that can help to identify
problem cases.

Bob




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7878; Package coreutils. (Fri, 21 Jan 2011 07:23:01 GMT) Full text and rfc822 format available.

Message #11 received at 7878 <at> debbugs.gnu.org (full text, mbox):

From: Randall Lewis <ralewis <at> yahoo-inc.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: "7878 <at> debbugs.gnu.org" <7878 <at> debbugs.gnu.org>
Subject: RE: bug#7878: "sort" bug--inconsistent single-column sorting
	influenced by other columns?
Date: Thu, 20 Jan 2011 23:29:42 -0800
Hi Bob--

Wow! So, a couple comments about how I seem to have figured out every wrong way to use "sort" when also using "join."

Who would've thought that 

sort -k1 test1.txt

would default to sort on the entire line? (I normally would've thought that [,POS2] means "optional if you want to have it keep going beyond the first field.")

Also, who would've thought that the default "sort" would be incompatible with "join" and that you would need to write the command like this every time you wanted to use "join"?

LC_ALL=C sort test1.txt

Or that you would need a special type of "pre-sort" on the column (which I was executing wrong)?

sort -k1,1 -t "|" test1.txt

Regardless, here is "locale" (for the record, I'm pretty new to the utilities--and love them. I'm not a computer scientist, but rather an economist trying to fit in at Yahoo! with the engineers and computer scientists). I'm sure there's a good reason why there are two, and it's pretty clear that I novice enough that I'll have to learn that later.

bash-3.2$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Thanks, Bob, for sharing two separate ways that I could get the answer the way I need it--two ways I could not have come up with on my own.

Thanks!

--Randall

P.S. So, the reason why sorting on the column didn't work for me was because it was plucking out the delimiter and then doing a string sort? Then it was string sorting, putting numbers before letters (as you might expect it to)? 

bash-3.2$ sort test1.txt
323|1
36|2
406|3
40|7 <-- Changed from 4 to 7 changed the sort order.
587|5

bash-3.2$ sort test1.txt
323|1
36|2
40|4
406|3
587|5





-----Original Message-----
From: Bob Proulx [mailto:bob <at> proulx.com] 
Sent: Thursday, January 20, 2011 10:02 PM
To: Randall Lewis
Cc: 7878 <at> debbugs.gnu.org
Subject: Re: bug#7878: "sort" bug--inconsistent single-column sorting influenced by other columns?

Randall Lewis wrote:
> "sort" does inconsistent sorting.

You are sure about that?  :-)

> I'm pretty sure it has NOTHING to do with the following warning,
> although I could be totally wrong.
> 
> " *** WARNING ***
> The locale specified by the environment affects sort order.
> Set LC_ALL=C to get the traditional sort order that uses
> native byte values. "

You read this, know that sort will base the sorting upon the locale
setting, but didn't tell us what locale you were using to sort?  Shame
on you.  Because you *know* I am going to ask you about it! :-)

What locale are you using?  C?  en_US.UTF-8?  Some other?  The locale
command will print this information.  Here is an example from my system.

  $ locale
  LANG=en_US.UTF-8
  LC_CTYPE="en_US.UTF-8"
  LC_NUMERIC="en_US.UTF-8"
  LC_TIME="en_US.UTF-8"
  LC_COLLATE=C
  LC_MONETARY="en_US.UTF-8"
  LC_MESSAGES="en_US.UTF-8"
  LC_PAPER="en_US.UTF-8"
  LC_NAME="en_US.UTF-8"
  LC_ADDRESS="en_US.UTF-8"
  LC_TELEPHONE="en_US.UTF-8"
  LC_MEASUREMENT="en_US.UTF-8"
  LC_IDENTIFICATION="en_US.UTF-8"
  LC_ALL=

> sort test1.txt
> 323|1
> 36|2
> 40|4
> 406|3
> 587|5

> sort test7.txt
> 323|B1
> 36|C2
> 406|B3
> 40|B4
> 587|C5

Looks okay to me for the en_US.UTF-8 locale.  But it will of course be
different in the C locale.

  $ LC_ALL=en_US.UTF-8 sort test1.txt 
  323|1
  36|2
  40|4
  406|3
  587|5

  $ LC_ALL=C sort test1.txt 
  323|1
  36|2
  406|3
  40|4
  587|5

What ordering did you expect there?  I assume you are expecting to see
these sorted as in the C locale?

> The rows are in a different order depending on the dataset--and it
> is NOT a numeric sort. I'm not even sure it is is ANY type of sort.

It is a character sort.  A string sort.  It is comparing the line of
characters from start to finish.  But it uses the system's collation
tables based upon the locale.  In the en_US.UTF-8 locale punctuation
is ignored and case is folded.  I don't like it but the powers that be
have decreed it.

Please see the FAQ:

  http://www.gnu.org/software/coreutils/faq/#Sort-does-not-sort-in-normal-order_0021

The standards documentation:

  http://www.opengroup.org/onlinepubs/009695399/utilities/sort.html

Variables that control localization:

  http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html#tag_08_02

> sort -k1 -t "|" test1.txt

Hint: If you ever think you need to use -k POS1 then you almost always
should be using -k POS1,POS2 to specify where you want the sort to
stop comparing.  Otherwise it compares all of the way to the end of
the line.

> But why did it sort inconsistently in the first place based on the
> other contents of the file rather than just focusing on the first
> column--even when I told it to?

You never told it not to continue comparing all of the way to the end
of the line.  For example this way:

  $ sort -t'|' -k1,1n -k2,2n test1.txt 
  36|2
  40|4
  323|1
  406|3
  587|5

That won't help you with join since that expects a non-numeric sort
ordering.

> Inconsistent sorting when combined with 'join' provides incorrect
> matches and duplication of records. This is a mess.

Yes.  Recent versions of join detect and warn about this.  Recent
versions of sort have a --debug option that can help to identify
problem cases.

Bob




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7878; Package coreutils. (Fri, 21 Jan 2011 09:18:01 GMT) Full text and rfc822 format available.

Message #14 received at 7878 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Randall Lewis <ralewis <at> yahoo-inc.com>
Cc: "7878-done <at> debbugs.gnu.org" <7878 <at> debbugs.gnu.org>
Subject: Re: bug#7878: "sort" bug--inconsistent single-column sorting
	influenced by other columns?
Date: Fri, 21 Jan 2011 01:25:41 -0800
On 01/20/2011 11:29 PM, Randall Lewis wrote:
> Also, who would've thought that the default "sort" would be incompatible with "join" and that you would need to write the command like this every time you wanted to use "join"?
> 
> LC_ALL=C sort test1.txt

No, "sort" and "join" use the same collating sequence by default.

It sounds like you have a different problem: you
weren't sorting by the same field that you were
joining on.  For example, if you want to use plain
"join" then you need to sort via "sort -k 1b,1".
Or, if you want to use "join -t '|'" then you
also need to use "sort -k 1,1 -t '|'".

This is documented in the coreutils manual.

It may be that "LC_ALL=C sort" worked around your
problem on your particular test case, but it won't
work in general.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7878; Package coreutils. (Fri, 21 Jan 2011 09:27:02 GMT) Full text and rfc822 format available.

Message #17 received at 7878 <at> debbugs.gnu.org (full text, mbox):

From: Randall Lewis <ralewis <at> yahoo-inc.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "7878-done <at> debbugs.gnu.org" <7878 <at> debbugs.gnu.org>
Subject: RE: bug#7878: "sort" bug--inconsistent single-column sorting
	influenced by other columns?
Date: Fri, 21 Jan 2011 01:34:22 -0800
Thanks Paul.

Yes, it would seem that the true solution to my problem is doing the following (as you suggested):

use "sort -k 1,1 -t '|'"

This ensures that I sort on the first field--whereas "sort -k1 -t '|'" does not, as much as I wanted it to. ;) Since I was joining on only the first field I should've only been sorting on the first field. So, perhaps the only logical conflict with my usage here is that "join" works on the first field by default (as far as I can tell from join --help) while "sort" does not. But I guess this makes sense since "sort" is used for much more (bizarre use cases) than just as a pre-step to "join."

I'll read up on the coreutils manual next time.

Thanks for being patient with me and for the great feedback. :)

--Randall


-----Original Message-----
From: Paul Eggert [mailto:eggert <at> cs.ucla.edu] 
Sent: Friday, January 21, 2011 1:26 AM
To: Randall Lewis
Cc: 7878-done <at> debbugs.gnu.org
Subject: Re: bug#7878: "sort" bug--inconsistent single-column sorting influenced by other columns?

On 01/20/2011 11:29 PM, Randall Lewis wrote:
> Also, who would've thought that the default "sort" would be incompatible with "join" and that you would need to write the command like this every time you wanted to use "join"?
> 
> LC_ALL=C sort test1.txt

No, "sort" and "join" use the same collating sequence by default.

It sounds like you have a different problem: you
weren't sorting by the same field that you were
joining on.  For example, if you want to use plain
"join" then you need to sort via "sort -k 1b,1".
Or, if you want to use "join -t '|'" then you
also need to use "sort -k 1,1 -t '|'".

This is documented in the coreutils manual.

It may be that "LC_ALL=C sort" worked around your
problem on your particular test case, but it won't
work in general.




Reply sent to Bob Proulx <bob <at> proulx.com>:
You have taken responsibility. (Fri, 21 Jan 2011 09:38:01 GMT) Full text and rfc822 format available.

Notification sent to Randall Lewis <ralewis <at> yahoo-inc.com>:
bug acknowledged by developer. (Fri, 21 Jan 2011 09:38:02 GMT) Full text and rfc822 format available.

Message #22 received at 7878-done <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Randall Lewis <ralewis <at> yahoo-inc.com>
Cc: 7878-done <at> debbugs.gnu.org
Subject: Re: bug#7878: "sort" bug--inconsistent single-column sorting
	influenced by other columns?
Date: Fri, 21 Jan 2011 02:45:02 -0700
Hi Randall,

Randall Lewis wrote:
> Wow! So, a couple comments about how I seem to have figured out
> every wrong way to use "sort" when also using "join."

You did have an impressive number of cases examined!

> Who would've thought that 
>
> sort -k1 test1.txt
> 
> would default to sort on the entire line? (I normally would've
> thought that [,POS2] means "optional if you want to have it keep
> going beyond the first field.")

You are not the only one to have had that misconception.  But that is
the way that it has always worked.  Here is the GNU sort documentation.

  `-k POS1[,POS2]'
  `--key=POS1[,POS2]'
       Specify a sort field that consists of the part of the line between
       POS1 and POS2 (or the end of the line, if POS2 is omitted),
       _inclusive_.

This behavior goes back at least to Unix v7 days and actually very
likely well before that time.  When you are a programmer in the middle
1970's writing a sorting program and you make a simple decision about
how to control sorting using command line arguments would you have had
any idea that in 2011 we would still be using virtually the same
program and interface forty years later?  And you are working on the
problem for what amounts to the first time on a new operating system.
Having done interface design and having been less successful I can't
complain.  :-)  Some of the decisions were less than great.  Other
decisions were excellent and visionary.  On average they were better
than most of us can do on our best days.

> Also, who would've thought that the default "sort" would be
> incompatible with "join" and that you would need to write the
> command like this every time you wanted to use "join"?

When sort and join were written they were compatible.  Back then the
collation sequence was strictly byte ordering.  That is the standard C
locale ordering.

It wasn't until recently when locales were introduced with en_US and
similar that problems were introduced.  For reasons unfathomable to me
the powers that be made sort ordering dictionary ordering where case
is folded and punctuation is ignored.  They failed to see how this
would negatively impact almost everything.  Creeping features.
Because punctuation is ignored in the en_US locale it causes a lot of
problems.  You didn't have to say LC_ALL=C for the first thirty years.
Don't get me started.  I have been a rather outspoken critic of this
design decision.

Personally I have the following set in my shell environment.

  export LANG=en_US.UTF-8
  export LC_COLLATE=C

I want the traditional collation sequence and so set LC_COLLATE.  But
I also want the fancy new characters with umlauts and that requires
(along with a unicode charset) a UTF-8 capable locale.  The above is a
compromise but for me a good one.

> LC_ALL=C sort test1.txt
> 
> Or that you would need a special type of "pre-sort" on the column
> (which I was executing wrong)?
> 
> sort -k1,1 -t "|" test1.txt

Since you had two fields you probably want to sort on the second field too.

  sort -k1,1 -k2,2 -t "|" test1.txt

That will sort on the first field and then the second field.

> Regardless, here is "locale" (for the record, I'm pretty new to the
> utilities--and love them. I'm not a computer scientist, but rather
> an economist trying to fit in at Yahoo! with the engineers and
> computer scientists). I'm sure there's a good reason why there are
> two, and it's pretty clear that I novice enough that I'll have to
> learn that later.

I didn't follow where the "two" was attached.  Two as in economists
and computer scientists?  Or two as in engineers and computer
scientists?  Full disclosure: I am an electrical engineer. :-)

> Thanks, Bob, for sharing two separate ways that I could get the
> answer the way I need it--two ways I could not have come up with on
> my own.

Just to nudge in a particular direction there are two other mailing
lists that are good to know about.  The coreutils <at> gnu.org mailing list
is for general discussion of the coreutils.  Here on bug-coreutils is
where bug reports are collected every message thread opens a bug
ticket in the bug tracking system.  Which is great for bug reports.
But not so good for general discussion since it keeps opening bugs
that need to be triaged.  That is why we have the coreutils mailing
list which is just a normal list for normal discussion.  Additionally
there is a general discussion list for general help
help-gnu-utils <at> gnu.org that is also a good resource.

> P.S. So, the reason why sorting on the column didn't work for me was
> because it was plucking out the delimiter and then doing a string
> sort? 

Correct.

> Then it was string sorting, putting numbers before letters (as
> you might expect it to)?

It would look like this to sort:

  $ sed 's/[[:punct:]]//' test1.txt 
  3231
  362
  4063
  404
  5875

  $ sed 's/[[:punct:]]//' test1.txt | LC_ALL=C sort
  3231
  362
  404
  4063
  5875

> 323|1
> 36|2
> 406|3
> 40|7 <-- Changed from 4 to 7 changed the sort order.
> 587|5

  $ sed 's/[[:punct:]]//' test1.txt | LC_ALL=C sort
  3231
  362
  4063
  407
  5875

And case is folded too.  But that didn't come into play here.  And
this affects everything that sorts everywhere on the system.
Including the shell.

  echo *
  for f in *; do ...
  ls

Bob




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 18 Feb 2011 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 125 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.