GNU bug report logs - #16944
Sort program (sort.c) I can't sort by ascii collating sequence over a first column of text.

Previous Next

Package: coreutils;

Reported by: Leslie Satenstein <lsatenstein <at> yahoo.com>

Date: Wed, 5 Mar 2014 22:51:01 UTC

Severity: normal

Tags: notabug

Done: Eric Blake <eblake <at> redhat.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 16944 in the body.
You can then email your comments to 16944 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#16944; Package coreutils. (Wed, 05 Mar 2014 22:51:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Leslie Satenstein <lsatenstein <at> yahoo.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 05 Mar 2014 22:51:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Leslie Satenstein <lsatenstein <at> yahoo.com>
To: bug-coreutils <at> gnu.org
Subject: Sort program (sort.c)  I can't sort by ascii collating sequence
 over a first column of text.
Date: Wed, 05 Mar 2014 17:48:51 -0500
[Message part 1 (text/plain, inline)]
I have a problem with the sort utility that I cannot seem to do with
sort.

I have a file x (below) and I wish to sort only the first column
according to the ascii table, in other words, a sort where the sort
follows the
A..Za..z  and of course the other characters as well. 

I created this file x to illustrate the problem.

This is  First line of file x is a space character, the backspace char
and the textHost=fedora20-leslie  

RAW Unsorted input (27 lines) filename x

Host=fedora20-leslie             |	       |                       scan
from|/home/leslie/Development/scandir
scandir.ini                      |20140223 1245|
e2c713788f9492be9e61d1d0badcc8ca|/home/leslie/Development/scandir
sha.c                            |20140223 1245|
f20dc5f72f0235d84a07e8a6b80ab036|/home/leslie/Development/scandir
dirdepth                         |20140223 1245|
9f2ff1bd8b133ca0de8d124ad7d761d2|/home/leslie/Development/scandir
scandirmd5.c                     |20140223 1245|
c38735f1cdf0bbcf7e352876d7f28793|/home/leslie/Development/scandir
md5Good.tar                      |20140223 1245|
8190181f115e74742e1291b915950531|/home/leslie/Development/scandir
inih_r27.tar                     |20140223 1245|
a8da6db331c8fe638cbb8c6940ce303e|/home/leslie/Development/scandir
test.sh                          |20140223 1245|
503c5fe5bd4ee7f2ac53d7df0a371bb6|/home/leslie/Development/scandir
scandir32.c                      |20140223 1245|
86c005228b275b55249cde39c2e95d32|/home/leslie/Development/scandir
scandir32                        |20140223 1245|
5d26167e56b5e6efe203bdbfb4483c6f|/home/leslie/Development/scandir
md5.c                            |20140223 1245|
2095124ffca65c307a840082185f5be9|/home/leslie/Development/scandir
crc32.o                          |20140223 1245|
10a49aede5f82d00205c1f89a8931731|/home/leslie/Development/scandir
sha                              |20140223 1245|
07f74c7c98e3498ca11dba9a5c56edc9|/home/leslie/Development/scandir
md5.o                            |20140223 1245|
4bb7270967299fa7fbb5ae4826f9c4c0|/home/leslie/Development/scandir
mddriver.c                       |20140223 1245|
581b61b0fc14df5e4a78b0db6d0d7ca4|/home/leslie/Development/scandir
sha1.c                           |20140223 1245|
74832014b5b65a34d5eaf273c7393116|/home/leslie/Development/scandir
scandirmd5                       |20140223 1245|
864a8f6dfbeb16bef1f09d71759aeca4|/home/leslie/Development/scandir
scandir                          |20140223 1245|
864a8f6dfbeb16bef1f09d71759aeca4|/home/leslie/Development/scandir
gcc.txt                          |20140223 1245|
b8917c1a087abbf74f0294dad9cbf698|/home/leslie/Development/scandir
scandirsha1.c                    |20140223 1245|
6f8e62c3c10c09922f41c643ff0592f8|/home/leslie/Development/scandir
sha1.h                           |20140223 1245|
d2559d2af8a19ea6bc64b35f69c4eea6|/home/leslie/Development/scandir
dirdepth.c                       |20140223 1245|
a7c3f1c02245aec9a1b651e11018ff82|/home/leslie/Development/scandir
x                                |20140305 1506|
d41d8cd98f00b204e9800998ecf8427e|/home/leslie/Development/scandir
crc32.c                          |20140223 1245|
4d7a5dbb246898ff9d3ba19c0ded7f5b|/home/leslie/Development/scandir
DATE2                            |20140223 1245|
e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandir
DATE1                            |20140223 1245|
e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandir
md5                              |20140223 1245|
a0509bd4723729ad76ce341844b0db92|/home/leslie/Development/scandir



sort x   places the first line, which collates lower than all the rest
of column 1  into row 8 within the output.
It also dropped the line with the character x that was showing in column
1 of the raw input.
(more below) the following list

crc32.c                          |20140223 1245|
4d7a5dbb246898ff9d3ba19c0ded7f5b|/home/leslie/Development/scandir
crc32.o                          |20140223 1245|
10a49aede5f82d00205c1f89a8931731|/home/leslie/Development/scandir
DATE1                            |20140223 1245|
e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandir
DATE2                            |20140223 1245|
e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandir
dirdepth                         |20140223 1245|
9f2ff1bd8b133ca0de8d124ad7d761d2|/home/leslie/Development/scandir
dirdepth.c                       |20140223 1245|
a7c3f1c02245aec9a1b651e11018ff82|/home/leslie/Development/scandir
gcc.txt                          |20140223 1245|
b8917c1a087abbf74f0294dad9cbf698|/home/leslie/Development/scandir
Host=fedora20-leslie             |	       |                       scan
from|/home/leslie/Development/scandir
inih_r27.tar                     |20140223 1245|
a8da6db331c8fe638cbb8c6940ce303e|/home/leslie/Development/scandir
md5                              |20140223 1245|
a0509bd4723729ad76ce341844b0db92|/home/leslie/Development/scandir
md5.c                            |20140223 1245|
2095124ffca65c307a840082185f5be9|/home/leslie/Development/scandir
md5Good.tar                      |20140223 1245|
8190181f115e74742e1291b915950531|/home/leslie/Development/scandir
md5.o                            |20140223 1245|
4bb7270967299fa7fbb5ae4826f9c4c0|/home/leslie/Development/scandir
mddriver.c                       |20140223 1245|
581b61b0fc14df5e4a78b0db6d0d7ca4|/home/leslie/Development/scandir
scandir                          |20140223 1245|
864a8f6dfbeb16bef1f09d71759aeca4|/home/leslie/Development/scandir
scandir32                        |20140223 1245|
5d26167e56b5e6efe203bdbfb4483c6f|/home/leslie/Development/scandir
scandir32.c                      |20140223 1245|
86c005228b275b55249cde39c2e95d32|/home/leslie/Development/scandir
scandir.ini                      |20140223 1245|
e2c713788f9492be9e61d1d0badcc8ca|/home/leslie/Development/scandir
scandirmd5                       |20140223 1245|
864a8f6dfbeb16bef1f09d71759aeca4|/home/leslie/Development/scandir

I get partial results by using the -f parameter as
sort -f   x      (or sort -fb )
Host=fedora20-leslie             |	       |                       scan
from|/home/leslie/Development/scandir
crc32.c                          |20140223 1245|
4d7a5dbb246898ff9d3ba19c0ded7f5b|/home/leslie/Development/scandir
crc32.o                          |20140223 1245|
10a49aede5f82d00205c1f89a8931731|/home/leslie/Development/scandir
DATE1                            |20140223 1245|
e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandir
DATE2                            |20140223 1245|
e606fe0237c786174d2087090f81644a|/home/leslie/Development/scandir
dirdepth                         |20140223 1245|
9f2ff1bd8b133ca0de8d124ad7d761d2|/home/leslie/Development/scandir
dirdepth.c                       |20140223 1245|
a7c3f1c02245aec9a1b651e11018ff82|/home/leslie/Development/scandir
gcc.txt                          |20140223 1245|
b8917c1a087abbf74f0294dad9cbf698|/home/leslie/Development/scandir
inih_r27.tar                     |20140223 1245|
a8da6db331c8fe638cbb8c6940ce303e|/home/leslie/Development/scandir
md5                              |20140223 1245|
a0509bd4723729ad76ce341844b0db92|/home/leslie/Development/scandir
md5.c                            |20140223 1245|
2095124ffca65c307a840082185f5be9|/home/leslie/Development/scandir
md5.o                            |20140223 1245|
4bb7270967299fa7fbb5ae4826f9c4c0|/home/leslie/Development/scandir
md5Good.tar                      |20140223 1245|
8190181f115e74742e1291b915950531|/home/leslie/Development/scandir
mddriver.c                       |20140223 1245|
581b61b0fc14df5e4a78b0db6d0d7ca4|/home/leslie/Development/scandir
scandir                          |20140223 1245|
864a8f6dfbeb16bef1f09d71759aeca4|/home/leslie/Development/scandir
scandir.ini                      |20140223 1245|
e2c713788f9492be9e61d1d0badcc8ca|/home/leslie/Development/scandir
scandir32                        |20140223 1245|
5d26167e56b5e6efe203bdbfb4483c6f|/home/leslie/Development/scandir
scandir32.c                      |20140223 1245|
86c005228b275b55249cde39c2e95d32|/home/leslie/Development/scandir
scandirmd5                       |20140223 1245|
864a8f6dfbeb16bef1f09d71759aeca4|/home/leslie/Development/scandir
scandirmd5.c                     |20140223 1245|
c38735f1cdf0bbcf7e352876d7f28793|/home/leslie/Development/scandir
scandirsha1.c                    |20140223 1245|
6f8e62c3c10c09922f41c643ff0592f8|/home/leslie/Development/scandir
sha                              |20140223 1245|
07f74c7c98e3498ca11dba9a5c56edc9|/home/leslie/Development/scandir
sha.c                            |20140223 1245|
f20dc5f72f0235d84a07e8a6b80ab036|/home/leslie/Development/scandir
sha1.c                           |20140223 1245|
74832014b5b65a34d5eaf273c7393116|/home/leslie/Development/scandir
sha1.h                           |20140223 1245|
d2559d2af8a19ea6bc64b35f69c4eea6|/home/leslie/Development/scandir
test.sh                          |20140223 1245|
503c5fe5bd4ee7f2ac53d7df0a371bb6|/home/leslie/Development/scandir
x                                |20140305 1506|
d41d8cd98f00b204e9800998ecf8427e|/home/leslie/Development/scandir

The sort order is not correct with folding, the missing line with the x
has returned and my header line remains in row 1, 
BUT...
I am after an ascii sequence sort and out of place are the rows with
DATE1 and DATE2.  They should actually appears as lines 2 and 3.

How do I get the sort to respect the ascii sorting sequence?  I can do
so for later fields such as sorting any other column such as ...
sort -fb -t '|' -k2  x   to   sort -fb -t '|' k4   x

My observation is that there does not appear to be an option that allows
me to sort by column 1 without shifting to the left of the all the
leading whitespace characters.
-
If I have found a shortcoming, I would like to propose a new flag  so
that the sort would actually generate the first column in pure ascii
sequence.
If the sort is  not broken can you propose a new flag to force ascii
collating sequence?.

(A new flag would allow new functionality but continue to allow the
existing sort program use to work as before).

Would like to hear back from  you.  

Leslie Satenstein
lsatenstein <at> yahoo.com

[Message part 2 (text/html, inline)]

Added tag(s) notabug. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Wed, 05 Mar 2014 23:19:02 GMT) Full text and rfc822 format available.

Reply sent to Eric Blake <eblake <at> redhat.com>:
You have taken responsibility. (Wed, 05 Mar 2014 23:19:03 GMT) Full text and rfc822 format available.

Notification sent to Leslie Satenstein <lsatenstein <at> yahoo.com>:
bug acknowledged by developer. (Wed, 05 Mar 2014 23:19:03 GMT) Full text and rfc822 format available.

Message #12 received at 16944-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Leslie Satenstein <lsatenstein <at> yahoo.com>, 16944-done <at> debbugs.gnu.org
Subject: Re: bug#16944: Sort program (sort.c) I can't sort by ascii collating
 sequence over a first column of text.
Date: Wed, 05 Mar 2014 16:18:27 -0700
[Message part 1 (text/plain, inline)]
tag 16944 notabug
thanks

On 03/05/2014 03:48 PM, Leslie Satenstein wrote:
> I have a problem with the sort utility that I cannot seem to do with
> sort.
> 
> I have a file x (below) and I wish to sort only the first column
> according to the ascii table, in other words, a sort where the sort
> follows the
> A..Za..z  and of course the other characters as well. 
> 
> I created this file x to illustrate the problem.
> 
> This is  First line of file x is a space character, the backspace char
> and the textHost=fedora20-leslie  
> 
> RAW Unsorted input (27 lines) filename x
> 
> Host=fedora20-leslie             |	       |                       scan
> from|/home/leslie/Development/scandir
> scandir.ini                      |20140223 1245|
> e2c713788f9492be9e61d1d0badcc8ca|/home/leslie/Development/scandir
> sha.c                            |20140223 1245|

Umm, your example file got corrupted by your mailer.  So it's harder to
see what you are actually trying to sort, and what results you are
trying to get.  Maybe you should actually attach your file 'x' instead
of pasting it inline where it gets corrupted.

Also, when you say "column 1", did you really mean "field 1" (which
occupies multiple character columns) rather than just the literal first
character?

> The sort order is not correct with folding, the missing line with the x
> has returned and my header line remains in row 1, 
> BUT...
> I am after an ascii sequence sort and out of place are the rows with
> DATE1 and DATE2.  They should actually appears as lines 2 and 3.

Are you setting locale environment variables correctly?  The only way to
guarantee ASCII collation is to use a locale that enforces it.  Many
distros these days default to an en_US.UTF-8 locale (or similar) which
intentionally does NOT do ascii collation; to override that, you
probably want to try 'LC_ALL=C sort ...'
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

> 
> How do I get the sort to respect the ascii sorting sequence?  I can do
> so for later fields such as sorting any other column such as ...
> sort -fb -t '|' -k2  x   to   sort -fb -t '|' k4   x

This looks very suspicious (not to mention a typo - you mention 'k4'
when you probably typed '-k4' - it's in your best interest to be more
accurate when reporting difficulties you are having).  You usually want
to use '-k2,2' and not the simpler '-k2' (the longer version sorts on
exactly field 2, while the shorthand treats field 2 and then on to the
end of the line all as one key).

You may want to try the 'sort --debug' flag to see exactly what sort is
using during its checks, to make sure it is choosing sort keys that line
up with what you think it should.

> 
> My observation is that there does not appear to be an option that allows
> me to sort by column 1 without shifting to the left of the all the
> leading whitespace characters.

I didn't parse that - if you are eliding leading whitespace, then you
are not sorting by column 1, but by the first non-whitespace character.
 Oh - maybe you meant sorting by "field 1", which is spelled '-k1,1' (or
-k1,1b if you want to ignore leading blanks), and optionally with -t in
effect to force field separation to match your expectations instead of
occurring on non-blank to blank transitions.

> -
> If I have found a shortcoming, I would like to propose a new flag  so
> that the sort would actually generate the first column in pure ascii
> sequence.

Sort already has a POSIX-mandated option to force pure ascii sorting:

LC_ALL=C sort ...

Therefore, I'm closing this as not a bug.  But feel free to ask further
questions or provide better details of what you are trying to do, in a
way that does not get munged by your mailer.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 03 Apr 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 139 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.