GNU bug report logs -
#14988
sort enhancement request
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#14988: sort enhancement request
which was filed against the coreutils package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 14988 <at> debbugs.gnu.org.
--
14988: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14988
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
[Message part 3 (text/plain, inline)]
tag 14988 notabug
thanks
[re-adding the list; and please don't top-post on technical lists]
On 07/31/2013 07:19 AM, Danny Nicholas wrote:
> Thank you Eric. We have two sorts on our system. Our /usr/bin/sort does not support the -s option,
Makes sense - the '-s' option is a GNU extension, and your /usr/bin/sort
is probably not GNU sort. If you want stable sorting using only POSIX
features, then you have to supply enough sort keys so that no two lines
ever compare equal (since POSIX has no way to disable the full-line sort
of last resort). And depending on your input to be sorted; this may
indeed require a pre-filter run that adds line numbering (by the way,
sed's '=' command can do this much more efficiently than a python
script), then sorting, then a post-filter run that removes the line number.
> but our /usr/local/bin/sort does.
Indeed - life is simpler if you can write your script to ensure that it
always sets PATH to use the full power of the GNU tools.
> Unfortunately, that did not resolve the issue. Here is a portion of the file I'm trying to sort
Thank you - THIS makes much more sense for understanding your problem.
> 010_000001_0000731_00001_200000081610_<Customer>
> 010_000001_0000731_00002_200000081610_ <CCODEPAGE>4102 LANGUAGE EN</CCODEPAGE>
> 010_000001_0000731_00003_200000081610_ <FirstCopy>YES</FirstCopy>
> 010_000001_0000731_00003_200000081610_ <eapprovetype>010</eapprovetype>
> 010_000001_0000731_00003_200000081610_ <lastpaymentdate>06/12/2013</lastpaymentdate>
> 010_000001_0000731_00003_200000081610_ <lastpaymentamount> 277.59</lastpaymentamount>
> 010_000001_0000731_00003_200000081610_ <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
> 010_000001_0000731_00003_200000081610_ <CPAGENAME>PAGE1</CPAGENAME>
> 010_000001_0000731_00004_200000081610_ <DG_BILL_LAYOUT>REGULAR</DG_BILL_LAYOUT>
> 010_000001_0000731_00005_200000081610_ <DC-DEVICE>PRINTER</DC-DEVICE>
> 010_000001_0000731_00006_200000081610_ <DC-RDI>S</DC-RDI>
> 010_000001_0000731_00007_200000081610_ <DC-SENDTYPE>PRINTER</DC-SENDTYPE>
> 010_000001_0000731_00008_200000081610_ <DSY-SYSID>R3P</DSY-SYSID>
>
> What I am executing is /usr/local/bin/sort -k 1,36 -s file -o file2
So, with "-k1,36" you asked sort to treat as its sort key the portion of
the line ranging from the first field to the 36th field. I only see 2
fields in most of the lines (a few have more, but none of them with 36
fields), so you are basically sorting by the entire line. You didn't
provide any other keys, but since your first key is already botched as
the ENTIRE line, there were no lines that compared equal for -s to make
any difference. Again, sort --debug makes this clear (using a subset of
just two lines of your input):
>> $ printf '010_000001_0000731_00003_200000081610_ <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>\n010_000001_0000731_00003_200000081610_ <CPAGENAME>PAGE1</CPAGENAME>\n' \
>> | LC_ALL=C sort --debug -k1,36 -s
>> sort: using simple byte comparison
>> 010_000001_0000731_00003_200000081610_ <CPAGENAME>PAGE1</CPAGENAME>
>> _______________________________________________________________________
>> 010_000001_0000731_00003_200000081610_ <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
>> ________________________________________________________________________________________________________
But it appears that what you WANTED was to sort on just the first 36
bytes, with a stable sort of the results. If so, then ASK for that, by
using the correct -k option:
>> $ printf '010_000001_0000731_00003_200000081610_ <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>\n010_000001_0000731_00003_200000081610_ <CPAGENAME>PAGE1</CPAGENAME>\n' \
>> | LC_ALL=C sort --debug -k1,1.36 -s
>> sort: using simple byte comparison
>> 010_000001_0000731_00003_200000081610_ <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
>> ____________________________________
>> 010_000001_0000731_00003_200000081610_ <CPAGENAME>PAGE1</CPAGENAME>
>> ____________________________________
Note how I asked for a sort key -k1,1.36, which says to start in the
first field, and end 36 bytes into the first field (hmm, it looks like
you actually want 38 bytes - but I'll leave that for you to decide).
Also note that -s now makes a difference, when the content of that first
sort key is identical so the last-resort full-line comparison swaps
unequal lines when -s is not used:
>> $ printf '010_000001_0000731_00003_200000081610_ <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>\n010_000001_0000731_00003_200000081610_ <CPAGENAME>PAGE1</CPAGENAME>\n' \
>> | LC_ALL=C sort --debug -k1,1.36
>> sort: using simple byte comparison
>> 010_000001_0000731_00003_200000081610_ <CPAGENAME>PAGE1</CPAGENAME>
>> ____________________________________
>> _______________________________________________________________________
>> 010_000001_0000731_00003_200000081610_ <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
>> ____________________________________
>> ________________________________________________________________________________________________________
As this is a case of you not passing the correct command line arguments,
rather than a bug in sort, I am marking this bug as closed. However,
feel free to continue to comment on the topic (preferably on-list) if
you have more questions.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
[Message part 5 (message/rfc822, inline)]
[Message part 6 (text/plain, inline)]
Hi guys,
I am presently using version 7.1 on a Solaris box. I downloaded 8.21 and really love the improvement in speed (almost 50% in some tests). I am looking to replace the commercial product NSORT and would like this feature in the source instead of a wrapper. If I have a file
XXXX300001XXXX
XXXX300002XXXX
XXXX300003XXXX
XXXX300003XXXX
XXXX300003XXXX
XXXX300003XXXX
XXXX300004XXXX
XXXX300005XXXX
XXXX300006XXXX
XXXX300007XXXX
NSORT keeps the 4 300003 records together in entry sequence. My present work-around is to use a Python script that reads in the whole file and creates a pseudo-key that is 30000X plus an 8 digit sequence number (I process millions of records). What I am thinking of is an -es (--entry-sequence) that would add a hidden -k to process on this internal sequence. If I figure out how to do this on my own, I will submit it to you.
Thanks,
Danny Nicholas
Applications Programmer
Pinnacle Data Systems L.L.C.
Office: (205) 307-6874
danny.nicholas <at> pinnacledatasystems.com
www.pinnacledatasystems.com<http://www.pinnacledatasystems.com/>
[Description: Description: Description: https://encrypted-tbn1.google.com/images?q=tbn:ANd9GcRglmT5RwJEUk-1ZNPo_FI8y_udB6BL29pkwTt-Qh442v-FI1gH] <http://www.linkedin.com/company/pinnacle-data-systems-llc> [Description: Description: Description: https://encrypted-tbn0.google.com/images?q=tbn:ANd9GcSfD26ooDfMWD_xWRaMfbMcaBmkIKcG2oRxlaj6tBGYguC_aD71lw] <https://twitter.com/#!/PinnacleDataSy1>
Follow us on LinkedIn and Twitter
CONFIDENTIALITY: This email (including any attachments) may contain confidential, proprietary and privileged information, and unauthorized disclosure or use is prohibited. If you received this email in error, please notify the sender and delete this email from your system.
[Message part 7 (text/html, inline)]
[image001.jpg (image/jpeg, inline)]
[image002.jpg (image/jpeg, inline)]
This bug report was last modified 12 years and 16 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.