tag 14988 needinfo thanks On 07/30/2013 02:51 PM, Danny Nicholas wrote: > Hi guys, [can you convince your mailer to wrap long lines?] > I am presently using version 7.1 on a Solaris box. I downloaded 8.21 and really love the improvement in speed (almost 50% in some tests). I am looking to replace the commercial product NSORT and would like this feature in the source instead of a wrapper. If I have a file > XXXX300001XXXX > XXXX300002XXXX > XXXX300003XXXX > XXXX300003XXXX > XXXX300003XXXX > XXXX300003XXXX > XXXX300004XXXX > XXXX300005XXXX > XXXX300006XXXX > XXXX300007XXXX As written, your example is already sorted in the same order as written, and with no other distinguishing features on the line, you haven't proven that sort isn't outputting lines in the order you want. I also can't tell if the XXXX represent the actual bytes you are sorting, or if you meant them as placeholders for a sanitized version of your actual data set. You'll need to give as an actual example of lines that are sorted differently by nsort and GNU sort, and the command line options you attempted for GNU sort, before we can tell you what to try next. > > NSORT keeps the 4 300003 records together in entry sequence. My present work-around is to use a Python script that reads in the whole file and creates a pseudo-key that is 30000X plus an 8 digit sequence number (I process millions of records). What I am thinking of is an -es (--entry-sequence) that would add a hidden -k to process on this internal sequence. If I figure out how to do this on my own, I will submit it to you. Short options must be one letter long; writing your proposed 'sort -es' would be the same as 'sort -e -s'. Also, we are reluctant to burn short options; these days, it's better to add a long option only, until it proves its popularity, so that we don't collide with any future standardized short options. It SOUNDS like you are merely asking for a stable sort option. Have you tried the -s/--stable option? That effectively adds an invisible key of last resort that says if two lines otherwise compare equal, sort them so that the line occurring first in input also occurs first in output. At any rate, I'm marking this bug as 'needinfo' so that we can get more feedback on whether --stable already meets your needs, or at least so we can get a test case that we can play with to see what you are really asking for. Also, have you played with 'sort --debug'? It shows you a lot more details on EXACTLY what sort is looking at. For example, I am able to do a numeric sort on JUST the 6 digits in between the XXXX fillers of the example you listed: $ printf 'XXXX300002XXXX\nXXXX300001XXXX\n' \ | LC_ALL=C sort --debug -k1.5,1.10n -s sort: using simple byte comparison XXXX300001XXXX ______ XXXX300002XXXX ______ > > CONFIDENTIALITY: This email (including any attachments) may contain confidential, Sorry, but this disclaimer is unenforceable on publicly archived lists. It is considered poor netiquette to use your employers email if they insist on adding this on your behalf, and you may be better off sending the mail from a personal account. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org