tag 10985 notabug thanks On 03/09/2012 12:46 PM, Oleg Moskalenko wrote: > Hi > > While testing different GNU coreutils sort versions on different platforms (Linux and FreeBSD) I found that some behavior is probably not what a utility user expects. Thanks for the report. However, you probably found behavior that is required by POSIX. > > Let's, say, we have to sort (numerically stable) just two lines: > > $ sort -t "|" -ns -k2.3,2.7 < 1|234 > 1|2|34 > ! Let's use 'sort --debug' to see what really happened: $ LC_ALL=C sort --debug -t\| -ns -k2.3,2.7 < 1|234 > 1|2|34 > a sort: using simple byte comparison 1|234 _ 1|2|34 __ So this sorted by locating the start of the second field ("234" of one line, and "2|34" of the other line), then starting at the 3rd byte past that location (even if it is in the next field). This behavior is required by POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html > > The correct output (from my point of view) must be: > > 1|2|34 > 1|234 Sorry, but that interpretation does not match POSIX. > > My reasoning is that applying the key specs "-k2.3,2.7" to string "1|234" we obtain the key "4", and applying the same key to the string "1|2|34" we must obtain "" (empty string), That's where you are wrong. POSIX states: >> The notation: >> >> -k field_start[type][,field_end[type]] >> >> shall define a key field that begins at field_start and ends at field_end inclusive, unless field_start falls beyond the end of the line or after field_end, in which case the key field is empty. A missing field_end shall mean the last character of the line. >> >> A field comprises a maximal sequence of non-separating characters and, in the absence of option -t, any preceding field separator. >> >> The field_start portion of the keydef option-argument shall have the form: >> >> field_number[.first_character] >> >> Fields and characters within fields shall be numbered starting with 1. The field_number and first_character pieces, interpreted as positive decimal integers, shall specify the first character to be used as part of a sort key. If .first_character is omitted, it shall refer to the first character of the field. That is, the field_start 2.3 means to start at the third character past the second field, regardless if any intermediate field separators are located, and that _only_ the end of a line (and not another field separator) can result in an empty key field. > > I do not know whether this is an intended behavior or a bug, Intended and mandated by the standards. > but this is definitely non-intuitive and not what a reasonable user would expect. Perhaps so, but if you want it changed, you need to file a bug report against POSIX. As such, I'm going to close out this coreutils bug. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org