From unknown Sun Jun 22 03:58:31 2025 X-Loop: help-debbugs@gnu.org Subject: bug#6176: [PATCH 1/2] sort: add a --debug option to highlight key extents Resent-From: =?UTF-8?Q?P=C3=A1draig?= Brady
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Tue, 11 May 2010 22:59:02 +0000
Resent-Message-ID: ) id 1OByPT-0006oN-Mk
for submit@debbugs.gnu.org; Tue, 11 May 2010 18:58:45 -0400
Received: from lists.gnu.org ([199.232.76.165]:44928)
by monty-python.gnu.org with esmtps
(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
(envelope-from ) id 1OByPP-0000VJ-Li
for submit@debbugs.gnu.org; Tue, 11 May 2010 18:58:39 -0400
Received: from [140.186.70.92] (port=33634 helo=eggs.gnu.org)
by lists.gnu.org with esmtp (Exim 4.43) id 1OByPM-0004X8-T3
for bug-coreutils@gnu.org; Tue, 11 May 2010 18:58:39 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
autolearn=unavailable version=3.3.1
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
(envelope-from ) id 1OByPH-0001WU-NA
for bug-coreutils@gnu.org; Tue, 11 May 2010 18:58:36 -0400
Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]:9793)
by eggs.gnu.org with smtp (Exim 4.69)
(envelope-from ) id 1OByPH-0001Vd-4t
for bug-coreutils@gnu.org; Tue, 11 May 2010 18:58:31 -0400
Received: (qmail 68351 invoked from network); 11 May 2010 22:58:28 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 11 May 2010 22:58:28 -0000
Message-ID: <4BE9E0B1.5030308@draigBrady.com>
Date: Tue, 11 May 2010 23:56:49 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
X-Enigmail-Version: 1.0.1
Content-Type: multipart/mixed; boundary="------------010204060602020007040800"
X-detected-operating-system: by eggs.gnu.org: FreeBSD 4.6-4.9
X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6,
seldom 2.4 (older, 4)
X-Spam-Score: -3.6 (---)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Date: Tue, 23 Feb 2010 08:43:04 +0000
Subject: [PATCH 1/2] sort: add a --debug option to highlight key extents
* src/sort (usage): Add description for --debug.
(write_bytes): Pass a line structure so it can subsequently
be passed to compare to highlight the keys when in debug mode.
Also transform TAB and NUL characters written to stdout so
that the highlighting in debug mode aligns correctly.
(human_numcompare): Pass an "endptr" so we can record the extent
of the number matched.
(general_numcompare): Likewise.
(find_unit_order): Likewise.
(getmonth): Likewise.
(numcompare): Likewise. Note we reuse find_unit_order() for this,
which is a good enough appropixmation, and means we don't need to
change the strnumcmp() interface.
(check_mixed_SI_IEC): Return whether iec_present, so that can be
used to set the "endptr" in find_unit_order. Also make the key
parameter optional, which will be the case from numcompare().
(count_tabs): A new function to determine how much to adjust
the mbswidth() values by (TABs don't have a width).
(mark_key): A new function to output the key highlighting to stdout.
(debug_key): A new function to determine the offset and width
of the key highlighting.
(key_compare): Pass the show_debug parameter so the key highlighting
is only displayed when explicitly called. For each key type, set
the length (lena) and whether leading blanks are auto skipped (skipb)
which are then used by debug_key() to highlight the portion of the
key used in the comparison.
(compare): Pass the show_debug parameter so the key highlighting
is only displayed when explicitly called. Call debug_key() to
highlight the last resort comparison.
(check): Output highlighting for disorder line to stdout.
(main): Process the --debug option and make it mutually exlusive
with the -o option as I don't see it useful there, even potentially
harmful if someone left a --debug in by mistake when updating a file.
Also restricting debug output to stdout, simplifies the logic
for dealing with temporary files.
* doc/coreutils.texi (sort invocation): Describe the --debug option,
and reference it from the --key description.
* tests/misc/sort-debug-keys: A new test for highlighting keys.
* tests/Makefile.am: Reference the new test.
* NEWS: Mention the new feature.
---
NEWS | 5 +
doc/coreutils.texi | 5 +
src/sort.c | 289 ++++++++++++++++++++++++++++++++---------
tests/Makefile.am | 1 +
tests/misc/sort-debug-keys | 317 ++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 557 insertions(+), 60 deletions(-)
create mode 100755 tests/misc/sort-debug-keys
diff --git a/NEWS b/NEWS
index 070f338..4c2da67 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU coreutils NEWS -*- outline -*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** New features
+
+ sort now accepts the --debug option, to highlight the part of the
+ line significant in the sort.
+
** Changes in behavior
sort -g now uses long doubles for greater range and precision.
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index c8ba53c..6714ada 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3939,6 +3939,11 @@ multiple fields.
Example: To sort on the second field, use @option{--key=2,2}
(@option{-k 2,2}). See below for more notes on keys and more examples.
+See also the @option{--debug} option to help determine the part
+of the line being used in the sort.
+
+@item --debug
+Highlight the portion of each line used for sorting.
@item --batch-size=@var{nmerge}
@opindex --batch-size
diff --git a/src/sort.c b/src/sort.c
index 54b97e2..65866b9 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -34,6 +34,7 @@
#include "hash.h"
#include "ignore-value.h"
#include "md5.h"
+#include "mbswidth.h"
#include "physmem.h"
#include "posixver.h"
#include "quote.h"
@@ -291,6 +292,9 @@ static struct keyfield *keylist;
/* Program used to (de)compress temp files. Must accept -d. */
static char const *compress_program;
+/* Annotate the output with extra info to aid the user. */
+static bool debug;
+
/* Maximum number of files to merge in one go. If more than this
number are present, temp files will be used. */
static unsigned int nmerge = NMERGE_DEFAULT;
@@ -371,6 +375,7 @@ Other options:\n\
-C, --check=quiet, --check=silent like -c, but do not report first bad line\n\
--compress-program=PROG compress temporaries with PROG;\n\
decompress them with PROG -d\n\
+ --debug annotate the part of the line used to sort\n\
--files0-from=F read input from the files specified by\n\
NUL-terminated names in file F;\n\
If F is - then read names from standard input\n\
@@ -429,6 +434,7 @@ enum
{
CHECK_OPTION = CHAR_MAX + 1,
COMPRESS_PROGRAM_OPTION,
+ DEBUG_PROGRAM_OPTION,
FILES0_FROM_OPTION,
NMERGE_OPTION,
RANDOM_SOURCE_OPTION,
@@ -442,6 +448,7 @@ static struct option const long_options[] =
{"ignore-leading-blanks", no_argument, NULL, 'b'},
{"check", optional_argument, NULL, CHECK_OPTION},
{"compress-program", required_argument, NULL, COMPRESS_PROGRAM_OPTION},
+ {"debug", no_argument, NULL, DEBUG_PROGRAM_OPTION},
{"dictionary-order", no_argument, NULL, 'd'},
{"ignore-case", no_argument, NULL, 'f'},
{"files0-from", required_argument, NULL, FILES0_FROM_OPTION},
@@ -1111,13 +1118,6 @@ open_temp (const char *name, pid_t pid)
return fp;
}
-static void
-write_bytes (const char *buf, size_t n_bytes, FILE *fp, const char *output_file)
-{
- if (fwrite (buf, 1, n_bytes, fp) != n_bytes)
- die (_("write failed"), output_file);
-}
-
/* Append DIR to the array of temporary directory names. */
static void
add_temp_dir (char const *dir)
@@ -1734,30 +1734,19 @@ fillbuf (struct buffer *buf, FILE *fp, char const *file)
}
}
-/* Compare strings A and B as numbers without explicitly converting them to
- machine numbers. Comparatively slow for short strings, but asymptotically
- hideously fast. */
-
-static int
-numcompare (const char *a, const char *b)
-{
- while (blanks[to_uchar (*a)])
- a++;
- while (blanks[to_uchar (*b)])
- b++;
-
- return strnumcmp (a, b, decimal_point, thousands_sep);
-}
-
/* Exit with an error if a mixture of SI and IEC units detected. */
-static void
+static bool
check_mixed_SI_IEC (char prefix, struct keyfield *key)
{
int iec_present = prefix == 'i';
- if (key->iec_present != -1 && iec_present != key->iec_present)
- error (SORT_FAILURE, 0, _("both SI and IEC prefixes present on units"));
- key->iec_present = iec_present;
+ if (key)
+ {
+ if (key->iec_present != -1 && iec_present != key->iec_present)
+ error (SORT_FAILURE, 0, _("both SI and IEC prefixes present on units"));
+ key->iec_present = iec_present;
+ }
+ return iec_present;
}
/* Return an integer which represents the order of magnitude of
@@ -1766,7 +1755,7 @@ check_mixed_SI_IEC (char prefix, struct keyfield *key)
Negative numbers return a negative unit order. */
static int
-find_unit_order (const char *number, struct keyfield *key)
+find_unit_order (const char *number, struct keyfield *key, char const **endptr)
{
static const char orders [UCHAR_LIM] =
{
@@ -1822,7 +1811,13 @@ find_unit_order (const char *number, struct keyfield *key)
/* For valid units check for MiB vs MB etc. */
if (order)
- check_mixed_SI_IEC (*(p + 1), key);
+ {
+ p++;
+ p += check_mixed_SI_IEC (*p, key);
+ }
+
+ if (endptr)
+ *endptr = p;
return sign * order;
}
@@ -1833,25 +1828,50 @@ find_unit_order (const char *number, struct keyfield *key)
i.e. input will never have both 6000K and 5M. */
static int
-human_numcompare (const char *a, const char *b, struct keyfield *key)
+human_numcompare (const char *a, const char *b, struct keyfield *key,
+ char const **ea)
{
while (blanks[to_uchar (*a)])
a++;
while (blanks[to_uchar (*b)])
b++;
- int order_a = find_unit_order (a, key);
- int order_b = find_unit_order (b, key);
+ int order_a = find_unit_order (a, key, ea);
+ int order_b = find_unit_order (b, key, NULL);
return (order_a > order_b ? 1
: order_a < order_b ? -1
: strnumcmp (a, b, decimal_point, thousands_sep));
}
+/* Compare strings A and B as numbers without explicitly converting them to
+ machine numbers. Comparatively slow for short strings, but asymptotically
+ hideously fast. */
+
static int
-general_numcompare (const char *sa, const char *sb)
+numcompare (const char *a, const char *b, char const **ea)
+{
+ while (blanks[to_uchar (*a)])
+ a++;
+ while (blanks[to_uchar (*b)])
+ b++;
+
+ if (debug)
+ {
+ /* Approximate strnumcmp extents with find_unit_order. */
+ if (find_unit_order (a, NULL, ea))
+ {
+ *ea -= 1; /* ignore the order letter */
+ *ea -= (**ea == 'i'); /* and IEC prefix */
+ }
+ }
+
+ return strnumcmp (a, b, decimal_point, thousands_sep);
+}
+
+static int
+general_numcompare (const char *sa, const char *sb, char const **ea)
{
- /* FIXME: add option to warn about failed conversions. */
/* FIXME: maybe add option to try expensive FP conversion
only if A and B can't be compared more cheaply/accurately. */
@@ -1863,13 +1883,12 @@ general_numcompare (const char *sa, const char *sb)
# define strtold strtod
#endif
- char *ea;
char *eb;
- long_double a = strtold (sa, &ea);
+ long_double a = strtold (sa, (char **) ea);
long_double b = strtold (sb, &eb);
/* Put conversion errors at the start of the collating sequence. */
- if (sa == ea)
+ if (sa == *ea)
return sb == eb ? 0 : -1;
if (sb == eb)
return 1;
@@ -1889,7 +1908,7 @@ general_numcompare (const char *sa, const char *sb)
Return 0 if the name in S is not recognized. */
static int
-getmonth (char const *month, size_t len)
+getmonth (char const *month, size_t len, char const **ea)
{
size_t lo = 0;
size_t hi = MONTHS_PER_YEAR;
@@ -1913,7 +1932,11 @@ getmonth (char const *month, size_t len)
for (;; m++, n++)
{
if (!*n)
- return monthtab[ix].val;
+ {
+ if (ea)
+ *ea = m;
+ return monthtab[ix].val;
+ }
if (m == monthlim || fold_toupper[to_uchar (*m)] < to_uchar (*n))
{
hi = ix;
@@ -2070,11 +2093,76 @@ compare_version (char *restrict texta, size_t lena,
return diff;
}
+/* For debug mode, count tabs in the passed string
+ so we can adjust the widths returned by mbswidth.
+ FIXME: Should we generally be counting non printable chars? */
+
+static size_t
+count_tabs (char const *text, const size_t len)
+{
+ size_t tabs = 0;
+ size_t tlen = strnlen (text, len);
+
+ while (tlen--)
+ {
+ if (*text++ == '\t')
+ tabs++;
+ }
+
+ return tabs;
+}
+
+/* For debug mode, "underline" a key at the
+ specified offset and screen width. */
+
+static void
+mark_key (size_t offset, size_t width)
+{
+ printf ("%*s", offset, "");
+
+ if (!width)
+ printf (_("^ no match for key\n"));
+ else
+ {
+ while (width--)
+ putchar ('_');
+ putchar ('\n');
+ }
+}
+
+/* For debug mode, determine the screen offset and width
+ to highlight for a key, and then output the highlight. */
+
+static void
+debug_key (char const *sline, char const *sfield, char const *efield,
+ size_t flen, bool skipb)
+{
+ char const *sa = sfield;
+
+ if (skipb) /* This key type implicitly skips leading blanks. */
+ {
+ while (sa < efield && blanks[to_uchar (*sa)])
+ {
+ sa++;
+ if (flen)
+ flen--; /* This assumes TABs same width as SPACEs. */
+ }
+ }
+
+ size_t offset = mbsnwidth (sline, sfield - sline, 0) + (sa - sfield);
+ offset += count_tabs (sline, sfield - sline);
+
+ size_t width = mbsnwidth (sa, flen, 0);
+ width += count_tabs (sa, flen);
+
+ mark_key (offset, width);
+}
+
/* Compare two lines A and B trying every key in sequence until there
are no more keys or a difference is found. */
static int
-keycompare (const struct line *a, const struct line *b)
+keycompare (const struct line *a, const struct line *b, bool show_debug)
{
struct keyfield *key = keylist;
@@ -2091,6 +2179,7 @@ keycompare (const struct line *a, const struct line *b)
{
char const *translate = key->translate;
bool const *ignore = key->ignore;
+ bool skipb = false; /* Whether key type auto skips leading blanks. */
/* Treat field ends before field starts as empty fields. */
lima = MAX (texta, lima);
@@ -2107,21 +2196,42 @@ keycompare (const struct line *a, const struct line *b)
else if (key->numeric || key->general_numeric || key->human_numeric)
{
char savea = *lima, saveb = *limb;
+ char const* ea = lima;
*lima = *limb = '\0';
- diff = (key->numeric ? numcompare (texta, textb)
- : key->general_numeric ? general_numcompare (texta, textb)
- : human_numcompare (texta, textb, key));
+ diff = (key->numeric ? numcompare (texta, textb, &ea)
+ : key->general_numeric ? general_numcompare (texta, textb,
+ &ea)
+ : human_numcompare (texta, textb, key, &ea));
+ if (show_debug)
+ {
+ lena = ea - texta;
+ skipb = true;
+ }
*lima = savea, *limb = saveb;
}
else if (key->version)
diff = compare_version (texta, lena, textb, lenb);
else if (key->month)
- diff = getmonth (texta, lena) - getmonth (textb, lenb);
+ {
+ char const *ea = lima;
+
+ int amon = getmonth (texta, lena, &ea);
+ diff = amon - getmonth (textb, lenb, NULL);
+
+ if (show_debug)
+ {
+ lena = amon ? ea - texta : 0;
+ skipb = true;
+ }
+ }
/* Sorting like this may become slow, so in a simple locale the user
can select a faster sort that is similar to ascii sort. */
else if (hard_LC_COLLATE)
{
+ /* FIXME: for debug, account for skipped chars, while handling mb chars.
+ Generally perhaps xmemfrm could be used to determine chars that are
+ excluded from the collating order? */
if (ignore || translate)
{
char buf[4000];
@@ -2165,6 +2275,8 @@ keycompare (const struct line *a, const struct line *b)
}
else if (ignore)
{
+ char *savea = texta;
+
#define CMP_WITH_IGNORE(A, B) \
do \
{ \
@@ -2192,6 +2304,10 @@ keycompare (const struct line *a, const struct line *b)
translate[to_uchar (*textb)]);
else
CMP_WITH_IGNORE (*texta, *textb);
+
+ /* We only need to restore this for debug_key
+ in which case the keys being compared are equal. */
+ texta = savea;
}
else if (lena == 0)
diff = - NONZERO (lenb);
@@ -2201,6 +2317,8 @@ keycompare (const struct line *a, const struct line *b)
{
if (translate)
{
+ char *savea = texta;
+
while (texta < lima && textb < limb)
{
diff = (to_uchar (translate[to_uchar (*texta++)])
@@ -2208,6 +2326,10 @@ keycompare (const struct line *a, const struct line *b)
if (diff)
goto not_equal;
}
+
+ /* We only need to restore this for debug_key
+ in which case the keys being compared are equal. */
+ texta = savea;
}
else
{
@@ -2221,6 +2343,9 @@ keycompare (const struct line *a, const struct line *b)
if (diff)
goto not_equal;
+ if (show_debug)
+ debug_key (a->text, texta, lima, lena, skipb);
+
key = key->next;
if (! key)
break;
@@ -2258,7 +2383,7 @@ keycompare (const struct line *a, const struct line *b)
depending on whether A compares less than, equal to, or greater than B. */
static int
-compare (const struct line *a, const struct line *b)
+compare (const struct line *a, const struct line *b, bool show_debug)
{
int diff;
size_t alen, blen;
@@ -2268,7 +2393,7 @@ compare (const struct line *a, const struct line *b)
and unadorned sort -r. */
if (keylist)
{
- diff = keycompare (a, b);
+ diff = keycompare (a, b, show_debug);
if (diff || unique || stable)
return diff;
}
@@ -2277,6 +2402,9 @@ compare (const struct line *a, const struct line *b)
fall through to the default comparison. */
alen = a->length - 1, blen = b->length - 1;
+ if (show_debug)
+ debug_key (a->text, a->text, a->text + alen, alen, false);
+
if (alen == 0)
diff = - NONZERO (blen);
else if (blen == 0)
@@ -2289,6 +2417,38 @@ compare (const struct line *a, const struct line *b)
return reverse ? -diff : diff;
}
+static void
+write_bytes (const struct line *line, FILE *fp, char const *output_file)
+{
+ char const *buf = line->text;
+ size_t n_bytes = line->length;
+
+ /* Convert TABs to '>' and \0 to \n when -z specified. */
+ if (debug && fp == stdout)
+ {
+ char const *ebuf = buf + n_bytes;
+ char const *c = buf;
+
+ while (c < ebuf)
+ {
+ char wc = *c++;
+ if (wc == '\t')
+ wc = '>';
+ else if (wc == 0 && eolchar == 0)
+ wc = '\n';
+ if (fputc (wc, fp) == EOF)
+ die (_("write failed"), output_file);
+ }
+
+ compare (line, line, true);
+ }
+ else
+ {
+ if (fwrite (buf, 1, n_bytes, fp) != n_bytes)
+ die (_("write failed"), output_file);
+ }
+}
+
/* Check that the lines read from FILE_NAME come in order. Return
true if they are in order. If CHECKONLY == 'c', also print a
diagnostic (FILE_NAME, line number, contents of line) to stderr if
@@ -2317,7 +2477,7 @@ check (char const *file_name, char checkonly)
/* Make sure the line saved from the old buffer contents is
less than or equal to the first line of the new buffer. */
- if (alloc && nonunique <= compare (&temp, line - 1))
+ if (alloc && nonunique <= compare (&temp, line - 1, false))
{
found_disorder:
{
@@ -2330,8 +2490,10 @@ check (char const *file_name, char checkonly)
fprintf (stderr, _("%s: %s:%s: disorder: "),
program_name, file_name,
umaxtostr (disorder_line_number, hr_buf));
- write_bytes (disorder_line->text, disorder_line->length,
- stderr, _("standard error"));
+ if (debug)
+ fputc ('\n', stderr);
+ write_bytes (disorder_line, debug ? stdout : stderr,
+ debug ? _("standard out") : _("standard error"));
}
ordered = false;
@@ -2341,7 +2503,7 @@ check (char const *file_name, char checkonly)
/* Compare each line in the buffer with its successor. */
while (linebase < --line)
- if (nonunique <= compare (line, line - 1))
+ if (nonunique <= compare (line, line - 1, false))
goto found_disorder;
line_number += buf.nlines;
@@ -2470,7 +2632,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
for (i = 0; i < nfiles; ++i)
ord[i] = i;
for (i = 1; i < nfiles; ++i)
- if (0 < compare (cur[ord[i - 1]], cur[ord[i]]))
+ if (0 < compare (cur[ord[i - 1]], cur[ord[i]], false))
t = ord[i - 1], ord[i - 1] = ord[i], ord[i] = t, i = 0;
/* Repeatedly output the smallest line until no input remains. */
@@ -2482,10 +2644,10 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
an identical series of lines. */
if (unique)
{
- if (savedline && compare (savedline, smallest))
+ if (savedline && compare (savedline, smallest, false))
{
savedline = NULL;
- write_bytes (saved.text, saved.length, ofp, output_file);
+ write_bytes (&saved, ofp, output_file);
}
if (!savedline)
{
@@ -2514,7 +2676,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
}
}
else
- write_bytes (smallest->text, smallest->length, ofp, output_file);
+ write_bytes (smallest, ofp, output_file);
/* Check if we need to read more lines into core. */
if (base[ord[0]] < smallest)
@@ -2568,7 +2730,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
while (lo < hi)
{
- int cmp = compare (cur[ord0], cur[ord[probe]]);
+ int cmp = compare (cur[ord0], cur[ord[probe]], false);
if (cmp < 0 || (cmp == 0 && ord0 < ord[probe]))
hi = probe;
else
@@ -2589,7 +2751,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
if (unique && savedline)
{
- write_bytes (saved.text, saved.length, ofp, output_file);
+ write_bytes (&saved, ofp, output_file);
free (saved.text);
}
@@ -2634,7 +2796,7 @@ mergelines (struct line *t,
struct line const *hi, size_t nhi)
{
for (;;)
- if (compare (lo - 1, hi - 1) <= 0)
+ if (compare (lo - 1, hi - 1, false) <= 0)
{
*--t = *--lo;
if (! --nlo)
@@ -2676,7 +2838,7 @@ sortlines (struct line *lines, size_t nlines, struct line *temp)
{
if (nlines == 2)
{
- if (0 < compare (&lines[-1], &lines[-2]))
+ if (0 < compare (&lines[-1], &lines[-2], false))
{
struct line tmp = lines[-1];
lines[-1] = lines[-2];
@@ -2712,7 +2874,7 @@ sortlines_temp (struct line *lines, size_t nlines, struct line *temp)
/* Declare `swap' as int, not bool, to work around a bug
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Tue, 11 May 2010 23:42:02 +0000
Resent-Message-ID: ) id 1OBz4h-00076R-TK
for 6176@debbugs.gnu.org; Tue, 11 May 2010 19:41:21 -0400
Received: (qmail 75885 invoked from network); 11 May 2010 23:41:15 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 11 May 2010 23:41:15 -0000
Message-ID: <4BE9EAB8.8010703@draigBrady.com>
Date: Wed, 12 May 2010 00:39:36 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com>
In-Reply-To: <4BE9E0B1.5030308@draigBrady.com>
X-Enigmail-Version: 1.0.1
Content-Type: multipart/mixed; boundary="------------090907000806040204010703"
X-Spam-Score: -2.0 (--)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Date: Tue, 11 May 2010 18:46:21 +0100
Subject: [PATCH 2/2] sort: --debug: output data independent key warnings
* src/sort.c (usage): Mention --debug can output warnings to stderr.
(default_key_compare): A new function refactored from main(),
and now also called from the new key_warnings() function.
(key_to_opts): A new function refactored from incompatible_options(),
and now also called from the new key_warnings() function.
(key_warnings): A new function to output warnings to stderr,
about questionable use of various options. Currently it warns
about zero length keys and ineffective global options.
(incompatible_options): Refactor out key_to_opts()
(main): Use key_init() to initialize gkey. Refactor out
default_key_compare(). Call key_warnings() in debug mode.
* doc/coreutils.texi (sort invocation): Mention that warnings
are output by --debug.
* tests/misc/sort-debug-warn: A new test for debug warnings.
* tests/Makefile.am: Reference the new test.
* NEWS: Mention the new feature
---
NEWS | 2 +-
doc/coreutils.texi | 1 +
src/sort.c | 173 ++++++++++++++++++++++++++++++--------------
tests/Makefile.am | 1 +
tests/misc/sort-debug-warn | 53 ++++++++++++++
5 files changed, 174 insertions(+), 56 deletions(-)
create mode 100755 tests/misc/sort-debug-warn
diff --git a/NEWS b/NEWS
index 4c2da67..5d7b81f 100644
--- a/NEWS
+++ b/NEWS
@@ -5,7 +5,7 @@ GNU coreutils NEWS -*- outline -*-
** New features
sort now accepts the --debug option, to highlight the part of the
- line significant in the sort.
+ line significant in the sort, and warn about questionable options.
** Changes in behavior
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 6714ada..cd99bd0 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3944,6 +3944,7 @@ of the line being used in the sort.
@item --debug
Highlight the portion of each line used for sorting.
+Also issue warnings about questionable usage to stderr.
@item --batch-size=@var{nmerge}
@opindex --batch-size
diff --git a/src/sort.c b/src/sort.c
index 65866b9..66a00ef 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -375,7 +375,8 @@ Other options:\n\
-C, --check=quiet, --check=silent like -c, but do not report first bad line\n\
--compress-program=PROG compress temporaries with PROG;\n\
decompress them with PROG -d\n\
- --debug annotate the part of the line used to sort\n\
+ --debug annotate the part of the line used to sort,\n\
+ and warn about questionable usage to stderr\n\
--files0-from=F read input from the files specified by\n\
NUL-terminated names in file F;\n\
If F is - then read names from standard input\n\
@@ -2158,6 +2159,108 @@ debug_key (char const *sline, char const *sfield, char const *efield,
mark_key (offset, width);
}
+/* Return whether sorting options specified for key. */
+
+static bool
+default_key_compare (struct keyfield const *key)
+{
+ return ! (key->ignore
+ || key->translate
+ || key->skipsblanks
+ || key->skipeblanks
+ || key->month
+ || key->numeric
+ || key->version
+ || key->general_numeric
+ || key->human_numeric
+ || key->random
+ /* || key->reverse */
+ );
+}
+
+/* Convert a key to the short options used to specify it.
+ The returned string must be freed. */
+
+static char*
+key_to_opts (struct keyfield const *key)
+{
+ /* The following is too big, but guaranteed to be "big enough". */
+ char *opts = xstrdup (short_options);
+ char *p = opts;
+
+ if (key->skipsblanks || key->skipeblanks)
+ *p++ = 'b';/* either disables global -b */
+ if (key->ignore == nondictionary)
+ *p++ = 'd';
+ if (key->ignore == nonprinting)
+ *p++ = 'i';
+ if (key->translate)
+ *p++ = 'f';
+ if (key->general_numeric)
+ *p++ = 'g';
+ if (key->human_numeric)
+ *p++ = 'h';
+ if (key->month)
+ *p++ = 'M';
+ if (key->numeric)
+ *p++ = 'n';
+ if (key->version)
+ *p++ = 'V';
+ if (key->random)
+ *p++ = 'R';
+ if (key->reverse)
+ *p++ = 'r';
+ *p = '\0';
+
+ return opts;
+}
+
+/* Output data independent key warnings to stderr. */
+
+static void
+key_warnings (struct keyfield const *gkey)
+{
+ struct keyfield const *key;
+ struct keyfield ugkey = *gkey;
+
+ for (key = keylist; key; key = key->next)
+ {
+ /* Output a warning for field specs that will never match. */
+ if (key->sword != SIZE_MAX && key->eword < key->sword)
+ fprintf (stderr, _("! zero width field ignored\n"));
+
+ /* Flag global options not copied or specified in any key. */
+ if (ugkey.ignore && (ugkey.ignore == key->ignore))
+ ugkey.ignore = NULL;
+ if (ugkey.translate && (ugkey.translate == key->translate))
+ ugkey.translate = NULL;
+ ugkey.skipsblanks &= !key->skipsblanks;
+ ugkey.skipeblanks &= !key->skipeblanks;
+ ugkey.month &= !key->month;
+ ugkey.numeric &= !key->numeric;
+ ugkey.general_numeric &= !key->general_numeric;
+ ugkey.human_numeric &= !key->human_numeric;
+ ugkey.random &= !key->random;
+ ugkey.version &= !key->version;
+ ugkey.reverse &= !key->reverse;
+ }
+
+ /* Warn about ignored global options flagged above. */
+ if (!default_key_compare (&ugkey) || (stable && ugkey.reverse))
+ {
+ bool ugkey_reverse = ugkey.reverse;
+ if (!stable)
+ ugkey.reverse = false;
+ char *opts = key_to_opts (&ugkey);
+ fprintf (stderr, _("! options `-%s' are ignored\n"), opts);
+ free (opts);
+ ugkey.reverse = ugkey_reverse;
+ }
+ if (!stable && ugkey.reverse)
+ fprintf (stderr,
+ _("! option `-r' only applies to last-resort comparison\n"));
+}
+
/* Compare two lines A and B trying every key in sequence until there
are no more keys or a difference is found. */
@@ -2418,7 +2521,7 @@ compare (const struct line *a, const struct line *b, bool show_debug)
}
static void
-write_bytes (const struct line *line, FILE *fp, char const *output_file)
+write_bytes (struct line const *line, FILE *fp, char const *output_file)
{
char const *buf = line->text;
size_t n_bytes = line->length;
@@ -3235,36 +3338,18 @@ incompatible_options (char const *opts)
static void
check_ordering_compatibility (void)
{
- struct keyfield const *key;
+ struct keyfield *key;
for (key = keylist; key; key = key->next)
if ((1 < (key->random + key->numeric + key->general_numeric + key->month
+ key->version + !!key->ignore + key->human_numeric))
|| (key->random && key->translate))
{
- /* The following is too big, but guaranteed to be "big enough". */
- char opts[sizeof short_options];
- char *p = opts;
- if (key->ignore == nondictionary)
- *p++ = 'd';
- if (key->translate)
- *p++ = 'f';
- if (key->general_numeric)
- *p++ = 'g';
- if (key->human_numeric)
- *p++ = 'h';
- if (key->ignore == nonprinting)
- *p++ = 'i';
- if (key->month)
- *p++ = 'M';
- if (key->numeric)
- *p++ = 'n';
- if (key->version)
- *p++ = 'V';
- if (key->random)
- *p++ = 'R';
- *p = '\0';
+ /* Clear flags we're not interested in. */
+ key->skipsblanks = key->skipeblanks = key->reverse = false;
+ char *opts = key_to_opts (key);
incompatible_options (opts);
+ free (opts);
}
}
@@ -3494,14 +3579,8 @@ main (int argc, char **argv)
/* The signal mask is known, so it is safe to invoke exit_cleanup. */
atexit (exit_cleanup);
- gkey.sword = gkey.eword = SIZE_MAX;
- gkey.ignore = NULL;
- gkey.translate = NULL;
- gkey.numeric = gkey.general_numeric = gkey.human_numeric = false;
- gkey.iec_present = -1;
- gkey.random = gkey.version = false;
- gkey.month = gkey.reverse = false;
- gkey.skipsblanks = gkey.skipeblanks = false;
+ key_init (&gkey);
+ gkey.sword = SIZE_MAX;
files = xnmalloc (argc, sizeof *files);
@@ -3836,17 +3915,7 @@ main (int argc, char **argv)
/* Inheritance of global options to individual keys. */
for (key = keylist; key; key = key->next)
{
- if (! (key->ignore
- || key->translate
- || (key->skipsblanks
- || key->reverse
- || key->skipeblanks
- || key->month
- || key->numeric
- || key->version
- || key->general_numeric
- || key->human_numeric
- || key->random)))
+ if (default_key_compare (key) && !key->reverse)
{
key->ignore = gkey.ignore;
key->translate = gkey.translate;
@@ -3856,24 +3925,15 @@ main (int argc, char **argv)
key->numeric = gkey.numeric;
key->general_numeric = gkey.general_numeric;
key->human_numeric = gkey.human_numeric;
+ key->version = gkey.version;
key->random = gkey.random;
key->reverse = gkey.reverse;
- key->version = gkey.version;
}
need_random |= key->random;
}
- if (!keylist && (gkey.ignore
- || gkey.translate
- || (gkey.skipsblanks
- || gkey.skipeblanks
- || gkey.month
- || gkey.numeric
- || gkey.general_numeric
- || gkey.human_numeric
- || gkey.random
- || gkey.version)))
+ if (!keylist && !default_key_compare (&gkey))
{
insertkey (&gkey);
need_random |= gkey.random;
@@ -3884,6 +3944,9 @@ main (int argc, char **argv)
if (debug && outfile)
error (SORT_FAILURE, 0, _("options -o and --debug are incompatible"));
+ if (debug)
+ key_warnings (&gkey);
+
reverse = gkey.reverse;
if (need_random)
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 46d388a..a732c63 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -225,6 +225,7 @@ TESTS = \
misc/sort-compress \
misc/sort-continue \
misc/sort-debug-keys \
+ misc/sort-debug-warn \
misc/sort-files0-from \
misc/sort-float \
misc/sort-merge \
diff --git a/tests/misc/sort-debug-warn b/tests/misc/sort-debug-warn
new file mode 100755
index 0000000..5295b4b
--- /dev/null
+++ b/tests/misc/sort-debug-warn
@@ -0,0 +1,53 @@
+#!/bin/sh
+# Test warnings for sort options
+
+# Copyright (C) 2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Wed, 12 May 2010 09:24:01 +0000
Resent-Message-ID: ) id 1OC8A9-0002m6-Pk
for 6176@debbugs.gnu.org; Wed, 12 May 2010 05:23:34 -0400
Received: (qmail 72953 invoked from network); 12 May 2010 09:23:29 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 12 May 2010 09:23:29 -0000
Message-ID: <4BEA732C.8090608@draigBrady.com>
Date: Wed, 12 May 2010 10:21:48 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com> <4BE9EAB8.8010703@draigBrady.com>
In-Reply-To: <4BE9EAB8.8010703@draigBrady.com>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.9 (--)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Subject: bug#6176: closed (Re: bug#6176: [PATCH 1/2] sort: add a --debug
option to highlight key extents)
Message-ID:
Subject: Re: bug#6176: [PATCH 1/2] sort: add a --debug option to highlight key
extents
In-Reply-To: <4BE9E0B1.5030308@draigBrady.com> (=?utf-8?Q?=22P=C3=A1draig?=
Brady"'s message of "Tue, 11 May 2010 23:56:49 +0100")
References: <4BE9E0B1.5030308@draigBrady.com>
Date: Wed, 12 May 2010 13:08:35 +0200
Message-ID: <87vdatml70.fsf@meyering.net>
Lines: 18
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -3.1 (---)
X-Debbugs-Envelope-To: 6176-done
Cc: 6176-done@debbugs.gnu.org
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: ) id 1OByPT-0006oN-Mk
for submit@debbugs.gnu.org; Tue, 11 May 2010 18:58:45 -0400
Received: from lists.gnu.org ([199.232.76.165]:44928)
by monty-python.gnu.org with esmtps
(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60)
(envelope-from ) id 1OByPP-0000VJ-Li
for submit@debbugs.gnu.org; Tue, 11 May 2010 18:58:39 -0400
Received: from [140.186.70.92] (port=33634 helo=eggs.gnu.org)
by lists.gnu.org with esmtp (Exim 4.43) id 1OByPM-0004X8-T3
for bug-coreutils@gnu.org; Tue, 11 May 2010 18:58:39 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
autolearn=unavailable version=3.3.1
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
(envelope-from ) id 1OByPH-0001WU-NA
for bug-coreutils@gnu.org; Tue, 11 May 2010 18:58:36 -0400
Received: from mail1.slb.deg.dub.stisp.net ([84.203.253.98]:9793)
by eggs.gnu.org with smtp (Exim 4.69)
(envelope-from ) id 1OByPH-0001Vd-4t
for bug-coreutils@gnu.org; Tue, 11 May 2010 18:58:31 -0400
Received: (qmail 68351 invoked from network); 11 May 2010 22:58:28 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 11 May 2010 22:58:28 -0000
Message-ID: <4BE9E0B1.5030308@draigBrady.com>
Date: Tue, 11 May 2010 23:56:49 +0100
From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?=
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
To: Report bugs to
Date: Tue, 23 Feb 2010 08:43:04 +0000
Subject: [PATCH 1/2] sort: add a --debug option to highlight key extents
* src/sort (usage): Add description for --debug.
(write_bytes): Pass a line structure so it can subsequently
be passed to compare to highlight the keys when in debug mode.
Also transform TAB and NUL characters written to stdout so
that the highlighting in debug mode aligns correctly.
(human_numcompare): Pass an "endptr" so we can record the extent
of the number matched.
(general_numcompare): Likewise.
(find_unit_order): Likewise.
(getmonth): Likewise.
(numcompare): Likewise. Note we reuse find_unit_order() for this,
which is a good enough appropixmation, and means we don't need to
change the strnumcmp() interface.
(check_mixed_SI_IEC): Return whether iec_present, so that can be
used to set the "endptr" in find_unit_order. Also make the key
parameter optional, which will be the case from numcompare().
(count_tabs): A new function to determine how much to adjust
the mbswidth() values by (TABs don't have a width).
(mark_key): A new function to output the key highlighting to stdout.
(debug_key): A new function to determine the offset and width
of the key highlighting.
(key_compare): Pass the show_debug parameter so the key highlighting
is only displayed when explicitly called. For each key type, set
the length (lena) and whether leading blanks are auto skipped (skipb)
which are then used by debug_key() to highlight the portion of the
key used in the comparison.
(compare): Pass the show_debug parameter so the key highlighting
is only displayed when explicitly called. Call debug_key() to
highlight the last resort comparison.
(check): Output highlighting for disorder line to stdout.
(main): Process the --debug option and make it mutually exlusive
with the -o option as I don't see it useful there, even potentially
harmful if someone left a --debug in by mistake when updating a file.
Also restricting debug output to stdout, simplifies the logic
for dealing with temporary files.
* doc/coreutils.texi (sort invocation): Describe the --debug option,
and reference it from the --key description.
* tests/misc/sort-debug-keys: A new test for highlighting keys.
* tests/Makefile.am: Reference the new test.
* NEWS: Mention the new feature.
---
NEWS | 5 +
doc/coreutils.texi | 5 +
src/sort.c | 289 ++++++++++++++++++++++++++++++++---------
tests/Makefile.am | 1 +
tests/misc/sort-debug-keys | 317 ++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 557 insertions(+), 60 deletions(-)
create mode 100755 tests/misc/sort-debug-keys
diff --git a/NEWS b/NEWS
index 070f338..4c2da67 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU coreutils NEWS -*- outline -*-
* Noteworthy changes in release ?.? (????-??-??) [?]
+** New features
+
+ sort now accepts the --debug option, to highlight the part of the
+ line significant in the sort.
+
** Changes in behavior
sort -g now uses long doubles for greater range and precision.
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index c8ba53c..6714ada 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3939,6 +3939,11 @@ multiple fields.
Example: To sort on the second field, use @option{--key=2,2}
(@option{-k 2,2}). See below for more notes on keys and more examples.
+See also the @option{--debug} option to help determine the part
+of the line being used in the sort.
+
+@item --debug
+Highlight the portion of each line used for sorting.
@item --batch-size=@var{nmerge}
@opindex --batch-size
diff --git a/src/sort.c b/src/sort.c
index 54b97e2..65866b9 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -34,6 +34,7 @@
#include "hash.h"
#include "ignore-value.h"
#include "md5.h"
+#include "mbswidth.h"
#include "physmem.h"
#include "posixver.h"
#include "quote.h"
@@ -291,6 +292,9 @@ static struct keyfield *keylist;
/* Program used to (de)compress temp files. Must accept -d. */
static char const *compress_program;
+/* Annotate the output with extra info to aid the user. */
+static bool debug;
+
/* Maximum number of files to merge in one go. If more than this
number are present, temp files will be used. */
static unsigned int nmerge = NMERGE_DEFAULT;
@@ -371,6 +375,7 @@ Other options:\n\
-C, --check=quiet, --check=silent like -c, but do not report first bad line\n\
--compress-program=PROG compress temporaries with PROG;\n\
decompress them with PROG -d\n\
+ --debug annotate the part of the line used to sort\n\
--files0-from=F read input from the files specified by\n\
NUL-terminated names in file F;\n\
If F is - then read names from standard input\n\
@@ -429,6 +434,7 @@ enum
{
CHECK_OPTION = CHAR_MAX + 1,
COMPRESS_PROGRAM_OPTION,
+ DEBUG_PROGRAM_OPTION,
FILES0_FROM_OPTION,
NMERGE_OPTION,
RANDOM_SOURCE_OPTION,
@@ -442,6 +448,7 @@ static struct option const long_options[] =
{"ignore-leading-blanks", no_argument, NULL, 'b'},
{"check", optional_argument, NULL, CHECK_OPTION},
{"compress-program", required_argument, NULL, COMPRESS_PROGRAM_OPTION},
+ {"debug", no_argument, NULL, DEBUG_PROGRAM_OPTION},
{"dictionary-order", no_argument, NULL, 'd'},
{"ignore-case", no_argument, NULL, 'f'},
{"files0-from", required_argument, NULL, FILES0_FROM_OPTION},
@@ -1111,13 +1118,6 @@ open_temp (const char *name, pid_t pid)
return fp;
}
-static void
-write_bytes (const char *buf, size_t n_bytes, FILE *fp, const char *output_file)
-{
- if (fwrite (buf, 1, n_bytes, fp) != n_bytes)
- die (_("write failed"), output_file);
-}
-
/* Append DIR to the array of temporary directory names. */
static void
add_temp_dir (char const *dir)
@@ -1734,30 +1734,19 @@ fillbuf (struct buffer *buf, FILE *fp, char const *file)
}
}
-/* Compare strings A and B as numbers without explicitly converting them to
- machine numbers. Comparatively slow for short strings, but asymptotically
- hideously fast. */
-
-static int
-numcompare (const char *a, const char *b)
-{
- while (blanks[to_uchar (*a)])
- a++;
- while (blanks[to_uchar (*b)])
- b++;
-
- return strnumcmp (a, b, decimal_point, thousands_sep);
-}
-
/* Exit with an error if a mixture of SI and IEC units detected. */
-static void
+static bool
check_mixed_SI_IEC (char prefix, struct keyfield *key)
{
int iec_present = prefix == 'i';
- if (key->iec_present != -1 && iec_present != key->iec_present)
- error (SORT_FAILURE, 0, _("both SI and IEC prefixes present on units"));
- key->iec_present = iec_present;
+ if (key)
+ {
+ if (key->iec_present != -1 && iec_present != key->iec_present)
+ error (SORT_FAILURE, 0, _("both SI and IEC prefixes present on units"));
+ key->iec_present = iec_present;
+ }
+ return iec_present;
}
/* Return an integer which represents the order of magnitude of
@@ -1766,7 +1755,7 @@ check_mixed_SI_IEC (char prefix, struct keyfield *key)
Negative numbers return a negative unit order. */
static int
-find_unit_order (const char *number, struct keyfield *key)
+find_unit_order (const char *number, struct keyfield *key, char const **endptr)
{
static const char orders [UCHAR_LIM] =
{
@@ -1822,7 +1811,13 @@ find_unit_order (const char *number, struct keyfield *key)
/* For valid units check for MiB vs MB etc. */
if (order)
- check_mixed_SI_IEC (*(p + 1), key);
+ {
+ p++;
+ p += check_mixed_SI_IEC (*p, key);
+ }
+
+ if (endptr)
+ *endptr = p;
return sign * order;
}
@@ -1833,25 +1828,50 @@ find_unit_order (const char *number, struct keyfield *key)
i.e. input will never have both 6000K and 5M. */
static int
-human_numcompare (const char *a, const char *b, struct keyfield *key)
+human_numcompare (const char *a, const char *b, struct keyfield *key,
+ char const **ea)
{
while (blanks[to_uchar (*a)])
a++;
while (blanks[to_uchar (*b)])
b++;
- int order_a = find_unit_order (a, key);
- int order_b = find_unit_order (b, key);
+ int order_a = find_unit_order (a, key, ea);
+ int order_b = find_unit_order (b, key, NULL);
return (order_a > order_b ? 1
: order_a < order_b ? -1
: strnumcmp (a, b, decimal_point, thousands_sep));
}
+/* Compare strings A and B as numbers without explicitly converting them to
+ machine numbers. Comparatively slow for short strings, but asymptotically
+ hideously fast. */
+
static int
-general_numcompare (const char *sa, const char *sb)
+numcompare (const char *a, const char *b, char const **ea)
+{
+ while (blanks[to_uchar (*a)])
+ a++;
+ while (blanks[to_uchar (*b)])
+ b++;
+
+ if (debug)
+ {
+ /* Approximate strnumcmp extents with find_unit_order. */
+ if (find_unit_order (a, NULL, ea))
+ {
+ *ea -= 1; /* ignore the order letter */
+ *ea -= (**ea == 'i'); /* and IEC prefix */
+ }
+ }
+
+ return strnumcmp (a, b, decimal_point, thousands_sep);
+}
+
+static int
+general_numcompare (const char *sa, const char *sb, char const **ea)
{
- /* FIXME: add option to warn about failed conversions. */
/* FIXME: maybe add option to try expensive FP conversion
only if A and B can't be compared more cheaply/accurately. */
@@ -1863,13 +1883,12 @@ general_numcompare (const char *sa, const char *sb)
# define strtold strtod
#endif
- char *ea;
char *eb;
- long_double a = strtold (sa, &ea);
+ long_double a = strtold (sa, (char **) ea);
long_double b = strtold (sb, &eb);
/* Put conversion errors at the start of the collating sequence. */
- if (sa == ea)
+ if (sa == *ea)
return sb == eb ? 0 : -1;
if (sb == eb)
return 1;
@@ -1889,7 +1908,7 @@ general_numcompare (const char *sa, const char *sb)
Return 0 if the name in S is not recognized. */
static int
-getmonth (char const *month, size_t len)
+getmonth (char const *month, size_t len, char const **ea)
{
size_t lo = 0;
size_t hi = MONTHS_PER_YEAR;
@@ -1913,7 +1932,11 @@ getmonth (char const *month, size_t len)
for (;; m++, n++)
{
if (!*n)
- return monthtab[ix].val;
+ {
+ if (ea)
+ *ea = m;
+ return monthtab[ix].val;
+ }
if (m == monthlim || fold_toupper[to_uchar (*m)] < to_uchar (*n))
{
hi = ix;
@@ -2070,11 +2093,76 @@ compare_version (char *restrict texta, size_t lena,
return diff;
}
+/* For debug mode, count tabs in the passed string
+ so we can adjust the widths returned by mbswidth.
+ FIXME: Should we generally be counting non printable chars? */
+
+static size_t
+count_tabs (char const *text, const size_t len)
+{
+ size_t tabs = 0;
+ size_t tlen = strnlen (text, len);
+
+ while (tlen--)
+ {
+ if (*text++ == '\t')
+ tabs++;
+ }
+
+ return tabs;
+}
+
+/* For debug mode, "underline" a key at the
+ specified offset and screen width. */
+
+static void
+mark_key (size_t offset, size_t width)
+{
+ printf ("%*s", offset, "");
+
+ if (!width)
+ printf (_("^ no match for key\n"));
+ else
+ {
+ while (width--)
+ putchar ('_');
+ putchar ('\n');
+ }
+}
+
+/* For debug mode, determine the screen offset and width
+ to highlight for a key, and then output the highlight. */
+
+static void
+debug_key (char const *sline, char const *sfield, char const *efield,
+ size_t flen, bool skipb)
+{
+ char const *sa = sfield;
+
+ if (skipb) /* This key type implicitly skips leading blanks. */
+ {
+ while (sa < efield && blanks[to_uchar (*sa)])
+ {
+ sa++;
+ if (flen)
+ flen--; /* This assumes TABs same width as SPACEs. */
+ }
+ }
+
+ size_t offset = mbsnwidth (sline, sfield - sline, 0) + (sa - sfield);
+ offset += count_tabs (sline, sfield - sline);
+
+ size_t width = mbsnwidth (sa, flen, 0);
+ width += count_tabs (sa, flen);
+
+ mark_key (offset, width);
+}
+
/* Compare two lines A and B trying every key in sequence until there
are no more keys or a difference is found. */
static int
-keycompare (const struct line *a, const struct line *b)
+keycompare (const struct line *a, const struct line *b, bool show_debug)
{
struct keyfield *key = keylist;
@@ -2091,6 +2179,7 @@ keycompare (const struct line *a, const struct line *b)
{
char const *translate = key->translate;
bool const *ignore = key->ignore;
+ bool skipb = false; /* Whether key type auto skips leading blanks. */
/* Treat field ends before field starts as empty fields. */
lima = MAX (texta, lima);
@@ -2107,21 +2196,42 @@ keycompare (const struct line *a, const struct line *b)
else if (key->numeric || key->general_numeric || key->human_numeric)
{
char savea = *lima, saveb = *limb;
+ char const* ea = lima;
*lima = *limb = '\0';
- diff = (key->numeric ? numcompare (texta, textb)
- : key->general_numeric ? general_numcompare (texta, textb)
- : human_numcompare (texta, textb, key));
+ diff = (key->numeric ? numcompare (texta, textb, &ea)
+ : key->general_numeric ? general_numcompare (texta, textb,
+ &ea)
+ : human_numcompare (texta, textb, key, &ea));
+ if (show_debug)
+ {
+ lena = ea - texta;
+ skipb = true;
+ }
*lima = savea, *limb = saveb;
}
else if (key->version)
diff = compare_version (texta, lena, textb, lenb);
else if (key->month)
- diff = getmonth (texta, lena) - getmonth (textb, lenb);
+ {
+ char const *ea = lima;
+
+ int amon = getmonth (texta, lena, &ea);
+ diff = amon - getmonth (textb, lenb, NULL);
+
+ if (show_debug)
+ {
+ lena = amon ? ea - texta : 0;
+ skipb = true;
+ }
+ }
/* Sorting like this may become slow, so in a simple locale the user
can select a faster sort that is similar to ascii sort. */
else if (hard_LC_COLLATE)
{
+ /* FIXME: for debug, account for skipped chars, while handling mb chars.
+ Generally perhaps xmemfrm could be used to determine chars that are
+ excluded from the collating order? */
if (ignore || translate)
{
char buf[4000];
@@ -2165,6 +2275,8 @@ keycompare (const struct line *a, const struct line *b)
}
else if (ignore)
{
+ char *savea = texta;
+
#define CMP_WITH_IGNORE(A, B) \
do \
{ \
@@ -2192,6 +2304,10 @@ keycompare (const struct line *a, const struct line *b)
translate[to_uchar (*textb)]);
else
CMP_WITH_IGNORE (*texta, *textb);
+
+ /* We only need to restore this for debug_key
+ in which case the keys being compared are equal. */
+ texta = savea;
}
else if (lena == 0)
diff = - NONZERO (lenb);
@@ -2201,6 +2317,8 @@ keycompare (const struct line *a, const struct line *b)
{
if (translate)
{
+ char *savea = texta;
+
while (texta < lima && textb < limb)
{
diff = (to_uchar (translate[to_uchar (*texta++)])
@@ -2208,6 +2326,10 @@ keycompare (const struct line *a, const struct line *b)
if (diff)
goto not_equal;
}
+
+ /* We only need to restore this for debug_key
+ in which case the keys being compared are equal. */
+ texta = savea;
}
else
{
@@ -2221,6 +2343,9 @@ keycompare (const struct line *a, const struct line *b)
if (diff)
goto not_equal;
+ if (show_debug)
+ debug_key (a->text, texta, lima, lena, skipb);
+
key = key->next;
if (! key)
break;
@@ -2258,7 +2383,7 @@ keycompare (const struct line *a, const struct line *b)
depending on whether A compares less than, equal to, or greater than B. */
static int
-compare (const struct line *a, const struct line *b)
+compare (const struct line *a, const struct line *b, bool show_debug)
{
int diff;
size_t alen, blen;
@@ -2268,7 +2393,7 @@ compare (const struct line *a, const struct line *b)
and unadorned sort -r. */
if (keylist)
{
- diff = keycompare (a, b);
+ diff = keycompare (a, b, show_debug);
if (diff || unique || stable)
return diff;
}
@@ -2277,6 +2402,9 @@ compare (const struct line *a, const struct line *b)
fall through to the default comparison. */
alen = a->length - 1, blen = b->length - 1;
+ if (show_debug)
+ debug_key (a->text, a->text, a->text + alen, alen, false);
+
if (alen == 0)
diff = - NONZERO (blen);
else if (blen == 0)
@@ -2289,6 +2417,38 @@ compare (const struct line *a, const struct line *b)
return reverse ? -diff : diff;
}
+static void
+write_bytes (const struct line *line, FILE *fp, char const *output_file)
+{
+ char const *buf = line->text;
+ size_t n_bytes = line->length;
+
+ /* Convert TABs to '>' and \0 to \n when -z specified. */
+ if (debug && fp == stdout)
+ {
+ char const *ebuf = buf + n_bytes;
+ char const *c = buf;
+
+ while (c < ebuf)
+ {
+ char wc = *c++;
+ if (wc == '\t')
+ wc = '>';
+ else if (wc == 0 && eolchar == 0)
+ wc = '\n';
+ if (fputc (wc, fp) == EOF)
+ die (_("write failed"), output_file);
+ }
+
+ compare (line, line, true);
+ }
+ else
+ {
+ if (fwrite (buf, 1, n_bytes, fp) != n_bytes)
+ die (_("write failed"), output_file);
+ }
+}
+
/* Check that the lines read from FILE_NAME come in order. Return
true if they are in order. If CHECKONLY == 'c', also print a
diagnostic (FILE_NAME, line number, contents of line) to stderr if
@@ -2317,7 +2477,7 @@ check (char const *file_name, char checkonly)
/* Make sure the line saved from the old buffer contents is
less than or equal to the first line of the new buffer. */
- if (alloc && nonunique <= compare (&temp, line - 1))
+ if (alloc && nonunique <= compare (&temp, line - 1, false))
{
found_disorder:
{
@@ -2330,8 +2490,10 @@ check (char const *file_name, char checkonly)
fprintf (stderr, _("%s: %s:%s: disorder: "),
program_name, file_name,
umaxtostr (disorder_line_number, hr_buf));
- write_bytes (disorder_line->text, disorder_line->length,
- stderr, _("standard error"));
+ if (debug)
+ fputc ('\n', stderr);
+ write_bytes (disorder_line, debug ? stdout : stderr,
+ debug ? _("standard out") : _("standard error"));
}
ordered = false;
@@ -2341,7 +2503,7 @@ check (char const *file_name, char checkonly)
/* Compare each line in the buffer with its successor. */
while (linebase < --line)
- if (nonunique <= compare (line, line - 1))
+ if (nonunique <= compare (line, line - 1, false))
goto found_disorder;
line_number += buf.nlines;
@@ -2470,7 +2632,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
for (i = 0; i < nfiles; ++i)
ord[i] = i;
for (i = 1; i < nfiles; ++i)
- if (0 < compare (cur[ord[i - 1]], cur[ord[i]]))
+ if (0 < compare (cur[ord[i - 1]], cur[ord[i]], false))
t = ord[i - 1], ord[i - 1] = ord[i], ord[i] = t, i = 0;
/* Repeatedly output the smallest line until no input remains. */
@@ -2482,10 +2644,10 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
an identical series of lines. */
if (unique)
{
- if (savedline && compare (savedline, smallest))
+ if (savedline && compare (savedline, smallest, false))
{
savedline = NULL;
- write_bytes (saved.text, saved.length, ofp, output_file);
+ write_bytes (&saved, ofp, output_file);
}
if (!savedline)
{
@@ -2514,7 +2676,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
}
}
else
- write_bytes (smallest->text, smallest->length, ofp, output_file);
+ write_bytes (smallest, ofp, output_file);
/* Check if we need to read more lines into core. */
if (base[ord[0]] < smallest)
@@ -2568,7 +2730,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
while (lo < hi)
{
- int cmp = compare (cur[ord0], cur[ord[probe]]);
+ int cmp = compare (cur[ord0], cur[ord[probe]], false);
if (cmp < 0 || (cmp == 0 && ord0 < ord[probe]))
hi = probe;
else
@@ -2589,7 +2751,7 @@ mergefps (struct sortfile *files, size_t ntemps, size_t nfiles,
if (unique && savedline)
{
- write_bytes (saved.text, saved.length, ofp, output_file);
+ write_bytes (&saved, ofp, output_file);
free (saved.text);
}
@@ -2634,7 +2796,7 @@ mergelines (struct line *t,
struct line const *hi, size_t nhi)
{
for (;;)
- if (compare (lo - 1, hi - 1) <= 0)
+ if (compare (lo - 1, hi - 1, false) <= 0)
{
*--t = *--lo;
if (! --nlo)
@@ -2676,7 +2838,7 @@ sortlines (struct line *lines, size_t nlines, struct line *temp)
{
if (nlines == 2)
{
- if (0 < compare (&lines[-1], &lines[-2]))
+ if (0 < compare (&lines[-1], &lines[-2], false))
{
struct line tmp = lines[-1];
lines[-1] = lines[-2];
@@ -2712,7 +2874,7 @@ sortlines_temp (struct line *lines, size_t nlines, struct line *temp)
/* Declare `swap' as int, not bool, to work around a bug
Cc: 6176@debbugs.gnu.org
Received: via spool by 6176-submit@debbugs.gnu.org id=B6176.127367243020331
(code B ref 6176); Wed, 12 May 2010 13:54:02 +0000
Received: (at 6176) by debbugs.gnu.org; 12 May 2010 13:53:50 +0000
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.69)
(envelope-from
Received: via spool by 6176-submit@debbugs.gnu.org id=B6176.127367256420415
(code B ref 6176); Wed, 12 May 2010 13:57:02 +0000
Received: (at 6176) by debbugs.gnu.org; 12 May 2010 13:56:04 +0000
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.69)
(envelope-from
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Wed, 12 May 2010 14:09:01 +0000
Resent-Message-ID: ) id 1OCCcH-0005Os-UN
for 6176@debbugs.gnu.org; Wed, 12 May 2010 10:08:54 -0400
Received: (qmail 47730 invoked from network); 12 May 2010 14:08:48 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 12 May 2010 14:08:48 -0000
Message-ID: <4BEAB609.6050201@draigBrady.com>
Date: Wed, 12 May 2010 15:07:05 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com>
<4BE9EAB8.8010703@draigBrady.com> <4BEAB2D3.3010801@redhat.com>
<4BEAB363.1070009@redhat.com>
In-Reply-To: <4BEAB363.1070009@redhat.com>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.9 (--)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Wed, 12 May 2010 14:16:02 +0000
Resent-Message-ID: ) id 1OCCiM-0005Ro-46
for 6176@debbugs.gnu.org; Wed, 12 May 2010 10:15:12 -0400
Received: (qmail 49439 invoked from network); 12 May 2010 14:15:04 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 12 May 2010 14:15:04 -0000
Message-ID: <4BEAB781.1070007@draigBrady.com>
Date: Wed, 12 May 2010 15:13:21 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com> <87vdatml70.fsf@meyering.net>
In-Reply-To: <87vdatml70.fsf@meyering.net>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.9 (--)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Fri, 14 May 2010 13:13:02 +0000
Resent-Message-ID: ) id 1OCugZ-0005EL-4K
for 6176@debbugs.gnu.org; Fri, 14 May 2010 09:12:16 -0400
Received: (qmail 62465 invoked from network); 14 May 2010 13:12:11 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 14 May 2010 13:12:11 -0000
Message-ID: <4BED4BBB.4000908@draigBrady.com>
Date: Fri, 14 May 2010 14:10:19 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com> <4BE9EAB8.8010703@draigBrady.com>
<4BEAB2D3.3010801@redhat.com> <4BEAB363.1070009@redhat.com>
<4BEAB609.6050201@draigBrady.com>
In-Reply-To: <4BEAB609.6050201@draigBrady.com>
X-Enigmail-Version: 1.0.1
Content-Type: multipart/mixed; boundary="------------040200000409030800080602"
X-Spam-Score: -1.6 (-)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Date: Tue, 11 May 2010 18:46:21 +0100
Subject: [PATCH] sort: --debug: output data independent key warnings
* src/sort.c (usage): Mention --debug can output warnings to stderr.
(default_key_compare): A new function refactored from main(),
and now also called from the new key_warnings() function.
(key_to_opts): A new function refactored from incompatible_options(),
and now also called from the new key_warnings() function.
(key_numeric): A new function refactored to test if key is numeric.
(key_warnings): A new function to output warnings to stderr,
about questionable use of various options. Currently it warns
about zero length keys and ineffective global options.
(incompatible_options): Refactor out key_to_opts()
(main): Use key_init() to initialize gkey. Refactor out
default_key_compare(). Call key_warnings() in debug mode.
* doc/coreutils.texi (sort invocation): Mention that warnings
are output by --debug.
* tests/misc/sort-debug-warn: A new test for debug warnings.
* tests/Makefile.am: Reference the new test.
* NEWS: Mention the new feature
---
NEWS | 2 +-
doc/coreutils.texi | 1 +
src/sort.c | 227 +++++++++++++++++++++++++++++++++-----------
tests/Makefile.am | 1 +
tests/misc/sort-debug-warn | 91 ++++++++++++++++++
5 files changed, 264 insertions(+), 58 deletions(-)
create mode 100755 tests/misc/sort-debug-warn
diff --git a/NEWS b/NEWS
index 4c2da67..5d7b81f 100644
--- a/NEWS
+++ b/NEWS
@@ -5,7 +5,7 @@ GNU coreutils NEWS -*- outline -*-
** New features
sort now accepts the --debug option, to highlight the part of the
- line significant in the sort.
+ line significant in the sort, and warn about questionable options.
** Changes in behavior
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 6714ada..cd99bd0 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -3944,6 +3944,7 @@ of the line being used in the sort.
@item --debug
Highlight the portion of each line used for sorting.
+Also issue warnings about questionable usage to stderr.
@item --batch-size=@var{nmerge}
@opindex --batch-size
diff --git a/src/sort.c b/src/sort.c
index 38c5ed2..f2b1124 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -131,6 +131,9 @@ static bool hard_LC_COLLATE;
static bool hard_LC_TIME;
#endif
+/* Nonzero if the obsolescent key option format is used. */
+static bool obsolete_used;
+
#define NONZERO(x) ((x) != 0)
/* The kind of blanks for '-b' to skip in various options. */
@@ -375,7 +378,8 @@ Other options:\n\
-C, --check=quiet, --check=silent like -c, but do not report first bad line\n\
--compress-program=PROG compress temporaries with PROG;\n\
decompress them with PROG -d\n\
- --debug annotate the part of the line used to sort\n\
+ --debug annotate the part of the line used to sort,\n\
+ and warn about questionable usage to stderr\n\
--files0-from=F read input from the files specified by\n\
NUL-terminated names in file F;\n\
If F is - then read names from standard input\n\
@@ -2158,6 +2162,143 @@ debug_key (char const *sline, char const *sfield, char const *efield,
mark_key (offset, width);
}
+/* Testing if a key is numeric is done in various places. */
+
+static inline bool
+key_numeric (struct keyfield const *key)
+{
+ return (key->numeric || key->general_numeric || key->human_numeric);
+}
+
+/* Return whether sorting options specified for key. */
+
+static bool
+default_key_compare (struct keyfield const *key)
+{
+ return ! (key->ignore
+ || key->translate
+ || key->skipsblanks
+ || key->skipeblanks
+ || key_numeric (key)
+ || key->month
+ || key->version
+ || key->random
+ /* || key->reverse */
+ );
+}
+
+/* Convert a key to the short options used to specify it.
+ The returned string must be freed. */
+
+static char*
+key_to_opts (struct keyfield const *key)
+{
+ /* The following is too big, but guaranteed to be "big enough". */
+ char *opts = xstrdup (short_options);
+ char *p = opts;
+
+ if (key->skipsblanks || key->skipeblanks)
+ *p++ = 'b';/* either disables global -b */
+ if (key->ignore == nondictionary)
+ *p++ = 'd';
+ if (key->ignore == nonprinting)
+ *p++ = 'i';
+ if (key->translate)
+ *p++ = 'f';
+ if (key->general_numeric)
+ *p++ = 'g';
+ if (key->human_numeric)
+ *p++ = 'h';
+ if (key->month)
+ *p++ = 'M';
+ if (key->numeric)
+ *p++ = 'n';
+ if (key->random)
+ *p++ = 'R';
+ if (key->reverse)
+ *p++ = 'r';
+ if (key->version)
+ *p++ = 'V';
+ *p = '\0';
+
+ return opts;
+}
+
+/* Output data independent key warnings to stderr. */
+
+static void
+key_warnings (struct keyfield const *gkey, bool gkey_only)
+{
+ struct keyfield const *key;
+ struct keyfield ugkey = *gkey;
+ size_t keynum = 1;
+
+ if (obsolete_used)
+ error (0, 0, _("obsolescent key formats used. Consider using `-k'"));
+
+ for (key = keylist; key; key = key->next, keynum++)
+ {
+ /* Warn about field specs that will never match. */
+ if (key->sword != SIZE_MAX && key->eword < key->sword)
+ error (0, 0, _("key %zu has zero width and will be ignored"), keynum);
+
+ /* Warn about significant leading blanks. */
+ if (!gkey_only && tab == TAB_DEFAULT && !key->skipsblanks
+ && !key_numeric (key) && !key->month)
+ error (0, 0, _("leading blanks are significant in key %zu. Consider "
+ "also specifying `b'"), keynum);
+
+
+ /* Warn about numeric comparisons spanning fields,
+ as field delimiters could be interpreted as part
+ of the number (maybe only in other locales). */
+ if (!gkey_only && key_numeric (key))
+ {
+ size_t sword = key->sword + 1;
+ size_t eword = key->eword + 1;
+ if (!sword)
+ sword++;
+ if (sword != eword)
+ error (0, 0, _("key %zu is numeric and spans multiple fields"),
+ keynum);
+ }
+
+ /* Flag global options not copied or specified in any key. */
+ if (ugkey.ignore && (ugkey.ignore == key->ignore))
+ ugkey.ignore = NULL;
+ if (ugkey.translate && (ugkey.translate == key->translate))
+ ugkey.translate = NULL;
+ ugkey.skipsblanks &= !key->skipsblanks;
+ ugkey.skipeblanks &= !key->skipeblanks;
+ ugkey.month &= !key->month;
+ ugkey.numeric &= !key->numeric;
+ ugkey.general_numeric &= !key->general_numeric;
+ ugkey.human_numeric &= !key->human_numeric;
+ ugkey.random &= !key->random;
+ ugkey.version &= !key->version;
+ ugkey.reverse &= !key->reverse;
+ }
+
+ /* Warn about ignored global options flagged above.
+ Note if gkey is the only one in the list, all flags are cleared. */
+ if (!default_key_compare (&ugkey)
+ || (ugkey.reverse && (stable || unique) && keylist))
+ {
+ bool ugkey_reverse = ugkey.reverse;
+ if (!(stable || unique))
+ ugkey.reverse = false;
+ char *opts = key_to_opts (&ugkey);
+ error (0, 0,
+ ngettext ("option `-%s' is ignored",
+ "options `-%s' are ignored",
+ select_plural (strlen (opts))), opts);
+ free (opts);
+ ugkey.reverse = ugkey_reverse;
+ }
+ if (ugkey.reverse && !(stable || unique) && keylist)
+ error (0, 0, _("option `-r' only applies to last-resort comparison"));
+}
+
/* Compare two lines A and B trying every key in sequence until there
are no more keys or a difference is found. */
@@ -2193,7 +2334,7 @@ keycompare (const struct line *a, const struct line *b, bool show_debug)
if (key->random)
diff = compare_random (texta, lena, textb, lenb);
- else if (key->numeric || key->general_numeric || key->human_numeric)
+ else if (key_numeric (key))
{
char savea = *lima, saveb = *limb;
char const* ea = lima;
@@ -3235,36 +3376,18 @@ incompatible_options (char const *opts)
static void
check_ordering_compatibility (void)
{
- struct keyfield const *key;
+ struct keyfield *key;
for (key = keylist; key; key = key->next)
- if ((1 < (key->random + key->numeric + key->general_numeric + key->month
- + key->version + !!key->ignore + key->human_numeric))
+ if ((1 < (key->random + key_numeric (key) + key->month + key->version
+ + !!key->ignore))
|| (key->random && key->translate))
{
- /* The following is too big, but guaranteed to be "big enough". */
- char opts[sizeof short_options];
- char *p = opts;
- if (key->ignore == nondictionary)
- *p++ = 'd';
- if (key->translate)
- *p++ = 'f';
- if (key->general_numeric)
- *p++ = 'g';
- if (key->human_numeric)
- *p++ = 'h';
- if (key->ignore == nonprinting)
- *p++ = 'i';
- if (key->month)
- *p++ = 'M';
- if (key->numeric)
- *p++ = 'n';
- if (key->version)
- *p++ = 'V';
- if (key->random)
- *p++ = 'R';
- *p = '\0';
+ /* Clear flags we're not interested in. */
+ key->skipsblanks = key->skipeblanks = key->reverse = false;
+ char *opts = key_to_opts (key);
incompatible_options (opts);
+ free (opts);
}
}
@@ -3391,6 +3514,7 @@ main (int argc, char **argv)
struct keyfield *key;
struct keyfield key_buf;
struct keyfield gkey;
+ bool gkey_only = false;
char const *s;
int c = 0;
char checkonly = 0;
@@ -3494,14 +3618,8 @@ main (int argc, char **argv)
/* The signal mask is known, so it is safe to invoke exit_cleanup. */
atexit (exit_cleanup);
- gkey.sword = gkey.eword = SIZE_MAX;
- gkey.ignore = NULL;
- gkey.translate = NULL;
- gkey.numeric = gkey.general_numeric = gkey.human_numeric = false;
- gkey.iec_present = -1;
- gkey.random = gkey.version = false;
- gkey.month = gkey.reverse = false;
- gkey.skipsblanks = gkey.skipeblanks = false;
+ key_init (&gkey);
+ gkey.sword = SIZE_MAX;
files = xnmalloc (argc, sizeof *files);
@@ -3574,6 +3692,7 @@ main (int argc, char **argv)
N_("stray character in field spec"));
}
insertkey (key);
+ obsolete_used = true;
}
}
}
@@ -3836,17 +3955,7 @@ main (int argc, char **argv)
/* Inheritance of global options to individual keys. */
for (key = keylist; key; key = key->next)
{
- if (! (key->ignore
- || key->translate
- || (key->skipsblanks
- || key->reverse
- || key->skipeblanks
- || key->month
- || key->numeric
- || key->version
- || key->general_numeric
- || key->human_numeric
- || key->random)))
+ if (default_key_compare (key) && !key->reverse)
{
key->ignore = gkey.ignore;
key->translate = gkey.translate;
@@ -3856,25 +3965,17 @@ main (int argc, char **argv)
key->numeric = gkey.numeric;
key->general_numeric = gkey.general_numeric;
key->human_numeric = gkey.human_numeric;
+ key->version = gkey.version;
key->random = gkey.random;
key->reverse = gkey.reverse;
- key->version = gkey.version;
}
need_random |= key->random;
}
- if (!keylist && (gkey.ignore
- || gkey.translate
- || (gkey.skipsblanks
- || gkey.skipeblanks
- || gkey.month
- || gkey.numeric
- || gkey.general_numeric
- || gkey.human_numeric
- || gkey.random
- || gkey.version)))
+ if (!keylist && !default_key_compare (&gkey))
{
+ gkey_only = true;
insertkey (&gkey);
need_random |= gkey.random;
}
@@ -3884,6 +3985,18 @@ main (int argc, char **argv)
if (debug && outfile)
error (SORT_FAILURE, 0, _("options -o and --debug are incompatible"));
+ if (debug)
+ {
+ /* Always output the locale in debug mode, since this
+ is such a common source of confusion. */
+ if (hard_LC_COLLATE)
+ error (0, 0, _("using %s sorting rules"),
+ quote (setlocale (LC_COLLATE, NULL)));
+ else
+ error (0, 0, _("using simple byte comparison"));
+ key_warnings (&gkey, gkey_only);
+ }
+
reverse = gkey.reverse;
if (need_random)
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 46d388a..a732c63 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -225,6 +225,7 @@ TESTS = \
misc/sort-compress \
misc/sort-continue \
misc/sort-debug-keys \
+ misc/sort-debug-warn \
misc/sort-files0-from \
misc/sort-float \
misc/sort-merge \
diff --git a/tests/misc/sort-debug-warn b/tests/misc/sort-debug-warn
new file mode 100755
index 0000000..6da3d9b
--- /dev/null
+++ b/tests/misc/sort-debug-warn
@@ -0,0 +1,91 @@
+#!/bin/sh
+# Test warnings for sort options
+
+# Copyright (C) 2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see
Cc: 6176@debbugs.gnu.org
Received: via spool by 6176-submit@debbugs.gnu.org id=B6176.127384980323273
(code B ref 6176); Fri, 14 May 2010 15:11:01 +0000
Received: (at 6176) by debbugs.gnu.org; 14 May 2010 15:10:03 +0000
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.69)
(envelope-from
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Fri, 14 May 2010 15:55:01 +0000
Resent-Message-ID: ) id 1OCxDC-0006NK-5U
for 6176@debbugs.gnu.org; Fri, 14 May 2010 11:54:06 -0400
Received: (qmail 93816 invoked from network); 14 May 2010 15:54:03 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 14 May 2010 15:54:03 -0000
Message-ID: <4BED71AA.8010305@draigBrady.com>
Date: Fri, 14 May 2010 16:52:10 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com> <4BE9EAB8.8010703@draigBrady.com> <4BEAB2D3.3010801@redhat.com> <4BEAB363.1070009@redhat.com> <4BEAB609.6050201@draigBrady.com>
<4BED4BBB.4000908@draigBrady.com> <4BED67BA.3010605@redhat.com>
In-Reply-To: <4BED67BA.3010605@redhat.com>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.8 (--)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Cc: 6176@debbugs.gnu.org
Received: via spool by 6176-submit@debbugs.gnu.org id=B6176.12738721973937
(code B ref 6176); Fri, 14 May 2010 21:24:02 +0000
Received: (at 6176) by debbugs.gnu.org; 14 May 2010 21:23:17 +0000
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.69)
(envelope-from
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Fri, 14 May 2010 21:50:02 +0000
Resent-Message-ID: ) id 1OD2lF-0001Bp-4L
for 6176@debbugs.gnu.org; Fri, 14 May 2010 17:49:37 -0400
Received: (qmail 46885 invoked from network); 14 May 2010 21:49:34 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 14 May 2010 21:49:34 -0000
Message-ID: <4BEDC4FD.8080707@draigBrady.com>
Date: Fri, 14 May 2010 22:47:41 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com> <4BE9EAB8.8010703@draigBrady.com> <4BEAB2D3.3010801@redhat.com> <4BEAB363.1070009@redhat.com> <4BEAB609.6050201@draigBrady.com>
<4BED4BBB.4000908@draigBrady.com> <4BEDBF40.60407@cs.ucla.edu>
In-Reply-To: <4BEDBF40.60407@cs.ucla.edu>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -2.8 (--)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Sat, 15 May 2010 00:29:02 +0000
Resent-Message-ID: ) id 1OD5En-0002Ch-4y
for 6176@debbugs.gnu.org; Fri, 14 May 2010 20:28:17 -0400
Received: (qmail 65031 invoked from network); 15 May 2010 00:28:15 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 15 May 2010 00:28:15 -0000
Message-ID: <4BEDEA2D.5040102@draigBrady.com>
Date: Sat, 15 May 2010 01:26:21 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com> <4BE9EAB8.8010703@draigBrady.com> <4BEAB2D3.3010801@redhat.com> <4BEAB363.1070009@redhat.com> <4BEAB609.6050201@draigBrady.com> <4BED4BBB.4000908@draigBrady.com>
<4BEDBF40.60407@cs.ucla.edu> <4BEDC4FD.8080707@draigBrady.com>
In-Reply-To: <4BEDC4FD.8080707@draigBrady.com>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -1.9 (-)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id:
Original-Sender: debbugs-submit-bounces@debbugs.gnu.org
Resent-To: owner@debbugs.gnu.org
Resent-CC: bug-coreutils@gnu.org
Resent-Date: Sat, 22 May 2010 13:24:02 +0000
Resent-Message-ID: ) id 1OFogH-0003k9-39
for 6176@debbugs.gnu.org; Sat, 22 May 2010 09:23:57 -0400
Received: (qmail 37905 invoked from network); 22 May 2010 13:23:51 -0000
Received: from unknown (HELO ?192.168.2.25?) (84.203.137.218)
by mail1.slb.deg.dub.stisp.net with SMTP; 22 May 2010 13:23:51 -0000
Message-ID: <4BF7DA50.5020003@draigBrady.com>
Date: Sat, 22 May 2010 14:21:20 +0100
From: =?UTF-8?Q?P=C3=A1draig?= Brady
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3
MIME-Version: 1.0
References: <4BE9E0B1.5030308@draigBrady.com> <4BE9EAB8.8010703@draigBrady.com> <4BEAB2D3.3010801@redhat.com> <4BEAB363.1070009@redhat.com> <4BEAB609.6050201@draigBrady.com> <4BED4BBB.4000908@draigBrady.com> <4BEDBF40.60407@cs.ucla.edu>
<4BEDC4FD.8080707@draigBrady.com> <4BEDEA2D.5040102@draigBrady.com>
In-Reply-To: <4BEDEA2D.5040102@draigBrady.com>
X-Enigmail-Version: 1.0.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Spam-Score: -1.6 (-)
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: