Package: coreutils;
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Tue, 27 Sep 2011 14:50:02 UTC
Severity: normal
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: help-debbugs <at> gnu.org (GNU bug Tracking System) To: Jim Meyering <jim <at> meyering.net> Cc: tracker <at> debbugs.gnu.org Subject: bug#9612: closed (sort: avoid a NaN-induced infloop) Date: Thu, 06 Oct 2011 14:58:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Thu, 06 Oct 2011 16:56:57 +0200 with message-id <87k48iyxqe.fsf <at> rho.meyering.net> and subject line Re: bug#9612: sort: avoid a NaN-induced infloop has caused the debbugs.gnu.org bug report #9612, regarding sort: avoid a NaN-induced infloop to be marked as done. (If you believe you have received this mail in error, please contact help-debbugs <at> gnu.org.) -- 9612: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9612 GNU Bug Tracking System Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net> To: bug-coreutils <at> gnu.org Subject: sort: avoid a NaN-induced infloop Date: Tue, 27 Sep 2011 16:48:45 +0200This was reported by Aaron Denney in http://bugs.debian.org/642557. Who would have thought that including a few NaNs in the input to sort would make it infloop. The original failure arose only when sort was reading from a pipe: yes -- -nan | head -156903 | sort -g > /dev/null But it's not always easy to reproduce. For example, on a uni-core Debian unstable amd64 system, the bug strikes every time with the distro-supplied binary, but not at all on a fedora rawhide-supplied binary in a multi-core VM. However, when I compile coreutils-8.13 myself on that same rawhide system, *that* sort fails every time. Setting up to debug was a little tricky. In order to invoke sort-reading-from-pipe from gdb, I had to use a fifo: mkfifo sort-test-pipe yes -- -nan | head -156903 | sort-test-pipe & Then, with gdb, you can do this: gdb sort run -g < sort-test-pipe Debugging it, I was surprised to see the infinite loop iterating through these "*"-marked lines, with nfiles == 2 and always resetting i = 0: (gdb) l 2893 Since this only reorders two items if one is strictly greater than 2894 the other, it is stable. */ 2895 for (i = 0; i < nfiles; ++i) 2896 ord[i] = i; * 2897 for (i = 1; i < nfiles; ++i) * 2898 if (0 < compare (cur[ord[i - 1]], cur[ord[i]])) * 2899 t = ord[i - 1], ord[i - 1] = ord[i], ord[i] = t, i = 0; 2900 2901 /* Repeatedly output the smallest line until no input remains. */ 2902 while (nfiles) (gdb) p ord[i-1] $6 = 0 (gdb) p ord[i] $7 = 1 (gdb) p compare (cur[0], cur[1]) $8 = 64 (gdb) p compare (cur[1], cur[0]) $9 = 64 That shows cur[0] > cur[1], and then, on the very next line a contradiction, cur[1] > cur[0] But it's not the classic NaN problem at all. That would be too easy ;-) That code already handles NaNs (via the general_numcompare function), detecting the unusual case and comparing NaNs with memcmp. Here's most of the general_numcompare function: static int general_numcompare (char const *sa, char const *sb) { char *ea; char *eb; long_double a = strtold (sa, &ea); long_double b = strtold (sb, &eb); ... /* Sort numbers in the usual way, where -0 == +0. Put NaNs after conversion errors but before numbers; sort them by internal bit-pattern, for lack of a more portable alternative. */ return (a < b ? -1 : a > b ? 1 : a == b ? 0 : b == b ? -1 : a == a ? 1 : memcmp (&a, &b, sizeof a)); } What could cause such bogosity? Here's the quick answer: The statement "long double x = NAN;" (inside glibc's strtold) leaves many bits of "x" uninitialized. For most uses of a NaN, that doesn't matter, but here, since we're actually trying to order the different NaNs, it makes all of the difference. [ I'm about to file the glibc bug -- or maybe it's a bug in the definition of gcc's __builtin_nan. Haven't looked yet. ] What happens when we infloop is that the surrounding code always happens to ensure that differing bits are 0x40 in the left operand and 0x00 in the right operand to memcmp. Then you will see precisely the above behavior. One "fix" is to return 0 rather than bothering with the memcmp call above, thus treating all NaNs as equal, rather than attempting to order them based on their internal bit patterns. But that is undesirable since it would render sort's output indeterminate whenever it contains different types of NaN values. E.g., NaN, NaN(1), NaN(2), etc. all result in different bit patterns. For example, currently, this works as you'd expect on a system with a NaN(...)-aware strtold, like anything glibc-based: $ for i in $(seq 9); do echo "NaN($i)";done|sort -gr NaN(9) NaN(8) NaN(7) NaN(6) NaN(5) NaN(4) NaN(3) NaN(2) NaN(1) If we were to remove the memcmp, those would not be sorted. ---------------------------------------- A kludgey fix that serves to demonstrate the problem would be to replace the two strtold invocations with these: long_double a = 0; a = strtold (sa, &ea); long_double b = 0; b = strtold (sb, &eb); That does the job at least when compiling with gcc -ggdb3, but minimal optimization could well eliminate the dead stores. ---------------------------------------- The real solution is to change glibc's strtold so that its "long double" result contains no uninitialized bit. In the shorter term, I'd like a gnulib strtold module that works around the problem. However, gnulib's strtod.c simply ignores the digits in NaN(DDDDD), while glibc's implementation actually uses them. That should be easy to fix, right? Just merge in the glibc changes? But no, they rely on ieee754.h's "union ieee854_long_double", and that is not portable. So I have resorted to an even shorter-term work-around in the patch below. =-================================= Consequences: whenever qsort's comparison function "lies" (i.e., returns 1 for a < b, -1 for a > b, or 0 for a != b, qsort's behavior is undefined (i.e., it may well segfault). GNU sort(1) uses qsort only incidentally, to sort month names, and nowhere else, but now, you've seen that sort(1) had a similar problem when its comparison function is inconsistent. -------------------------------------------------- BTW, once I found the problem code, I realized I could provoke the infloop with two one-line inputs using sort's --merge option: echo nan > 1; sort -m -g 1 1 -------------------------------- Here's a little program to demonstrate the problem. Note how the results differ with compilation options: #include <stdlib.h> #include <stdio.h> static char * fmt_nan (long double x) { unsigned int i; static char buf[33]; unsigned char const *p = (unsigned char const *) &x; for (i = 0; i < sizeof x; i++) sprintf (buf + 2*i, "%02x", *p++); return buf; } int main () { const char *q = "nan"; long double x = strtold (q, NULL); printf ("%s\n", fmt_nan (x)); x = 0; x = strtold (q, NULL); printf ("%s\n", fmt_nan (x)); return 0; } $ gcc -O0 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out 00000000000000c0ff7f400000000000 00000000000000c0ff7f000000000000 $ gcc -O1 -Wall -Wextra -W /t/strtold-bogosity.c && ./a.out 00000000000000c0ff7f000000000000 00000000000000c0ff7f000000000000 ------------------------------------------------------- And here's the patch I expect to apply for now: From 17f70721d10f02543ae9e2d4e2c4b2babe0206a7 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyering <at> redhat.com> Date: Tue, 27 Sep 2011 16:32:35 +0200 Subject: [PATCH] sort: avoid a NaN-induced infloop These commands would fail to terminate: yes -- -nan | head -156903 | sort -g > /dev/null echo nan > F; sort -m -g F F That can happen with any strtold implementation that includes uninitialized data in its return value. The problem arises in the mergefps function when bubble-sorting the two or more lines, each from one of the input streams being merged: compare(a,b) returns 64, yet compare(b,a) also returns a positive value. With a broken comparison function like that, the bubble sort never terminates. Why do the long-double bit strings corresponding to two identical "nan" strings not compare equal? Because some parts of the result are uninitialized and thus depend on the state of the stack. For more details, see http://bugs.gnu.org/FIXME. * src/sort.c (nan_compare): New function. (general_numcompare): Use it rather than bare memcmp. Reported by Aaron Denney in http://bugs.debian.org/642557. * NEWS (Bug fixes): Mention it. * tests/misc/sort-NaN-infloop: New file. * tests/Makefile.am (TESTS): Add it. --- NEWS | 4 ++++ src/sort.c | 20 +++++++++++++++++++- tests/Makefile.am | 1 + tests/misc/sort-NaN-infloop | 28 ++++++++++++++++++++++++++++ 4 files changed, 52 insertions(+), 1 deletions(-) create mode 100755 tests/misc/sort-NaN-infloop diff --git a/NEWS b/NEWS index 140e6fa..f05a088 100644 --- a/NEWS +++ b/NEWS @@ -2,6 +2,10 @@ GNU coreutils NEWS -*- outline -*- * Noteworthy changes in release ?.? (????-??-??) [?] +** Bug fixes + + sort -g no longer infloops for certain inputs containing NaNs + ** Improvements md5sum --check now supports the -r format from the corresponding BSD tool. diff --git a/src/sort.c b/src/sort.c index 3d3119d..3e94a6e 100644 --- a/src/sort.c +++ b/src/sort.c @@ -1910,6 +1910,24 @@ numcompare (char const *a, char const *b) return strnumcmp (a, b, decimal_point, thousands_sep); } +/* Work around a problem whereby the long double value returned by glibc's + strtold ("NaN", ...) contains uninitialized bits: clear all bytes of + A and B before calling strtold. FIXME: remove this function once + gnulib guarantees that strtold's result is always well defined. */ +static int +nan_compare (char const *sa, char const *sb) +{ + long_double a; + memset (&a, 0, sizeof a); + a = strtold (sa, NULL); + + long_double b; + memset (&b, 0, sizeof b); + b = strtold (sb, NULL); + + return memcmp (&a, &b, sizeof a); +} + static int general_numcompare (char const *sa, char const *sb) { @@ -1935,7 +1953,7 @@ general_numcompare (char const *sa, char const *sb) : a == b ? 0 : b == b ? -1 : a == a ? 1 - : memcmp (&a, &b, sizeof a)); + : nan_compare (sa, sb)); } /* Return an integer in 1..12 of the month name MONTH. diff --git a/tests/Makefile.am b/tests/Makefile.am index eeb4cab..2cf409a 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -250,6 +250,7 @@ TESTS = \ misc/sort-unique \ misc/sort-unique-segv \ misc/sort-version \ + misc/sort-NaN-infloop \ split/filter \ split/suffix-length \ split/b-chunk \ diff --git a/tests/misc/sort-NaN-infloop b/tests/misc/sort-NaN-infloop new file mode 100755 index 0000000..ead871e --- /dev/null +++ b/tests/misc/sort-NaN-infloop @@ -0,0 +1,28 @@ +#!/bin/sh +# exercise the NaN-infloop bug in coreutils-8.13 + +# Copyright (C) 2011 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +. "${srcdir=.}/init.sh"; path_prepend_ ../src +print_ver_ sort + +echo nan > F || fail=1 +printf 'nan\nnan\n' > exp || fail=1 +timeout 10 sort -g -m F F > out || fail=1 + +compare out exp || fail=1 + +Exit $fail -- 1.7.7.rc0.362.g5a14
[Message part 3 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net> To: 9612-done <at> debbugs.gnu.org Subject: Re: bug#9612: sort: avoid a NaN-induced infloop Date: Thu, 06 Oct 2011 16:56:57 +0200Jim Meyering wrote: > Jim Meyering wrote: >> Who would have thought that including a few NaNs in the input to sort >> would make it infloop. >> >> The original failure arose only when sort was reading from a pipe: >> >> yes -- -nan | head -156903 | sort -g > /dev/null >> >> But it's not always easy to reproduce. > ... >> [ I'm about to file the glibc bug > ... > > I've done that: > > RFE: strtold: do not include uninitialized bytes when converting "NaN" > http://sourceware.org/bugzilla/show_bug.cgi?id=13246 In the above BZ comments, Jakub Jelinek proposed something more robust, but I think it's not necessary. If we find an optimizer that thinks the added memset is useless, we'll discover it via the failing test, so I'd prefer not to add the volatile/union hack. I'm closing this.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.