GNU bug report logs -
#6643
[PATCH] sort: fix a bug with sort -u and xmemcoll0, and tune keycompare
Previous Next
Reported by: Paul Eggert <eggert <at> CS.UCLA.EDU>
Date: Thu, 15 Jul 2010 23:34:02 UTC
Severity: normal
Tags: patch
Done: Jim Meyering <jim <at> meyering.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6643 in the body.
You can then email your comments to 6643 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6643
; Package
coreutils
.
(Thu, 15 Jul 2010 23:34:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Paul Eggert <eggert <at> CS.UCLA.EDU>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Thu, 15 Jul 2010 23:34:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Well, after my brave words about xmemcoll0 unlikely to be misused in 'sort'
I found a misuse of it. With sort -u, write_bytes replaces the trailing NUL
with a newline but it doesn't change it back after writing. In some cases
"sort -u" will output a line and then later use it as an argument for
comparison, which will mess up if the trailing NUL has been replaced.
I pushed the following patch to work around the problem.
Chen, can you please verify that "sort -u" does not access the same
line from multiple threads, even saved lines? If it does, then even this patch
is not enough: we would need to alter write_bytes so that it does not
modify its argument at all.
This patch also tunes keycompare slightly. I should have
broken it up into a different patch, but it sort of snuck in.
By the way, any objection if I put 'const' after the types that it
qualifies, e.g., "char const *" rather than "const char *"? That
was the usual style here.
Thanks.
From 5a31febb1b363d9a3a59ed60d5f2051d520dd407 Mon Sep 17 00:00:00 2001
From: Paul R. Eggert <eggert <at> cs.ucla.edu>
Date: Thu, 15 Jul 2010 16:24:03 -0700
Subject: [PATCH] sort: fix a bug with sort -u and xmemcoll0, and tune keycompare
* src/sort.c (keycompare): Use xmemcoll0, as it avoids
a couple of stores.
(write_bytes): Leave the buffer the way we found it,
as it might be used again for a later comparison,
if -u is used.
---
src/sort.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/src/sort.c b/src/sort.c
index 7d31878..960df74 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -2497,7 +2497,9 @@ keycompare (const struct line *a, const struct line *b, bool show_debug)
}
}
- diff = xmemcoll (copy_a, new_len_a, copy_b, new_len_b);
+ copy_a[new_len_a++] = '\0';
+ copy_b[new_len_b++] = '\0';
+ diff = xmemcoll0 (copy_a, new_len_a, copy_b, new_len_b);
if (sizeof buf < size)
free (copy_a);
@@ -2664,13 +2666,11 @@ write_bytes (struct line const *line, FILE *fp, char const *output_file)
{
char *buf = line->text;
size_t n_bytes = line->length;
-
- *(buf + n_bytes - 1) = eolchar;
+ char *ebuf = buf + n_bytes;
/* Convert TABs to '>' and \0 to \n when -z specified. */
if (debug && fp == stdout)
{
- char const *ebuf = buf + n_bytes;
char const *c = buf;
while (c < ebuf)
@@ -2678,7 +2678,7 @@ write_bytes (struct line const *line, FILE *fp, char const *output_file)
char wc = *c++;
if (wc == '\t')
wc = '>';
- else if (wc == 0 && eolchar == 0)
+ else if (c == ebuf)
wc = '\n';
if (fputc (wc, fp) == EOF)
die (_("write failed"), output_file);
@@ -2688,8 +2688,10 @@ write_bytes (struct line const *line, FILE *fp, char const *output_file)
}
else
{
+ ebuf[-1] = eolchar;
if (fwrite (buf, 1, n_bytes, fp) != n_bytes)
die (_("write failed"), output_file);
+ ebuf[-1] = '\0';
}
}
--
1.7.1
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6643
; Package
coreutils
.
(Fri, 16 Jul 2010 00:06:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 6643 <at> debbugs.gnu.org (full text, mbox):
Hi Professor Eggert,
> Chen, can you please verify that "sort -u" does not access the same
> line from multiple threads, even saved lines? If it does, then even this patch
> is not enough: we would need to alter write_bytes so that it does not
> modify its argument at all.
Off the top of my head, no line is ever accessed by multiple threads
at the same time.
I'll go through the code tonight when I get home to doubly make sure.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6643
; Package
coreutils
.
(Fri, 16 Jul 2010 00:48:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 6643 <at> debbugs.gnu.org (full text, mbox):
On 16/07/10 00:33, Paul Eggert wrote:
> Well, after my brave words about xmemcoll0 unlikely to be misused in 'sort'
> I found a misuse of it. With sort -u, write_bytes replaces the trailing NUL
> with a newline but it doesn't change it back after writing. In some cases
> "sort -u" will output a line and then later use it as an argument for
> comparison, which will mess up if the trailing NUL has been replaced.
> I pushed the following patch to work around the problem.
That looks correct.
Sorry for not spotting it.
I had intended to look at the -u paths but didn't
when I considered that all the -u tests had passed.
But I now realize that they're run under LC_ALL=C
I'll add a few basic tests I think to run under $mb_locale
> Chen, can you please verify that "sort -u" does not access the same
> line from multiple threads, even saved lines? If it does, then even this patch
> is not enough: we would need to alter write_bytes so that it does not
> modify its argument at all.
>
> This patch also tunes keycompare slightly. I should have
> broken it up into a different patch, but it sort of snuck in.
I didn't do that because then you're just doing what xmemcoll()
is doing anyway. I.E. the other xmemcoll0() is more cache
efficient because it doesn't need to write anything in the
normal case for each comparison.
> By the way, any objection if I put 'const' after the types that it
> qualifies, e.g., "char const *" rather than "const char *"? That
> was the usual style here.
Yep that's fine.
cheers,
Pádraig.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6643
; Package
coreutils
.
(Fri, 16 Jul 2010 01:24:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 6643 <at> debbugs.gnu.org (full text, mbox):
On 07/15/10 17:46, Pádraig Brady wrote:
> I'll add a few basic tests I think to run under $mb_locale
Thanks. Jim also reminded me that I should add at least one test for
this bug, so I just pushed the one enclosed below. The test is
locale-dependent and is not guaranteed to catch the bug on every host,
but it did catch the bug in my RHEL 5 build host and that's good
enough.
>> This patch also tunes keycompare slightly. I should have
>> broken it up into a different patch, but it sort of snuck in.
>
> I didn't do that because then you're just doing what xmemcoll()
> is doing anyway. I.E. the other xmemcoll0() is more cache
> efficient because it doesn't need to write anything in the
> normal case for each comparison.
It's still a win in terms of number of memory access, no? as it avoids
useless saves and restores of the junk byte after the lines. Also,
with my pedantic hat on, it avoids access to uninitialized storage
(that junk byte) and thus avoids undefined behavior (though this isn't
likely to be a problem on any practical non-debugging environment).
Anywhere, here's the further patch.
From 5ab0257f7df3658ee1ff01172b2b2ed307432155 Mon Sep 17 00:00:00 2001
From: Paul R. Eggert <eggert <at> cs.ucla.edu>
Date: Thu, 15 Jul 2010 18:13:08 -0700
Subject: [PATCH] sort: add a test case for the sort -u bug
* tests/Makefile.am (TESTS): Add misc/sort-unique.
* tests/misc/sort-unique: New file.
---
tests/Makefile.am | 1 +
tests/misc/sort-unique | 46 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+), 0 deletions(-)
create mode 100755 tests/misc/sort-unique
diff --git a/tests/Makefile.am b/tests/Makefile.am
index fccd000..4aa93bf 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -234,6 +234,7 @@ TESTS = \
misc/sort-merge-fdlimit \
misc/sort-month \
misc/sort-rand \
+ misc/sort-unique \
misc/sort-version \
misc/split-a \
misc/split-fail \
diff --git a/tests/misc/sort-unique b/tests/misc/sort-unique
new file mode 100755
index 0000000..e186d34
--- /dev/null
+++ b/tests/misc/sort-unique
@@ -0,0 +1,46 @@
+#!/bin/sh
+# Test 'sort -u'.
+
+# Copyright (C) 2010 Free Software Foundation, Inc.
+
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+if test "$VERBOSE" = yes; then
+ set -x
+ sort --version
+fi
+
+. $srcdir/test-lib.sh
+
+cat > in <<\EOF
+1
+2
+1
+3
+EOF
+
+cat > exp <<\EOF
+1
+2
+3
+EOF
+
+for LOC in C "$LOCALE_FR" "$LOCALE_FR_UTF8"; do
+ test -z "$LOC" && continue
+
+ LC_ALL=$LOC sort -u in > out || { fail=1; break; }
+ compare exp out || { fail=1; break; }
+done
+
+Exit $fail
--
1.7.1
Information forwarded
to
owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org
:
bug#6643
; Package
coreutils
.
(Fri, 16 Jul 2010 10:07:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 6643 <at> debbugs.gnu.org (full text, mbox):
> Chen, can you please verify that "sort -u" does not access the same
> line from multiple threads, even saved lines? If it does, then even this patch
> is not enough: we would need to alter write_bytes so that it does not
> modify its argument at all.
I just went over the code, pretty big nostalgia trip :-)
write_unique's saved line has already worked its way up to the root node
of the tree; that's to say whatever sorting is to be done on it has already
been done. Thus, none of the merging threads will ever call compare on
it, since they would only do work on remaining un-fully-sorted, yet-to-be-
output lines. Also, only one call to write_unique will ever occur at one
time on one buffer.
The external merge process also makes use of a saved line, but that part
is sequential I believe. Joey's external merge patch might play some games
with it, but I've only skimmed their code. We'll have to cross that bridge
when we get there.
Reply sent
to
Jim Meyering <jim <at> meyering.net>
:
You have taken responsibility.
(Sun, 17 Apr 2011 09:10:03 GMT)
Full text and
rfc822 format available.
Notification sent
to
Paul Eggert <eggert <at> CS.UCLA.EDU>
:
bug acknowledged by developer.
(Sun, 17 Apr 2011 09:10:03 GMT)
Full text and
rfc822 format available.
Message #22 received at 6643-done <at> debbugs.gnu.org (full text, mbox):
Paul Eggert wrote:
> On 07/15/10 17:46, Pádraig Brady wrote:
>> I'll add a few basic tests I think to run under $mb_locale
>
> Thanks. Jim also reminded me that I should add at least one test for
> this bug, so I just pushed the one enclosed below. The test is
> locale-dependent and is not guaranteed to catch the bug on every host,
> but it did catch the bug in my RHEL 5 build host and that's good
> enough.
Thanks. Closing.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 15 May 2011 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 14 years and 40 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.