GNU bug report logs - #9780
sort -u throws out non-duplicates

Previous Next

Package: coreutils;

Reported by: Bernhard Rosenkraenzer <bero <at> bero.eu>

Date: Tue, 18 Oct 2011 01:04:02 UTC

Severity: normal

Tags: moreinfo

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #85 received at 9780 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9780 <at> debbugs.gnu.org, Rasmus Borup Hansen <rbh <at> intomics.com>
Subject: Re: bug#9780: sort -u throws out non-duplicates
Date: Fri, 17 Aug 2012 23:09:15 +0200
Paul Eggert wrote:
> OK, I scratched my head for a bit and came up with the following
> further patch, which addresses the issues that I mentioned.
...
> Subject: [PATCH] sort: simpler fix for sort -u data-loss bug
>
> * src/sort.c (overlap): Remove.
> (fillbuf): Do not try to copy saved lines, as that is too risky
> in the presence of parallelism, reallocated buffers, etc.
> (sort): Invalidate any saved line before sorting a new batch.
> ---
>  src/sort.c |   36 +-----------------------------------

Very nice!  That fixes not just the original bug, but also the FMR,
and eliminates my entire patch.  The only cost is in writing at most
one more line per buffer.

I hate to look such a nice gift horse in the mouth, but it's getting
late here...  Would you mind adjusting that to add NEWS and mention that
you've fixed the second, free-memory-read bug, too?

And even add the test?
If you don't find time, I'll get to that over the weekend.

===============
Regarding your patch...

For the record, at first I thought an input that used one (long) line per
buffer would make --unique a no-op, but then I realized that in that case,
each buffers-worth (one line each) would be written to its own temporary
file, and the merge phase would handle the --unique semantics.

Thanks again!




This bug report was last modified 12 years and 278 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.