GNU bug report logs - #9780
sort -u throws out non-duplicates

Previous Next

Package: coreutils;

Reported by: Bernhard Rosenkraenzer <bero <at> bero.eu>

Date: Tue, 18 Oct 2011 01:04:02 UTC

Severity: normal

Tags: moreinfo

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 9780 <at> debbugs.gnu.org, Rasmus Borup Hansen <rbh <at> intomics.com>
Subject: bug#9780: sort -u throws out non-duplicates
Date: Fri, 17 Aug 2012 21:53:06 +0200
Paul Eggert wrote:

> On 08/17/2012 12:36 PM, Jim Meyering wrote:
>> The first time the safe_text buffer is allocated
>> it will have to be disjoint from the line.text buffer
>> and from the buffer into which we're about to fread.
>> Thereafter, regardless of reallocation, overlap should
>> always be false.
>
> I haven't thought it through entirely, but I was
> worried about the case where there is a saved line
> but no saved_text, the buffer is reallocated, and

That is precisely what happens when this "(unique && ..." condition
is true for the first time (presuming you mean s/saved_text/safe_text/)

          /* With --unique, when we're about to read into a buffer that
             overlaps the saved "preceding" line (saved_line), copy the line's
             .text member to a realloc'd-as-needed temporary buffer and adjust
             the line's key-defining members if they're set.  */
          if (unique && overlap (ptr, readsize, &saved_line))
            {
              /* Copy saved_line.text into a buffer where it won't be clobbered
                 and if KEY is non-NULL, adjust saved_line.key* to match.  */
              static char *safe_text;
              static size_t safe_text_n_alloc;
              if (safe_text_n_alloc < saved_line.length)
                {
                  safe_text_n_alloc = saved_line.length;
                  safe_text = x2nrealloc (safe_text, &safe_text_n_alloc, 1);
                }
              memcpy (safe_text, saved_line.text, saved_line.length);
              if (key)
                {
                  #define s saved_line
                  s.keybeg = safe_text + (s.keybeg - s.text);
                  s.keylim = safe_text + (s.keylim - s.text);
                  #undef s
                }
              saved_line.text = safe_text;
            }

safe_text is initially NULL and we enter that block
only when we're about to fread into a buffer that overlaps
the current saved_line.text buffer.

In that case, we allocate an initial safe_text buffer,
copy saved_line.text into it, and update saved_line.text
to point to the just-allocated/initialized buffer.
Any test of overlap that compares that just-allocated
(or realloc'd) buffer with the about-to-be-fread-into
buffer will return false.

> then we test for overlap.  If the reallocated buffer
> does not overlap the original buffer, the test for
> overlap will fail even though the saved line needs
> to be copied into a new saved_text buffer.
>
> I'll stare at the code some more....




This bug report was last modified 12 years and 278 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.