GNU bug report logs - #6789
propose renaming gnulib memxfrm to amemxfrm (naming collision with coreutils)

Previous Next

Package: coreutils;

Reported by: Paul Eggert <eggert <at> CS.UCLA.EDU>

Date: Tue, 3 Aug 2010 19:47:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #41 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> CS.UCLA.EDU>
To: Bruno Haible <bruno <at> clisp.org>
Cc: bug-coreutils <at> gnu.org
Subject: Re: propose renaming gnulib memxfrm to amemxfrm (naming collision
	with coreutils)
Date: Sun, 08 Aug 2010 23:21:29 -0700
On 08/08/10 05:24, Bruno Haible wrote:
> sort: reduce number of strxfrm calls

Thanks for that suggestion.  Amusingly enough, it made 'sort -R'
slower on the first benchmark I tried it on, which was 'sort -R *'.
But that's an unfair benchmark, since '*' expanded to executables and
other non-text files.  Overall, it's a good idea.  However, the code
need not be quite that long, since there's no need to do size_t
overflow checking.  I pushed this:

From 0061819c7e1bbc26586cc5977ea96da016f7cea2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Sun, 8 Aug 2010 23:14:38 -0700
Subject: [PATCH] sort: speed up -R with long lines in hard locales

* src/sort.c (compare_random): Guess that the output will be
3X the input.  This avoids the overhead of calling strxfrm
twice on typical implementations.  Suggested by Bruno Haible.
---
 src/sort.c |   18 +++++++++++++-----
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/src/sort.c b/src/sort.c
index dcfd24f..148ed3e 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -2024,6 +2024,7 @@ compare_random (char *restrict texta, size_t lena,
   char stackbuf[4000];
   char *buf = stackbuf;
   size_t bufsize = sizeof stackbuf;
+  void *allocated = NULL;
   uint32_t dig[2][MD5_DIGEST_SIZE / sizeof (uint32_t)];
   struct md5_ctx s[2];
   s[0] = s[1] = random_md5_state;
@@ -2047,6 +2048,16 @@ compare_random (char *restrict texta, size_t lena,
 
           /* Store the transformed data into a big-enough buffer.  */
 
+          /* A 3X size guess avoids the overhead of calling strxfrm
+             twice on typical implementations.  Don't worry about
+             size_t overflow, as the guess need not be correct.  */
+          size_t guess_bufsize = 3 * (lena + lenb) + 2;
+          if (bufsize < guess_bufsize)
+            {
+              bufsize = MAX (guess_bufsize, bufsize * 3 / 2);
+              buf = allocated = xrealloc (allocated, bufsize);
+            }
+
           size_t sizea =
             (texta < lima ? xstrxfrm (buf, texta, bufsize) + 1 : 0);
           bool a_fits = sizea <= bufsize;
@@ -2062,9 +2073,7 @@ compare_random (char *restrict texta, size_t lena,
               bufsize = sizea + sizeb;
               if (bufsize < SIZE_MAX / 3)
                 bufsize = bufsize * 3 / 2;
-              buf = (buf == stackbuf
-                     ? xmalloc (bufsize)
-                     : xrealloc (buf, bufsize));
+              buf = allocated = xrealloc (allocated, bufsize);
               if (texta < lima)
                 strxfrm (buf, texta, sizea);
               if (textb < limb)
@@ -2119,8 +2128,7 @@ compare_random (char *restrict texta, size_t lena,
       diff = xfrm_diff;
     }
 
-  if (buf != stackbuf)
-    free (buf);
+  free (allocated);
 
   return diff;
 }
-- 
1.7.2






This bug report was last modified 14 years and 6 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.