Hi Paul, > I pushed a patch to do that at > . The idea to allocate enough memory before calling strxfrm also gives a speedup in this case. Done through the attached patch. I called 'sort' like this: $ for i in `seq 10`; do time LC_ALL=de_DE.UTF-8 ./sort -R < input100 > output; done where the input100 file contains 100 copies of the attached 2-lines file. Timings before the patch: real 0m9.512s user 0m18.401s sys 0m0.468s real 0m8.871s user 0m17.033s sys 0m0.544s real 0m8.742s user 0m16.777s sys 0m0.472s real 0m8.784s user 0m16.829s sys 0m0.480s real 0m8.657s user 0m16.665s sys 0m0.452s real 0m8.700s user 0m16.737s sys 0m0.484s real 0m8.665s user 0m16.569s sys 0m0.500s real 0m8.826s user 0m16.937s sys 0m0.464s real 0m8.827s user 0m16.985s sys 0m0.428s real 0m8.680s user 0m16.765s sys 0m0.356s Timings with the patch: real 0m5.886s user 0m11.161s sys 0m0.384s real 0m5.137s user 0m9.705s sys 0m0.408s real 0m5.150s user 0m9.753s sys 0m0.404s real 0m5.090s user 0m9.697s sys 0m0.348s real 0m5.158s user 0m9.753s sys 0m0.420s real 0m5.149s user 0m9.825s sys 0m0.360s real 0m5.134s user 0m9.765s sys 0m0.364s real 0m5.080s user 0m9.669s sys 0m0.332s real 0m5.052s user 0m9.625s sys 0m0.336s real 0m5.084s user 0m9.713s sys 0m0.288s Total user time before: 169.698 sec Total user time with the patch: 98.666 sec Speedup: factor 1.72. 2010-08-08 Bruno Haible sort: reduce number of strxfrm calls * src/sort.c (compare_random): Allocate enough memory ahead of time, so that usually only one call to strxfrm is necessary for each string part. *** src/sort.c.orig Sun Aug 8 13:11:01 2010 --- src/sort.c Sun Aug 8 13:10:45 2010 *************** *** 2047,2052 **** --- 2047,2080 ---- /* Store the transformed data into a big-enough buffer. */ + /* A call to strxfrm costs about 20 times more than a call to + strdup of the result. Therefore it is worth to try to avoid + calling strxfrm more than once on a given string, by making + enough room before calling strxfrm. + The size of the strxfrm result of a string of length len is + likely to be between len and 3 * len. */ + if (lena + lenb >= lena && lena + lenb < SIZE_MAX / 3) + { + size_t new_bufsize = 3 * (lena + lenb) + 1; /* no overflow */ + if (new_bufsize > bufsize) + { + if (bufsize < SIZE_MAX / 3 * 2) + { + /* Ensure proportional growth of bufsize. */ + if (new_bufsize < bufsize + bufsize / 2) + new_bufsize = bufsize + bufsize / 2; + } + char *new_buf = malloc (new_bufsize); + if (new_buf != NULL) + { + if (buf != stackbuf) + free (buf); + buf = new_buf; + bufsize = new_bufsize; + } + } + } + size_t sizea = (texta < lima ? xstrxfrm (buf, texta, bufsize) + 1 : 0); bool a_fits = sizea <= bufsize;