Package: coreutils;
Reported by: belhage <at> midibel.com
Date: Thu, 5 Aug 2010 20:09:01 UTC
Severity: normal
Done: Bob Proulx <bob <at> proulx.com>
Bug is archived. No further changes may be made.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Lars Ole Belhage <belhage <at> midibel.com> To: bug-coreutils <at> gnu.org Subject: Re: Bug-coreutils Digest, Vol 93, Issue 6 Date: Thu, 05 Aug 2010 19:56:21 +0200
On Thu, 2010-08-05 at 12:03 -0400, bug-coreutils-request <at> gnu.org wrote: > Send Bug-coreutils mailing list submissions to > bug-coreutils <at> gnu.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.gnu.org/mailman/listinfo/bug-coreutils > or, via email, send a message with subject or body 'help' to > bug-coreutils-request <at> gnu.org > > You can reach the person managing the list at > bug-coreutils-owner <at> gnu.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bug-coreutils digest..." > > > Today's Topics: > > 1. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming > collision with coreutils) (Paul Eggert) > 2. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming > collision with coreutils) (Simon Josefsson) > 3. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming > collision with coreutils) (Paolo Bonzini) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 04 Aug 2010 16:21:28 -0700 > From: Paul Eggert <eggert <at> CS.UCLA.EDU> > Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming > collision with coreutils) > To: Bruno Haible <bruno <at> clisp.org> > Cc: Bug-gnulib <bug-gnulib <at> gnu.org>, Bug-coreutils > <bug-coreutils <at> gnu.org> > Message-ID: <4C59F5F8.9080309 <at> cs.ucla.edu> > Content-Type: text/plain; charset=UTF-8 > > On 08/03/10 16:33, Bruno Haible wrote: > > But when the stack buffer is not sufficient, then the use of coreutils memxfrm > > is 30% to 70% slower than the use of gnulib memxfrm, with a difference of > > 700 sec at least. > > (Ooo! Ooo! Performance measurements! I love this stuff!) > > It depends on the data. In the typical case, "sort" is applied to > text data, which does not contain NUL bytes. The data in that > benchmark contained many NUL bytes. If you take the same benchmark > and uniformly replace "\0" with "\t" in compare.c, then the situation > is much different: coreutils memxfrm is about 3 times faster than > gnulib memxfrm on the larger test cases (this platform is Ubuntu > 10.04, gcc 4.5.0, 2.4 GHz Pentium 4): > > 503-penguin $ gcc -std=gnu99 -O2 -Wall coreutils-memxfrm.c gnulib-memxfrm.c compare1.c -I. > 504-penguin $ ./a.out > Time for gnulib_memxfrm: 0,032002 > Time for coreutils_memxfrm: 0,028001 > Time for gnulib_memxfrm: 0,024002 > Time for coreutils_memxfrm: 0,024001 > Time for gnulib_memxfrm: 0,036003 > Time for coreutils_memxfrm: 0,032002 > Time for gnulib_memxfrm: 18,2051 > Time for coreutils_memxfrm: 5,48834 > Time for gnulib_memxfrm: 16,045 > Time for coreutils_memxfrm: 5,50034 > Time for gnulib_memxfrm: 15,837 > Time for coreutils_memxfrm: 5,44834 > > I expect that this performance glitch in gnulib memxfrm could be > improved, as it shouldn't simply double buffer sizes when they're too > small, as at that point it already knows what the final buffer size > should be. Doing this should bring up gnulib memxfrm to be as fast as > coreutils xmemxfrm for this benchmark. Also, I agree that gnulib > memxfrm is faster in some important cases. Still, gnulib memxfrm is > problematic, because it insists on managing memory itself. > > Come to think of it, looking at gnulib memxfrm gave me an idea > to improve the performance of GNU sort by bypassing the need for an > memxfrm-like function entirely. I pushed a patch to do that at > <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>. > > This avoids any potential naming collision for now. The point > remains, though, that it's confusing that gnulib memxfrm's name begins > with "mem", as the mem* functions don't allocate memory. Would you > consider a patch that renames gnulib memxfrm to amemxfrm, or to some > other such name? > > > > > > > ------------------------------ > > Message: 2 > Date: Thu, 05 Aug 2010 01:44:30 +0200 > From: Simon Josefsson <simon <at> josefsson.org> > Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming > collision with coreutils) > To: Paul Eggert <eggert <at> CS.UCLA.EDU> > Cc: Bug-gnulib <bug-gnulib <at> gnu.org>, Bug-coreutils > <bug-coreutils <at> gnu.org>, Bruno Haible <bruno <at> clisp.org> > Message-ID: <8739uurli9.fsf <at> mocca.josefsson.org> > Content-Type: text/plain; charset=us-ascii > > Paul Eggert <eggert <at> CS.UCLA.EDU> writes: > > > Come to think of it, looking at gnulib memxfrm gave me an idea > > to improve the performance of GNU sort by bypassing the need for an > > memxfrm-like function entirely. I pushed a patch to do that at > > <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>. > > I don't know this code at all, but would your approach lead to problems > if two different strings have the same MD5 hash? MD5 is broken, and > finding collisions takes just seconds on normal PC. See: > http://en.wikipedia.org/wiki/MD5#Security > > /Simon > > > > > > ------------------------------ > > Message: 3 > Date: Thu, 05 Aug 2010 04:58:26 +0200 > From: Paolo Bonzini <bonzini <at> gnu.org> > Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming > collision with coreutils) > To: bug-gnulib <at> gnu.org, Bug-coreutils <bug-coreutils <at> gnu.org>, Paul > Eggert <eggert <at> cs.ucla.edu> > Message-ID: <4C5A28D2.3040102 <at> gnu.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 08/05/2010 01:44 AM, Simon Josefsson wrote: > > Paul Eggert<eggert <at> CS.UCLA.EDU> writes: > > > >> Come to think of it, looking at gnulib memxfrm gave me an idea > >> to improve the performance of GNU sort by bypassing the need for an > >> memxfrm-like function entirely. I pushed a patch to do that at > >> <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>. > > > > I don't know this code at all, but would your approach lead to problems > > if two different strings have the same MD5 hash? MD5 is broken, and > > finding collisions takes just seconds on normal PC. See: > > http://en.wikipedia.org/wiki/MD5#Security > > MD5 is used simply as a kind of "random key generator", so it doesn't > matter. I wonder two things however: > > 1) why bother with memxfrm as a tie-breaker? isn't memcmp good enough? > > 2) maybe there's something cheaper than md5 that can be used? For > example you could compare a^x and b^x where x is the output of a fast > 32-bit random number generator? It doesn't need to be cryptographic, > I'd pick http://en.wikipedia.org/wiki/Xorshift. > > Paolo > > > > > > ------------------------------ > > _______________________________________________ > Bug-coreutils mailing list > Bug-coreutils <at> gnu.org > http://lists.gnu.org/mailman/listinfo/bug-coreutils > > > End of Bug-coreutils Digest, Vol 93, Issue 6 > ********************************************
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.