GNU bug report logs - #6804
Bug-coreutils Digest, Vol 93, Issue 6

From: Lars Ole Belhage <belhage <at> midibel.com>
To: bug-coreutils <at> gnu.org
Subject: Re: Bug-coreutils Digest, Vol 93, Issue 6
Date: Thu, 05 Aug 2010 19:56:21 +0200

On Thu, 2010-08-05 at 12:03 -0400, bug-coreutils-request <at> gnu.org wrote:
> Send Bug-coreutils mailing list submissions to
> 	bug-coreutils <at> gnu.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.gnu.org/mailman/listinfo/bug-coreutils
> or, via email, send a message with subject or body 'help' to
> 	bug-coreutils-request <at> gnu.org
> 
> You can reach the person managing the list at
> 	bug-coreutils-owner <at> gnu.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bug-coreutils digest..."
> 
> 
> Today's Topics:
> 
>    1. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
>       collision with coreutils) (Paul Eggert)
>    2. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
>       collision with coreutils) (Simon Josefsson)
>    3. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
>       collision with coreutils) (Paolo Bonzini)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 04 Aug 2010 16:21:28 -0700
> From: Paul Eggert <eggert <at> CS.UCLA.EDU>
> Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
> 	collision with coreutils)
> To: Bruno Haible <bruno <at> clisp.org>
> Cc: Bug-gnulib <bug-gnulib <at> gnu.org>, Bug-coreutils
> 	<bug-coreutils <at> gnu.org>
> Message-ID: <4C59F5F8.9080309 <at> cs.ucla.edu>
> Content-Type: text/plain; charset=UTF-8
> 
> On 08/03/10 16:33, Bruno Haible wrote:
> > But when the stack buffer is not sufficient, then the use of coreutils memxfrm
> > is 30% to 70% slower than the use of gnulib memxfrm, with a difference of
> > 700 sec at least.
> 
> (Ooo!  Ooo!  Performance measurements!  I love this stuff!)
> 
> It depends on the data.  In the typical case, "sort" is applied to
> text data, which does not contain NUL bytes.  The data in that
> benchmark contained many NUL bytes.  If you take the same benchmark
> and uniformly replace "\0" with "\t" in compare.c, then the situation
> is much different: coreutils memxfrm is about 3 times faster than
> gnulib memxfrm on the larger test cases (this platform is Ubuntu
> 10.04, gcc 4.5.0, 2.4 GHz Pentium 4):
> 
>   503-penguin $ gcc -std=gnu99 -O2 -Wall coreutils-memxfrm.c gnulib-memxfrm.c compare1.c -I.
>   504-penguin $ ./a.out
>   Time for gnulib_memxfrm: 0,032002
>   Time for coreutils_memxfrm: 0,028001
>   Time for gnulib_memxfrm: 0,024002
>   Time for coreutils_memxfrm: 0,024001
>   Time for gnulib_memxfrm: 0,036003
>   Time for coreutils_memxfrm: 0,032002
>   Time for gnulib_memxfrm: 18,2051
>   Time for coreutils_memxfrm: 5,48834
>   Time for gnulib_memxfrm: 16,045
>   Time for coreutils_memxfrm: 5,50034
>   Time for gnulib_memxfrm: 15,837
>   Time for coreutils_memxfrm: 5,44834
> 
> I expect that this performance glitch in gnulib memxfrm could be
> improved, as it shouldn't simply double buffer sizes when they're too
> small, as at that point it already knows what the final buffer size
> should be.  Doing this should bring up gnulib memxfrm to be as fast as
> coreutils xmemxfrm for this benchmark.  Also, I agree that gnulib
> memxfrm is faster in some important cases.  Still, gnulib memxfrm is
> problematic, because it insists on managing memory itself.
> 
> Come to think of it, looking at gnulib memxfrm gave me an idea
> to improve the performance of GNU sort by bypassing the need for an
> memxfrm-like function entirely.  I pushed a patch to do that at
> <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>.
> 
> This avoids any potential naming collision for now.  The point
> remains, though, that it's confusing that gnulib memxfrm's name begins
> with "mem", as the mem* functions don't allocate memory.  Would you
> consider a patch that renames gnulib memxfrm to amemxfrm, or to some
> other such name?
> 
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 05 Aug 2010 01:44:30 +0200
> From: Simon Josefsson <simon <at> josefsson.org>
> Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
> 	collision with coreutils)
> To: Paul Eggert <eggert <at> CS.UCLA.EDU>
> Cc: Bug-gnulib <bug-gnulib <at> gnu.org>, Bug-coreutils
> 	<bug-coreutils <at> gnu.org>,	Bruno Haible <bruno <at> clisp.org>
> Message-ID: <8739uurli9.fsf <at> mocca.josefsson.org>
> Content-Type: text/plain; charset=us-ascii
> 
> Paul Eggert <eggert <at> CS.UCLA.EDU> writes:
> 
> > Come to think of it, looking at gnulib memxfrm gave me an idea
> > to improve the performance of GNU sort by bypassing the need for an
> > memxfrm-like function entirely.  I pushed a patch to do that at
> > <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>.
> 
> I don't know this code at all, but would your approach lead to problems
> if two different strings have the same MD5 hash?  MD5 is broken, and
> finding collisions takes just seconds on normal PC.  See:
> http://en.wikipedia.org/wiki/MD5#Security
> 
> /Simon
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 05 Aug 2010 04:58:26 +0200
> From: Paolo Bonzini <bonzini <at> gnu.org>
> Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
> 	collision with coreutils)
> To: bug-gnulib <at> gnu.org, Bug-coreutils <bug-coreutils <at> gnu.org>,	Paul
> 	Eggert <eggert <at> cs.ucla.edu>
> Message-ID: <4C5A28D2.3040102 <at> gnu.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> On 08/05/2010 01:44 AM, Simon Josefsson wrote:
> > Paul Eggert<eggert <at> CS.UCLA.EDU>  writes:
> >
> >> Come to think of it, looking at gnulib memxfrm gave me an idea
> >> to improve the performance of GNU sort by bypassing the need for an
> >> memxfrm-like function entirely.  I pushed a patch to do that at
> >> <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>.
> >
> > I don't know this code at all, but would your approach lead to problems
> > if two different strings have the same MD5 hash?  MD5 is broken, and
> > finding collisions takes just seconds on normal PC.  See:
> > http://en.wikipedia.org/wiki/MD5#Security
> 
> MD5 is used simply as a kind of "random key generator", so it doesn't 
> matter.  I wonder two things however:
> 
> 1) why bother with memxfrm as a tie-breaker? isn't memcmp good enough?
> 
> 2) maybe there's something cheaper than md5 that can be used?  For 
> example you could compare a^x and b^x where x is the output of a fast 
> 32-bit random number generator?  It doesn't need to be cryptographic, 
> I'd pick http://en.wikipedia.org/wiki/Xorshift.
> 
> Paolo
> 
> 
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Bug-coreutils mailing list
> Bug-coreutils <at> gnu.org
> http://lists.gnu.org/mailman/listinfo/bug-coreutils
> 
> 
> End of Bug-coreutils Digest, Vol 93, Issue 6
> ********************************************

GNU bug report logs - #6804 Bug-coreutils Digest, Vol 93, Issue 6

GNU bug report logs - #6804
Bug-coreutils Digest, Vol 93, Issue 6