GNU bug report logs - #6804
Bug-coreutils Digest, Vol 93, Issue 6

Previous Next

Package: coreutils;

Reported by: belhage <at> midibel.com

Date: Thu, 5 Aug 2010 20:09:01 UTC

Severity: normal

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6804 in the body.
You can then email your comments to 6804 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#6804; Package coreutils. (Thu, 05 Aug 2010 20:09:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to belhage <at> midibel.com:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 05 Aug 2010 20:09:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Lars Ole Belhage <belhage <at> midibel.com>
To: bug-coreutils <at> gnu.org
Subject: Re: Bug-coreutils Digest, Vol 93, Issue 6
Date: Thu, 05 Aug 2010 19:56:21 +0200
On Thu, 2010-08-05 at 12:03 -0400, bug-coreutils-request <at> gnu.org wrote:
> Send Bug-coreutils mailing list submissions to
> 	bug-coreutils <at> gnu.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.gnu.org/mailman/listinfo/bug-coreutils
> or, via email, send a message with subject or body 'help' to
> 	bug-coreutils-request <at> gnu.org
> 
> You can reach the person managing the list at
> 	bug-coreutils-owner <at> gnu.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bug-coreutils digest..."
> 
> 
> Today's Topics:
> 
>    1. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
>       collision with coreutils) (Paul Eggert)
>    2. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
>       collision with coreutils) (Simon Josefsson)
>    3. bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
>       collision with coreutils) (Paolo Bonzini)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 04 Aug 2010 16:21:28 -0700
> From: Paul Eggert <eggert <at> CS.UCLA.EDU>
> Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
> 	collision with coreutils)
> To: Bruno Haible <bruno <at> clisp.org>
> Cc: Bug-gnulib <bug-gnulib <at> gnu.org>, Bug-coreutils
> 	<bug-coreutils <at> gnu.org>
> Message-ID: <4C59F5F8.9080309 <at> cs.ucla.edu>
> Content-Type: text/plain; charset=UTF-8
> 
> On 08/03/10 16:33, Bruno Haible wrote:
> > But when the stack buffer is not sufficient, then the use of coreutils memxfrm
> > is 30% to 70% slower than the use of gnulib memxfrm, with a difference of
> > 700 sec at least.
> 
> (Ooo!  Ooo!  Performance measurements!  I love this stuff!)
> 
> It depends on the data.  In the typical case, "sort" is applied to
> text data, which does not contain NUL bytes.  The data in that
> benchmark contained many NUL bytes.  If you take the same benchmark
> and uniformly replace "\0" with "\t" in compare.c, then the situation
> is much different: coreutils memxfrm is about 3 times faster than
> gnulib memxfrm on the larger test cases (this platform is Ubuntu
> 10.04, gcc 4.5.0, 2.4 GHz Pentium 4):
> 
>   503-penguin $ gcc -std=gnu99 -O2 -Wall coreutils-memxfrm.c gnulib-memxfrm.c compare1.c -I.
>   504-penguin $ ./a.out
>   Time for gnulib_memxfrm: 0,032002
>   Time for coreutils_memxfrm: 0,028001
>   Time for gnulib_memxfrm: 0,024002
>   Time for coreutils_memxfrm: 0,024001
>   Time for gnulib_memxfrm: 0,036003
>   Time for coreutils_memxfrm: 0,032002
>   Time for gnulib_memxfrm: 18,2051
>   Time for coreutils_memxfrm: 5,48834
>   Time for gnulib_memxfrm: 16,045
>   Time for coreutils_memxfrm: 5,50034
>   Time for gnulib_memxfrm: 15,837
>   Time for coreutils_memxfrm: 5,44834
> 
> I expect that this performance glitch in gnulib memxfrm could be
> improved, as it shouldn't simply double buffer sizes when they're too
> small, as at that point it already knows what the final buffer size
> should be.  Doing this should bring up gnulib memxfrm to be as fast as
> coreutils xmemxfrm for this benchmark.  Also, I agree that gnulib
> memxfrm is faster in some important cases.  Still, gnulib memxfrm is
> problematic, because it insists on managing memory itself.
> 
> Come to think of it, looking at gnulib memxfrm gave me an idea
> to improve the performance of GNU sort by bypassing the need for an
> memxfrm-like function entirely.  I pushed a patch to do that at
> <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>.
> 
> This avoids any potential naming collision for now.  The point
> remains, though, that it's confusing that gnulib memxfrm's name begins
> with "mem", as the mem* functions don't allocate memory.  Would you
> consider a patch that renames gnulib memxfrm to amemxfrm, or to some
> other such name?
> 
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Thu, 05 Aug 2010 01:44:30 +0200
> From: Simon Josefsson <simon <at> josefsson.org>
> Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
> 	collision with coreutils)
> To: Paul Eggert <eggert <at> CS.UCLA.EDU>
> Cc: Bug-gnulib <bug-gnulib <at> gnu.org>, Bug-coreutils
> 	<bug-coreutils <at> gnu.org>,	Bruno Haible <bruno <at> clisp.org>
> Message-ID: <8739uurli9.fsf <at> mocca.josefsson.org>
> Content-Type: text/plain; charset=us-ascii
> 
> Paul Eggert <eggert <at> CS.UCLA.EDU> writes:
> 
> > Come to think of it, looking at gnulib memxfrm gave me an idea
> > to improve the performance of GNU sort by bypassing the need for an
> > memxfrm-like function entirely.  I pushed a patch to do that at
> > <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>.
> 
> I don't know this code at all, but would your approach lead to problems
> if two different strings have the same MD5 hash?  MD5 is broken, and
> finding collisions takes just seconds on normal PC.  See:
> http://en.wikipedia.org/wiki/MD5#Security
> 
> /Simon
> 
> 
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Thu, 05 Aug 2010 04:58:26 +0200
> From: Paolo Bonzini <bonzini <at> gnu.org>
> Subject: bug#6789: propose renaming gnulib memxfrm to amemxfrm (naming
> 	collision with coreutils)
> To: bug-gnulib <at> gnu.org, Bug-coreutils <bug-coreutils <at> gnu.org>,	Paul
> 	Eggert <eggert <at> cs.ucla.edu>
> Message-ID: <4C5A28D2.3040102 <at> gnu.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> On 08/05/2010 01:44 AM, Simon Josefsson wrote:
> > Paul Eggert<eggert <at> CS.UCLA.EDU>  writes:
> >
> >> Come to think of it, looking at gnulib memxfrm gave me an idea
> >> to improve the performance of GNU sort by bypassing the need for an
> >> memxfrm-like function entirely.  I pushed a patch to do that at
> >> <http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=2b49b140cc13cf36ec5ee5acaca5ac7bfeed6366>.
> >
> > I don't know this code at all, but would your approach lead to problems
> > if two different strings have the same MD5 hash?  MD5 is broken, and
> > finding collisions takes just seconds on normal PC.  See:
> > http://en.wikipedia.org/wiki/MD5#Security
> 
> MD5 is used simply as a kind of "random key generator", so it doesn't 
> matter.  I wonder two things however:
> 
> 1) why bother with memxfrm as a tie-breaker? isn't memcmp good enough?
> 
> 2) maybe there's something cheaper than md5 that can be used?  For 
> example you could compare a^x and b^x where x is the output of a fast 
> 32-bit random number generator?  It doesn't need to be cryptographic, 
> I'd pick http://en.wikipedia.org/wiki/Xorshift.
> 
> Paolo
> 
> 
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Bug-coreutils mailing list
> Bug-coreutils <at> gnu.org
> http://lists.gnu.org/mailman/listinfo/bug-coreutils
> 
> 
> End of Bug-coreutils Digest, Vol 93, Issue 6
> ********************************************





bug closed, send any further explanations to 6804 <at> debbugs.gnu.org and belhage <at> midibel.com Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Thu, 25 Aug 2011 05:45:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 22 Sep 2011 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 13 years and 325 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.