GNU bug report logs - #42269
Remove non-GMP code from coreutils factor.c

Previous Next

Package: coreutils;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Wed, 8 Jul 2020 16:27:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 42269 in the body.
You can then email your comments to 42269 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#42269; Package coreutils. (Wed, 08 Jul 2020 16:27:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 08 Jul 2020 16:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Coreutils bugs <bug-coreutils <at> gnu.org>
Cc: James Youngman <jay <at> gnu.org>,
 Pádraig Brady <P <at> draigBrady.com>,
 Torbjörn Granlund <tg <at> gmplib.org>,
 Jim Meyering <jim <at> meyering.net>
Subject: Remove non-GMP code from coreutils factor.c
Date: Wed, 8 Jul 2020 09:25:57 -0700
[Message part 1 (text/plain, inline)]
I recently modified GNU coreutils so that it can assume GMP, possibly by 
compiling and linking mini-gmp.c. This helps simplify the coreutils source code 
and makes coreutils behavior more portable.

In doing so, I noticed that factor.c has special-purpose code to factor integers 
up to 127 bits. Although this code added functionality when coreutils could not 
assume GMP, it's no longer needed for that. And although it runs faster than the 
GMP code does, while doing the recent surgery on factor.c I began to wonder 
whether the hassle of maintaining the code outweighed its usefulness. So I wrote 
up the attached patch, which simply removes the non-GMP code and simplifies 
factor.c quite a bit.

I assume the attached patch will hurt performance significantly in some cases 
for 127-bit numbers, so I did not install it. Perhaps it would be better to keep 
the non-GMP algorithm and recode it with GMP. Or perhaps it would be better to 
leave the factor.c code alone.

Comments?
[0001-factor-simplify-by-assuming-libgmp.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#42269; Package coreutils. (Wed, 08 Jul 2020 16:59:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: tg <at> gmplib.org (Torbjörn Granlund)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: nisse <at> lysator.liu.se, Pádraig Brady <P <at> draigBrady.com>,
 James Youngman <jay <at> gnu.org>, Coreutils bugs <bug-coreutils <at> gnu.org>,
 Jim Meyering <jim <at> meyering.net>
Subject: Re: Remove non-GMP code from coreutils factor.c
Date: Wed, 08 Jul 2020 18:57:53 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

  I recently modified GNU coreutils so that it can assume GMP, possibly
  by compiling and linking mini-gmp.c. This helps simplify the coreutils
  source code and makes coreutils behavior more portable.

  In doing so, I noticed that factor.c has special-purpose code to
  factor integers up to 127 bits. Although this code added functionality
  when coreutils could not assume GMP, it's no longer needed for
  that. And although it runs faster than the GMP code does, while doing
  the recent surgery on factor.c I began to wonder whether the hassle of
  maintaining the code outweighed its usefulness. So I wrote up the
  attached patch, which simply removes the non-GMP code and simplifies
  factor.c quite a bit.

  I assume the attached patch will hurt performance significantly in
  some cases for 127-bit numbers, so I did not install it. Perhaps it
  would be better to keep the non-GMP algorithm and recode it with
  GMP. Or perhaps it would be better to leave the factor.c code alone.

  Comments?

The GMP code in coreutils factor.c was writen by me as a demo (see
gmp/demos) over 25 years ago.  It was put into coreutils' factor.c
without consulting me.  I would have disagreed if I had been asked.

The non-GMP code of coreutils was extremely well-tuned by me and Niels
Möller a couple of years ago.  It is so fast that it has created some
stir in the mathematical community, or so I have been told

I expect the GMP code of factor.c to be pretty much unused.  Why?
Because Pollard rho is suitable only in the range currently covered by
the non-GMP code.  Iff we were to write well-tuned low-level GMP code,
that we could expend the practical range ever so slightly.

By leaving just the GMP code, you would create a pretty useless factor
command.  Any naive old factor command would often beat it.  It would
make much more sense to remove the factor command altogether.

If any code is to be removed, then that would be the GMP code of
coreutils factor.


-- 
Torbjörn
Please encrypt, key id 0xC8601622




Added tag(s) patch. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Wed, 08 Jul 2020 17:59:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#42269; Package coreutils. (Wed, 08 Jul 2020 18:30:02 GMT) Full text and rfc822 format available.

Message #13 received at 42269 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Torbjörn Granlund <tg <at> gmplib.org>
Cc: nisse <at> lysator.liu.se, P <at> draigBrady.com, jay <at> gnu.org, 42269 <at> debbugs.gnu.org,
 jim <at> meyering.net
Subject: Re: bug#42269: Remove non-GMP code from coreutils factor.c
Date: Wed, 8 Jul 2020 11:29:47 -0700
On 7/8/20 9:57 AM, Torbjörn Granlund wrote:

> The non-GMP code of coreutils was extremely well-tuned by me and Niels
> Möller a couple of years ago.

How time flies! The code was merged in 2012.

> By leaving just the GMP code, you would create a pretty useless factor
> command.  Any naive old factor command would often beat it.  It would
> make much more sense to remove the factor command altogether.

OK, thanks. Then let's forget about the patch I just proposed.

Could you give an example of where the 128-bit code shines, compared to the GMP 
code on the same arguments? I could add the example as a comment in the factor.c 
code, to let me and future maintainers know why it's useful for performance.




bug closed, send any further explanations to 42269 <at> debbugs.gnu.org and Paul Eggert <eggert <at> cs.ucla.edu> Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Wed, 08 Jul 2020 18:32:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#42269; Package coreutils. (Wed, 08 Jul 2020 19:35:02 GMT) Full text and rfc822 format available.

Message #18 received at 42269 <at> debbugs.gnu.org (full text, mbox):

From: tg <at> gmplib.org (Torbjörn Granlund)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: nisse <at> lysator.liu.se, P <at> draigBrady.com, jay <at> gnu.org, 42269 <at> debbugs.gnu.org,
 jim <at> meyering.net
Subject: Re: bug#42269: Remove non-GMP code from coreutils factor.c
Date: Wed, 08 Jul 2020 21:34:44 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

  Could you give an example of where the 128-bit code shines, compared
  to the GMP code on the same arguments? I could add the example as a
  comment in the factor.c code, to let me and future maintainers know
  why it's useful for performance.

Any number which does not happen to be B-smooth for, say B < 2^30, will
show easily measurable performance difference of 5x to 40x IIRC.

A semantic difference which sometimes makes the speed difference less
pronounced is that the non-GMP code proves that the printed factors are
indeed prime.  We use a criterion which requires factoring of p-1 for
any assumed prime factor p.  In unlucky cases that recursive
factorisation is costlier than the main factorisation.

I have a patch which makes the non-GMP code some 2x - 3x faster.  It's
been maturing for several years now, so I suppose I should really finish
it.  (It got tangled with code which improves the GMP case by letting it
fall into the non-GMP code as numbers get smaller.  That sounds simple
but is quite messy for various reasons.  It is also not clear how much
complexity we could defend for this command of limited utility.)

-- 
Torbjörn
Please encrypt, key id 0xC8601622




Information forwarded to bug-coreutils <at> gnu.org:
bug#42269; Package coreutils. (Wed, 08 Jul 2020 20:52:02 GMT) Full text and rfc822 format available.

Message #21 received at submit <at> debbugs.gnu.org (full text, mbox):

From: nisse <at> lysator.liu.se (Niels Möller)
To: tg <at> gmplib.org (Torbjörn Granlund)
Cc: James Youngman <jay <at> gnu.org>,
 Pádraig Brady <P <at> draigBrady.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>, Coreutils bugs <bug-coreutils <at> gnu.org>,
 Jim Meyering <jim <at> meyering.net>
Subject: Re: Remove non-GMP code from coreutils factor.c
Date: Wed, 08 Jul 2020 22:51:32 +0200
tg <at> gmplib.org (Torbjörn Granlund) writes:

> If any code is to be removed, then that would be the GMP code of
> coreutils factor.

I agree with Torbjörn. The GMP code in GNU factor might have made more
sense when most computers were 32-bit, and "bignums" were smaller.

But on 64-bit computers, the GMP code would be used only for numbers
above 127 bits. I'm really not that familiar with state of the art
factoring, but I'd guess pollard rho is a bad algorithm choice for that
range, and one ought to use, e.g., some variant of the quadratic sieve.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.




Information forwarded to bug-coreutils <at> gnu.org:
bug#42269; Package coreutils. (Wed, 08 Jul 2020 21:47:01 GMT) Full text and rfc822 format available.

Message #24 received at submit <at> debbugs.gnu.org (full text, mbox):

From: tg <at> gmplib.org (Torbjörn Granlund)
To: nisse <at> lysator.liu.se (Niels Möller)
Cc: James Youngman <jay <at> gnu.org>,
 Pádraig Brady <P <at> draigBrady.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>, Coreutils bugs <bug-coreutils <at> gnu.org>,
 Jim Meyering <jim <at> meyering.net>
Subject: Re: Remove non-GMP code from coreutils factor.c
Date: Wed, 08 Jul 2020 23:46:38 +0200
nisse <at> lysator.liu.se (Niels Möller) writes:

  I'm really not that familiar with state of the art factoring, but I'd
  guess pollard rho is a bad algorithm choice for that range, and one
  ought to use, e.g., some variant of the quadratic sieve.

Let me modify that statement somewhat.

Pollars rho is suitable for finding small factors of any size
composites.  Small here might mean about < 2^64.

Any factoring effort should start with trial division, then some rounds
of Pollard rho, then perhaps some EC rounds.  Only after that and when a
remaining cofactor is non-prime, QS is to be rolled out.

That could sound like the GMP code of coreutils factor is great for
factoring really large numbers.  It is, but only for smooth huge
numbers.

If we ever get crazy enough to consider QS for coreutils factor, its
current GMP Pollard rho code would become part of a general factoring
engine for numbers > 2^127.

-- 
Torbjörn
Please encrypt, key id 0xC8601622




Information forwarded to bug-coreutils <at> gnu.org:
bug#42269; Package coreutils. (Thu, 09 Jul 2020 02:09:02 GMT) Full text and rfc822 format available.

Message #27 received at 42269 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Torbjörn Granlund <tg <at> gmplib.org>
Cc: nisse <at> lysator.liu.se, P <at> draigBrady.com, jay <at> gnu.org, 42269 <at> debbugs.gnu.org,
 jim <at> meyering.net
Subject: Re: bug#42269: Remove non-GMP code from coreutils factor.c
Date: Wed, 8 Jul 2020 19:08:36 -0700
[Message part 1 (text/plain, inline)]
On 7/8/20 12:34 PM, Torbjörn Granlund wrote:

> Any number which does not happen to be B-smooth for, say B < 2^30, will
> show easily measurable performance difference of 5x to 40x IIRC.

Ah, I had tried the example in the manual, (2^31 - 1) * (2^61 - 1). Even though 
it isn't B-smooth for B < 2^30, the performance difference was only 2x on my 
machine. I just now tried 2^127 - 1 and saw a similar performance difference, 
but 2^127 - 3 had a 15x difference so it's a better example.

I installed the attached to try to document this better.

> I have a patch which makes the non-GMP code some 2x - 3x faster.  It's
> been maturing for several years now, so I suppose I should really finish
> it.  (It got tangled with code which improves the GMP case by letting it
> fall into the non-GMP code as numbers get smaller.  That sounds simple
> but is quite messy for various reasons.  It is also not clear how much
> complexity we could defend for this command of limited utility.)

Yes, 'factor' is just a minor utility needed for POSIX compliance. Although it'd 
be nice to get that 2x-3x improvement whenever you have the time, it's not 
urgent. Thanks for your guidance on the GMP issue.

[0001-factor-explain-why-non-GMP-code-Bug-42269.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 06 Aug 2020 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 11 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.