GNU bug report logs - #12350
Composites identified as primes in factor.c (when HAVE_GMP)

Reported by: Torbjorn Granlund <tg <at> gmplib.org>

Date: Tue, 4 Sep 2012 13:29:02 UTC

Severity: normal

Done: Jim Meyering <meyering <at> hx.meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 12350 in the body.
You can then email your comments to 12350 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 13:29:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Torbjorn Granlund <tg <at> gmplib.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 04 Sep 2012 13:29:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: bug-coreutils <at> gnu.org
Cc: nisse <at> lysator.liu.se
Subject: Composites identified as primes in factor.c (when HAVE_GMP)
Date: Tue, 04 Sep 2012 15:09:53 +0200

[Message part 1 (text/plain, inline)]

The very old factoring code cut from an now obsolete version GMP does
not pass proper arguments to the mpz_probab_prime_p function.  It ask
for 3 Miller-Rabin tests only, which is not sufficient.

I am afraid the original poor code was wrritten by me, where this
particular problem was introduced with 3.0's demos/factorize.c.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
by much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of "false witnesses" will be much, much lower.

Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that causes coreutils factor to
fail:

465658903
2242724851
6635692801
17709149503
17754345703
20889169003
42743470771
54890944111
72047131003
85862644003
98275842811
114654168091
117225546301
...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
invalidly recognised as primes in that range.

Here is a simple patch:

[diff (application/octet-stream, attachment)]

[Message part 3 (text/plain, inline)]

I and Niels Möller have written a suggested replacement for coreutils
factor.c.  It fixes a number of issues with the current code:

(1) Much faster trial division code (> 10x) based on a small table of
    prime inverses.  Still, the new code doesn't perform lots of trial
    dividing.

(2) Pollard rho code using Montgomery representation for numbers < 2^64.
    (We consider extending this to 128 bits.)  Not dependent on GMP.

(3) Lucas prime proving code instead of probablitic Miller-Rabin primes
    testing.

(4) SQUFOF code, which might be included depending on performance
    issues.

(5) Replacement GMP code (#if HAVE_GMP) that also includes Lucas proving
    code.

The new code is faster than the current code:

Old:
  seq `pexpr 2^64-1000` `pexpr 2^64-1` | time factor >/dev/null
  524.57 user

New:
  seq `pexpr 2^64-1000` `pexpr 2^64-1` | time ./factor >/dev/null
  0.05 user

For smaller number ranges, the improvements are currently much more
modest, as little as 2x in some cases.

The code should be ready within a few weeks.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 14:47:02 GMT) Full text and rfc822 format available.

Message #8 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 16:46:51 +0200

Torbjorn Granlund wrote:
> The very old factoring code cut from an now obsolete version GMP does
> not pass proper arguments to the mpz_probab_prime_p function.  It ask
> for 3 Miller-Rabin tests only, which is not sufficient.

Hi Torbjorn

Thank you for the patch and explanation.
I've converted that into the commit below in your name.
Please proofread it and let me know if you'd like to change anything.
I tweaked the patch to change MR_REPS from a #define to an enum
and to add the comment just preceding.

I'll add NEWS and tests separately.

From ea6dd126e6452504f9fa1d6708d25473e2c27e67 Mon Sep 17 00:00:00 2001
From: Torbjorn Granlund <tg <at> gmplib.org>
Date: Tue, 4 Sep 2012 16:22:47 +0200
Subject: [PATCH] factor: don't ever declare composites to be prime

The multiple-precision factoring code (with HAVE_GMP) was copied from
a now-obsolete version of GMP that did not pass proper arguments to
the mpz_probab_prime_p function.  It makes that code perform no more
than 3 Miller-Rabin tests only, which is not sufficient.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
by much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of "false witnesses" will be much, much lower.

Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that cause coreutils' factor to
fail:

  465658903
  2242724851
  6635692801
  17709149503
  17754345703
  20889169003
  42743470771
  54890944111
  72047131003
  85862644003
  98275842811
  114654168091
  117225546301
  ...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
invalidly recognised as primes in that range.

* src/factor.c (MR_REPS): Define to 25.
(factor_using_pollard_rho): Use MR_REPS, not 3.
(print_factors_multi): Likewise.
---
 src/factor.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/factor.c b/src/factor.c
index 1d55805..e63e0e0 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -153,6 +153,9 @@ factor_using_division (mpz_t t, unsigned int limit)
   mpz_clear (r);
 }

+/* The number of Miller-Rabin tests we require.  */
+enum { MR_REPS = 25 };
+
 static void
 factor_using_pollard_rho (mpz_t n, int a_int)
 {
@@ -222,7 +225,7 @@ S4:

       mpz_div (n, n, g);	/* divide by g, before g is overwritten */

-      if (!mpz_probab_prime_p (g, 3))
+      if (!mpz_probab_prime_p (g, MR_REPS))
         {
           do
             {
@@ -242,7 +245,7 @@ S4:
       mpz_mod (x, x, n);
       mpz_mod (x1, x1, n);
       mpz_mod (y, y, n);
-      if (mpz_probab_prime_p (n, 3))
+      if (mpz_probab_prime_p (n, MR_REPS))
         {
           emit_factor (n);
           break;
@@ -411,7 +414,7 @@ print_factors_multi (mpz_t t)
       if (mpz_cmp_ui (t, 1) != 0)
         {
           debug ("[is number prime?] ");
-          if (mpz_probab_prime_p (t, 3))
+          if (mpz_probab_prime_p (t, MR_REPS))
             emit_factor (t);
           else
             factor_using_pollard_rho (t, 1);
--
1.7.12.176.g3fc0e4c

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 14:58:01 GMT) Full text and rfc822 format available.

Message #11 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 16:57:19 +0200


On 09/04/2012 04:46 PM, Jim Meyering wrote:
> incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

s/incvolving/involving/

Have a nice day,
Berny

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 15:33:02 GMT) Full text and rfc822 format available.

Message #14 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 16:31:54 +0100

On 09/04/2012 03:46 PM, Jim Meyering wrote:
> There are 9008992 composites of the form with s=2 below 2^64.  With 3
> Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be

s/4^64/64/ ?

For what it's worth I checked the million primes in
the range 452,930,477 to 472,882,027 and they're
now identified correctly (465658903 was included previously).

Note processing time has increased with the patch.
On my 2.1GHz i3-2310M, running over the above range
used to take 14m, but now takes 18m.

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 16:11:02 GMT) Full text and rfc822 format available.

Message #17 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Bernhard Voelker <mail <at> bernhard-voelker.de>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 18:10:39 +0200

Bernhard Voelker wrote:
> On 09/04/2012 04:46 PM, Jim Meyering wrote:
>> incvolving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.
>
> s/incvolving/involving/

Pádraig Brady wrote:
> Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
s/4^64/64/ ?

Fixed both.  Thanks.
I've also changed s/s/z/ in recognised.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 16:44:02 GMT) Full text and rfc822 format available.

Message #20 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 18:42:54 +0200

Jim Meyering wrote:

> Torbjorn Granlund wrote:
>> The very old factoring code cut from an now obsolete version GMP does
>> not pass proper arguments to the mpz_probab_prime_p function.  It ask
>> for 3 Miller-Rabin tests only, which is not sufficient.
>
> Hi Torbjorn
>
> Thank you for the patch and explanation.
> I've converted that into the commit below in your name.
> Please proofread it and let me know if you'd like to change anything.
> I tweaked the patch to change MR_REPS from a #define to an enum
> and to add the comment just preceding.
>
> I'll add NEWS and tests separately.
...
> From: Torbjorn Granlund <tg <at> gmplib.org>
> Date: Tue, 4 Sep 2012 16:22:47 +0200
> Subject: [PATCH] factor: don't ever declare composites to be prime

Torbjörn, I've just noticed that I misspelled your name above.

Here's the NEWS/tests addition.
Following is an adjusted commit that spells your name properly.

From e561ff991b74dc19f6728aa1e6e61d1927055ac1 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 4 Sep 2012 18:26:25 +0200
Subject: [PATCH] factor: doc and tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* NEWS (Bug fixes): Mention it.
* tests/misc/factor.pl: Add five of Torbjörn's tests.
---
 NEWS                 | 3 +++
 tests/misc/factor.pl | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/NEWS b/NEWS
index f3874fd..ffa7939 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,9 @@ GNU coreutils NEWS                                    -*- outline -*-
   it detects this precise type of cycle, diagnoses it as such and
   eventually exits nonzero.

+  factor (when using gmp) would mistakenly declare some composite numbers
+  to be prime, e.g., 465658903, 2242724851, 6635692801.
+
   rm -i -d now prompts the user then removes an empty directory, rather
   than ignoring the -d option and failing with an 'Is a directory' error.
   [bug introduced in coreutils-8.19, with the addition of --dir (-d)]
diff --git a/tests/misc/factor.pl b/tests/misc/factor.pl
index 47f9343..38a5037 100755
--- a/tests/misc/factor.pl
+++ b/tests/misc/factor.pl
@@ -67,6 +67,11 @@ my @Tests =
       {OUT => "4: 2 2\n"},
       {ERR => "$prog: 'a' is not a valid positive integer\n"},
       {EXIT => 1}],
+     ['bug-2012-a', '465658903', {OUT => '15259 30517'}],
+     ['bug-2012-b', '2242724851', {OUT => '33487 66973'}],
+     ['bug-2012-c', '6635692801', {OUT => '57601 115201'}],
+     ['bug-2012-d', '17709149503', {OUT => '94099 188197'}],
+     ['bug-2012-e', '17754345703', {OUT => '94219 188437'}],
     );

 # Prepend the command line argument and append a newline to end
--
1.7.12.176.g3fc0e4c


From 4c21a96443ee26eb0d4da31526ce4cf180ac7a4e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Torbj=C3=B6rn=20Granlund?= <tg <at> gmplib.org>
Date: Tue, 4 Sep 2012 18:38:29 +0200
Subject: [PATCH] factor: don't ever declare composites to be prime

The multiple-precision factoring code (with HAVE_GMP) was copied from
a now-obsolete version of GMP that did not pass proper arguments to
the mpz_probab_prime_p function.  It makes that code perform no more
than 3 Miller-Rabin tests only, which is not sufficient.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
by much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of "false witnesses" will be much, much lower.

Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
involving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that cause coreutils' factor to
fail:

  465658903
  2242724851
  6635692801
  17709149503
  17754345703
  20889169003
  42743470771
  54890944111
  72047131003
  85862644003
  98275842811
  114654168091
  117225546301
  ...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin test, one would expect about 9008992/64 = 140766 to be
invalidly recognized as primes in that range.

* src/factor.c (MR_REPS): Define to 25.
(factor_using_pollard_rho): Use MR_REPS, not 3.
(print_factors_multi): Likewise.
* THANKS.in: Remove my name, now that it will be automatically
included in the generated THANKS file.
---
 THANKS.in    | 1 -
 src/factor.c | 9 ++++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/THANKS.in b/THANKS.in
index 1580151..2c3f83c 100644
--- a/THANKS.in
+++ b/THANKS.in
@@ -608,7 +608,6 @@ Tony Leneis                         tony <at> plaza.ds.adp.com
 Tony Robinson                       ajr <at> eng.cam.ac.uk
 Toomas Soome                        Toomas.Soome <at> Elion.ee
 Toralf Förster                      toralf.foerster <at> gmx.de
-Torbjorn Granlund                   tege <at> nada.kth.se
 Torbjorn Lindgren                   tl <at> funcom.no
 Torsten Landschoff                  torsten <at> pclab.ifg.uni-kiel.de
 Travis Gummels                      tgummels <at> redhat.com
diff --git a/src/factor.c b/src/factor.c
index 1d55805..e63e0e0 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -153,6 +153,9 @@ factor_using_division (mpz_t t, unsigned int limit)
   mpz_clear (r);
 }

+/* The number of Miller-Rabin tests we require.  */
+enum { MR_REPS = 25 };
+
 static void
 factor_using_pollard_rho (mpz_t n, int a_int)
 {
@@ -222,7 +225,7 @@ S4:

       mpz_div (n, n, g);	/* divide by g, before g is overwritten */

-      if (!mpz_probab_prime_p (g, 3))
+      if (!mpz_probab_prime_p (g, MR_REPS))
         {
           do
             {
@@ -242,7 +245,7 @@ S4:
       mpz_mod (x, x, n);
       mpz_mod (x1, x1, n);
       mpz_mod (y, y, n);
-      if (mpz_probab_prime_p (n, 3))
+      if (mpz_probab_prime_p (n, MR_REPS))
         {
           emit_factor (n);
           break;
@@ -411,7 +414,7 @@ print_factors_multi (mpz_t t)
       if (mpz_cmp_ui (t, 1) != 0)
         {
           debug ("[is number prime?] ");
-          if (mpz_probab_prime_p (t, 3))
+          if (mpz_probab_prime_p (t, MR_REPS))
             emit_factor (t);
           else
             factor_using_pollard_rho (t, 1);
--
1.7.12.176.g3fc0e4c

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 17:25:01 GMT) Full text and rfc822 format available.

Message #23 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 11:24:35 -0600

[Message part 1 (text/plain, inline)]

On 09/04/2012 10:42 AM, Jim Meyering wrote:
> Jim Meyering wrote:
> 
>> > Torbjorn Granlund wrote:
> 
> Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,

s/for/form/

-- 
Eric Blake   eblake <at> redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 17:59:02 GMT) Full text and rfc822 format available.

Message #26 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 19:58:20 +0200

Jim Meyering <jim <at> meyering.net> writes:

  Jim Meyering wrote:
  
  > Torbjorn Granlund wrote:
  >> The very old factoring code cut from an now obsolete version GMP does
  >> not pass proper arguments to the mpz_probab_prime_p function.  It ask
  >> for 3 Miller-Rabin tests only, which is not sufficient.
  >
  > Hi Torbjorn
  >
  > Thank you for the patch and explanation.
  > I've converted that into the commit below in your name.
  > Please proofread it and let me know if you'd like to change anything.
  > I tweaked the patch to change MR_REPS from a #define to an enum
  > and to add the comment just preceding.
  >
  > I'll add NEWS and tests separately.
  ...
  > From: Torbjorn Granlund <tg <at> gmplib.org>
  > Date: Tue, 4 Sep 2012 16:22:47 +0200
  > Subject: [PATCH] factor: don't ever declare composites to be prime
  
  Torbjörn, I've just noticed that I misspelled your name above.
  
Did you?  Well, you misspell recognise too, but then again, most people
on the other side of the pond misspell lots of English words.  :-)

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 18:12:02 GMT) Full text and rfc822 format available.

Message #29 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 20:10:59 +0200

Pádraig Brady <P <at> draigBrady.com> writes:

  On 09/04/2012 03:46 PM, Jim Meyering wrote:
  > There are 9008992 composites of the form with s=2 below 2^64.  With 3
  > Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be

  s/4^64/64/ ?

  For what it's worth I checked the million primes in
  the range 452,930,477 to 472,882,027 and they're
  now identified correctly (465658903 was included previously).

  Note processing time has increased with the patch.
  On my 2.1GHz i3-2310M, running over the above range
  used to take 14m, but now takes 18m.

It sometimes takes more time to do things correctly.

As I mentioned in the original post, we will replace the current code
with code that is many times faster.  Your example above will run at
less than a minute on your system.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 19:27:02 GMT) Full text and rfc822 format available.

Message #32 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 20:26:45 +0100

On 09/04/2012 07:10 PM, Torbjorn Granlund wrote:
> Pádraig Brady<P <at> draigBrady.com>  writes:
>
>    On 09/04/2012 03:46 PM, Jim Meyering wrote:
>    >  There are 9008992 composites of the form with s=2 below 2^64.  With 3
>    >  Miller-Rabin test, one would expect about 9008992/4^64 = 140766 to be
>
>    s/4^64/64/ ?
>
>    For what it's worth I checked the million primes in
>    the range 452,930,477 to 472,882,027 and they're
>    now identified correctly (465658903 was included previously).
>
>    Note processing time has increased with the patch.
>    On my 2.1GHz i3-2310M, running over the above range
>    used to take 14m, but now takes 18m.
>
> It sometimes takes more time to do things correctly.

Sure. I was just quantifying the performance change,
for others who may be referencing or noticing patches.
(Actually, I'd add a note to the commit message that,
this increases calculations by about 25%).

> As I mentioned in the original post, we will replace the current code
> with code that is many times faster.  Your example above will run at
> less than a minute on your system.

I'd left my test files in place in anticipation ;)

thanks,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 04 Sep 2012 21:57:01 GMT) Full text and rfc822 format available.

Message #35 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 04 Sep 2012 23:55:58 +0200

Pádraig Brady <P <at> draigBrady.com> writes:

  Sure. I was just quantifying the performance change,
  for others who may be referencing or noticing patches.
  (Actually, I'd add a note to the commit message that,
  this increases calculations by about 25%).

And surely mode for certain cases. We spend 25/3 or about 8 times more
effort in Miller Rabin.

  > As I mentioned in the original post, we will replace the current code
  > with code that is many times faster.  Your example above will run at
  > less than a minute on your system.

  I'd left my test files in place in anticipation ;)

Please do, and let me and Niels know if it takes more than 45s.  Your
test case takes 28s on my 3.3 GHz Sandy bridge system with our current
code.  I'm a little disappointed the code doesn't beat the old code more
for small factorisations.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Wed, 05 Sep 2012 06:53:02 GMT) Full text and rfc822 format available.

Message #38 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Wed, 05 Sep 2012 08:52:48 +0200

Eric Blake wrote:
> On 09/04/2012 10:42 AM, Jim Meyering wrote:
>> Jim Meyering wrote:
>>
>>> > Torbjorn Granlund wrote:
>>
>> Problem numbers are of the for N=pq, p,q prime and (p-1)/(q-1) = s,
>
> s/for/form/

Fixed.  Thanks.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Wed, 05 Sep 2012 06:58:02 GMT) Full text and rfc822 format available.

Message #41 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Wed, 05 Sep 2012 08:57:50 +0200

Torbjorn Granlund wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>
>   Jim Meyering wrote:
>
>   > Torbjorn Granlund wrote:
>   >> The very old factoring code cut from an now obsolete version GMP does
>   >> not pass proper arguments to the mpz_probab_prime_p function.  It ask
>   >> for 3 Miller-Rabin tests only, which is not sufficient.
>   >
>   > Hi Torbjorn
>   >
>   > Thank you for the patch and explanation.
>   > I've converted that into the commit below in your name.
>   > Please proofread it and let me know if you'd like to change anything.
>   > I tweaked the patch to change MR_REPS from a #define to an enum
>   > and to add the comment just preceding.
>   >
>   > I'll add NEWS and tests separately.
>   ...
>   > From: Torbjorn Granlund <tg <at> gmplib.org>
>   > Date: Tue, 4 Sep 2012 16:22:47 +0200
>   > Subject: [PATCH] factor: don't ever declare composites to be prime
>
>   Torbjörn, I've just noticed that I misspelled your name above.
>
> Did you?

I meant that I used Torbjorn rather than Torbjörn.

> Well, you misspell recognise too, but then again, most people
> on the other side of the pond misspell lots of English words.  :-)

Yes, the dichotomy is unfortunate.
Over the years, it has even caused interface problems, i.e.,
with --colours vs --colors and $LS_COLOURS vs LS_COLORS.
I wanted to settle on one, and US spelling is more common
than British so I settled on that.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 06 Sep 2012 18:16:02 GMT) Full text and rfc822 format available.

Message #44 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 06 Sep 2012 20:15:32 +0200

[Message part 1 (text/plain, inline)]

I and Niels now would appreciate feedback on the new factor code.

We've put the entire little project in a tar file, which is attached.
The code is also available at <http://gmplib.org:8000/factoring/>.

Here is the README file:

NT factor  (Niels' and Torbjörn's factor, or New Technology factor)

This is a project for producing a decent 'factor' command for GNU.
The code was written by Torbjörn Granlund and Niels Möller in Aug-Sept
2012, but parts of the code is based on preexisting GMP code.

The old factor program could handle numbers < 2^64, unless GMP was
available at build time.  Without GMP, only trial division was used;
with GMP an old version of GMP's demos/factorize.c code was used,
which relies on Pollard rho for much better performance.  The old
factor program used probabilistic Miller-Rabin primes testing.

The new code can factor numbers < 2^127 and does not currently make
use of GMP, not even as an option.  It uses fast trial division code,
Pollard rho, and optionally SQUFOF.  It uses the Lucas primality test
instead of a probabilistic test.

The new code is several times faster then the old code, in particular
on 32-bit hardware.  On current 64-bit machines, it is between 3 times
and 10000 times faster for ranges of numbers; for 32-bit machines we
have seen 150,000 times improvement for some number range.  The
advantage for the new code is greater for larger numbers, matching
mathematical theory of algorithm efficiency.  (These numbers compare
the new code to the old GMP-less code; GMP-enabled old code is only
between 3 and 10 times slower.)

For smaller numbers, more than half the time is spent in I/O,
buffering, and conversions.  We have not tried to optimise these
parts, but instead kept them clean.


* We don't have any --help or --version options currently.

* Our packaging with separate Makefile, outseq.c and ChangeLog was
  useful during our development.  We don't expect these to be useful
  in coreutils.  In particular, the slow testing of the 'check' target
  is probably quite unsuitable for coreutils (but similar but quicker
  tests would make sense).

* The files probably needed for coreutils are:

  o factor.c -- main factoring code
  o make-prime-list.c -- primes table generator program
  o longlong.h -- arithmetic support routines (from GMP)


Technical considerations:

* Should we handle numbers >= 2^127?  That would in effect mean
  merging a current version of GMP's demos/factorize.c into this
  factor.c, and put that under HAVE_GMP (like in the old factor.c).
  It should be understood that factoring such large numbers with only
  Pollard rho is not very feasible.

* We think a --verbose option would be nice, although we don't have
  one in the present version.  It would output information on
  algorithm switches and bounds reached.  Opinions?


Portability caveats:

* We rely on POSIX.1 getchar_unlocked for a performance advantage.

* We have some hardwired W_TYPE_SIZE settings for the code interfacing
  to longlong.h.  It is now 64 bits.  It will break on systems where
  uintmax_t is not a 64-bit type.  Please see the beginning of
  factor.c.


Legal caveat:

* Both Niels and Torbjörn are GNU hackers since long.  We do not
  currently have paperwork in place for coreutils contributions.  This
  will certainly be addressed.

[nt-factor.tar.lz (application/octet-stream, attachment)]

[Message part 3 (text/plain, inline)]


-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 06 Sep 2012 18:17:01 GMT) Full text and rfc822 format available.

Message #47 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 06 Sep 2012 20:16:35 +0200

[Message part 1 (text/plain, inline)]

I and Niels now would appreciate feedback on the new factor code.

We've put the entire little project in a tar file, which is attached.
The code is also available at <http://gmplib.org:8000/factoring/>.

Here is the README file:

NT factor  (Niels' and Torbjörn's factor, or New Technology factor)

This is a project for producing a decent 'factor' command for GNU.
The code was written by Torbjörn Granlund and Niels Möller in Aug-Sept
2012, but parts of the code is based on preexisting GMP code.

The old factor program could handle numbers < 2^64, unless GMP was
available at build time.  Without GMP, only trial division was used;
with GMP an old version of GMP's demos/factorize.c code was used,
which relies on Pollard rho for much better performance.  The old
factor program used probabilistic Miller-Rabin primes testing.

The new code can factor numbers < 2^127 and does not currently make
use of GMP, not even as an option.  It uses fast trial division code,
Pollard rho, and optionally SQUFOF.  It uses the Lucas primality test
instead of a probabilistic test.

The new code is several times faster then the old code, in particular
on 32-bit hardware.  On current 64-bit machines, it is between 3 times
and 10000 times faster for ranges of numbers; for 32-bit machines we
have seen 150,000 times improvement for some number range.  The
advantage for the new code is greater for larger numbers, matching
mathematical theory of algorithm efficiency.  (These numbers compare
the new code to the old GMP-less code; GMP-enabled old code is only
between 3 and 10 times slower.)

For smaller numbers, more than half the time is spent in I/O,
buffering, and conversions.  We have not tried to optimise these
parts, but instead kept them clean.


* We don't have any --help or --version options currently.

* Our packaging with separate Makefile, outseq.c and ChangeLog was
  useful during our development.  We don't expect these to be useful
  in coreutils.  In particular, the slow testing of the 'check' target
  is probably quite unsuitable for coreutils (but similar but quicker
  tests would make sense).

* The files probably needed for coreutils are:

  o factor.c -- main factoring code
  o make-prime-list.c -- primes table generator program
  o longlong.h -- arithmetic support routines (from GMP)


Technical considerations:

* Should we handle numbers >= 2^127?  That would in effect mean
  merging a current version of GMP's demos/factorize.c into this
  factor.c, and put that under HAVE_GMP (like in the old factor.c).
  It should be understood that factoring such large numbers with only
  Pollard rho is not very feasible.

* We think a --verbose option would be nice, although we don't have
  one in the present version.  It would output information on
  algorithm switches and bounds reached.  Opinions?


Portability caveats:

* We rely on POSIX.1 getchar_unlocked for a performance advantage.

* We have some hardwired W_TYPE_SIZE settings for the code interfacing
  to longlong.h.  It is now 64 bits.  It will break on systems where
  uintmax_t is not a 64-bit type.  Please see the beginning of
  factor.c.


Legal caveat:

* Both Niels and Torbjörn are GNU hackers since long.  We do not
  currently have paperwork in place for coreutils contributions.  This
  will certainly be addressed.

[nt-factor.tar.lz (application/octet-stream, attachment)]

[Message part 3 (text/plain, inline)]


-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 06 Sep 2012 20:24:01 GMT) Full text and rfc822 format available.

Message #50 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 06 Sep 2012 22:23:42 +0200

Torbjorn Granlund wrote:
> I and Niels now would appreciate feedback on the new factor code.
>
> We've put the entire little project in a tar file, which is attached.
> The code is also available at <http://gmplib.org:8000/factoring/>.

Thanks a lot!  I've started looking at the code.
I was surprised to see "make check" fail.

    $ ./ourseq 0 100000 > k                                                      :
    $ ./factor < k                                                               :
    0:
    1:
    2: 2
    3: 3
    4: 2 2
    5: 5
    6: 2 3
    7: 7
    8: 2 2 2
    9: 3 3
    zsh: abort (core dumped)  ./factor < k

That was due to unexpected input.
Poking around, I see that ourseq writes from uninitialized memory:

    $ ./ourseq 9 11
    9
    102
    112
    $ ./ourseq 9 11
    9
    10>
    11>
    $ ./ourseq 9 11
    9
    10"
    11"

The fix is to change the memmove to copy one more byte each time:
to copy the required trailing NUL.
With that, it looks like "make check" will pass.
It will definitely benefit from running the individual
tests in parallel ;-)

From 9e6db73344f43e828b8d716a0ea6a5842980d518 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 6 Sep 2012 22:12:41 +0200
Subject: [PATCH] incr: don't omit trailing NUL when incrementing

---
 ourseq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ourseq.c b/ourseq.c
index d2472aa..cb71f13 100644
--- a/ourseq.c
+++ b/ourseq.c
@@ -48,7 +48,7 @@ incr (string *st)
 	}
       s[i] = '0';
     }
-  memmove (s + 1, s, len);
+  memmove (s + 1, s, len + 1);
   s[0] = '1';
   st->len = len + 1;
 }
--
1.7.12.176.g3fc0e4c

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 06 Sep 2012 20:30:01 GMT) Full text and rfc822 format available.

Message #53 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 06 Sep 2012 22:29:24 +0200

Jim Meyering <jim <at> meyering.net> writes:

      zsh: abort (core dumped)  ./factor < k
  
  That was due to unexpected input.

The parsing of the new factor is probably not too bad, but the error
reporting could be better.  :o)

  --- a/ourseq.c
  +++ b/ourseq.c
  @@ -48,7 +48,7 @@ incr (string *st)
   	}
         s[i] = '0';
       }
  -  memmove (s + 1, s, len);
  +  memmove (s + 1, s, len + 1);
     s[0] = '1';
     st->len = len + 1;
   }

Thanks.  Culpa mea, ourseq is not my finest work, just a hack for
testing factor.
  

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 06 Sep 2012 21:34:02 GMT) Full text and rfc822 format available.

Message #56 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 06 Sep 2012 23:33:15 +0200

Torbjorn Granlund wrote:
> I and Niels now would appreciate feedback on the new factor code.
...
> * Our packaging with separate Makefile, outseq.c and ChangeLog was
>   useful during our development.  We don't expect these to be useful
>   in coreutils.  In particular, the slow testing of the 'check' target
>   is probably quite unsuitable for coreutils (but similar but quicker
>   tests would make sense).

I think the tests will be fine, as long as they're separate, and hence
can be parallelized by the default mechanism.  We might want to label
most of them as "expensive", so that they're run only by those who set
RUN_EXPENSIVE_TESTS=yes in their environment.

> * The files probably needed for coreutils are:
>
>   o factor.c -- main factoring code
>   o make-prime-list.c -- primes table generator program
>   o longlong.h -- arithmetic support routines (from GMP)
>
>
> Technical considerations:
>
> * Should we handle numbers >= 2^127?  That would in effect mean
>   merging a current version of GMP's demos/factorize.c into this
>   factor.c, and put that under HAVE_GMP (like in the old factor.c).
>   It should be understood that factoring such large numbers with only
>   Pollard rho is not very feasible.

The existing code can factor arbitrarily large numbers quickly, as long
as they have no large prime factors.  We should retain that capability.

> * We think a --verbose option would be nice, although we don't have
>   one in the present version.  It would output information on
>   algorithm switches and bounds reached.  Opinions?

I think it would be worthwhile, especially to give an idea of what progress
is being made when factoring very large numbers, but hardly something
that need be done now.

E.g., currently this doesn't print much:

    $ M8=$(echo 2^31-1|bc) M9=$(echo 2^61-1|bc) M10=$(echo 2^89-1|bc)
    $ factor --verbose $(echo "$M8 * $M9 * $M10" | bc)
    [using arbitrary-precision arithmetic][trial division (32761)] [is number prime?] [pollard-rho (1)]

Ideally it'd print something every second or two.

> Portability caveats:
>
> * We rely on POSIX.1 getchar_unlocked for a performance advantage.
>
> * We have some hardwired W_TYPE_SIZE settings for the code interfacing
>   to longlong.h.  It is now 64 bits.  It will break on systems where
>   uintmax_t is not a 64-bit type.  Please see the beginning of
>   factor.c.

I wonder how many types of systems would be affected.

> Legal caveat:
>
> * Both Niels and Torbjörn are GNU hackers since long.  We do not
>   currently have paperwork in place for coreutils contributions.  This
>   will certainly be addressed.

Thanks.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 06 Sep 2012 22:01:01 GMT) Full text and rfc822 format available.

Message #59 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 00:00:38 +0200

Jim Meyering <jim <at> meyering.net> writes:

  The existing code can factor arbitrarily large numbers quickly, as long
  as they have no large prime factors.  We should retain that capability.

OK, so we'll put the GMP demos program into this one.

This opens another technical concern:

We have moved towards proving primality, since for 128 bit numbers it
can be done quickly.  But if we allow arbitrary large numbers, it is
expensive.

We might want an option for this choosing probabilistic testing, perhaps
--prp (common abbreviation for PRobabilistic Prime).  By default, we
should prove primality, I think.

My current devel version if GMP's demos/factorize has Lucas code.

  > * We think a --verbose option would be nice, although we don't have
  >   one in the present version.  It would output information on
  >   algorithm switches and bounds reached.  Opinions?

  I think it would be worthwhile, especially to give an idea of what progress
  is being made when factoring very large numbers, but hardly something
  that need be done now.

  E.g., currently this doesn't print much:

      $ M8=$(echo 2^31-1|bc) M9=$(echo 2^61-1|bc) M10=$(echo 2^89-1|bc)
      $ factor --verbose $(echo "$M8 * $M9 * $M10" | bc)
      [using arbitrary-precision arithmetic][trial division (32761)] [is number prime?] [pollard-rho (1)]

  Ideally it'd print something every second or two.

I'll let Niels worry about this, since he was the one to ask for it.

  > Portability caveats:
  >
  > * We rely on POSIX.1 getchar_unlocked for a performance advantage.
  >
  > * We have some hardwired W_TYPE_SIZE settings for the code interfacing
  >   to longlong.h.  It is now 64 bits.  It will break on systems where
  >   uintmax_t is not a 64-bit type.  Please see the beginning of
  >   factor.c.

  I wonder how many types of systems would be affected.

It is not used currently anywhere in coreutils?  Perhaps coreutils could
use autoconf for checking this?  (If we're really crazy, we could speed
the factor program by an additional 20% by using blocked input with
e.g. fread.)

Please take a look at the generated code for factor_using_division,
towards the end where 8 imulq should be found (on amd64).  The code uses
mov, imul, cmp, jbe for testing the divisibility of a prime; the branch
is taken when the prime divides the number being factored, thus highly
non-taken.  (I suppose we could do a better job at describing the maths,
with some references.  This particular trick is from "Division by
invariant integers using multiplication".)

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 00:23:02 GMT) Full text and rfc822 format available.

Message #62 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 06 Sep 2012 17:22:22 -0700

On 09/06/2012 02:33 PM, Jim Meyering wrote:
>> > * We have some hardwired W_TYPE_SIZE settings for the code interfacing
>> >   to longlong.h.  It is now 64 bits.  It will break on systems where
>> >   uintmax_t is not a 64-bit type.  Please see the beginning of
>> >   factor.c.
> I wonder how many types of systems would be affected.

It's only a matter of time.  GCC already supports 128-bit
integers on my everyday host (Fedora 17, x86-64, GCC 4.7.1).
Eventually uintmax_t will grow past 64 bits, if only for the
crypto guys.

If the code needs exactly-64-bit unsigned integers, shouldn't
it be using uint64_t?  That's the standard way of doing
that sort of thing.  Gnulib can supply the type on pre-C99
platforms.  Weird but standard-conforming platforms that
don't have uint64_t will be out of luck, but surely they're out
of luck anyway.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 06:20:01 GMT) Full text and rfc822 format available.

Message #65 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 08:19:14 +0200

Jim Meyering wrote:
> Jim Meyering wrote:
>
>> Torbjorn Granlund wrote:
>>> The very old factoring code cut from an now obsolete version GMP does
>>> not pass proper arguments to the mpz_probab_prime_p function.  It ask
>>> for 3 Miller-Rabin tests only, which is not sufficient.
>>
>> Hi Torbjorn
>>
>> Thank you for the patch and explanation.
>> I've converted that into the commit below in your name.
>> Please proofread it and let me know if you'd like to change anything.
>> I tweaked the patch to change MR_REPS from a #define to an enum
>> and to add the comment just preceding.
>>
>> I'll add NEWS and tests separately.
> ...
>> From: Torbjorn Granlund <tg <at> gmplib.org>
>> Date: Tue, 4 Sep 2012 16:22:47 +0200
>> Subject: [PATCH] factor: don't ever declare composites to be prime
>
> Torbjörn, I've just noticed that I misspelled your name above.
>
> Here's the NEWS/tests addition.
> Following is an adjusted commit that spells your name properly.
>
>>From e561ff991b74dc19f6728aa1e6e61d1927055ac1 Mon Sep 17 00:00:00 2001

There have been enough changes (mostly typo fixes) that I'm re-posting
these for review before I push.  Also, I added this sentence to NEWS
about the performance hit, too

    The fix makes factor somewhat slower (~25%) for ranges of consecutive
    numbers, and up to 8 times slower for some worst-case individual numbers.


From 68cf62bb04ecd138c81b68539c2a065250ca4390 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Torbj=C3=B6rn=20Granlund?= <tg <at> gmplib.org>
Date: Tue, 4 Sep 2012 18:38:29 +0200
Subject: [PATCH 1/2] factor: don't ever declare composites to be prime

The multiple-precision factoring code (with HAVE_GMP) was copied from
a now-obsolete version of GMP that did not pass proper arguments to
the mpz_probab_prime_p function.  It makes that code perform no more
than 3 Miller-Rabin tests only, which is not sufficient.

A Miller-Rabin test will detect composites with at least a probability
of 3/4.  For a uniform random composite, the probability will actually
be much higher.

Or put another way, of the N-3 possible Miller-Rabin tests for checking
the composite N, there is no number N for which more than (N-3)/4 of the
tests will fail to detect the number as a composite.  For most numbers N
the number of "false witnesses" will be much, much lower.

Problem numbers are of the form N=pq, p,q prime and (p-1)/(q-1) = s,
where s is a small integer.  (There are other problem forms too,
involving 3 or more prime factors.)  When s = 2, we get the 3/4 factor.

It is easy to find numbers of that form that cause coreutils' factor to
fail:

  465658903
  2242724851
  6635692801
  17709149503
  17754345703
  20889169003
  42743470771
  54890944111
  72047131003
  85862644003
  98275842811
  114654168091
  117225546301
  ...

There are 9008992 composites of the form with s=2 below 2^64.  With 3
Miller-Rabin tests, one would expect about 9008992/64 = 140766 to be
invalidly recognized as primes in that range.

* src/factor.c (MR_REPS): Define to 25.
(factor_using_pollard_rho): Use MR_REPS, not 3.
(print_factors_multi): Likewise.
* THANKS.in: Remove my name, now that it will be automatically
included in the generated THANKS file.
---
 THANKS.in    | 1 -
 src/factor.c | 9 ++++++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/THANKS.in b/THANKS.in
index 1580151..2c3f83c 100644
--- a/THANKS.in
+++ b/THANKS.in
@@ -608,7 +608,6 @@ Tony Leneis                         tony <at> plaza.ds.adp.com
 Tony Robinson                       ajr <at> eng.cam.ac.uk
 Toomas Soome                        Toomas.Soome <at> Elion.ee
 Toralf Förster                      toralf.foerster <at> gmx.de
-Torbjorn Granlund                   tege <at> nada.kth.se
 Torbjorn Lindgren                   tl <at> funcom.no
 Torsten Landschoff                  torsten <at> pclab.ifg.uni-kiel.de
 Travis Gummels                      tgummels <at> redhat.com
diff --git a/src/factor.c b/src/factor.c
index 1d55805..e63e0e0 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -153,6 +153,9 @@ factor_using_division (mpz_t t, unsigned int limit)
   mpz_clear (r);
 }

+/* The number of Miller-Rabin tests we require.  */
+enum { MR_REPS = 25 };
+
 static void
 factor_using_pollard_rho (mpz_t n, int a_int)
 {
@@ -222,7 +225,7 @@ S4:

       mpz_div (n, n, g);	/* divide by g, before g is overwritten */

-      if (!mpz_probab_prime_p (g, 3))
+      if (!mpz_probab_prime_p (g, MR_REPS))
         {
           do
             {
@@ -242,7 +245,7 @@ S4:
       mpz_mod (x, x, n);
       mpz_mod (x1, x1, n);
       mpz_mod (y, y, n);
-      if (mpz_probab_prime_p (n, 3))
+      if (mpz_probab_prime_p (n, MR_REPS))
         {
           emit_factor (n);
           break;
@@ -411,7 +414,7 @@ print_factors_multi (mpz_t t)
       if (mpz_cmp_ui (t, 1) != 0)
         {
           debug ("[is number prime?] ");
-          if (mpz_probab_prime_p (t, 3))
+          if (mpz_probab_prime_p (t, MR_REPS))
             emit_factor (t);
           else
             factor_using_pollard_rho (t, 1);
--
1.7.12.176.g3fc0e4c


From 0c0bfdf5e150e430f1ec3edb4fef9170c13ea268 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 4 Sep 2012 18:26:25 +0200
Subject: [PATCH 2/2] factor: NEWS and tests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* NEWS (Bug fixes): Mention it.
* tests/misc/factor.pl: Add five of Torbjörn's tests.
---
 NEWS                 | 6 ++++++
 tests/misc/factor.pl | 5 +++++
 2 files changed, 11 insertions(+)

diff --git a/NEWS b/NEWS
index 995fafb..8770a3b 100644
--- a/NEWS
+++ b/NEWS
@@ -15,6 +15,12 @@ GNU coreutils NEWS                                    -*- outline -*-
   it detects this precise type of cycle, diagnoses it as such and
   eventually exits nonzero.

+  factor (when using gmp) would mistakenly declare some composite numbers
+  to be prime, e.g., 465658903, 2242724851, 6635692801 and many more.
+  The fix makes factor somewhat slower (~25%) for ranges of consecutive
+  numbers, and up to 8 times slower for some worst-case individual numbers.
+  [bug introduced in coreutils-7.0, with GNU MP support]
+
   rm -i -d now prompts the user then removes an empty directory, rather
   than ignoring the -d option and failing with an 'Is a directory' error.
   [bug introduced in coreutils-8.19, with the addition of --dir (-d)]
diff --git a/tests/misc/factor.pl b/tests/misc/factor.pl
index 47f9343..38a5037 100755
--- a/tests/misc/factor.pl
+++ b/tests/misc/factor.pl
@@ -67,6 +67,11 @@ my @Tests =
       {OUT => "4: 2 2\n"},
       {ERR => "$prog: 'a' is not a valid positive integer\n"},
       {EXIT => 1}],
+     ['bug-2012-a', '465658903', {OUT => '15259 30517'}],
+     ['bug-2012-b', '2242724851', {OUT => '33487 66973'}],
+     ['bug-2012-c', '6635692801', {OUT => '57601 115201'}],
+     ['bug-2012-d', '17709149503', {OUT => '94099 188197'}],
+     ['bug-2012-e', '17754345703', {OUT => '94219 188437'}],
     );

 # Prepend the command line argument and append a newline to end
--
1.7.12.176.g3fc0e4c

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 08:34:02 GMT) Full text and rfc822 format available.

Message #68 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 10:32:56 +0200

Paul Eggert <eggert <at> cs.ucla.edu> writes:

  On 09/06/2012 02:33 PM, Jim Meyering wrote:
  >> > * We have some hardwired W_TYPE_SIZE settings for the code interfacing
  >> >   to longlong.h.  It is now 64 bits.  It will break on systems where
  >> >   uintmax_t is not a 64-bit type.  Please see the beginning of
  >> >   factor.c.
  > I wonder how many types of systems would be affected.

  It's only a matter of time.  GCC already supports 128-bit
  integers on my everyday host (Fedora 17, x86-64, GCC 4.7.1).
  Eventually uintmax_t will grow past 64 bits, if only for the
  crypto guys.

It should however be noted that uintmax_t stays at 64 bits even with
GCC's 128-bit integers.  I think the latter are declared as not being
integers, or something along those lines, to avoid the ABI-breaking
change of redefining uintmax_t.

  If the code needs exactly-64-bit unsigned integers, shouldn't
  it be using uint64_t?  That's the standard way of doing
  that sort of thing.  Gnulib can supply the type on pre-C99
  platforms.  Weird but standard-conforming platforms that
  don't have uint64_t will be out of luck, but surely they're out
  of luck anyway.

The code does not need any particular size of uintmax_t, except that we
need a preprocessor-time size measurement of it.  The reason for this is
longlong.h's tests of which single-line asm code to include.

The new factor program works without longlong.h, but some parts of it
will become 3-4 times slower.  To disable longlong.h, please compile
with -DUSE_LONGLONG_H=0. (The worst affected parts would probably be the
single-word Lucas code and all double-word factoring.)

I suppose that an autoconf test of the type size will be needed at least
for theoretical portability, if longlong.h is to be retained.

There is one other place where some (hypothetical) portability problems
may exist, and that's make-prime-list.c.  It prints a list of uintmax_t
literals.

We let the coreutils maintainers worry about the allowable complexity of
the factor program; I and Niels are happy to sacrifice some speed for
lowering the code complexity.  But first we will increase it by
retrofitting GMP factoring code.  :o)

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 08:44:01 GMT) Full text and rfc822 format available.

Message #71 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: nisse <at> lysator.liu.se (Niels Möller)
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 10:43:20 +0200

Jim Meyering <jim <at> meyering.net> writes:

> The existing code can factor arbitrarily large numbers quickly, as long
> as they have no large prime factors.  We should retain that capability.

My understanding is that most gnu/linux distributions build coreutils
without linking to gmp. So lots of users don't get this capability.

If this is an important feature, maybe one should consider bundling
mini-gmp and use that as a fallback in case coreutils is configured
without gmp (see
http://gmplib.org:8000/gmp/file/7677276bdf92/mini-gmp/README). I would
expect it to be a constant factor (maybe 10) times slower than the real
gmp for numbers up to a few hundred bits (for larger numbers, it gets
much slower due to lack of sophisticated algorithms, but we probably
can't factor them in reasonable time anyway).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 08:58:01 GMT) Full text and rfc822 format available.

Message #74 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: nisse <at> lysator.liu.se (Niels Möller)
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 10:56:42 +0200

Torbjorn Granlund <tg <at> gmplib.org> writes:

> There is one other place where some (hypothetical) portability problems
> may exist, and that's make-prime-list.c.  It prints a list of uintmax_t
> literals.

I don't think the prime sieving is not a problem, but for each (odd)
prime p, it also computes p^{-1} mod 2^{bits} and floor ( (2^{bits} - 1)
/ p), where "bits" is the size of an uintmax_t. This will break cross
compilation, if uintmax_t is of different size on build and host system,
or if different suffixes (U, UL, ULL) are needed in the generated
primes.h.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 09:02:02 GMT) Full text and rfc822 format available.

Message #77 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 11:01:03 +0200

Torbjorn Granlund wrote:
...
>   > * We have some hardwired W_TYPE_SIZE settings for the code interfacing
>   >   to longlong.h.  It is now 64 bits.  It will break on systems where
>   >   uintmax_t is not a 64-bit type.  Please see the beginning of
>   >   factor.c.
>
>   I wonder how many types of systems would be affected.
>
> It is not used currently anywhere in coreutils?  Perhaps coreutils could

uintmax_t is used throughout coreutils, but nowhere (that comes to mind)
does it fail when UINTMAX_MAX happens to be different than 2^64-1.
What I was wondering is how many systems have a uintmax_t that is
only 32 bits wide.  Now that I reread, I suppose this code would be
ok (albeit slower) with uintmax_t wider than 64.

> use autoconf for checking this?  (If we're really crazy, we could speed
> the factor program by an additional 20% by using blocked input with
> e.g. fread.)
>
> Please take a look at the generated code for factor_using_division,
> towards the end where 8 imulq should be found (on amd64).  The code uses
> mov, imul, cmp, jbe for testing the divisibility of a prime; the branch
> is taken when the prime divides the number being factored, thus highly
> non-taken.  (I suppose we could do a better job at describing the maths,
> with some references.  This particular trick is from "Division by
> invariant integers using multiplication".)

Any place you can add a reference would be most welcome.

Here's one where I'd appreciate a reference in a comment:

  #define MAGIC64 ((uint64_t) 0x0202021202030213ULL)
  #define MAGIC63 ((uint64_t) 0x0402483012450293ULL)
  #define MAGIC65 ((uint64_t) 0x218a019866014613ULL)
  #define MAGIC11 0x23b

  /* Returns the square root if the input is a square, otherwise 0. */
  static uintmax_t
  is_square (uintmax_t x)
  {
    /* Uses the tests suggested by Cohen. Excludes 99% of squares before
       computing the square root. */
    if (((MAGIC64 >> (x & 63)) & 1)
        && ((MAGIC63 >> (x % 63)) & 1)
        /* Both 0 and 64 are squares mod (65) */
        && ((MAGIC65 >> ((x % 65) & 63)) & 1)
        && ((MAGIC11 >> (x % 11) & 1)))
      {
        uintmax_t r = isqrt (x);
        if (r*r == x)
          return r;
      }
    return 0;
  }

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 09:29:02 GMT) Full text and rfc822 format available.

Message #80 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Niels Möller <nisse <at> lysator.liu.se>
Cc: 12350 <at> debbugs.gnu.org, Jim Meyering <jim <at> meyering.net>,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 10:28:19 +0100

On 09/07/2012 09:43 AM, Niels Möller wrote:
> Jim Meyering<jim <at> meyering.net>  writes:
>
>> The existing code can factor arbitrarily large numbers quickly, as long
>> as they have no large prime factors.  We should retain that capability.
>
> My understanding is that most gnu/linux distributions build coreutils
> without linking to gmp. So lots of users don't get this capability.
>
> If this is an important feature, maybe one should consider bundling
> mini-gmp and use that as a fallback in case coreutils is configured
> without gmp (see
> http://gmplib.org:8000/gmp/file/7677276bdf92/mini-gmp/README). I would
> expect it to be a constant factor (maybe 10) times slower than the real
> gmp for numbers up to a few hundred bits (for larger numbers, it gets
> much slower due to lack of sophisticated algorithms, but we probably
> can't factor them in reasonable time anyway).

Bundling libraries is bad if one needed to update it.
The correct approach here is to file a bug against
your distro to enable gmp which is trivial matter
of adding the build and runtime dependency on gmp.

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 09:42:02 GMT) Full text and rfc822 format available.

Message #83 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 10:41:01 +0100

On 09/07/2012 07:19 AM, Jim Meyering wrote:
> There have been enough changes (mostly typo fixes) that I'm re-posting
> these for review before I push.  Also, I added this sentence to NEWS
> about the performance hit, too
>
>      The fix makes factor somewhat slower (~25%) for ranges of consecutive
>      numbers, and up to 8 times slower for some worst-case individual numbers.

Thanks for collating all the tweaks.
+1

Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 10:36:02 GMT) Full text and rfc822 format available.

Message #86 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: nisse <at> lysator.liu.se (Niels Möller)
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org, Jim Meyering <jim <at> meyering.net>,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 12:35:32 +0200

Pádraig Brady <P <at> draigBrady.com> writes:

> On 09/07/2012 09:43 AM, Niels Möller wrote:

>> If this is an important feature, maybe one should consider bundling
>> mini-gmp

> Bundling libraries is bad if one needed to update it.

mini-gmp is not an ordinary library. It's a single portable C source
file (currently around 4000 lines) implementing a subset of the GMP API,
and with performance only a few times slower than the real thing, for
"small bignums". It's *intended* for bundling with applications, either
for unconditional use, or for use as a fallback if the real gmp library
is not available. It's never (I hope!) going to be installed in
/usr/lib. To me, coreutil's factor seem to be close match for what it's
intended for.

That said, mini-gmp is pretty new (I wrote most of it around last
Christmas) and I'm not aware of any application or library using it yet.
I think the guile hackers are considering using it (for the benefit of
applications which use guile as an extension language, but don't need
high performance bignums).

So if you decide to use it in coreutils, you'll be pioneers.

It *is* used in the GMP build process, for precomputing various internal
tables.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 11:00:02 GMT) Full text and rfc822 format available.

Message #89 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Niels Möller <nisse <at> lysator.liu.se>
Cc: 12350 <at> debbugs.gnu.org, 608832 <at> bugs.debian.org,
	Jim Meyering <jim <at> meyering.net>, Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 11:59:07 +0100

On 09/07/2012 11:35 AM, Niels Möller wrote:
> Pádraig Brady<P <at> draigBrady.com>  writes:
>
>> On 09/07/2012 09:43 AM, Niels Möller wrote:
>
>>> If this is an important feature, maybe one should consider bundling
>>> mini-gmp
>
>> Bundling libraries is bad if one needed to update it.
>
> mini-gmp is not an ordinary library. It's a single portable C source
> file (currently around 4000 lines) implementing a subset of the GMP API,
> and with performance only a few times slower than the real thing, for
> "small bignums". It's *intended* for bundling with applications, either
> for unconditional use, or for use as a fallback if the real gmp library
> is not available. It's never (I hope!) going to be installed in
> /usr/lib. To me, coreutil's factor seem to be close match for what it's
> intended for.
>
> That said, mini-gmp is pretty new (I wrote most of it around last
> Christmas) and I'm not aware of any application or library using it yet.
> I think the guile hackers are considering using it (for the benefit of
> applications which use guile as an extension language, but don't need
> high performance bignums).
>
> So if you decide to use it in coreutils, you'll be pioneers.
>
> It *is* used in the GMP build process, for precomputing various internal
> tables.

I can see the need when bootstrapping,
but I'd prefer if coreutils just relied on regular GMP.

That said, I see there is some push back in debian on depending on GMP.
Note expr from coreutils also uses GMP, which may sway the decision.

thanks,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 14:51:01 GMT) Full text and rfc822 format available.

Message #92 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Niels Möller <nisse <at> lysator.liu.se>
Cc: 12350 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>,
	Jim Meyering <jim <at> meyering.net>, Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 07:49:39 -0700

On 09/07/2012 03:35 AM, Niels Möller wrote:
> It's *intended* for bundling with applications, either
> for unconditional use, or for use as a fallback if the real gmp library
> is not available.

I've been looking for something like that for Emacs, since I want
Emacs to use bignums.  Do you think it'd be suitable?

One hassle I have with combining Emacs and GMP is that
Emacs wants to control how memory is allocated, and wants its
memory allocator to longjmp out if memory gets low, and GMP
is documented to not support that.  If the mini-gmp library
doesn't have this problem I'm thinking that Emacs might use
it *instead* of GMP.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 07 Sep 2012 18:10:01 GMT) Full text and rfc822 format available.

Message #95 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 07 Sep 2012 20:09:32 +0200

[Message part 1 (text/plain, inline)]

Jim Meyering <jim <at> meyering.net> writes:

  uintmax_t is used throughout coreutils, but nowhere (that comes to mind)
  does it fail when UINTMAX_MAX happens to be different than 2^64-1.
  What I was wondering is how many systems have a uintmax_t that is
  only 32 bits wide.  Now that I reread, I suppose this code would be
  ok (albeit slower) with uintmax_t wider than 64.
  
The code with work with longlong.h iff W_TYPE_SIZE is defined to the
bitsize of uintmax_t.

  Any place you can add a reference would be most welcome.
  
I have added comments here and there.  More comments might be desirable.

  Here's one where I'd appreciate a reference in a comment:
  
    #define MAGIC64 ((uint64_t) 0x0202021202030213ULL)
    #define MAGIC63 ((uint64_t) 0x0402483012450293ULL)
    #define MAGIC65 ((uint64_t) 0x218a019866014613ULL)
    #define MAGIC11 0x23b
  
I added a comment explaining these constants.

Here is a new version of the code.  It now has GMP factoring code,
updated from the GMP demos code.

[nt-factor-002.tar.lz (application/octet-stream, attachment)]

[Message part 3 (text/plain, inline)]

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sat, 08 Sep 2012 00:12:01 GMT) Full text and rfc822 format available.

Message #98 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>,  12350 <at> debbugs.gnu.org,
	nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sat, 08 Sep 2012 02:10:44 +0200

I found a problem with the GMP integration.

We have a 100 byte buffer in the stdin reading code, which was adequate
before we used GMP, but now one might want to attempt to factor much
larger numbers.

We'll fix that, but not tonight.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 09 Sep 2012 15:23:01 GMT) Full text and rfc822 format available.

Message #101 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 09 Sep 2012 17:21:29 +0200

We made a number of additional changes to the program.

* It now works (again) without longlong.h.  We provide this option for
  code simplicity, at the expensive of performance.

* A new command line option `-w' enables weak primes testing.  This is
  actually often a slowdown since 25 Miller-Rabin tests are currently
  often slower than the default of a combination of Miller-Rabin tests
  and Lucas tests.  It will surely speed up some cases, in particular in
  the GMP range.

* Speedup for prime_p by computing redcify of used bases more
  efficiently.

* Cleanup and more comments.

Pádraig's example should now run at about 40s on his machine.

I believe we could make things about 50% faster for numbers < 2^128,
mainly by improving powm, and by being more clever about how to compute
the powers in Lucas.  We could also speed Pollard rho by reducing the
gcd call frequency.  In the GMP range, there is more headroom.  We can
leave such improvements for the future.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 09 Sep 2012 20:01:01 GMT) Full text and rfc822 format available.

Message #104 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>,  12350 <at> debbugs.gnu.org,
	nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 09 Sep 2012 21:59:58 +0200

[Message part 1 (text/plain, inline)]

We found a bug causing performance problems for numbers between 2^64 and
2^127.  It could also trigger asserts with factoring numbers close to
2^127.

The new version is attached.

[nt-factor-004.tar.lz (application/octet-stream, attachment)]

[Message part 3 (text/plain, inline)]

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 10 Sep 2012 11:52:02 GMT) Full text and rfc822 format available.

Message #107 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>,  12350 <at> debbugs.gnu.org,
	nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 10 Sep 2012 13:51:09 +0200

[Message part 1 (text/plain, inline)]

Torbjorn Granlund <tg <at> gmplib.org> writes:

  We found a bug causing performance problems for numbers between 2^64 and
  2^127.  It could also trigger asserts with factoring numbers close to
  2^127.
  
  The new version is attached.

Another bug found, this time related to the GMP code.  It would clear
out an uninitialised structure with prime proving disabled.

Here ia a fixed version:

[nt-factor-005.tar.lz (application/octet-stream, attachment)]

[Message part 3 (text/plain, inline)]

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 13 Sep 2012 09:53:01 GMT) Full text and rfc822 format available.

Message #110 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 13 Sep 2012 11:51:48 +0200

We won't be sending any more code replacement blobs to this address; it
is most surely the wrong place.

Please get our suggested factor.c replacement from
<http://gmplib.org:8000/factoring/>.

I plan to spend no more time on this project now.  Should the
contribution be accepted, I will make the necessary amendments to the
GNU copyright paperwork.  I am certainly willing to answer questions
about the code, of course.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 13 Sep 2012 10:22:02 GMT) Full text and rfc822 format available.

Message #113 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 13 Sep 2012 12:20:30 +0200

[Message part 1 (text/plain, inline)]

Torbjorn Granlund wrote:
> We won't be sending any more code replacement blobs to this address; it
> is most surely the wrong place.

Hi Torbjorn,

I guess you're saying that because there's been too little feedback?
IMHO, this is great work.
I've been reviewing the latest and had prepared several patches.
Just hadn't made time to send them.

> Please get our suggested factor.c replacement from
> <http://gmplib.org:8000/factoring/>.
>
> I plan to spend no more time on this project now.  Should the
> contribution be accepted, I will make the necessary amendments to the

You may consider it accepted.  That was clear in my mind from
the beginning.  Sorry if I didn't make that clear to you.
Now, it's just a matter of integrating it.

> GNU copyright paperwork.  I am certainly willing to answer questions
> about the code, of course.

Here are some suggested changes -- I made these against
a temporary local git repository using your -005 tarball.
That was before I learned (just now) that you have a mercurial
repository.

I made the Makefile parallelization changes mostly to avoid
waiting too long when I run "make check" -- in the very short run.
Obviously we cannot use its GNU make features in coreutils/tests.

In coreutils, we are pretty strict on warnings, so most of these
changes are to avoid the few that remained.  I've also moved
some declarations "down" to be nearer their point of first
initialization.  That's another style issue in coreutils.  We
are no longer required to declare all variables at the top of
each block.

[k (text/plain, inline)]

From bc5f37195abb5c0f9de7d65f5cc937ab902cd774 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 11:59:17 +0200
Subject: [PATCH 01/12] CFLAGS: Add options.

---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 1378caf..9f84629 100644
--- a/Makefile
+++ b/Makefile
@@ -17,7 +17,7 @@


 CC = gcc
-CFLAGS = -O2 -g -Wall -Wno-unused-but-set-variable
+CFLAGS = -std=gnu99 -O2 -g -Werror -W -Wall -Wno-unused-but-set-variable

 all: factor make-prime-list

-- 
1.7.12.363.g53284de


From 6488446b1185498fed02984ac51a1068e0d3bc7c Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 11:55:30 +0200
Subject: [PATCH 02/12] s/const static/static const/

---
 factor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/factor.c b/factor.c
index abca67c..98e298c 100644
--- a/factor.c
+++ b/factor.c
@@ -793,7 +793,7 @@ mp_factor_using_division (mpz_t t, struct mp_factors *factors)
 #endif

 /* Entry i contains (2i+1)^(-1) mod 2^8.  */
-const static unsigned char  binvert_table[128] =
+static const unsigned char  binvert_table[128] =
 {
   0x01, 0xAB, 0xCD, 0xB7, 0x39, 0xA3, 0xC5, 0xEF,
   0xF1, 0x1B, 0x3D, 0xA7, 0x29, 0x13, 0x35, 0xDF,
-- 
1.7.12.363.g53284de


From d2b9e92f883f7817592d2cdbe338866dc9e94d49 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 12:00:00 +0200
Subject: [PATCH 03/12] prime_2p: make r and k unsigned, and

millerrabin: make k unsigned; move decl of i into for stmt.
---
 factor.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/factor.c b/factor.c
index 98e298c..0c2999b 100644
--- a/factor.c
+++ b/factor.c
@@ -1038,7 +1038,6 @@ powm2 (uintmax_t *r1m,
 int
 millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q, unsigned int k, uintmax_t one)
 {
-  unsigned int i;
   uintmax_t y, nm1;

   y = powm (b, q, n, ni, one);
@@ -1048,7 +1047,7 @@ millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q, unsigned int k
   if (y == one || y == nm1)
     return 1;

-  for (i = 1; i < k; i++)
+  for (unsigned int i = 1; i < k; i++)
     {
       y = mulredc (y, y, n, ni);

@@ -1063,9 +1062,8 @@ millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q, unsigned int k
 int
 millerrabin2 (const uintmax_t *np, uintmax_t ni,
 	      const uintmax_t *bp, const uintmax_t *qp,
-	      int k, const uintmax_t *one)
+	      unsigned int k, const uintmax_t *one)
 {
-  unsigned int i;
   uintmax_t y1, y0, nm1_1, nm1_0, r1m;

   y0 = powm2 (&r1m, bp, qp, np, ni, one);
@@ -1079,7 +1077,7 @@ millerrabin2 (const uintmax_t *np, uintmax_t ni,
   if (y0 == nm1_0 && y1 == nm1_1)
     return 1;

-  for (i = 1; i < k; i++)
+  for (unsigned int i = 1; i < k; i++)
     {
       y0 = mulredc2 (&r1m, y1, y0, y1, y0, np[1], np[0], ni);
       y1 = r1m;
@@ -1206,7 +1204,7 @@ prime2_p (uintmax_t n1, uintmax_t n0)
   uintmax_t one[2];
   uintmax_t na[2];
   uintmax_t ni;
-  int k, r;
+  unsigned int k, r;
   struct factors factors;

   if (n1 == 0)
-- 
1.7.12.363.g53284de


From 97d8c18730bd0dbed254ea64f29416c662cf772d Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 12:25:51 +0200
Subject: [PATCH 04/12] move index decl into "for" stmt

---
 factor.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/factor.c b/factor.c
index 0c2999b..0ef3667 100644
--- a/factor.c
+++ b/factor.c
@@ -1904,12 +1904,11 @@ factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors)
 	  else
 	    {
 	      struct factors f;
-	      unsigned i;

 	      f.nfactors = 0;
 	      factor_using_squfof (0, S, &f);
 	      /* Duplicate the new factors */
-	      for (i = 0; i < f.nfactors; i++)
+	      for (unsigned int i = 0; i < f.nfactors; i++)
 		factor_insert_multiplicity (factors, f.p[i], 2*f.e[i]);
 	    }
 	  return;
-- 
1.7.12.363.g53284de


From a0994e20ccbce4330fdd3a065e3db6fb43659b5e Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 6 Sep 2012 22:40:46 +0200
Subject: [PATCH 05/12] strto2uintmax: avoid signed/unsigned mismatch warning

---
 factor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/factor.c b/factor.c
index 0ef3667..e3e5ee7 100644
--- a/factor.c
+++ b/factor.c
@@ -2182,7 +2182,7 @@ strto2uintmax (uintmax_t *hip, uintmax_t *lop, const char *s)
 {
   int errcode;
   int c;
-  int lo_carry;
+  unsigned int lo_carry;
   uintmax_t hi, lo;

   hi = lo = 0;
-- 
1.7.12.363.g53284de


From 4ba89941c2eae13ab3418bf4dc839a41535260b1 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 12:27:28 +0200
Subject: [PATCH 06/12] prime_p: avoid signed/unsigned comparison warning

---
 factor.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/factor.c b/factor.c
index e3e5ee7..ff56b34 100644
--- a/factor.c
+++ b/factor.c
@@ -1119,7 +1119,7 @@ mp_millerrabin (mpz_srcptr n, mpz_srcptr nm1, mpz_ptr x, mpz_ptr y,
 int
 prime_p (uintmax_t n)
 {
-  int k, r, is_prime;
+  int k, is_prime;
   uintmax_t q, a, a_prim, one, ni;
   struct factors factors;

@@ -1152,7 +1152,7 @@ prime_p (uintmax_t n)

   /* Loop until Lucas proves our number prime, or Miller-Rabin proves our
      number composite.  */
-  for (r = 0; r < PRIMES_PTAB_ENTRIES; r++)
+  for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++)
     {
       int i;

-- 
1.7.12.363.g53284de


From d22cf07c4eb7e4da736d3230cfa29515742611b3 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 6 Sep 2012 22:42:49 +0200
Subject: [PATCH 07/12] factor_using_squfof: (MERGE w/prev) avoid
 signed/unsigned compare warning

---
 factor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/factor.c b/factor.c
index ff56b34..c49ab0b 100644
--- a/factor.c
+++ b/factor.c
@@ -1908,7 +1908,7 @@ factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors)
 	      f.nfactors = 0;
 	      factor_using_squfof (0, S, &f);
 	      /* Duplicate the new factors */
-	      for (unsigned int i = 0; i < f.nfactors; i++)
+	      for (int i = 0; i < f.nfactors; i++)
 		factor_insert_multiplicity (factors, f.p[i], 2*f.e[i]);
 	    }
 	  return;
-- 
1.7.12.363.g53284de


From d242b192b3b2d22f2040cd5260fa7de0785bd550 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 6 Sep 2012 22:45:45 +0200
Subject: [PATCH 08/12] strto2uintmax: avoid signed/unsigned compare warning

---
 factor.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/factor.c b/factor.c
index c49ab0b..b3a4ca9 100644
--- a/factor.c
+++ b/factor.c
@@ -2181,14 +2181,13 @@ int
 strto2uintmax (uintmax_t *hip, uintmax_t *lop, const char *s)
 {
   int errcode;
-  int c;
   unsigned int lo_carry;
   uintmax_t hi, lo;

   hi = lo = 0;
   for (;;)
     {
-      c = *s++;
+      unsigned char c = *s++;
       if (c == 0)
 	break;

-- 
1.7.12.363.g53284de


From 235f5a220f03d44f94222334bc832d738325c8a6 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 6 Sep 2012 23:00:10 +0200
Subject: [PATCH 09/12] HIGHBIT_TO_MASK: avoid signed/unsigned warning:

Cast 2nd operand of ternary operator to uintmax_t, to match
the type of the third operand.
---
 factor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/factor.c b/factor.c
index b3a4ca9..7b9e50d 100644
--- a/factor.c
+++ b/factor.c
@@ -342,7 +342,7 @@ void factor (uintmax_t, uintmax_t, struct factors *);

 #define HIGHBIT_TO_MASK(x)						\
   (((intmax_t)-1 >> 1) < 0						\
-   ? ((intmax_t)(x) >> (W_TYPE_SIZE - 1))				\
+   ? (uintmax_t)((intmax_t)(x) >> (W_TYPE_SIZE - 1))			\
    : ((x) & ((uintmax_t) 1 << (W_TYPE_SIZE - 1))			\
       ? UINTMAX_MAX : (uintmax_t) 0))

-- 
1.7.12.363.g53284de


From 19fc1e4bf36efef4153e49be7703bb3ea425308f Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 13:50:10 +0200
Subject: [PATCH 10/12] run tests in parallel; put expected SHA in Makefile

---
 Makefile      | 84 +++++++++++++++++++++++++++--------------------------------
 tests/b0.sha  |  1 -
 tests/b1.sha  |  1 -
 tests/b10.sha |  1 -
 tests/b2.sha  |  1 -
 tests/b3.sha  |  1 -
 tests/b4.sha  |  1 -
 tests/b5.sha  |  1 -
 tests/b6.sha  |  1 -
 tests/b7.sha  |  1 -
 tests/b8.sha  |  1 -
 tests/b9.sha  |  1 -
 tests/h1.sha  |  1 -
 tests/q1.sha  |  1 -
 tests/q2.sha  |  1 -
 tests/q3.sha  |  1 -
 tests/q4.sha  |  1 -
 tests/q5.sha  |  1 -
 tests/s0.sha  |  1 -
 tests/s1.sha  |  1 -
 tests/s2.sha  |  1 -
 tests/s3.sha  |  1 -
 tests/s4.sha  |  1 -
 tests/s5.sha  |  1 -
 tests/s6.sha  |  1 -
 tests/s7.sha  |  1 -
 tests/s8.sha  |  1 -
 tests/s9.sha  |  1 -
 28 files changed, 39 insertions(+), 72 deletions(-)
 delete mode 100644 tests/b0.sha
 delete mode 100644 tests/b1.sha
 delete mode 100644 tests/b10.sha
 delete mode 100644 tests/b2.sha
 delete mode 100644 tests/b3.sha
 delete mode 100644 tests/b4.sha
 delete mode 100644 tests/b5.sha
 delete mode 100644 tests/b6.sha
 delete mode 100644 tests/b7.sha
 delete mode 100644 tests/b8.sha
 delete mode 100644 tests/b9.sha
 delete mode 100644 tests/h1.sha
 delete mode 100644 tests/q1.sha
 delete mode 100644 tests/q2.sha
 delete mode 100644 tests/q3.sha
 delete mode 100644 tests/q4.sha
 delete mode 100644 tests/q5.sha
 delete mode 100644 tests/s0.sha
 delete mode 100644 tests/s1.sha
 delete mode 100644 tests/s2.sha
 delete mode 100644 tests/s3.sha
 delete mode 100644 tests/s4.sha
 delete mode 100644 tests/s5.sha
 delete mode 100644 tests/s6.sha
 delete mode 100644 tests/s7.sha
 delete mode 100644 tests/s8.sha
 delete mode 100644 tests/s9.sha

diff --git a/Makefile b/Makefile
index 9f84629..107a53c 100644
--- a/Makefile
+++ b/Makefile
@@ -43,51 +43,45 @@ clean:
 # Use make check CHECK_FLAGS=-s to check squfof
 CHECK_FLAGS=

-check: factor ourseq
-	./ourseq 0 1000000         | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s0.sha
-	./ourseq 18446744073709541616 18446744073709551615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b0.sha
-	./ourseq        0 10000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s1.sha
-	./ourseq 10000000 20000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s2.sha
-	./ourseq 20000000 30000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s3.sha
-	./ourseq 30000000 40000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s4.sha
-	./ourseq 40000000 50000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s5.sha
-	./ourseq 50000000 60000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s6.sha
-	./ourseq 60000000 70000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s7.sha
-	./ourseq 70000000 80000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s8.sha
-	./ourseq 80000000 90000000 | ./factor $(CHECK_FLAGS) | shasum -c --status tests/s9.sha
-	./ourseq 18446744073708551616 18446744073708651615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b1.sha
-	./ourseq 18446744073708651616 18446744073708751615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b2.sha
-	./ourseq 18446744073708751616 18446744073708851615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b3.sha
-	./ourseq 18446744073708851616 18446744073708951615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b4.sha
-	./ourseq 18446744073708951616 18446744073709051615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b5.sha
-	./ourseq 18446744073709051616 18446744073709151615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b6.sha
-	./ourseq 18446744073709151616 18446744073709251615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b7.sha
-	./ourseq 18446744073709251616 18446744073709351615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b8.sha
-	./ourseq 18446744073709351616 18446744073709451615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b9.sha
-	./ourseq 18446744073709451616 18446744073709551615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/b10.sha
-	./ourseq 18446744073709551616 18446744073709651615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/q1.sha
-	./ourseq 18446744073709651616 18446744073709751615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/q2.sha
-	./ourseq 18446744073709751616 18446744073709851615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/q3.sha
-	./ourseq 18446744073709851616 18446744073709951615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/q4.sha
-	./ourseq 18446744073709951616 18446744073710051615 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/q5.sha
-	./ourseq 79228162514264337593543850336 79228162514264337593543860335 \
-		| ./factor $(CHECK_FLAGS) | shasum -c --status tests/h1.sha
+p = 1844674407370
+q = 792281625142643375935438
+
+args = $(word 2,$(subst -, ,$@)) $(word 3,$(subst -, ,$@))
+tests = \
+  t-0-10000000-a451244522b1b662c86cb3cbb55aee3e085a61a0 \
+  t-10000000-20000000-c792a2e02f1c8536b5121f624b04039d20187016 \
+  t-20000000-30000000-8115e8dff97d1674134ec054598d939a2a5f6113 \
+  t-30000000-40000000-fe7b832c8e0ed55035152c0f9ebd59de73224a60 \
+  t-40000000-50000000-b8786d66c432e48bc5b342ee3c6752b7f096f206 \
+  t-50000000-60000000-a74fe518c5f79873c2b9016745b88b42c8fd3ede \
+  t-60000000-70000000-689bc70d681791e5d1b8ac1316a05d0c4473d6db \
+  t-70000000-80000000-d370808f2ab8c865f64c2ff909c5722db5b7d58d \
+  t-80000000-90000000-7978aa66bf2bdb446398336ea6f02605e9a77581 \
+  t-$(p)8551616-$(p)8651615-66c57cd58f4fb572df7f088d17e4f4c1d4f01bb1 \
+  t-$(p)8551616-$(p)8651615-66c57cd58f4fb572df7f088d17e4f4c1d4f01bb1 \
+  t-$(p)8651616-$(p)8751615-729228e693b1a568ecc85b199927424c7d16d410 \
+  t-$(p)8751616-$(p)8851615-5a0c985017c2d285e4698f836f5a059e0b684563 \
+  t-$(p)8851616-$(p)8951615-0482295c514e371c98ce9fd335deed0c9c44a4f4 \
+  t-$(p)8951616-$(p)9051615-9c0e1105ac7c45e27e7bbeb5e213f530d2ad1a71 \
+  t-$(p)9051616-$(p)9151615-604366d2b1d75371d0679e6a68962d66336cd383 \
+  t-$(p)9151616-$(p)9251615-9192d2bdee930135b28d7160e6d395a7027871da \
+  t-$(p)9251616-$(p)9351615-bcf56ae55d20d700690cff4d3327b78f83fc01bf \
+  t-$(p)9351616-$(p)9451615-16b106398749e5f24d278ba7c58229ae43f650ac \
+  t-$(p)9451616-$(p)9551615-ad2c6ed63525f8e7c83c4c416e7715fa1bebc54c \
+  t-$(p)9551616-$(p)9651615-2b6f9c11742d9de045515a6627c27a042c49f8ba \
+  t-$(p)9651616-$(p)9751615-54851acd51c4819beb666e26bc0100dc9adbc310 \
+  t-$(p)9751616-$(p)9851615-6939c2a7afd2d81f45f818a159b7c5226f83a50b \
+  t-$(p)9851616-$(p)9951615-0f2c8bc011d2a45e2afa01459391e68873363c6c \
+  t-$(p)9951616-18446744073710051615-630dc2ad72f4c222bad1405e6c5bea590f92a98c \
+  t-$(q)50336-$(q)60335-51ccb201e35599d545cb942e2bb31aba5bce4fc5
+
+$(tests): factor ourseq
+	@echo '$(lastword $(subst -, ,$@))  -' > exp.$@
+	@echo $(args)
+	@./ourseq $(args) | ./factor $(CHECK_FLAGS) | shasum -c --status exp.$@
+	@rm exp.$@
+
+check: $(tests) factor ourseq

 ver = `cat ver`
 dist:
diff --git a/tests/b0.sha b/tests/b0.sha
deleted file mode 100644
index 2811aa7..0000000
--- a/tests/b0.sha
+++ /dev/null
@@ -1 +0,0 @@
-28cc641760150f6bd378c27c6f810c8a4a9792d8  -
diff --git a/tests/b1.sha b/tests/b1.sha
deleted file mode 100644
index 64a6025..0000000
--- a/tests/b1.sha
+++ /dev/null
@@ -1 +0,0 @@
-66c57cd58f4fb572df7f088d17e4f4c1d4f01bb1  -
diff --git a/tests/b10.sha b/tests/b10.sha
deleted file mode 100644
index a5bf947..0000000
--- a/tests/b10.sha
+++ /dev/null
@@ -1 +0,0 @@
-ad2c6ed63525f8e7c83c4c416e7715fa1bebc54c  -
diff --git a/tests/b2.sha b/tests/b2.sha
deleted file mode 100644
index 29c5e6a..0000000
--- a/tests/b2.sha
+++ /dev/null
@@ -1 +0,0 @@
-729228e693b1a568ecc85b199927424c7d16d410  -
diff --git a/tests/b3.sha b/tests/b3.sha
deleted file mode 100644
index b8b0a56..0000000
--- a/tests/b3.sha
+++ /dev/null
@@ -1 +0,0 @@
-5a0c985017c2d285e4698f836f5a059e0b684563  -
diff --git a/tests/b4.sha b/tests/b4.sha
deleted file mode 100644
index 9ff8dad..0000000
--- a/tests/b4.sha
+++ /dev/null
@@ -1 +0,0 @@
-0482295c514e371c98ce9fd335deed0c9c44a4f4  -
diff --git a/tests/b5.sha b/tests/b5.sha
deleted file mode 100644
index d577478..0000000
--- a/tests/b5.sha
+++ /dev/null
@@ -1 +0,0 @@
-9c0e1105ac7c45e27e7bbeb5e213f530d2ad1a71  -
diff --git a/tests/b6.sha b/tests/b6.sha
deleted file mode 100644
index f8d57e6..0000000
--- a/tests/b6.sha
+++ /dev/null
@@ -1 +0,0 @@
-604366d2b1d75371d0679e6a68962d66336cd383  -
diff --git a/tests/b7.sha b/tests/b7.sha
deleted file mode 100644
index 69997c4..0000000
--- a/tests/b7.sha
+++ /dev/null
@@ -1 +0,0 @@
-9192d2bdee930135b28d7160e6d395a7027871da  -
diff --git a/tests/b8.sha b/tests/b8.sha
deleted file mode 100644
index 10aac6b..0000000
--- a/tests/b8.sha
+++ /dev/null
@@ -1 +0,0 @@
-bcf56ae55d20d700690cff4d3327b78f83fc01bf  -
diff --git a/tests/b9.sha b/tests/b9.sha
deleted file mode 100644
index 8859f9f..0000000
--- a/tests/b9.sha
+++ /dev/null
@@ -1 +0,0 @@
-16b106398749e5f24d278ba7c58229ae43f650ac  -
diff --git a/tests/h1.sha b/tests/h1.sha
deleted file mode 100644
index 3ce3865..0000000
--- a/tests/h1.sha
+++ /dev/null
@@ -1 +0,0 @@
-51ccb201e35599d545cb942e2bb31aba5bce4fc5  -
diff --git a/tests/q1.sha b/tests/q1.sha
deleted file mode 100644
index 3bc8e2a..0000000
--- a/tests/q1.sha
+++ /dev/null
@@ -1 +0,0 @@
-2b6f9c11742d9de045515a6627c27a042c49f8ba  -
diff --git a/tests/q2.sha b/tests/q2.sha
deleted file mode 100644
index c1c4ae4..0000000
--- a/tests/q2.sha
+++ /dev/null
@@ -1 +0,0 @@
-54851acd51c4819beb666e26bc0100dc9adbc310  -
diff --git a/tests/q3.sha b/tests/q3.sha
deleted file mode 100644
index 49a29e9..0000000
--- a/tests/q3.sha
+++ /dev/null
@@ -1 +0,0 @@
-6939c2a7afd2d81f45f818a159b7c5226f83a50b  -
diff --git a/tests/q4.sha b/tests/q4.sha
deleted file mode 100644
index 7922917..0000000
--- a/tests/q4.sha
+++ /dev/null
@@ -1 +0,0 @@
-0f2c8bc011d2a45e2afa01459391e68873363c6c  -
diff --git a/tests/q5.sha b/tests/q5.sha
deleted file mode 100644
index 9d4f028..0000000
--- a/tests/q5.sha
+++ /dev/null
@@ -1 +0,0 @@
-630dc2ad72f4c222bad1405e6c5bea590f92a98c  -
diff --git a/tests/s0.sha b/tests/s0.sha
deleted file mode 100644
index 3a8136d..0000000
--- a/tests/s0.sha
+++ /dev/null
@@ -1 +0,0 @@
-fe06c03e8dcaeee1cb3d657e56047a190c766d46  -
diff --git a/tests/s1.sha b/tests/s1.sha
deleted file mode 100644
index 41cccdc..0000000
--- a/tests/s1.sha
+++ /dev/null
@@ -1 +0,0 @@
-a451244522b1b662c86cb3cbb55aee3e085a61a0  -
diff --git a/tests/s2.sha b/tests/s2.sha
deleted file mode 100644
index c829df7..0000000
--- a/tests/s2.sha
+++ /dev/null
@@ -1 +0,0 @@
-c792a2e02f1c8536b5121f624b04039d20187016  -
diff --git a/tests/s3.sha b/tests/s3.sha
deleted file mode 100644
index fb8effc..0000000
--- a/tests/s3.sha
+++ /dev/null
@@ -1 +0,0 @@
-8115e8dff97d1674134ec054598d939a2a5f6113  -
diff --git a/tests/s4.sha b/tests/s4.sha
deleted file mode 100644
index f744178..0000000
--- a/tests/s4.sha
+++ /dev/null
@@ -1 +0,0 @@
-fe7b832c8e0ed55035152c0f9ebd59de73224a60  -
diff --git a/tests/s5.sha b/tests/s5.sha
deleted file mode 100644
index 61ae911..0000000
--- a/tests/s5.sha
+++ /dev/null
@@ -1 +0,0 @@
-b8786d66c432e48bc5b342ee3c6752b7f096f206  -
diff --git a/tests/s6.sha b/tests/s6.sha
deleted file mode 100644
index 0f7f250..0000000
--- a/tests/s6.sha
+++ /dev/null
@@ -1 +0,0 @@
-a74fe518c5f79873c2b9016745b88b42c8fd3ede  -
diff --git a/tests/s7.sha b/tests/s7.sha
deleted file mode 100644
index 76f5159..0000000
--- a/tests/s7.sha
+++ /dev/null
@@ -1 +0,0 @@
-689bc70d681791e5d1b8ac1316a05d0c4473d6db  -
diff --git a/tests/s8.sha b/tests/s8.sha
deleted file mode 100644
index ad84d82..0000000
--- a/tests/s8.sha
+++ /dev/null
@@ -1 +0,0 @@
-d370808f2ab8c865f64c2ff909c5722db5b7d58d  -
diff --git a/tests/s9.sha b/tests/s9.sha
deleted file mode 100644
index 2cfe601..0000000
--- a/tests/s9.sha
+++ /dev/null
@@ -1 +0,0 @@
-7978aa66bf2bdb446398336ea6f02605e9a77581  -
-- 
1.7.12.363.g53284de


From d88087d9f4ac1b1e767be6f6547a28e5a01641ff Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 14:27:18 +0200
Subject: [PATCH 11/12] adjust comment wording

---
 factor.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/factor.c b/factor.c
index 7b9e50d..b7cfcbf 100644
--- a/factor.c
+++ b/factor.c
@@ -18,8 +18,8 @@ PARTICULAR PURPOSE.  See the GNU General Public License for more details.
 You should have received a copy of the GNU General Public License along with
 this program.  If not, see http://www.gnu.org/licenses/.  */

-/* Factor efficiently numbers that fit in one or two words (word = uintmax_t),
-   of with GMP numbers of any size.
+/* Efficiently factor numbers that fit in one or two words (word = uintmax_t),
+   or, with GMP, numbers of any size.

   Code organisation:

-- 
1.7.12.363.g53284de


From 509bc134f8a3c603ac338f67c1ad0127ae29fbe3 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Tue, 11 Sep 2012 14:27:53 +0200
Subject: [PATCH 12/12] factor_insert_refind: make "off" unsigned, not int

---
 factor.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/factor.c b/factor.c
index b7cfcbf..59a4e73 100644
--- a/factor.c
+++ b/factor.c
@@ -623,9 +623,10 @@ int flag_prove_primality = 1;
 #define UNLIKELY(cond)  __builtin_expect ((cond) != 0, 0)

 void
-factor_insert_refind (struct factors *factors, uintmax_t p, unsigned int i, int off)
+factor_insert_refind (struct factors *factors, uintmax_t p, unsigned int i,
+		      unsigned int off)
 {
-  int j;
+  unsigned int j;
   for (j = 0; j < off; j++)
     p += primes_diff[i + j];
   factor_insert (factors, p);
-- 
1.7.12.363.g53284de

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 13 Sep 2012 11:32:02 GMT) Full text and rfc822 format available.

Message #116 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 13 Sep 2012 13:30:18 +0200

Jim Meyering wrote:
> Torbjorn Granlund wrote:
>> We won't be sending any more code replacement blobs to this address; it
>> is most surely the wrong place.
>
> Hi Torbjorn,
>
> I guess you're saying that because there's been too little feedback?
> IMHO, this is great work.
> I've been reviewing the latest and had prepared several patches.
> Just hadn't made time to send them.
>
>> Please get our suggested factor.c replacement from
>> <http://gmplib.org:8000/factoring/>.
>>
>> I plan to spend no more time on this project now.  Should the
>> contribution be accepted, I will make the necessary amendments to the
>
> You may consider it accepted.  That was clear in my mind from
> the beginning.  Sorry if I didn't make that clear to you.
> Now, it's just a matter of integrating it.
>
>> GNU copyright paperwork.  I am certainly willing to answer questions
>> about the code, of course.
>
> Here are some suggested changes -- I made these against
> a temporary local git repository using your -005 tarball.
> That was before I learned (just now) that you have a mercurial
> repository.

Here's a change without which ourseq misuse clobbers the heap:

    $ valgrind ./ourseq 99999999999999999999999 1
    ==7387== Memcheck, a memory error detector
    ==7387== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
    ==7387== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
    ==7387== Command: ./ourseq 99999999999999999999999 1
    ==7387==
    ==7387== Invalid write of size 8
    ==7387==    at 0x4A0A0CB: memcpy@@GLIBC_2.14 (mc_replace_strmem.c:837)
    ==7387==    by 0x4006CC: main (ourseq.c:74)
    ==7387==  Address 0x4c35040 is 0 bytes inside a block of size 3 alloc'd
    ==7387==    at 0x4A0884D: malloc (vg_replace_malloc.c:263)
    ==7387==    by 0x4006AD: main (ourseq.c:71)
    ==7387==
    first string greater than second string
    ==7387==
    ==7387== HEAP SUMMARY:
    ==7387==     in use at exit: 5 bytes in 2 blocks
    ==7387==   total heap usage: 2 allocs, 0 frees, 5 bytes allocated
    ==7387==
    ==7387== LEAK SUMMARY:
    ==7387==    definitely lost: 0 bytes in 0 blocks
    ==7387==    indirectly lost: 0 bytes in 0 blocks
    ==7387==      possibly lost: 0 bytes in 0 blocks
    ==7387==    still reachable: 5 bytes in 2 blocks
    ==7387==         suppressed: 0 bytes in 0 blocks
    ==7387== Rerun with --leak-check=full to see details of leaked memory
    ==7387==
    ==7387== For counts of detected and suppressed errors, rerun with: -v
    ==7387== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 2 from 2)


From 2fe3143867a3a2f0b3f4d0ff71c8bfca9c676127 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Thu, 13 Sep 2012 13:27:48 +0200
Subject: [PATCH] bug fix: ourseq would clobber heap for some out-of-order
 args

* ourseq.c (main): Don't clobber heap for argv[1] longer than argv[2].
Also declare functions static and some parameters const.
---
 ChangeLog |  6 ++++++
 ourseq.c  | 13 +++++++++----
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index d71a8c3..b009136 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2012-09-13  Jim Meyering  <meyering <at> redhat.com>
+
+	bug fix: ourseq would clobber heap for some out-of-order args
+	* ourseq.c (main): Don't clobber heap for argv[1] longer than argv[2].
+	Also declare functions static and some parameters const.
+
 2012-09-10  Torbjorn Granlund  <tege <at> gmplib.org>

 	* factor.c (mp_prime_p): Clear out `factors' only after Lucas run.
diff --git a/ourseq.c b/ourseq.c
index cb71f13..a9899f9 100644
--- a/ourseq.c
+++ b/ourseq.c
@@ -1,5 +1,5 @@
 /* A simple seq program that operates directly on the numeric strings.
-   This works around strange limits/bugs in standards seq implementations.  */
+   This works around strange limits/bugs in standard seq implementations.  */

 #include <stdlib.h>
 #include <string.h>
@@ -13,8 +13,8 @@ struct string

 typedef struct string string;

-int
-cmp (string *s1, string *s2)
+static int
+cmp (string const *s1, string const *s2)
 {
   size_t len1, len2;

@@ -29,7 +29,7 @@ cmp (string *s1, string *s2)
   return strcmp (s1->str, s2->str);
 }

-void
+static void
 incr (string *st)
 {
   size_t len;
@@ -67,6 +67,11 @@ main (int argc, char **argv)

   len1 = strlen (argv[1]);
   len2 = strlen (argv[2]);
+  if (len2 < len1)
+    {
+      fprintf (stderr, "first string greater than second string\n");
+      exit (1);
+    }

   b.str = malloc (len2 + 2);	/* not a typo for len1 */
   e.str = malloc (len2 + 1);
--
1.7.12.363.g53284de

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Thu, 13 Sep 2012 19:17:01 GMT) Full text and rfc822 format available.

Message #119 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Thu, 13 Sep 2012 21:15:24 +0200

  > We won't be sending any more code replacement blobs to this address; it
  > is most surely the wrong place.
  
  I guess you're saying that because there's been too little feedback?

Not really.  I suppose I'm sending a code replacement so that it get
appended to a bug report.  My initial post was infact a bug report, but
that was 30 posts ago...

  IMHO, this is great work.

Thanks.

  You may consider it accepted.  That was clear in my mind from
  the beginning.  Sorry if I didn't make that clear to you.
  Now, it's just a matter of integrating it.
  
Good.  I would certainly accept rejection, or partial rejection.  I
suppose the code is a fair bit more complex than one might perhaps
expect, but we don't know how to make it neater without a large
performance penalty.  But we can make it much more complex if desirable.
:-)

I think the ugliest part is the interface to longlong.h.  I don't know
how to make that better, though.

  Here are some suggested changes -- I made these against
  a temporary local git repository using your -005 tarball.
  That was before I learned (just now) that you have a mercurial
  repository.
  
I'll take a look within a few days.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 14 Sep 2012 00:26:01 GMT) Full text and rfc822 format available.

Message #122 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 14 Sep 2012 02:24:43 +0200

  * ourseq.c (main): Don't clobber heap for argv[1] longer than argv[2].
  Also declare functions static and some parameters const.

I put this into our repo, and also tacked on a GPL header to make sure
nobody would mistake this for non-free software.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 14 Sep 2012 11:40:02 GMT) Full text and rfc822 format available.

Message #125 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 14 Sep 2012 13:38:13 +0200

I merged your changes, and made several analogous changes.
The code now passes strict compilation with and without HAVE_GMP.

I did not merge the tests changes yet.

Please grab our version from our repo.

If your repo is public, please let me now how to access it.  Else,
please either send a diff -c between your version and our repo version,
or your full file.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 14 Sep 2012 12:50:01 GMT) Full text and rfc822 format available.

Message #128 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 14 Sep 2012 14:48:00 +0200

[Message part 1 (text/plain, inline)]

Torbjorn Granlund wrote:

> I merged your changes, and made several analogous changes.
> The code now passes strict compilation with and without HAVE_GMP.
>
> I did not merge the tests changes yet.
>
> Please grab our version from our repo.
>
> If your repo is public, please let me now how to access it.  Else,
> please either send a diff -c between your version and our repo version,
> or your full file.

Thanks for finishing the job!
Regarding the "full file", I presume you mean the Makefile,
since you've integrated all of my changes in factor.c.
Included below.

Though note that I've replaced the use of ./ourseq with simply "seq",
which depends on your having the very latest version of seq.c from
coreutils v8.19-129-g77f89d0 or newer built and first in your path.

With that Makefile, you can remove the entire tests/ directory.

[Makefile (text/plain, inline)]

# Developement Makefile for the NT factor project

# Copyright 2012 Free Software Foundation, Inc.

# This program is free software; you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software
# Foundation; either version 3 of the License, or (at your option) any later
# version.

# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more
# details.

# You should have received a copy of the GNU General Public License along with
# this program.  If not, see http://www.gnu.org/licenses/.  */


CC = gcc
CFLAGS = -std=gnu99 -O2 -g -Werror -W -Wall -Wno-unused-but-set-variable

all: factor make-prime-list

factor: factor.o

factor.o: factor.c primes.h

primes.h: make-prime-list
	./make-prime-list 5000 >primes.h

%: %.c
.PRECIOUS: %.o

%.o: %.c
	$(CC) $(CFLAGS) -c $< -o $@

%: %.o
	$(CC) $(CFLAGS) $(LDFLAGS) $^ -o $@

clean:
	rm -f make-prime-list factor primes.h ourseq *.o

# Use make check CHECK_FLAGS=-s to check squfof
CHECK_FLAGS=

p = 1844674407370
q = 792281625142643375935438

args = $(word 2,$(subst -, ,$@)) $(word 3,$(subst -, ,$@))
tests = \
  t-0-10000000-a451244522b1b662c86cb3cbb55aee3e085a61a0 \
  t-10000000-20000000-c792a2e02f1c8536b5121f624b04039d20187016 \
  t-20000000-30000000-8115e8dff97d1674134ec054598d939a2a5f6113 \
  t-30000000-40000000-fe7b832c8e0ed55035152c0f9ebd59de73224a60 \
  t-40000000-50000000-b8786d66c432e48bc5b342ee3c6752b7f096f206 \
  t-50000000-60000000-a74fe518c5f79873c2b9016745b88b42c8fd3ede \
  t-60000000-70000000-689bc70d681791e5d1b8ac1316a05d0c4473d6db \
  t-70000000-80000000-d370808f2ab8c865f64c2ff909c5722db5b7d58d \
  t-80000000-90000000-7978aa66bf2bdb446398336ea6f02605e9a77581 \
  t-$(p)8551616-$(p)8651615-66c57cd58f4fb572df7f088d17e4f4c1d4f01bb1 \
  t-$(p)8551616-$(p)8651615-66c57cd58f4fb572df7f088d17e4f4c1d4f01bb1 \
  t-$(p)8651616-$(p)8751615-729228e693b1a568ecc85b199927424c7d16d410 \
  t-$(p)8751616-$(p)8851615-5a0c985017c2d285e4698f836f5a059e0b684563 \
  t-$(p)8851616-$(p)8951615-0482295c514e371c98ce9fd335deed0c9c44a4f4 \
  t-$(p)8951616-$(p)9051615-9c0e1105ac7c45e27e7bbeb5e213f530d2ad1a71 \
  t-$(p)9051616-$(p)9151615-604366d2b1d75371d0679e6a68962d66336cd383 \
  t-$(p)9151616-$(p)9251615-9192d2bdee930135b28d7160e6d395a7027871da \
  t-$(p)9251616-$(p)9351615-bcf56ae55d20d700690cff4d3327b78f83fc01bf \
  t-$(p)9351616-$(p)9451615-16b106398749e5f24d278ba7c58229ae43f650ac \
  t-$(p)9451616-$(p)9551615-ad2c6ed63525f8e7c83c4c416e7715fa1bebc54c \
  t-$(p)9551616-$(p)9651615-2b6f9c11742d9de045515a6627c27a042c49f8ba \
  t-$(p)9651616-$(p)9751615-54851acd51c4819beb666e26bc0100dc9adbc310 \
  t-$(p)9751616-$(p)9851615-6939c2a7afd2d81f45f818a159b7c5226f83a50b \
  t-$(p)9851616-$(p)9951615-0f2c8bc011d2a45e2afa01459391e68873363c6c \
  t-$(p)9951616-18446744073710051615-630dc2ad72f4c222bad1405e6c5bea590f92a98c \
  t-$(q)50336-$(q)60335-51ccb201e35599d545cb942e2bb31aba5bce4fc5

$(tests): factor ourseq
	@echo '$(lastword $(subst -, ,$@))  -' > exp.$@
	@echo $(args)
	@seq $(args) | ./factor $(CHECK_FLAGS) | shasum -c --status exp.$@
	@rm exp.$@

check: $(tests) factor ourseq

ver = `cat ver`
dist:
	mkdir nt-factor-$(ver)
	cp -pr factor.c ChangeLog README Makefile make-prime-list.c ourseq.c longlong.h tests nt-factor-$(ver)
	tar cf - nt-factor-$(ver) | lzip >nt-factor-$(ver).tar.lz
	rm -rf nt-factor-$(ver)
	./incr ver

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 14 Sep 2012 12:53:02 GMT) Full text and rfc822 format available.

Message #131 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 14 Sep 2012 14:51:29 +0200

Torbjorn Granlund wrote:

> I merged your changes, and made several analogous changes.
> The code now passes strict compilation with and without HAVE_GMP.
>
> I did not merge the tests changes yet.
>
> Please grab our version from our repo.
>
> If your repo is public, please let me now how to access it.  Else,
> please either send a diff -c between your version and our repo version,
> or your full file.

Oh, if you don't mind making some simple, automated changes,
I'd appreciate if you were to expand all TABs in these files:

  factor.c
  longlong.h
  make-prime-list.c
  ourseq.c

And there remain a few longer-than-79 lines:

    $ wc -L *.[ch]
        83 factor.c
        88 longlong.h
        79 make-prime-list.c
        79 ourseq.c
        88 total

Can you split them?

Thanks,

Jim

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 14 Sep 2012 12:55:02 GMT) Full text and rfc822 format available.

Message #134 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 14 Sep 2012 14:53:07 +0200

Jim Meyering <jim <at> meyering.net> writes:

  Regarding the "full file", I presume you mean the Makefile,
  since you've integrated all of my changes in factor.c.
  Included below.
  
No, I mean factor.c.  I integrated the changes manually, so might have
missed something.

  Though note that I've replaced the use of ./ourseq with simply "seq",
  which depends on your having the very latest version of seq.c from
  coreutils v8.19-129-g77f89d0 or newer built and first in your path.
  
  With that Makefile, you can remove the entire tests/ directory.
  
I think I stick to our current organisation for now.  The important file
to keep in synch is factor.c

I suppose this is a good time to take care of paperwork.
Will you ping assign <at> xxx or should I?

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Fri, 14 Sep 2012 13:02:02 GMT) Full text and rfc822 format available.

Message #137 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Fri, 14 Sep 2012 15:00:40 +0200

Jim Meyering <jim <at> meyering.net> writes:

  Torbjorn Granlund wrote:

  > I merged your changes, and made several analogous changes.
  > The code now passes strict compilation with and without HAVE_GMP.
  >
  > I did not merge the tests changes yet.
  >
  > Please grab our version from our repo.
  >
  > If your repo is public, please let me now how to access it.  Else,
  > please either send a diff -c between your version and our repo version,
  > or your full file.

  Oh, if you don't mind making some simple, automated changes,
  I'd appreciate if you were to expand all TABs in these files:

    factor.c
    longlong.h
    make-prime-list.c
    ourseq.c

I would advice against making such changes to longlong.h, since it might
make sense to keep it equal to the GMP version.  We don't plan to stop
using TAB in GMP.

But feel free to make any changes to the coreutils versions of any of
these files; we then simply need to pass -b to diff.

  And there remain a few longer-than-79 lines:

      $ wc -L *.[ch]
          83 factor.c
          88 longlong.h
          79 make-prime-list.c
          79 ourseq.c
          88 total

  Can you split them?

Same goes for this type of changes.  

I don't see any lines that makes the program hard to read.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sat, 15 Sep 2012 20:49:02 GMT) Full text and rfc822 format available.

Message #140 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sat, 15 Sep 2012 22:47:02 +0200

Torbjorn Granlund wrote:

>   > We won't be sending any more code replacement blobs to this address; it
>   > is most surely the wrong place.
>
>   I guess you're saying that because there's been too little feedback?
>
> Not really.  I suppose I'm sending a code replacement so that it get
> appended to a bug report.  My initial post was infact a bug report, but
> that was 30 posts ago...
>
>   IMHO, this is great work.
>
> Thanks.
>
>   You may consider it accepted.  That was clear in my mind from
>   the beginning.  Sorry if I didn't make that clear to you.
>   Now, it's just a matter of integrating it.
>
> Good.  I would certainly accept rejection, or partial rejection.  I
> suppose the code is a fair bit more complex than one might perhaps
> expect, but we don't know how to make it neater without a large
> performance penalty.  But we can make it much more complex if desirable.
> :-)
>
> I think the ugliest part is the interface to longlong.h.  I don't know
> how to make that better, though.
>
>   Here are some suggested changes -- I made these against
>   a temporary local git repository using your -005 tarball.
>   That was before I learned (just now) that you have a mercurial
>   repository.

Here are some more suggested changes.
Sorry about the terse commit logs.
The changes are mostly stylistic.

changeset:   121:80954440c618
tag:         tip
user:        Jim Meyering <meyering <at> redhat.com>
date:        Sat Sep 15 22:37:19 2012 +0200
files:       factor.c
description:
use "unsigned long int" consistently, not "unsigned long"


 factor.c |  40 ++++++++++++++++++++--------------------
 1 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/factor.c b/factor.c
--- a/factor.c
+++ b/factor.c
@@ -119,7 +119,7 @@
 #else
 typedef unsigned char UQItype;
 typedef		 long SItype;
-typedef unsigned long USItype;
+typedef unsigned long int USItype;
 #if HAVE_LONG_LONG
 typedef	long long int DItype;
 typedef unsigned long long int UDItype;
@@ -178,8 +178,8 @@
 struct mp_factors
 {
   mpz_t         *p;
-  unsigned long *e;
-  unsigned long nfactors;
+  unsigned long int *e;
+  unsigned long int nfactors;
 };
 #endif

@@ -189,7 +189,7 @@
 #define umul_ppmm(w1, w0, u, v)						\
   do {									\
     uintmax_t __x0, __x1, __x2, __x3;					\
-    unsigned long __ul, __vl, __uh, __vh;				\
+    unsigned long int __ul, __vl, __uh, __vh;				\
     uintmax_t __u = (u), __v = (v);					\
 									\
     __ul = __ll_lowpart (__u);						\
@@ -530,9 +530,9 @@
 static void
 mp_factor_insert (struct mp_factors *factors, mpz_t prime)
 {
-  unsigned long nfactors = factors->nfactors;
+  unsigned long int nfactors = factors->nfactors;
   mpz_t         *p  = factors->p;
-  unsigned long *e  = factors->e;
+  unsigned long int *e  = factors->e;
   long i;

   /* Locate position for insert new or increment e.  */
@@ -567,7 +567,7 @@
 }

 static void
-mp_factor_insert_ui (struct mp_factors *factors, unsigned long prime)
+mp_factor_insert_ui (struct mp_factors *factors, unsigned long int prime)
 {
   mpz_t pz;

@@ -1367,13 +1367,13 @@
 #endif

 static void
-factor_using_pollard_rho (uintmax_t n, unsigned long a, struct factors *factors)
+factor_using_pollard_rho (uintmax_t n, unsigned long int a,
+			  struct factors *factors)
 {
   uintmax_t x, z, y, P, t, ni, g;
-  unsigned long k, l;

-  k = 1;
-  l = 1;
+  unsigned long int k = 1;
+  unsigned long int l = 1;

   redcify (P, 1, n);
   addmod (x, P, P, n);		/* i.e., redcify(2) */
@@ -1407,7 +1407,7 @@
 	  z = x;
 	  k = l;
 	  l = 2 * l;
-	  for (unsigned long i = 0; i < k; i++)
+	  for (unsigned long int i = 0; i < k; i++)
 	    {
 	      x = mulredc (x, x, n, ni);
 	      addmod (x, x, a, n);
@@ -1446,14 +1446,13 @@
 }

 static void
-factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long a,
+factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long int a,
 			   struct factors *factors)
 {
   uintmax_t x1, x0, z1, z0, y1, y0, P1, P0, t1, t0, ni, g1, g0, r1m;
-  unsigned long k, l;

-  k = 1;
-  l = 1;
+  unsigned long int k = 1;
+  unsigned long int l = 1;

   redcify2 (P1, P0, 1, n1, n0);
   addmod2 (x1, x0, P1, P0, P1, P0, n1, n0); /* i.e., redcify(2) */
@@ -1489,7 +1488,7 @@
 	  z1 = x1; z0 = x0;
 	  k = l;
 	  l = 2 * l;
-	  for (unsigned long i = 0; i < k; i++)
+	  for (unsigned long int i = 0; i < k; i++)
 	    {
 	      x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni);
 	      x1 = r1m;
@@ -1562,11 +1561,12 @@

 #if HAVE_GMP
 static void
-mp_factor_using_pollard_rho (mpz_t n, unsigned long a, struct mp_factors *factors)
+mp_factor_using_pollard_rho (mpz_t n, unsigned long int a,
+			     struct mp_factors *factors)
 {
   mpz_t x, z, y, P;
   mpz_t t, t2;
-  unsigned long long k, l;
+  unsigned long long int k, l;

   if (flag_verbose > 0)
     {
@@ -1608,7 +1608,7 @@
 	  mpz_set (z, x);
 	  k = l;
 	  l = 2 * l;
-	  for (unsigned long long i = 0; i < k; i++)
+	  for (unsigned long long int i = 0; i < k; i++)
 	    {
 	      mpz_mul (t, x, x);
 	      mpz_mod (x, t, n);

changeset:   120:92ea4621a776
user:        Jim Meyering <meyering <at> redhat.com>
date:        Sat Sep 15 22:33:36 2012 +0200
files:       factor.c
description:
mp_prime_p: widen type of index to match the type of upper bound

In case we operate on a number with more than INT_MAX factors ;-)


 factor.c |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/factor.c b/factor.c
--- a/factor.c
+++ b/factor.c
@@ -1328,7 +1328,7 @@
       if (flag_prove_primality)
 	{
 	  is_prime = true;
-	  for (unsigned int i = 0; i < factors.nfactors && is_prime; i++)
+	  for (unsigned long int i = 0; i < factors.nfactors && is_prime; i++)
 	    {
 	      mpz_divexact (tmp, nm1, factors.p[i]);
 	      mpz_powm (tmp, a, tmp, n);

changeset:   119:c1b592ac3449
user:        Jim Meyering <meyering <at> redhat.com>
date:        Sat Sep 15 22:22:40 2012 +0200
files:       factor.c
description:
mp_prime_p: use "unsigned long int" not "int" for result of mpz_scan1


 factor.c |  2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/factor.c b/factor.c
--- a/factor.c
+++ b/factor.c
@@ -1302,7 +1302,7 @@
   mpz_sub_ui (nm1, n, 1);

   /* Find q and k, where q is odd and n = 1 + 2**k * q.  */
-  int k = mpz_scan1 (nm1, 0);
+  unsigned long int k = mpz_scan1 (nm1, 0);
   mpz_tdiv_q_2exp (q, nm1, k);

   mpz_set_ui (a, 2);

changeset:   118:12940edfecc6
user:        Jim Meyering <meyering <at> redhat.com>
date:        Sat Sep 15 22:21:02 2012 +0200
files:       factor.c
description:
reduce scope of functions and variables, favor unsigned types, use "bool"


 factor.c |  210 ++++++++++++++++++++++++++++++---------------------------------
 1 files changed, 99 insertions(+), 111 deletions(-)

diff --git a/factor.c b/factor.c
--- a/factor.c
+++ b/factor.c
@@ -89,6 +89,7 @@
 #include <ctype.h>
 #include <string.h>		/* for memmove */
 #include <unistd.h>		/* for getopt */
+#include <stdbool.h>

 #if HAVE_GMP
 #include <gmp.h>
@@ -160,7 +161,7 @@

 enum alg_type { ALG_POLLARD_RHO = 1, ALG_SQUFOF = 2 };

-enum alg_type alg;
+static enum alg_type alg;

 /* 2*3*5*7*11...*101 is 128 bits, and has 26 prime factors */
 #define MAX_NFACTS 26
@@ -182,7 +183,7 @@
 };
 #endif

-void factor (uintmax_t, uintmax_t, struct factors *);
+static void factor (uintmax_t, uintmax_t, struct factors *);

 #ifndef umul_ppmm
 #define umul_ppmm(w1, w0, u, v)						\
@@ -309,7 +310,7 @@
     uintmax_t _rh, _rl, _nh, _nl;					\
     umul_ppmm (_rh, _rl, (a), (b));					\
     _nh = n; _nl = 0;							\
-    for (int _i = W_TYPE_SIZE; _i != 0; _i--)				\
+    for (unsigned int _i = W_TYPE_SIZE; _i != 0; _i--)			\
       {									\
 	rsh2 (_nh, _nl, _nh, _nl, 1);					\
 	if (ge2 (_rh, _rl, _nh, _nl))					\
@@ -352,10 +353,10 @@

 /* Compute r = a mod d, where r = <*t1,retval>, a = <a1,a0>, d = <d1,d0>.
    Requires that d1 != 0.  */
-uintmax_t
+static uintmax_t
 mod2 (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t d1, uintmax_t d0)
 {
-  int cntd, cnta, cnt;
+  int cntd, cnta;

   assert (d1 != 0);

@@ -367,7 +368,7 @@

   count_leading_zeros (cntd, d1);
   count_leading_zeros (cnta, a1);
-  cnt = cntd - cnta;
+  int cnt = cntd - cnta;
   lsh2 (d1, d0, d1, d0, cnt);
   for (int i = 0; i < cnt; i++)
     {
@@ -380,7 +381,7 @@
   return a0;
 }

-uintmax_t
+static uintmax_t
 gcd_odd (uintmax_t a, uintmax_t b)
 {
   if ( (b & 1) == 0)
@@ -418,7 +419,7 @@
     }
 }

-uintmax_t
+static uintmax_t
 gcd2_odd (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0)
 {
   while ((a0 & 1) == 0)
@@ -456,16 +457,16 @@
   return a0;
 }

-void
+static void
 factor_insert_multiplicity (struct factors *factors,
 			    uintmax_t prime, unsigned int m)
 {
-  int nfactors = factors->nfactors;
+  unsigned int nfactors = factors->nfactors;
   uintmax_t *p = factors->p;
   unsigned char *e = factors->e;
-  int i;

   /* Locate position for insert new or increment e.  */
+  int i;
   for (i = nfactors - 1; i >= 0; i--)
     {
       if (p[i] <= prime)
@@ -491,7 +492,7 @@

 #define factor_insert(f, p) factor_insert_multiplicity(f, p, 1)

-void
+static void
 factor_insert_large (struct factors *factors,
 		     uintmax_t p1, uintmax_t p0)
 {
@@ -506,9 +507,9 @@
 }

 #if HAVE_GMP
-void mp_factor (mpz_t, struct mp_factors *);
+static void mp_factor (mpz_t, struct mp_factors *);

-void
+static void
 mp_factor_init (struct mp_factors *factors)
 {
   factors->p = malloc (1);
@@ -516,7 +517,7 @@
   factors->nfactors = 0;
 }

-void
+static void
 mp_factor_clear (struct mp_factors *factors)
 {
   for (unsigned int i = 0; i < factors->nfactors; i++)
@@ -526,10 +527,10 @@
   free (factors->e);
 }

-void
+static void
 mp_factor_insert (struct mp_factors *factors, mpz_t prime)
 {
-  long    nfactors  = factors->nfactors;
+  unsigned long nfactors = factors->nfactors;
   mpz_t         *p  = factors->p;
   unsigned long *e  = factors->e;
   long i;
@@ -565,7 +566,7 @@
     }
 }

-void
+static void
 mp_factor_insert_ui (struct mp_factors *factors, unsigned long prime)
 {
   mpz_t pz;
@@ -606,10 +607,10 @@
 #undef P

 /* This flag is honoured just in the GMP code. */
-int flag_verbose = 0;
+static int flag_verbose = 0;

 /* Prove primality or run probabilistic tests.  */
-int flag_prove_primality = 1;
+static int flag_prove_primality = 1;

 /* Number of Miller-Rabin tests to run when not proving primality. */
 #define MR_REPS 25
@@ -617,7 +618,7 @@
 #define LIKELY(cond)    __builtin_expect ((cond) != 0, 1)
 #define UNLIKELY(cond)  __builtin_expect ((cond) != 0, 0)

-void
+static void
 factor_insert_refind (struct factors *factors, uintmax_t p, unsigned int i,
 		      unsigned int off)
 {
@@ -659,16 +660,13 @@
    order, and the non-multiples of p onto the range lim < q < B.
  */

-uintmax_t
+static uintmax_t
 factor_using_division (uintmax_t *t1p, uintmax_t t1, uintmax_t t0,
 		       struct factors *factors)
 {
-  unsigned int i;
-  uintmax_t p;
-
   if (t0 % 2 == 0)
     {
-      int cnt;
+      unsigned int cnt;

       if (t0 == 0)
 	{
@@ -686,7 +684,8 @@
       factor_insert_multiplicity (factors, 2, cnt);
     }

-  p = 3;
+  uintmax_t p = 3;
+  unsigned int i;
   for (i = 0; t1 > 0 && i < PRIMES_PTAB_ENTRIES; i++)
     {
       for (;;)
@@ -724,9 +723,7 @@
   for (; i < PRIMES_PTAB_ENTRIES; i += 8)
     {
       uintmax_t q;
-      const struct primes_dtab *pd;
-
-      pd = &primes_dtab[i];
+      const struct primes_dtab *pd = &primes_dtab[i];
       DIVBLOCK(0);
       DIVBLOCK(1);
       DIVBLOCK(2);
@@ -745,7 +742,7 @@
 }

 #if HAVE_GMP
-void
+static void
 mp_factor_using_division (mpz_t t, struct mp_factors *factors)
 {
   mpz_t q;
@@ -882,7 +879,7 @@
   } while (0)

 /* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B.
-   Both a and b has to be in redc form, the result will be in redc form too. */
+   Both a and b must be in redc form, the result will be in redc form too. */
 inline uintmax_t
 mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)
 {
@@ -899,9 +896,9 @@
 }

 /* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B.
-   Both a and b has to be in redc form, the result will be in redc form too.
+   Both a and b must be in redc form, the result will be in redc form too.
    For performance reasons, the most significant bit of m must be clear. */
-uintmax_t
+static uintmax_t
 mulredc2 (uintmax_t *r1p,
 	  uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0,
 	  uintmax_t m1, uintmax_t m0, uintmax_t mi)
@@ -969,12 +966,10 @@
   return r0;
 }

-uintmax_t
+static uintmax_t
 powm (uintmax_t b, uintmax_t e, uintmax_t n, uintmax_t ni, uintmax_t one)
 {
-  uintmax_t y;
-
-  y = one;
+  uintmax_t y = one;

   if (e & 1)
     y = b;
@@ -991,7 +986,7 @@
   return y;
 }

-uintmax_t
+static uintmax_t
 powm2 (uintmax_t *r1m,
        const uintmax_t *bp, const uintmax_t *ep, const uintmax_t *np,
        uintmax_t ni, const uintmax_t *one)
@@ -1032,32 +1027,30 @@
   return r0;
 }

-int
+static bool
 millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q,
 	     unsigned int k, uintmax_t one)
 {
-  uintmax_t y, nm1;
+  uintmax_t y = powm (b, q, n, ni, one);

-  y = powm (b, q, n, ni, one);
-
-  nm1 = n - one;	/* -1, but in redc representation. */
+  uintmax_t nm1 = n - one;	/* -1, but in redc representation. */

   if (y == one || y == nm1)
-    return 1;
+    return true;

   for (unsigned int i = 1; i < k; i++)
     {
       y = mulredc (y, y, n, ni);

       if (y == nm1)
-	return 1;
+	return true;
       if (y == one)
-	return 0;
+	return false;
     }
-  return 0;
+  return false;
 }

-int
+static bool
 millerrabin2 (const uintmax_t *np, uintmax_t ni, const uintmax_t *bp,
 	      const uintmax_t *qp, unsigned int k, const uintmax_t *one)
 {
@@ -1067,12 +1060,12 @@
   y1 = r1m;

   if (y0 == one[0] && y1 == one[1])
-    return 1;
+    return true;

   sub_ddmmss (nm1_1, nm1_0, np[1], np[0], one[1], one[0]);

   if (y0 == nm1_0 && y1 == nm1_1)
-    return 1;
+    return true;

   for (unsigned int i = 1; i < k; i++)
     {
@@ -1080,66 +1073,65 @@
       y1 = r1m;

       if (y0 == nm1_0 && y1 == nm1_1)
-	return 1;
+	return true;
       if (y0 == one[0] && y1 == one[1])
-	return 0;
+	return false;
     }
-  return 0;
+  return false;
 }

 #if HAVE_GMP
-static int
+static bool
 mp_millerrabin (mpz_srcptr n, mpz_srcptr nm1, mpz_ptr x, mpz_ptr y,
 		mpz_srcptr q, unsigned long int k)
 {
-  unsigned long int i;
-
   mpz_powm (y, x, q, n);

   if (mpz_cmp_ui (y, 1) == 0 || mpz_cmp (y, nm1) == 0)
-    return 1;
+    return true;

-  for (i = 1; i < k; i++)
+  for (unsigned long int i = 1; i < k; i++)
     {
       mpz_powm_ui (y, y, 2, n);
       if (mpz_cmp (y, nm1) == 0)
-	return 1;
+	return true;
       if (mpz_cmp_ui (y, 1) == 0)
-	return 0;
+	return false;
     }
-  return 0;
+  return false;
 }
 #endif

 /* Lucas' prime test.  The number of iterations vary greatly, up to a few dozen
    have been observed.  The average seem to be about 2.  */
-int
+static bool
 prime_p (uintmax_t n)
 {
-  int k, is_prime;
-  uintmax_t q, a, a_prim, one, ni;
+  int k;
+  bool is_prime;
+  uintmax_t a_prim, one, ni;
   struct factors factors;

   if (n <= 1)
-    return 0;
+    return false;

   /* We have already casted out small primes. */
   if (n < (uintmax_t) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
-    return 1;
+    return true;

   /* Precomputation for Miller-Rabin.  */
-  q = n - 1;
+  uintmax_t q = n - 1;
   for (k = 0; (q & 1) == 0; k++)
     q >>= 1;

-  a = 2;
+  uintmax_t a = 2;
   binv (ni, n);			/* ni <- 1/n mod B */
   redcify (one, 1, n);
   addmod (a_prim, one, one, n);	/* i.e., redcify a = 2 */

   /* Perform a Miller-Rabin test, finds most composites quickly.  */
   if (!millerrabin (n, ni, a_prim, q, k, one))
-    return 0;
+    return false;

   if (flag_prove_primality)
     {
@@ -1151,12 +1143,10 @@
      number composite.  */
   for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++)
     {
-      int i;
-
       if (flag_prove_primality)
 	{
-	  is_prime = 1;
-	  for (i = 0; i < factors.nfactors && is_prime; i++)
+	  is_prime = true;
+	  for (unsigned int i = 0; i < factors.nfactors && is_prime; i++)
 	    {
 	      is_prime = powm (a_prim, (n - 1) / factors.p[i], n, ni, one) != one;
 	    }
@@ -1168,7 +1158,7 @@
 	}

       if (is_prime)
-	return 1;
+	return true;

       a += primes_diff[r];	/* Establish new base.  */

@@ -1185,18 +1175,17 @@
       }

       if (!millerrabin (n, ni, a_prim, q, k, one))
-	return 0;
+	return false;
     }

   fprintf (stderr, "Lucas prime test failure.  This should not happen\n");
   abort ();
 }

-int
+static bool
 prime2_p (uintmax_t n1, uintmax_t n0)
 {
   uintmax_t q[2], nm1[2];
-  uintmax_t a;
   uintmax_t a_prim[2];
   uintmax_t one[2];
   uintmax_t na[2];
@@ -1223,7 +1212,7 @@
       rsh2 (q[1], q[0], nm1[1], nm1[0], k);
     }

-  a = 2;
+  uintmax_t a = 2;
   binv (ni, n0);
   redcify2 (one[1], one[0], 1, n1, n0);
   addmod2 (a_prim[1], a_prim[0], one[1], one[0], one[1], one[0], n1, n0);
@@ -1233,7 +1222,7 @@
   na[1] = n1;

   if (!millerrabin2 (na, ni, a_prim, q, k, one))
-    return 0;
+    return false;

   if (flag_prove_primality)
     {
@@ -1245,12 +1234,12 @@
      number composite.  */
   for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++)
     {
-      int i, is_prime;
+      bool is_prime;
       uintmax_t e[2], y[2];

       if (flag_prove_primality)
 	{
-	  is_prime = 1;
+	  is_prime = true;
 	  if (factors.plarge[1])
 	    {
 	      uintmax_t pi;
@@ -1260,7 +1249,7 @@
 	      y[0] = powm2 (&y[1], a_prim, e, na, ni, one);
 	      is_prime = (y[0] != one[0] || y[1] != one[1]);
 	    }
-	  for (i = 0; i < factors.nfactors && is_prime; i++)
+	  for (unsigned int i = 0; i < factors.nfactors && is_prime; i++)
 	    {
 	      /* FIXME: We always have the factor 2. Do we really need to handle it
 		 here? We have done the same powering as part of millerrabin. */
@@ -1279,13 +1268,13 @@
 	}

       if (is_prime)
-	return 1;
+	return true;

       a += primes_diff[r];	/* Establish new base.  */
       redcify2 (a_prim[1], a_prim[0], a, n1, n0);

       if (!millerrabin2 (na, ni, a_prim, q, k, one))
-	return 0;
+	return false;
     }

   fprintf (stderr, "Lucas prime test failure.  This should not happen\n");
@@ -1293,19 +1282,19 @@
 }

 #if HAVE_GMP
-int
+static bool
 mp_prime_p (mpz_t n)
 {
-  int k, is_prime;
+  bool is_prime;
   mpz_t q, a, nm1, tmp;
   struct mp_factors factors;

   if (mpz_cmp_ui (n, 1) <= 0)
-    return 0;
+    return false;

   /* We have already casted out small primes. */
   if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) < 0)
-    return 1;
+    return true;

   mpz_inits (q, a, nm1, tmp, NULL);

@@ -1313,7 +1302,7 @@
   mpz_sub_ui (nm1, n, 1);

   /* Find q and k, where q is odd and n = 1 + 2**k * q.  */
-  k = mpz_scan1 (nm1, 0);
+  int k = mpz_scan1 (nm1, 0);
   mpz_tdiv_q_2exp (q, nm1, k);

   mpz_set_ui (a, 2);
@@ -1321,7 +1310,7 @@
   /* Perform a Miller-Rabin test, finds most composites quickly.  */
   if (!mp_millerrabin (n, nm1, a, tmp, q, k))
     {
-      is_prime = 0;
+      is_prime = false;
       goto ret2;
     }

@@ -1338,7 +1327,7 @@
     {
       if (flag_prove_primality)
 	{
-	  is_prime = 1;
+	  is_prime = true;
 	  for (unsigned int i = 0; i < factors.nfactors && is_prime; i++)
 	    {
 	      mpz_divexact (tmp, nm1, factors.p[i]);
@@ -1359,7 +1348,7 @@

       if (!mp_millerrabin (n, nm1, a, tmp, q, k))
 	{
-	  is_prime = 0;
+	  is_prime = false;
 	  goto ret1;
 	}
     }
@@ -1377,11 +1366,11 @@
 }
 #endif

-void
+static void
 factor_using_pollard_rho (uintmax_t n, unsigned long a, struct factors *factors)
 {
   uintmax_t x, z, y, P, t, ni, g;
-  unsigned long k, l, i;
+  unsigned long k, l;

   k = 1;
   l = 1;
@@ -1418,7 +1407,7 @@
 	  z = x;
 	  k = l;
 	  l = 2 * l;
-	  for (i = 0; i < k; i++)
+	  for (unsigned long i = 0; i < k; i++)
 	    {
 	      x = mulredc (x, x, n, ni);
 	      addmod (x, x, a, n);
@@ -1456,12 +1445,12 @@
     }
 }

-void
+static void
 factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long a,
 			   struct factors *factors)
 {
   uintmax_t x1, x0, z1, z0, y1, y0, P1, P0, t1, t0, ni, g1, g0, r1m;
-  unsigned long k, l, i;
+  unsigned long k, l;

   k = 1;
   l = 1;
@@ -1500,7 +1489,7 @@
 	  z1 = x1; z0 = x0;
 	  k = l;
 	  l = 2 * l;
-	  for (i = 0; i < k; i++)
+	  for (unsigned long i = 0; i < k; i++)
 	    {
 	      x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni);
 	      x1 = r1m;
@@ -1572,12 +1561,12 @@
 }

 #if HAVE_GMP
-void
+static void
 mp_factor_using_pollard_rho (mpz_t n, unsigned long a, struct mp_factors *factors)
 {
   mpz_t x, z, y, P;
   mpz_t t, t2;
-  unsigned long long k, l, i;
+  unsigned long long k, l;

   if (flag_verbose > 0)
     {
@@ -1619,7 +1608,7 @@
 	  mpz_set (z, x);
 	  k = l;
 	  l = 2 * l;
-	  for (i = 0; i < k; i++)
+	  for (unsigned long long i = 0; i < k; i++)
 	    {
 	      mpz_mul (t, x, x);
 	      mpz_mod (x, t, n);
@@ -1859,7 +1848,7 @@
 #endif


-void
+static void
 factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors)
 {
   /* Uses algorithm and notation from
@@ -2114,7 +2103,7 @@
   exit (EXIT_FAILURE);
 }

-void
+static void
 factor (uintmax_t t1, uintmax_t t0, struct factors *factors)
 {
   factors->nfactors = 0;
@@ -2149,7 +2138,7 @@
 }

 #if HAVE_GMP
-void
+static void
 mp_factor (mpz_t t, struct mp_factors *factors)
 {
   mp_factor_init (factors);
@@ -2173,7 +2162,7 @@
 }
 #endif

-int
+static int
 strto2uintmax (uintmax_t *hip, uintmax_t *lop, const char *s)
 {
   int errcode;
@@ -2226,7 +2215,7 @@
   return errcode;
 }

-void
+static void
 print_uintmaxes (uintmax_t t1, uintmax_t t0)
 {
   uintmax_t q, r;
@@ -2245,7 +2234,7 @@
     }
 }

-void
+static void
 factor_one (const char *input)
 {
   uintmax_t t1, t0;
@@ -2353,7 +2342,6 @@
 int
 main (int argc, char *argv[])
 {
-  int i;
   int c;

   alg = ALG_POLLARD_RHO;	/* Default to Pollard rho */
@@ -2380,7 +2368,7 @@
 #endif

   if (optind < argc)
-    for (i = optind; i < argc; i++)
+    for (int i = optind; i < argc; i++)
       factor_one (argv[i]);
   else
     {
@@ -2396,7 +2384,7 @@
     {
       double acc_f;
       printf ("q  freq.  cum. freq.(total: %d)\n", q_freq[0]);
-      for (i = 1, acc_f = 0.0; i <= Q_FREQ_SIZE; i++)
+      for (unsigned int i = 1, acc_f = 0.0; i <= Q_FREQ_SIZE; i++)
 	{
 	  double f = (double) q_freq[i] / q_freq[0];
 	  acc_f += f;

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 16 Sep 2012 20:45:01 GMT) Full text and rfc822 format available.

Message #143 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 16 Sep 2012 22:43:35 +0200

Jim Meyering wrote:
...
> Here are some more suggested changes.
> Sorry about the terse commit logs.
> The changes are mostly stylistic.
>
> changeset:   121:80954440c618

Hi Torbjorn,

I've begun inserting your factor.c into coreutils.
That enables a lot more warnings, and I've made a few
changes, beginning to accommodate them.  Unfortunately,
I've also converted TABs to spaces, so I'll let you
compute your own diffs (presumably with -b).  Included below:

Would you mind changing the names of a few variables
or adjusting declarations to avoid some -Wshadow warnings?

I changed the innermost "r" to "rem" locally, but there are
others.  Also, "S".

  make  all-recursive
  make[1]: Entering directory `/h/j/w/co/cu'
  Making all in po
  make[2]: Entering directory `/h/j/w/co/cu/po'
  make[2]: Leaving directory `/h/j/w/co/cu/po'
  Making all in .
  make[2]: Entering directory `/h/j/w/co/cu'
    CC       src/factor.o
  src/factor.c: In function 'factor_using_squfof':
  src/factor.c:1896:17: error: declaration of 'S' shadows a previous local [-Werror=shadow]
         uintmax_t S, Dh, Dl, Q1, Q, P, L, L1, B;
                   ^
  src/factor.c:1860:13: error: shadowed declaration is here [-Werror=shadow]
     uintmax_t S;
               ^
  src/factor.c:1987:25: error: declaration of 'r' shadows a previous local [-Werror=shadow]
                 uintmax_t r = is_square (Q);
                           ^
  src/factor.c:1945:31: error: shadowed declaration is here [-Werror=shadow]
             uintmax_t q, P1, t, r;
                                 ^
  src/factor.c:2037:33: error: declaration of 'r' shadows a previous local [-Werror=shadow]
                         uintmax_t r;
                                   ^
  src/factor.c:1987:25: error: shadowed declaration is here [-Werror=shadow]
                 uintmax_t r = is_square (Q);
                           ^
  src/factor.c: At top level:
  src/factor.c:2291:1: error: no previous prototype for 'read_item' [-Werror=missing-prototypes]
   read_item (struct inbuf *bufstruct)
   ^
  cc1: all warnings being treated as errors
  make[2]: *** [src/factor.o] Error 1
  make[2]: Leaving directory `/h/j/w/co/cu'
  make[1]: *** [all-recursive] Error 1
  make[1]: Leaving directory `/h/j/w/co/cu'
  make: *** [all] Error 2

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 16 Sep 2012 20:49:01 GMT) Full text and rfc822 format available.

Message #146 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 16 Sep 2012 22:47:05 +0200

[Message part 1 (text/plain, inline)]

Jim Meyering wrote:

> Jim Meyering wrote:
> ...
>> Here are some more suggested changes.
>> Sorry about the terse commit logs.
>> The changes are mostly stylistic.
>>
>> changeset:   121:80954440c618
>
> Hi Torbjorn,
>
> I've begun inserting your factor.c into coreutils.
> That enables a lot more warnings, and I've made a few
> changes, beginning to accommodate them.  Unfortunately,
> I've also converted TABs to spaces, so I'll let you
> compute your own diffs (presumably with -b).  Included below:
>
> Would you mind changing the names of a few variables
> or adjusting declarations to avoid some -Wshadow warnings?
>
> I changed the innermost "r" to "rem" locally, but there are
> others.  Also, "S".
>
>   make  all-recursive
>   make[1]: Entering directory `/h/j/w/co/cu'
>   Making all in po
>   make[2]: Entering directory `/h/j/w/co/cu/po'
>   make[2]: Leaving directory `/h/j/w/co/cu/po'
>   Making all in .
>   make[2]: Entering directory `/h/j/w/co/cu'
>     CC       src/factor.o
>   src/factor.c: In function 'factor_using_squfof':
>   src/factor.c:1896:17: error: declaration of 'S' shadows a previous local [-Werror=shadow]
>          uintmax_t S, Dh, Dl, Q1, Q, P, L, L1, B;
>                    ^
>   src/factor.c:1860:13: error: shadowed declaration is here [-Werror=shadow]
>      uintmax_t S;
>                ^
>   src/factor.c:1987:25: error: declaration of 'r' shadows a previous local [-Werror=shadow]
>                  uintmax_t r = is_square (Q);
>                            ^
>   src/factor.c:1945:31: error: shadowed declaration is here [-Werror=shadow]
>              uintmax_t q, P1, t, r;
>                                  ^
>   src/factor.c:2037:33: error: declaration of 'r' shadows a previous local [-Werror=shadow]
>                          uintmax_t r;
>                                    ^
>   src/factor.c:1987:25: error: shadowed declaration is here [-Werror=shadow]
>                  uintmax_t r = is_square (Q);
>                            ^
>   src/factor.c: At top level:
>   src/factor.c:2291:1: error: no previous prototype for 'read_item' [-Werror=missing-prototypes]
>    read_item (struct inbuf *bufstruct)
>    ^
>   cc1: all warnings being treated as errors
>   make[2]: *** [src/factor.o] Error 1
>   make[2]: Leaving directory `/h/j/w/co/cu'
>   make[1]: *** [all-recursive] Error 1
>   make[1]: Leaving directory `/h/j/w/co/cu'
>   make: *** [all] Error 2

Here is factor.c, as I've begun to adapt it.
I.e., including <config.h> and adding the _GL_ATTRIBUTE_CONST
attributes are a must in coreutils, but not useful to you
in the stand-alone package.

At least one other change may be interesting to you:
another scope-reduction one:

    - inline uintmax_t
    + static inline uintmax_t
      mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)

[factor.c (text/plain, inline)]

/* Factoring of uintmax_t numbers.

   Contributed to the GNU project by Torbjörn Granlund and Niels Möller
   Contains code from GNU MP.

Copyright 1995, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2005, 2009, 2012
Free Software Foundation, Inc.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 3 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.  See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with
this program.  If not, see http://www.gnu.org/licenses/.  */

/* Efficiently factor numbers that fit in one or two words (word = uintmax_t),
   or, with GMP, numbers of any size.

  Code organisation:

    There are several variants of many functions, for handling one word, two
    words, and GMP's mpz_t type.  If the one-word variant is called foo, the
    two-word variant will be foo2, and the one for mpz_t will be mp_foo.  In
    some cases, the plain function variants will handle both one-word and
    two-word numbers, evidenced by function arguments.

    The factoring code for two words will fall into the code for one word when
    progress allows that.

    Using GMP is optional.  Define HAVE_GMP to make this code include GMP
    factoring code.  The GMP factoring code is based on GMP's demos/factorize.c
    (last synched 2012-09-07).  The GMP-based factoring code will stay in GMP
    factoring code even if numbers get small enough for using the two-word
    code.

  Algorithm:

    (1) Perform trial division using a small primes table, but without hardware
        division since the primes table store inverses modulo the word base.
        (The GMP variant of this code doesn't make use of the precomputed
        inverses, but instead relies on GMP for fast divisibility testing.)
    (2) Check the nature of any non-factored part using Miller-Rabin for
        detecting composites, and Lucas for detecting primes.
    (3) Factor any remaining composite part using the Pollard-Brent rho
        algorithm or the SQUFOF algorithm, checking status of found factors
        again using Miller-Rabin and Lucas.

    We prefer using Hensel norm in the divisions, not the more familiar
    Euclidian norm, since the former leads to much faster code.  In the
    Pollard-Brent rho code and the the prime testing code, we use Montgomery's
    trick of multiplying all n-residues by the word base, allowing cheap Hensel
    reductions mod n.

  Improvements:

    * Use modular inverses also for exact division in the Lucas code, and
      elsewhere.  A problem is to locate the inverses not from an index, but
      from a prime.  We might instead compute the inverse on-the-fly.

    * Tune trial division table size (not forgetting that this is a standalone
      program where the table will be read from disk for each invocation).

    * Implement less naive powm, using k-ary exponentiation for k = 3 or
      perhaps k = 4.

    * Try to speed trial division code for single uintmax_t numbers, i.e., the
      code using DIVBLOCK.  It currently runs at 2 cycles per prime (Intel SBR,
      IBR), 3 cycles per prime (AMD Stars) and 5 cycles per prime (AMD BD) when
      using gcc 4.6 and 4.7.  Some software pipelining should help; 1, 2, and 4
      respectively cycles ought to be possible.

    * The redcify function could be vastly improved by using (plain Euclidian)
      pre-inversion (such as GMP's invert_limb) and udiv_qrnnd_preinv (from
      GMP's gmp-impl.h).  The redcify2 function could be vastly improved using
      similar methoods.  These functions currently dominate run time when using
      the -w option.
*/

#include <config.h>

#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <errno.h>
#include <assert.h>
#include <ctype.h>
#include <string.h>             /* for memmove */
#include <unistd.h>             /* for getopt */
#include <stdbool.h>

#include "system.h"

#if HAVE_GMP
# include <gmp.h>
#endif

#ifndef STAT_SQUFOF
# define STAT_SQUFOF 0
#endif

#ifndef USE_LONGLONG_H
# define USE_LONGLONG_H 1
#endif

#if USE_LONGLONG_H

/* Make definitions for longlong.h to make it do what it can do for us */
# define W_TYPE_SIZE 64          /* bitcount for uintmax_t */
# define UWtype  uintmax_t
# define UHWtype unsigned long int
# undef UDWtype
# if HAVE_ATTRIBUTE_MODE
typedef unsigned int UQItype    __attribute__ ((mode (QI)));
typedef          int SItype     __attribute__ ((mode (SI)));
typedef unsigned int USItype    __attribute__ ((mode (SI)));
typedef          int DItype     __attribute__ ((mode (DI)));
typedef unsigned int UDItype    __attribute__ ((mode (DI)));
# else
typedef unsigned char UQItype;
typedef          long SItype;
typedef unsigned long int USItype;
#  if HAVE_LONG_LONG
typedef long long int DItype;
typedef unsigned long long int UDItype;
#  else /* Assume `long' gives us a wide enough type.  Needed for hppa2.0w.  */
typedef long int DItype;
typedef unsigned long int UDItype;
#  endif
# endif
# define LONGLONG_STANDALONE     /* Don't require GMP's longlong.h mdep files */
# define ASSERT(x)               /* FIXME make longlong.h really standalone */
# define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */
# ifndef __GMP_GNUC_PREREQ
#  define __GMP_GNUC_PREREQ(a,b) 1
# endif
# if _ARCH_PPC
#  define HAVE_HOST_CPU_FAMILY_powerpc 1
# endif
# include "longlong.h"
# ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
const unsigned char factor_clz_tab[129] =
{
  1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
  7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
  8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
  8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
  9
};
# endif

#else /* not USE_LONGLONG_H */

# define W_TYPE_SIZE (8 * sizeof (uintmax_t))
# define __ll_B ((uintmax_t) 1 << (W_TYPE_SIZE / 2))
# define __ll_lowpart(t)  ((uintmax_t) (t) & (__ll_B - 1))
# define __ll_highpart(t) ((uintmax_t) (t) >> (W_TYPE_SIZE / 2))

#endif

enum alg_type { ALG_POLLARD_RHO = 1, ALG_SQUFOF = 2 };

static enum alg_type alg;

/* 2*3*5*7*11...*101 is 128 bits, and has 26 prime factors */
#define MAX_NFACTS 26

struct factors
{
  uintmax_t     plarge[2]; /* Can have a single large factor */
  uintmax_t     p[MAX_NFACTS];
  unsigned char e[MAX_NFACTS];
  unsigned char nfactors;
};

#if HAVE_GMP
struct mp_factors
{
  mpz_t         *p;
  unsigned long int *e;
  unsigned long int nfactors;
};
#endif

static void factor (uintmax_t, uintmax_t, struct factors *);

#ifndef umul_ppmm
# define umul_ppmm(w1, w0, u, v)                                         \
  do {                                                                  \
    uintmax_t __x0, __x1, __x2, __x3;                                   \
    unsigned long int __ul, __vl, __uh, __vh;                           \
    uintmax_t __u = (u), __v = (v);                                     \
                                                                        \
    __ul = __ll_lowpart (__u);                                          \
    __uh = __ll_highpart (__u);                                         \
    __vl = __ll_lowpart (__v);                                          \
    __vh = __ll_highpart (__v);                                         \
                                                                        \
    __x0 = (uintmax_t) __ul * __vl;                                     \
    __x1 = (uintmax_t) __ul * __vh;                                     \
    __x2 = (uintmax_t) __uh * __vl;                                     \
    __x3 = (uintmax_t) __uh * __vh;                                     \
                                                                        \
    __x1 += __ll_highpart (__x0);/* this can't give carry */            \
    __x1 += __x2;               /* but this indeed can */               \
    if (__x1 < __x2)            /* did we get it? */                    \
      __x3 += __ll_B;           /* yes, add it in the proper pos. */    \
                                                                        \
    (w1) = __x3 + __ll_highpart (__x1);                                 \
    (w0) = (__x1 << W_TYPE_SIZE / 2) + __ll_lowpart (__x0);             \
  } while (0)
#endif

#if !defined(udiv_qrnnd) || defined(UDIV_NEEDS_NORMALIZATION)
/* Define our own, not needing normalization. This function is
   currently not performance critical, so keep it simple. Similar to
   the mod macro below. */
# undef udiv_qrnnd
# define udiv_qrnnd(q, r, n1, n0, d)                                     \
  do {                                                                  \
    uintmax_t __d1, __d0, __q, __r1, __r0;                              \
                                                                        \
    assert ((n1) < (d));                                                \
    __d1 = (d); __d0 = 0;                                               \
    __r1 = (n1); __r0 = (n0);                                           \
    __q = 0;                                                            \
    for (unsigned int __i = W_TYPE_SIZE; __i > 0; __i--)                \
      {                                                                 \
        rsh2 (__d1, __d0, __d1, __d0, 1);                               \
        __q <<= 1;                                                      \
        if (ge2 (__r1, __r0, __d1, __d0))                               \
          {                                                             \
            __q++;                                                      \
            sub_ddmmss (__r1, __r0, __r1, __r0, __d1, __d0);            \
          }                                                             \
      }                                                                 \
    (r) = __r0;                                                         \
    (q) = __q;                                                          \
  } while (0)
#endif

#if !defined (add_ssaaaa)
# define add_ssaaaa(sh, sl, ah, al, bh, bl)                              \
  do {                                                                  \
    uintmax_t _add_x;                                                   \
    _add_x = (al) + (bl);                                               \
    (sh) = (ah) + (bh) + (_add_x < (al));                               \
    (sl) = _add_x;                                                      \
  } while (0)
#endif

#define rsh2(rh, rl, ah, al, cnt)                                       \
  do {                                                                  \
    (rl) = ((ah) << (W_TYPE_SIZE - (cnt))) | ((al) >> (cnt));           \
    (rh) = (ah) >> (cnt);                                               \
  } while (0)

#define lsh2(rh, rl, ah, al, cnt)                                       \
  do {                                                                  \
    (rh) = ((ah) << cnt) | ((al) >> (W_TYPE_SIZE - (cnt)));             \
    (rl) = (al) << (cnt);                                               \
  } while (0)

#define ge2(ah, al, bh, bl)                                             \
  ((ah) > (bh) || ((ah) == (bh) && (al) >= (bl)))

#define gt2(ah, al, bh, bl)                                             \
  ((ah) > (bh) || ((ah) == (bh) && (al) > (bl)))

#ifndef sub_ddmmss
# define sub_ddmmss(rh, rl, ah, al, bh, bl)                              \
  do {                                                                  \
    uintmax_t _cy;                                                      \
    _cy = (al) < (bl);                                                  \
    (rl) = (al) - (bl);                                                 \
    (rh) = (ah) - (bh) - _cy;                                           \
  } while (0)
#endif

#ifndef count_leading_zeros
# define count_leading_zeros(count, x) do {                              \
    uintmax_t __clz_x = (x);                                            \
    unsigned int __clz_c;                                               \
    for (__clz_c = 0;                                                   \
         (__clz_x & ((uintmax_t) 0xff << (W_TYPE_SIZE - 8))) == 0;      \
         __clz_c += 8)                                                  \
      __clz_x <<= 8;                                                    \
    for (; (intmax_t)__clz_x >= 0; __clz_c++)                           \
      __clz_x <<= 1;                                                    \
    (count) = __clz_c;                                                  \
  } while (0)
#endif

#ifndef count_trailing_zeros
# define count_trailing_zeros(count, x) do {                             \
    uintmax_t __ctz_x = (x);                                            \
    unsigned int __ctz_c = 0;                                           \
    while ((__ctz_x & 1) == 0)                                          \
      {                                                                 \
        __ctz_x >>= 1;                                                  \
        __ctz_c++;                                                      \
      }                                                                 \
    (count) = __ctz_c;                                                  \
  } while (0)
#endif

/* Requires that a < n and b <= n */
#define submod(r,a,b,n)                                                 \
  do {                                                                  \
    uintmax_t _t = - (uintmax_t) (a < b);                               \
    (r) = ((n) & _t) + (a) - (b);                                       \
  } while (0)

#define addmod(r,a,b,n)                                                 \
  submod((r), (a), ((n) - (b)), (n))

/* Modular two-word addition and subtraction.  For performance reasons, the
   most significant bit of n1 must be clear.  The destination variables must be
   distinct from the mod operand.  */
#define addmod2(r1, r0, a1, a0, b1, b0, n1, n0)                         \
  do {                                                                  \
    add_ssaaaa ((r1), (r0), (a1), (a0), (b1), (b0));                    \
    if (ge2 ((r1), (r0), (n1), (n0)))                                   \
      sub_ddmmss ((r1), (r0), (r1), (r0), (n1), (n0));                  \
  } while (0)
#define submod2(r1, r0, a1, a0, b1, b0, n1, n0)                         \
  do {                                                                  \
    sub_ddmmss ((r1), (r0), (a1), (a0), (b1), (b0));                    \
    if ((intmax_t) (r1) < 0)                                            \
      add_ssaaaa ((r1), (r0), (r1), (r0), (n1), (n0));                  \
  } while (0)

#define HIGHBIT_TO_MASK(x)                                              \
  (((intmax_t)-1 >> 1) < 0                                              \
   ? (uintmax_t)((intmax_t)(x) >> (W_TYPE_SIZE - 1))                    \
   : ((x) & ((uintmax_t) 1 << (W_TYPE_SIZE - 1))                        \
      ? UINTMAX_MAX : (uintmax_t) 0))

/* Compute r = a mod d, where r = <*t1,retval>, a = <a1,a0>, d = <d1,d0>.
   Requires that d1 != 0.  */
static uintmax_t
mod2 (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t d1, uintmax_t d0)
{
  int cntd, cnta;

  assert (d1 != 0);

  if (a1 == 0)
    {
      *r1 = 0;
      return a0;
    }

  count_leading_zeros (cntd, d1);
  count_leading_zeros (cnta, a1);
  int cnt = cntd - cnta;
  lsh2 (d1, d0, d1, d0, cnt);
  for (int i = 0; i < cnt; i++)
    {
      if (ge2 (a1, a0, d1, d0))
        sub_ddmmss (a1, a0, a1, a0, d1, d0);
      rsh2 (d1, d0, d1, d0, 1);
    }

  *r1 = a1;
  return a0;
}

static uintmax_t _GL_ATTRIBUTE_CONST
gcd_odd (uintmax_t a, uintmax_t b)
{
  if ( (b & 1) == 0)
    {
      uintmax_t t = b;
      b = a;
      a = t;
    }
  if (a == 0)
    return b;

  /* Take out least significant one bit, to make room for sign */
  b >>= 1;

  for (;;)
    {
      uintmax_t t;
      uintmax_t bgta;

      while ((a & 1) == 0)
        a >>= 1;
      a >>= 1;

      t = a - b;
      if (t == 0)
        return (a << 1) + 1;

      bgta = HIGHBIT_TO_MASK (t);

      /* b <-- min (a, b) */
      b += (bgta & t);

      /* a <-- |a - b| */
      a = (t ^ bgta) - bgta;
    }
}

static uintmax_t
gcd2_odd (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0)
{
  while ((a0 & 1) == 0)
    rsh2 (a1, a0, a1, a0, 1);
  while ((b0 & 1) == 0)
    rsh2 (b1, b0, b1, b0, 1);

  for (;;)
    {
      if ((b1 | a1) == 0)
        {
          *r1 = 0;
          return gcd_odd (b0, a0);
        }

      if (gt2 (a1, a0, b1, b0))
        {
          sub_ddmmss (a1, a0, a1, a0, b1, b0);
          do
            rsh2 (a1, a0, a1, a0, 1);
          while ((a0 & 1) == 0);
        }
      else if (gt2 (b1, b0, a1, a0))
        {
          sub_ddmmss (b1, b0, b1, b0, a1, a0);
          do
            rsh2 (b1, b0, b1, b0, 1);
          while ((b0 & 1) == 0);
        }
      else
        break;
    }

  *r1 = a1;
  return a0;
}

static void
factor_insert_multiplicity (struct factors *factors,
                            uintmax_t prime, unsigned int m)
{
  unsigned int nfactors = factors->nfactors;
  uintmax_t *p = factors->p;
  unsigned char *e = factors->e;

  /* Locate position for insert new or increment e.  */
  int i;
  for (i = nfactors - 1; i >= 0; i--)
    {
      if (p[i] <= prime)
        break;
    }

  if (i < 0 || p[i] != prime)
    {
      for (int j = nfactors - 1; j > i; j--)
        {
          p[j + 1] = p[j];
          e[j + 1] = e[j];
        }
      p[i + 1] = prime;
      e[i + 1] = m;
      factors->nfactors = nfactors + 1;
    }
  else
    {
      e[i] += m;
    }
}

#define factor_insert(f, p) factor_insert_multiplicity(f, p, 1)

static void
factor_insert_large (struct factors *factors,
                     uintmax_t p1, uintmax_t p0)
{
  if (p1 > 0)
    {
      assert (factors->plarge[1] == 0);
      factors->plarge[0] = p0;
      factors->plarge[1] = p1;
    }
  else
    factor_insert (factors, p0);
}

#if HAVE_GMP
static void mp_factor (mpz_t, struct mp_factors *);

static void
mp_factor_init (struct mp_factors *factors)
{
  factors->p = malloc (1);
  factors->e = malloc (1);
  factors->nfactors = 0;
}

static void
mp_factor_clear (struct mp_factors *factors)
{
  for (unsigned int i = 0; i < factors->nfactors; i++)
    mpz_clear (factors->p[i]);

  free (factors->p);
  free (factors->e);
}

static void
mp_factor_insert (struct mp_factors *factors, mpz_t prime)
{
  unsigned long int nfactors = factors->nfactors;
  mpz_t         *p  = factors->p;
  unsigned long int *e  = factors->e;
  long i;

  /* Locate position for insert new or increment e.  */
  for (i = nfactors - 1; i >= 0; i--)
    {
      if (mpz_cmp (p[i], prime) <= 0)
        break;
    }

  if (i < 0 || mpz_cmp (p[i], prime) != 0)
    {
      p = realloc (p, (nfactors + 1) * sizeof p[0]);
      e = realloc (e, (nfactors + 1) * sizeof e[0]);

      mpz_init (p[nfactors]);
      for (long j = nfactors - 1; j > i; j--)
        {
          mpz_set (p[j + 1], p[j]);
          e[j + 1] = e[j];
        }
      mpz_set (p[i + 1], prime);
      e[i + 1] = 1;

      factors->p = p;
      factors->e = e;
      factors->nfactors = nfactors + 1;
    }
  else
    {
      e[i] += 1;
    }
}

static void
mp_factor_insert_ui (struct mp_factors *factors, unsigned long int prime)
{
  mpz_t pz;

  mpz_init_set_ui (pz, prime);
  mp_factor_insert (factors, pz);
  mpz_clear (pz);
}
#endif /* HAVE_GMP */


#define P(a,b,c,d) a,
static const unsigned char primes_diff[] = {
#include "primes.h"
0,0,0,0,0,0,0                           /* 7 sentinels for 8-way loop */
};
#undef P

#define PRIMES_PTAB_ENTRIES (sizeof(primes_diff) / sizeof(primes_diff[0]) - 8 + 1)

#define P(a,b,c,d) b,
static const unsigned char primes_diff8[] = {
#include "primes.h"
0,0,0,0,0,0,0                           /* 7 sentinels for 8-way loop */
};
#undef P

struct primes_dtab
{
  uintmax_t binv, lim;
};

#define P(a,b,c,d) {c,d},
static const struct primes_dtab primes_dtab[] = {
#include "primes.h"
{1,0},{1,0},{1,0},{1,0},{1,0},{1,0},{1,0} /* 7 sentinels for 8-way loop */
};
#undef P

/* This flag is honoured just in the GMP code. */
static int flag_verbose = 0;

/* Prove primality or run probabilistic tests.  */
static int flag_prove_primality = 1;

/* Number of Miller-Rabin tests to run when not proving primality. */
#define MR_REPS 25

#define LIKELY(cond)    __builtin_expect ((cond) != 0, 1)
#define UNLIKELY(cond)  __builtin_expect ((cond) != 0, 0)

static void
factor_insert_refind (struct factors *factors, uintmax_t p, unsigned int i,
                      unsigned int off)
{
  for (unsigned int j = 0; j < off; j++)
    p += primes_diff[i + j];
  factor_insert (factors, p);
}

/* Trial division with odd primes uses the following trick.

   Let p be an odd prime, and B = 2^{W_TYPE_SIZE}. For simplicity,
   consider the case t < B (this is the second loop below).

   From our tables we get

     binv = p^{-1} (mod B)
     lim = floor ( (B-1) / p ).

   First assume that t is a multiple of p, t = q * p. Then 0 <= q <=
   lim (and all quotients in this range occur for some t).

   Then t = q * p is true also (mod B), and p is invertible we get

     q = t * binv (mod B).

   Next, assume that t is *not* divisible by p. Since multiplication
   by binv (mod B) is a one-to-one mapping,

     t * binv (mod B) > lim,

   because all the smaller values are already taken.

   This can be summed up by saying that the function

     q(t) = binv * t (mod B)

   is a permutation of the range 0 <= t < B, with the curious property
   that it maps the multiples of p onto the range 0 <= q <= lim, in
   order, and the non-multiples of p onto the range lim < q < B.
 */

static uintmax_t
factor_using_division (uintmax_t *t1p, uintmax_t t1, uintmax_t t0,
                       struct factors *factors)
{
  if (t0 % 2 == 0)
    {
      unsigned int cnt;

      if (t0 == 0)
        {
          count_trailing_zeros (cnt, t1);
          t0 = t1 >> cnt;
          t1 = 0;
          cnt += W_TYPE_SIZE;
        }
      else
        {
          count_trailing_zeros (cnt, t0);
          rsh2 (t1, t0, t1, t0, cnt);
        }

      factor_insert_multiplicity (factors, 2, cnt);
    }

  uintmax_t p = 3;
  unsigned int i;
  for (i = 0; t1 > 0 && i < PRIMES_PTAB_ENTRIES; i++)
    {
      for (;;)
        {
          uintmax_t q1, q0, hi, lo;

          q0 = t0 * primes_dtab[i].binv;
          umul_ppmm (hi, lo, q0, p);
          if (hi > t1)
            break;
          hi = t1 - hi;
          q1 = hi * primes_dtab[i].binv;
          if (LIKELY (q1 > primes_dtab[i].lim))
            break;
          t1 = q1; t0 = q0;
          factor_insert (factors, p);
        }
      p += primes_diff[i + 1];
    }
  if (t1p)
    *t1p = t1;

#define DIVBLOCK(I)                                                     \
  do {                                                                  \
    for (;;)                                                            \
      {                                                                 \
        q = t0 * pd[I].binv;                                            \
        if (LIKELY (q > pd[I].lim))                                     \
          break;                                                        \
        t0 = q;                                                         \
        factor_insert_refind (factors, p, i + 1, I);                    \
      }                                                                 \
  } while (0)

  for (; i < PRIMES_PTAB_ENTRIES; i += 8)
    {
      uintmax_t q;
      const struct primes_dtab *pd = &primes_dtab[i];
      DIVBLOCK (0);
      DIVBLOCK (1);
      DIVBLOCK (2);
      DIVBLOCK (3);
      DIVBLOCK (4);
      DIVBLOCK (5);
      DIVBLOCK (6);
      DIVBLOCK (7);

      p += primes_diff8[i];
      if (p * p > t0)
        break;
    }

  return t0;
}

#if HAVE_GMP
static void
mp_factor_using_division (mpz_t t, struct mp_factors *factors)
{
  mpz_t q;
  unsigned long int p;

  if (flag_verbose > 0)
    {
      printf ("[trial division] ");
    }

  mpz_init (q);

  p = mpz_scan1 (t, 0);
  mpz_div_2exp (t, t, p);
  while (p)
    {
      mp_factor_insert_ui (factors, 2);
      --p;
    }

  p = 3;
  for (unsigned int i = 1; i <= PRIMES_PTAB_ENTRIES;)
    {
      if (! mpz_divisible_ui_p (t, p))
        {
          p += primes_diff[i++];
          if (mpz_cmp_ui (t, p * p) < 0)
            break;
        }
      else
        {
          mpz_tdiv_q_ui (t, t, p);
          mp_factor_insert_ui (factors, p);
        }
    }

  mpz_clear (q);
}
#endif

/* Entry i contains (2i+1)^(-1) mod 2^8.  */
static const unsigned char  binvert_table[128] =
{
  0x01, 0xAB, 0xCD, 0xB7, 0x39, 0xA3, 0xC5, 0xEF,
  0xF1, 0x1B, 0x3D, 0xA7, 0x29, 0x13, 0x35, 0xDF,
  0xE1, 0x8B, 0xAD, 0x97, 0x19, 0x83, 0xA5, 0xCF,
  0xD1, 0xFB, 0x1D, 0x87, 0x09, 0xF3, 0x15, 0xBF,
  0xC1, 0x6B, 0x8D, 0x77, 0xF9, 0x63, 0x85, 0xAF,
  0xB1, 0xDB, 0xFD, 0x67, 0xE9, 0xD3, 0xF5, 0x9F,
  0xA1, 0x4B, 0x6D, 0x57, 0xD9, 0x43, 0x65, 0x8F,
  0x91, 0xBB, 0xDD, 0x47, 0xC9, 0xB3, 0xD5, 0x7F,
  0x81, 0x2B, 0x4D, 0x37, 0xB9, 0x23, 0x45, 0x6F,
  0x71, 0x9B, 0xBD, 0x27, 0xA9, 0x93, 0xB5, 0x5F,
  0x61, 0x0B, 0x2D, 0x17, 0x99, 0x03, 0x25, 0x4F,
  0x51, 0x7B, 0x9D, 0x07, 0x89, 0x73, 0x95, 0x3F,
  0x41, 0xEB, 0x0D, 0xF7, 0x79, 0xE3, 0x05, 0x2F,
  0x31, 0x5B, 0x7D, 0xE7, 0x69, 0x53, 0x75, 0x1F,
  0x21, 0xCB, 0xED, 0xD7, 0x59, 0xC3, 0xE5, 0x0F,
  0x11, 0x3B, 0x5D, 0xC7, 0x49, 0x33, 0x55, 0xFF
};

/* Compute n^(-1) mod B, using a Newton iteration.  */
#define binv(inv,n)                                                     \
  do {                                                                  \
    uintmax_t  __n = (n);                                               \
    uintmax_t  __inv;                                                   \
                                                                        \
    __inv = binvert_table[(__n / 2) & 0x7F]; /*  8 */                   \
    if (W_TYPE_SIZE > 8)   __inv = 2 * __inv - __inv * __inv * __n;     \
    if (W_TYPE_SIZE > 16)  __inv = 2 * __inv - __inv * __inv * __n;     \
    if (W_TYPE_SIZE > 32)  __inv = 2 * __inv - __inv * __inv * __n;     \
                                                                        \
    if (W_TYPE_SIZE > 64)                                               \
      {                                                                 \
        int  __invbits = 64;                                            \
        do {                                                            \
          __inv = 2 * __inv - __inv * __inv * __n;                      \
          __invbits *= 2;                                               \
        } while (__invbits < W_TYPE_SIZE);                              \
      }                                                                 \
                                                                        \
    (inv) = __inv;                                                      \
  } while (0)

/* q = u / d, assuming d|u.  */
#define divexact_21(q1, q0, u1, u0, d)                                  \
  do {                                                                  \
    uintmax_t _di, _q0;                                                 \
    binv (_di, (d));                                                    \
    _q0 = (u0) * _di;                                                   \
    if ((u1) >= (d))                                                    \
      {                                                                 \
        uintmax_t _p1, _p0;                                             \
        umul_ppmm (_p1, _p0, _q0, d);                                   \
        (q1) = ((u1) - _p1) * _di;                                      \
        (q0) = _q0;                                                     \
      }                                                                 \
    else                                                                \
      {                                                                 \
        (q0) = _q0;                                                     \
        (q1) = 0;                                                       \
      }                                                                 \
  } while(0)

/* x B (mod n). */
#define redcify(r_prim, r, n)                                           \
  do {                                                                  \
    uintmax_t _redcify_q;                                               \
    udiv_qrnnd (_redcify_q, r_prim, r, 0, n);                           \
  } while (0)

/* x B^2 (mod n). Requires x > 0, n1 < B/2 */
#define redcify2(r1, r0, x, n1, n0)                                     \
  do {                                                                  \
    uintmax_t _r1, _r0, _i;                                             \
    if ((x) < (n1))                                                     \
      {                                                                 \
        _r1 = (x); _r0 = 0;                                             \
        _i = W_TYPE_SIZE;                                               \
      }                                                                 \
    else                                                                \
      {                                                                 \
        _r1 = 0; _r0 = (x);                                             \
        _i = 2*W_TYPE_SIZE;                                             \
      }                                                                 \
    while (_i-- > 0)                                                    \
      {                                                                 \
        lsh2 (_r1, _r0, _r1, _r0, 1);                                   \
        if (ge2 (_r1, _r0, (n1), (n0)))                                 \
          sub_ddmmss (_r1, _r0, _r1, _r0, (n1), (n0));                  \
      }                                                                 \
    (r1) = _r1;                                                         \
    (r0) = _r0;                                                         \
  } while (0)

/* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B.
   Both a and b must be in redc form, the result will be in redc form too. */
static inline uintmax_t
mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)
{
  uintmax_t rh, rl, q, th, tl, xh;

  umul_ppmm (rh, rl, a, b);
  q = rl * mi;
  umul_ppmm (th, tl, q, m);
  xh = rh - th;
  if (rh < th)
    xh += m;

  return xh;
}

/* Modular two-word multiplication, r = a * b mod m, with mi = m^(-1) mod B.
   Both a and b must be in redc form, the result will be in redc form too.
   For performance reasons, the most significant bit of m must be clear. */
static uintmax_t
mulredc2 (uintmax_t *r1p,
          uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0,
          uintmax_t m1, uintmax_t m0, uintmax_t mi)
{
  uintmax_t r1, r0, q, p1, p0, t1, t0, s1, s0;
  mi = -mi;
  assert ( (a1 >> (W_TYPE_SIZE - 1)) == 0);
  assert ( (b1 >> (W_TYPE_SIZE - 1)) == 0);
  assert ( (m1 >> (W_TYPE_SIZE - 1)) == 0);

  /* First compute a0 * <b1, b0> B^{-1}
        +-----+
        |a0 b0|
     +--+--+--+
     |a0 b1|
     +--+--+--+
        |q0 m0|
     +--+--+--+
     |q0 m1|
    -+--+--+--+
     |r1|r0| 0|
     +--+--+--+
  */
  umul_ppmm (t1, t0, a0, b0);
  umul_ppmm (r1, r0, a0, b1);
  q = mi * t0;
  umul_ppmm (p1, p0, q, m0);
  umul_ppmm (s1, s0, q, m1);
  r0 += (t0 != 0); /* Carry */
  add_ssaaaa (r1, r0, r1, r0, 0, p1);
  add_ssaaaa (r1, r0, r1, r0, 0, t1);
  add_ssaaaa (r1, r0, r1, r0, s1, s0);

  /* Next, (a1 * <b1, b0> + <r1, r0> B^{-1}
        +-----+
        |a1 b0|
        +--+--+
        |r1|r0|
     +--+--+--+
     |a1 b1|
     +--+--+--+
        |q1 m0|
     +--+--+--+
     |q1 m1|
    -+--+--+--+
     |r1|r0| 0|
     +--+--+--+
  */
  umul_ppmm (t1, t0, a1, b0);
  umul_ppmm (s1, s0, a1, b1);
  add_ssaaaa (t1, t0, t1, t0, 0, r0);
  q = mi * t0;
  add_ssaaaa (r1, r0, s1, s0, 0, r1);
  umul_ppmm (p1, p0, q, m0);
  umul_ppmm (s1, s0, q, m1);
  r0 += (t0 != 0); /* Carry */
  add_ssaaaa (r1, r0, r1, r0, 0, p1);
  add_ssaaaa (r1, r0, r1, r0, 0, t1);
  add_ssaaaa (r1, r0, r1, r0, s1, s0);

  if (ge2 (r1, r0, m1, m0))
    sub_ddmmss (r1, r0, r1, r0, m1, m0);

  *r1p = r1;
  return r0;
}

static uintmax_t _GL_ATTRIBUTE_CONST
powm (uintmax_t b, uintmax_t e, uintmax_t n, uintmax_t ni, uintmax_t one)
{
  uintmax_t y = one;

  if (e & 1)
    y = b;

  while (e != 0)
    {
      b = mulredc (b, b, n, ni);
      e >>= 1;

      if (e & 1)
        y = mulredc (y, b, n, ni);
    }

  return y;
}

static uintmax_t
powm2 (uintmax_t *r1m,
       const uintmax_t *bp, const uintmax_t *ep, const uintmax_t *np,
       uintmax_t ni, const uintmax_t *one)
{
  uintmax_t r1, r0, b1, b0, n1, n0;
  unsigned int i;
  uintmax_t e;

  b0 = bp[0];
  b1 = bp[1];
  n0 = np[0];
  n1 = np[1];

  r0 = one[0];
  r1 = one[1];

  for (e = ep[0], i = W_TYPE_SIZE; i > 0; i--, e >>= 1)
    {
      if (e & 1)
        {
          r0 = mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni);
          r1 = *r1m;
        }
      b0 = mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni);
      b1 = *r1m;
    }
  for (e = ep[1]; e > 0; e >>= 1)
    {
      if (e & 1)
        {
          r0 = mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni);
          r1 = *r1m;
        }
      b0 = mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni);
      b1 = *r1m;
    }
  *r1m = r1;
  return r0;
}

static bool _GL_ATTRIBUTE_CONST
millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q,
             unsigned int k, uintmax_t one)
{
  uintmax_t y = powm (b, q, n, ni, one);

  uintmax_t nm1 = n - one;      /* -1, but in redc representation. */

  if (y == one || y == nm1)
    return true;

  for (unsigned int i = 1; i < k; i++)
    {
      y = mulredc (y, y, n, ni);

      if (y == nm1)
        return true;
      if (y == one)
        return false;
    }
  return false;
}

static bool
millerrabin2 (const uintmax_t *np, uintmax_t ni, const uintmax_t *bp,
              const uintmax_t *qp, unsigned int k, const uintmax_t *one)
{
  uintmax_t y1, y0, nm1_1, nm1_0, r1m;

  y0 = powm2 (&r1m, bp, qp, np, ni, one);
  y1 = r1m;

  if (y0 == one[0] && y1 == one[1])
    return true;

  sub_ddmmss (nm1_1, nm1_0, np[1], np[0], one[1], one[0]);

  if (y0 == nm1_0 && y1 == nm1_1)
    return true;

  for (unsigned int i = 1; i < k; i++)
    {
      y0 = mulredc2 (&r1m, y1, y0, y1, y0, np[1], np[0], ni);
      y1 = r1m;

      if (y0 == nm1_0 && y1 == nm1_1)
        return true;
      if (y0 == one[0] && y1 == one[1])
        return false;
    }
  return false;
}

#if HAVE_GMP
static bool
mp_millerrabin (mpz_srcptr n, mpz_srcptr nm1, mpz_ptr x, mpz_ptr y,
                mpz_srcptr q, unsigned long int k)
{
  mpz_powm (y, x, q, n);

  if (mpz_cmp_ui (y, 1) == 0 || mpz_cmp (y, nm1) == 0)
    return true;

  for (unsigned long int i = 1; i < k; i++)
    {
      mpz_powm_ui (y, y, 2, n);
      if (mpz_cmp (y, nm1) == 0)
        return true;
      if (mpz_cmp_ui (y, 1) == 0)
        return false;
    }
  return false;
}
#endif

/* Lucas' prime test.  The number of iterations vary greatly, up to a few dozen
   have been observed.  The average seem to be about 2.  */
static bool
prime_p (uintmax_t n)
{
  int k;
  bool is_prime;
  uintmax_t a_prim, one, ni;
  struct factors factors;

  if (n <= 1)
    return false;

  /* We have already casted out small primes. */
  if (n < (uintmax_t) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME)
    return true;

  /* Precomputation for Miller-Rabin.  */
  uintmax_t q = n - 1;
  for (k = 0; (q & 1) == 0; k++)
    q >>= 1;

  uintmax_t a = 2;
  binv (ni, n);                 /* ni <- 1/n mod B */
  redcify (one, 1, n);
  addmod (a_prim, one, one, n); /* i.e., redcify a = 2 */

  /* Perform a Miller-Rabin test, finds most composites quickly.  */
  if (!millerrabin (n, ni, a_prim, q, k, one))
    return false;

  if (flag_prove_primality)
    {
      /* Factor n-1 for Lucas.  */
      factor (0, n - 1, &factors);
    }

  /* Loop until Lucas proves our number prime, or Miller-Rabin proves our
     number composite.  */
  for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++)
    {
      if (flag_prove_primality)
        {
          is_prime = true;
          for (unsigned int i = 0; i < factors.nfactors && is_prime; i++)
            {
              is_prime = powm (a_prim, (n - 1) / factors.p[i], n, ni, one) != one;
            }
        }
      else
        {
          /* After enough Miller-Rabin runs, be content. */
          is_prime = (r == MR_REPS - 1);
        }

      if (is_prime)
        return true;

      a += primes_diff[r];      /* Establish new base.  */

      /* The following is equivalent to redcify (a_prim, a, n).  It runs faster
         on most processors, since it avoids udiv_qrnnd.  If we go down the
         udiv_qrnnd_preinv path, this code should be replaced.  */
      {
        uintmax_t dummy, s1, s0;
        umul_ppmm (s1, s0, one, a);
        if (LIKELY (s1 == 0))
          a_prim = s0 % n;
        else
          udiv_qrnnd (dummy, a_prim, s1, s0, n);
      }

      if (!millerrabin (n, ni, a_prim, q, k, one))
        return false;
    }

  fprintf (stderr, "Lucas prime test failure.  This should not happen\n");
  abort ();
}

static bool
prime2_p (uintmax_t n1, uintmax_t n0)
{
  uintmax_t q[2], nm1[2];
  uintmax_t a_prim[2];
  uintmax_t one[2];
  uintmax_t na[2];
  uintmax_t ni;
  unsigned int k;
  struct factors factors;

  if (n1 == 0)
    return prime_p (n0);

  nm1[1] = n1 - (n0 == 0);
  nm1[0] = n0 - 1;
  if (nm1[0] == 0)
    {
      count_trailing_zeros (k, nm1[1]);

      q[0] = nm1[1] >> k;
      q[1] = 0;
      k += W_TYPE_SIZE;
    }
  else
    {
      count_trailing_zeros (k, nm1[0]);
      rsh2 (q[1], q[0], nm1[1], nm1[0], k);
    }

  uintmax_t a = 2;
  binv (ni, n0);
  redcify2 (one[1], one[0], 1, n1, n0);
  addmod2 (a_prim[1], a_prim[0], one[1], one[0], one[1], one[0], n1, n0);

  /* FIXME: Use scalars or pointers in arguments? Some consistency needed. */
  na[0] = n0;
  na[1] = n1;

  if (!millerrabin2 (na, ni, a_prim, q, k, one))
    return false;

  if (flag_prove_primality)
    {
      /* Factor n-1 for Lucas.  */
      factor (nm1[1], nm1[0], &factors);
    }

  /* Loop until Lucas proves our number prime, or Miller-Rabin proves our
     number composite.  */
  for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++)
    {
      bool is_prime;
      uintmax_t e[2], y[2];

      if (flag_prove_primality)
        {
          is_prime = true;
          if (factors.plarge[1])
            {
              uintmax_t pi;
              binv (pi, factors.plarge[0]);
              e[0] = pi * nm1[0];
              e[1] = 0;
              y[0] = powm2 (&y[1], a_prim, e, na, ni, one);
              is_prime = (y[0] != one[0] || y[1] != one[1]);
            }
          for (unsigned int i = 0; i < factors.nfactors && is_prime; i++)
            {
              /* FIXME: We always have the factor 2. Do we really need to handle it
                 here? We have done the same powering as part of millerrabin. */
              if (factors.p[i] == 2)
                rsh2 (e[1], e[0], nm1[1], nm1[0], 1);
              else
                divexact_21 (e[1], e[0], nm1[1], nm1[0], factors.p[i]);
              y[0] = powm2 (&y[1], a_prim, e, na, ni, one);
              is_prime = (y[0] != one[0] || y[1] != one[1]);
            }
        }
      else
        {
          /* After enough Miller-Rabin runs, be content. */
          is_prime = (r == MR_REPS - 1);
        }

      if (is_prime)
        return true;

      a += primes_diff[r];      /* Establish new base.  */
      redcify2 (a_prim[1], a_prim[0], a, n1, n0);

      if (!millerrabin2 (na, ni, a_prim, q, k, one))
        return false;
    }

  fprintf (stderr, "Lucas prime test failure.  This should not happen\n");
  abort ();
}

#if HAVE_GMP
static bool
mp_prime_p (mpz_t n)
{
  bool is_prime;
  mpz_t q, a, nm1, tmp;
  struct mp_factors factors;

  if (mpz_cmp_ui (n, 1) <= 0)
    return false;

  /* We have already casted out small primes. */
  if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) < 0)
    return true;

  mpz_inits (q, a, nm1, tmp, NULL);

  /* Precomputation for Miller-Rabin.  */
  mpz_sub_ui (nm1, n, 1);

  /* Find q and k, where q is odd and n = 1 + 2**k * q.  */
  unsigned long int k = mpz_scan1 (nm1, 0);
  mpz_tdiv_q_2exp (q, nm1, k);

  mpz_set_ui (a, 2);

  /* Perform a Miller-Rabin test, finds most composites quickly.  */
  if (!mp_millerrabin (n, nm1, a, tmp, q, k))
    {
      is_prime = false;
      goto ret2;
    }

  if (flag_prove_primality)
    {
      /* Factor n-1 for Lucas.  */
      mpz_set (tmp, nm1);
      mp_factor (tmp, &factors);
    }

  /* Loop until Lucas proves our number prime, or Miller-Rabin proves our
     number composite.  */
  for (unsigned int r = 0; r < PRIMES_PTAB_ENTRIES; r++)
    {
      if (flag_prove_primality)
        {
          is_prime = true;
          for (unsigned long int i = 0; i < factors.nfactors && is_prime; i++)
            {
              mpz_divexact (tmp, nm1, factors.p[i]);
              mpz_powm (tmp, a, tmp, n);
              is_prime = mpz_cmp_ui (tmp, 1) != 0;
            }
        }
      else
        {
          /* After enough Miller-Rabin runs, be content. */
          is_prime = (r == MR_REPS - 1);
        }

      if (is_prime)
        goto ret1;

      mpz_add_ui (a, a, primes_diff[r]);        /* Establish new base.  */

      if (!mp_millerrabin (n, nm1, a, tmp, q, k))
        {
          is_prime = false;
          goto ret1;
        }
    }

  fprintf (stderr, "Lucas prime test failure.  This should not happen\n");
  abort ();

 ret1:
  if (flag_prove_primality)
    mp_factor_clear (&factors);
 ret2:
  mpz_clears (q, a, nm1, tmp, NULL);

  return is_prime;
}
#endif

static void
factor_using_pollard_rho (uintmax_t n, unsigned long int a,
                          struct factors *factors)
{
  uintmax_t x, z, y, P, t, ni, g;

  unsigned long int k = 1;
  unsigned long int l = 1;

  redcify (P, 1, n);
  addmod (x, P, P, n);          /* i.e., redcify(2) */
  y = z = x;

  while (n != 1)
    {
      assert (a < n);

      binv (ni, n);             /* FIXME: when could we use old 'ni' value? */

      for (;;)
        {
          do
            {
              x = mulredc (x, x, n, ni);
              addmod (x, x, a, n);

              submod (t, z, x, n);
              P = mulredc (P, t, n, ni);

              if (k % 32 == 1)
                {
                  if (gcd_odd (P, n) != 1)
                    goto factor_found;
                  y = x;
                }
            }
          while (--k != 0);

          z = x;
          k = l;
          l = 2 * l;
          for (unsigned long int i = 0; i < k; i++)
            {
              x = mulredc (x, x, n, ni);
              addmod (x, x, a, n);
            }
          y = x;
        }

    factor_found:
      do
        {
          y = mulredc (y, y, n, ni);
          addmod (y, y, a, n);

          submod (t, z, y, n);
          g = gcd_odd (t, n);
        }
      while (g == 1);

      n = n / g;

      if (!prime_p (g))
        factor_using_pollard_rho (g, a + 1, factors);
      else
        factor_insert (factors, g);

      if (prime_p (n))
        {
          factor_insert (factors, n);
          break;
        }

      x = x % n;
      z = z % n;
      y = y % n;
    }
}

static void
factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long int a,
                           struct factors *factors)
{
  uintmax_t x1, x0, z1, z0, y1, y0, P1, P0, t1, t0, ni, g1, g0, r1m;

  unsigned long int k = 1;
  unsigned long int l = 1;

  redcify2 (P1, P0, 1, n1, n0);
  addmod2 (x1, x0, P1, P0, P1, P0, n1, n0); /* i.e., redcify(2) */
  y1 = z1 = x1;
  y0 = z0 = x0;

  while (n1 != 0 || n0 != 1)
    {
      binv (ni, n0);

      for (;;)
        {
          do
            {
              x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni);
              x1 = r1m;
              addmod2 (x1, x0, x1, x0, 0, a, n1, n0);

              submod2 (t1, t0, z1, z0, x1, x0, n1, n0);
              P0 = mulredc2 (&r1m, P1, P0, t1, t0, n1, n0, ni);
              P1 = r1m;

              if (k % 32 == 1)
                {
                  g0 = gcd2_odd (&g1, P1, P0, n1, n0);
                  if (g1 != 0 || g0 != 1)
                    goto factor_found;
                  y1 = x1; y0 = x0;
                }
            }
          while (--k != 0);

          z1 = x1; z0 = x0;
          k = l;
          l = 2 * l;
          for (unsigned long int i = 0; i < k; i++)
            {
              x0 = mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni);
              x1 = r1m;
              addmod2 (x1, x0, x1, x0, 0, a, n1, n0);
            }
          y1 = x1; y0 = x0;
        }

    factor_found:
      do
        {
          y0 = mulredc2 (&r1m, y1, y0, y1, y0, n1, n0, ni);
          y1 = r1m;
          addmod2 (y1, y0, y1, y0, 0, a, n1, n0);

          submod2 (t1, t0, z1, z0, y1, y0, n1, n0);
          g0 = gcd2_odd (&g1, t1, t0, n1, n0);
        }
      while (g1 == 0 && g0 == 1);

      if (g1 == 0)
        {
          /* The found factor is one word. */
          divexact_21 (n1, n0, n1, n0, g0);     /* n = n / g */

          if (!prime_p (g0))
            factor_using_pollard_rho (g0, a + 1, factors);
          else
            factor_insert (factors, g0);
        }
      else
        {
          /* The found factor is two words.  This is highly unlikely, thus hard
             to trigger.  Please be careful before you change this code!  */
          uintmax_t ginv;

          binv (ginv, g0);      /* Compute n = n / g.  Since the result will */
          n0 = ginv * n0;       /* fit one word, we can compute the quotient */
          n1 = 0;               /* modulo B, ignoring the high divisor word. */

          if (!prime2_p (g1, g0))
            factor_using_pollard_rho2 (g1, g0, a + 1, factors);
          else
            factor_insert_large (factors, g1, g0);
        }

      if (n1 == 0)
        {
          if (prime_p (n0))
            {
              factor_insert (factors, n0);
              break;
            }

          factor_using_pollard_rho (n0, a, factors);
          return;
        }

      if (prime2_p (n1, n0))
        {
          factor_insert_large (factors, n1, n0);
          break;
        }

      x0 = mod2 (&x1, x1, x0, n1, n0);
      z0 = mod2 (&z1, z1, z0, n1, n0);
      y0 = mod2 (&y1, y1, y0, n1, n0);
    }
}

#if HAVE_GMP
static void
mp_factor_using_pollard_rho (mpz_t n, unsigned long int a,
                             struct mp_factors *factors)
{
  mpz_t x, z, y, P;
  mpz_t t, t2;
  unsigned long long int k, l;

  if (flag_verbose > 0)
    {
      printf ("[pollard-rho (%lu)] ", a);
    }

  mpz_inits (t, t2, NULL);
  mpz_init_set_si (y, 2);
  mpz_init_set_si (x, 2);
  mpz_init_set_si (z, 2);
  mpz_init_set_ui (P, 1);
  k = 1;
  l = 1;

  while (mpz_cmp_ui (n, 1) != 0)
    {
      for (;;)
        {
          do
            {
              mpz_mul (t, x, x);
              mpz_mod (x, t, n);
              mpz_add_ui (x, x, a);

              mpz_sub (t, z, x);
              mpz_mul (t2, P, t);
              mpz_mod (P, t2, n);

              if (k % 32 == 1)
                {
                  mpz_gcd (t, P, n);
                  if (mpz_cmp_ui (t, 1) != 0)
                    goto factor_found;
                  mpz_set (y, x);
                }
            }
          while (--k != 0);

          mpz_set (z, x);
          k = l;
          l = 2 * l;
          for (unsigned long long int i = 0; i < k; i++)
            {
              mpz_mul (t, x, x);
              mpz_mod (x, t, n);
              mpz_add_ui (x, x, a);
            }
          mpz_set (y, x);
        }

    factor_found:
      do
        {
          mpz_mul (t, y, y);
          mpz_mod (y, t, n);
          mpz_add_ui (y, y, a);

          mpz_sub (t, z, y);
          mpz_gcd (t, t, n);
        }
      while (mpz_cmp_ui (t, 1) == 0);

      mpz_divexact (n, n, t);   /* divide by t, before t is overwritten */

      if (!mp_prime_p (t))
        {
          if (flag_verbose > 0)
            {
              printf ("[composite factor--restarting pollard-rho] ");
            }
          mp_factor_using_pollard_rho (t, a + 1, factors);
        }
      else
        {
          mp_factor_insert (factors, t);
        }

      if (mp_prime_p (n))
        {
          mp_factor_insert (factors, n);
          break;
        }

      mpz_mod (x, x, n);
      mpz_mod (z, z, n);
      mpz_mod (y, y, n);
    }

  mpz_clears (P, t2, t, z, x, y, NULL);
}
#endif

/* FIXME: Maybe better to use an iteration converging to 1/sqrt(n)?  If
   algorithm is replaced, consider also returning the remainder. */
static uintmax_t _GL_ATTRIBUTE_CONST
isqrt (uintmax_t n)
{
  uintmax_t x;
  unsigned c;
  if (n == 0)
    return 0;

  count_leading_zeros (c, n);

  /* Make x > sqrt(n). This will be invariant through the loop. */
  x = (uintmax_t) 1 << ((W_TYPE_SIZE + 1 - c) / 2);

  for (;;)
    {
      uintmax_t y = (x + n/x) / 2;
      if (y >= x)
        return x;

      x = y;
    }
}

static uintmax_t _GL_ATTRIBUTE_CONST
isqrt2 (uintmax_t nh, uintmax_t nl)
{
  unsigned int shift;
  uintmax_t x;

  /* Ensures the remainder fits in an uintmax_t. */
  assert (nh < ((uintmax_t) 1 << (W_TYPE_SIZE - 2)));

  if (nh == 0)
    return isqrt (nl);

  count_leading_zeros (shift, nh);
  shift &= ~1;

  /* Make x > sqrt(n) */
  x = isqrt ( (nh << shift) + (nl >> (W_TYPE_SIZE - shift))) + 1;
  x <<= (W_TYPE_SIZE - shift) / 2;

  /* Do we need more than one iteration? */
  for (;;)
    {
      uintmax_t q, r, y;
      udiv_qrnnd (q, r, nh, nl, x);
      y = (x + q) / 2;

      if (y >= x)
        {
          uintmax_t hi, lo;
          umul_ppmm (hi, lo, x + 1, x + 1);
          assert (gt2 (hi, lo, nh, nl));

          umul_ppmm (hi, lo, x, x);
          assert (ge2 (nh, nl, hi, lo));
          sub_ddmmss (hi, lo, nh, nl, hi, lo);
          assert (hi == 0);

          return x;
        }

      x = y;
    }
}

/* MAGIC[N] has a bit i set iff i is a quadratic residue mod N. */
#define MAGIC64 ((uint64_t) 0x0202021202030213ULL)
#define MAGIC63 ((uint64_t) 0x0402483012450293ULL)
#define MAGIC65 ((uint64_t) 0x218a019866014613ULL)
#define MAGIC11 0x23b

/* Returns the square root if the input is a square, otherwise 0. */
static uintmax_t _GL_ATTRIBUTE_CONST
is_square (uintmax_t x)
{
  /* Uses the tests suggested by Cohen. Excludes 99% of the non-squares before
     computing the square root. */
  if (((MAGIC64 >> (x & 63)) & 1)
      && ((MAGIC63 >> (x % 63)) & 1)
      /* Both 0 and 64 are squares mod (65) */
      && ((MAGIC65 >> ((x % 65) & 63)) & 1)
      && ((MAGIC11 >> (x % 11) & 1)))
    {
      uintmax_t r = isqrt (x);
      if (r*r == x)
        return r;
    }
  return 0;
}

static const unsigned short invtab[] =
  {
    0x1fc, 0x1f8, 0x1f4, 0x1f0, 0x1ec, 0x1e9, 0x1e5, 0x1e1,
    0x1de, 0x1da, 0x1d7, 0x1d4, 0x1d0, 0x1cd, 0x1ca, 0x1c7,
    0x1c3, 0x1c0, 0x1bd, 0x1ba, 0x1b7, 0x1b4, 0x1b2, 0x1af,
    0x1ac, 0x1a9, 0x1a6, 0x1a4, 0x1a1, 0x19e, 0x19c, 0x199,
    0x197, 0x194, 0x192, 0x18f, 0x18d, 0x18a, 0x188, 0x186,
    0x183, 0x181, 0x17f, 0x17d, 0x17a, 0x178, 0x176, 0x174,
    0x172, 0x170, 0x16e, 0x16c, 0x16a, 0x168, 0x166, 0x164,
    0x162, 0x160, 0x15e, 0x15c, 0x15a, 0x158, 0x157, 0x155,
    0x153, 0x151, 0x150, 0x14e, 0x14c, 0x14a, 0x149, 0x147,
    0x146, 0x144, 0x142, 0x141, 0x13f, 0x13e, 0x13c, 0x13b,
    0x139, 0x138, 0x136, 0x135, 0x133, 0x132, 0x130, 0x12f,
    0x12e, 0x12c, 0x12b, 0x129, 0x128, 0x127, 0x125, 0x124,
    0x123, 0x121, 0x120, 0x11f, 0x11e, 0x11c, 0x11b, 0x11a,
    0x119, 0x118, 0x116, 0x115, 0x114, 0x113, 0x112, 0x111,
    0x10f, 0x10e, 0x10d, 0x10c, 0x10b, 0x10a, 0x109, 0x108,
    0x107, 0x106, 0x105, 0x104, 0x103, 0x102, 0x101, 0x100,
  };

/* Compute q = [u/d], r = u mod d.  Avoids slow hardware division for the case
   that q < 0x40; here it instead uses a table of (Euclidian) inverses.  */
#define div_smallq(q, r, u, d)                                          \
  do {                                                                  \
    if (0 && (u) / 0x40 < (d))                                          \
      {                                                                 \
        int _cnt;                                                       \
        uintmax_t _dinv, _mask, _q, _r;                                 \
        count_leading_zeros (_cnt, (d));                                \
                                                                        \
        _dinv = invtab[((d) >> (W_TYPE_SIZE - 8 - _cnt))                \
                       - (1 << (8 - 1))];                               \
                                                                        \
        _r = (u);                                                       \
        _q = _r * _dinv >> (W_TYPE_SIZE + 8 - _cnt);                    \
        _r -= _q*(d);                                                   \
                                                                        \
        _mask = -(uintmax_t) (_r >= (d));                               \
        (r) = _r - (_mask & (d));                                       \
        (q) = _q - _mask;                                               \
        assert ( (q) * (d) + (r) == u);                                 \
      }                                                                 \
    else                                                                \
      {                                                                 \
        uintmax_t _q = (u) / (d);                                       \
        (r) = (u) - _q * (d);                                           \
        (q) = _q;                                                       \
      }                                                                 \
  } while (0)

/* Notes: Example N = 22117019. After first phase we find Q1 = 6314, Q
   = 3025, P = 1737, representing F_{18} = (-6314, 2* 1737, 3025),
   with 3025 = 55^2.

   Constructing the square root, we get Q1 = 55, Q = 8653, P = 4652,
   representing G_0 = (-55, 2*4652, 8653).

   In the notation of the paper:

   S_{-1} = 55, S_0 = 8653, R_0 = 4652

   Put

     t_0 = floor([q_0 + R_0] / S0) = 1
     R_1 = t_0 * S_0 - R_0 = 4001
     S_1 = S_{-1} +t_0 (R_0 - R_1) = 706
*/

/* Multipliers, in order of efficiency:
   0.7268  3*5*7*11 = 1155 = 3 (mod 4)
   0.7317  3*5*7    =  105 = 1
   0.7820  3*5*11   =  165 = 1
   0.7872  3*5      =   15 = 3
   0.8101  3*7*11   =  231 = 3
   0.8155  3*7      =   21 = 1
   0.8284  5*7*11   =  385 = 1
   0.8339  5*7      =   35 = 3
   0.8716  3*11     =   33 = 1
   0.8774  3        =    3 = 3
   0.8913  5*11     =   55 = 3
   0.8972  5        =    5 = 1
   0.9233  7*11     =   77 = 1
   0.9295  7        =    7 = 3
   0.9934  11       =   11 = 3
*/
#define QUEUE_SIZE 50

#if STAT_SQUFOF
# define Q_FREQ_SIZE 50
/* Element 0 keeps the total */
static unsigned int q_freq[Q_FREQ_SIZE + 1];
# define MIN(a,b) ((a) < (b) ? (a) : (b))
#endif


static void
factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors)
{
  /* Uses algorithm and notation from

     SQUARE FORM FACTORIZATION
     JASON E. GOWER AND SAMUEL S. WAGSTAFF, JR.

     http://homes.cerias.purdue.edu/~ssw/squfof.pdf
   */

  static const unsigned int multipliers_1[] =
    { /* = 1 (mod 4) */
      105, 165, 21, 385, 33, 5, 77, 1, 0
    };
  static const unsigned int multipliers_3[] =
    { /* = 3 (mod 4) */
      1155, 15, 231, 35, 3, 55, 7, 11, 0
    };

  uintmax_t S;
  const unsigned int *m;

  struct { uintmax_t Q; uintmax_t P; } queue[QUEUE_SIZE];

  S = isqrt2 (n1, n0);

  if (n0 == S * S)
    {
      uintmax_t p1, p0;

      umul_ppmm (p1, p0, S, S);
      assert (p0 == n0);

      if (n1 == p1)
        {
          if (prime_p (S))
            factor_insert_multiplicity (factors, S, 2);
          else
            {
              struct factors f;

              f.nfactors = 0;
              factor_using_squfof (0, S, &f);
              /* Duplicate the new factors */
              for (unsigned int i = 0; i < f.nfactors; i++)
                factor_insert_multiplicity (factors, f.p[i], 2*f.e[i]);
            }
          return;
        }
    }

  /* Select multipliers so we always get n * mu = 3 (mod 4) */
  for (m = (n0 % 4 == 1) ? multipliers_3 : multipliers_1;
       *m; m++)
    {
      uintmax_t S, Dh, Dl, Q1, Q, P, L, L1, B;
      unsigned int i;
      unsigned int mu = *m;
      unsigned int qpos = 0;

      assert (mu * n0 % 4 == 3);

      /* In the notation of the paper, with mu * n == 3 (mod 4), we
         get \Delta = 4 mu * n, and the paper's \mu is 2 mu. As far as
         I understand it, the necessary bound is 4 \mu^3 < n, or 32
         mu^3 < n.

         However, this seems insufficient: With n = 37243139 and mu =
         105, we get a trivial factor, from the square 38809 = 197^2,
         without any corresponding Q earlier in the iteration.

         Requiring 64 mu^3 < n seems sufficient. */
      if (n1 == 0)
        {
          if ((uintmax_t) mu*mu*mu >= n0 / 64)
            continue;
        }
      else
        {
          if (n1 > ((uintmax_t) 1 << (W_TYPE_SIZE - 2)) / mu)
            continue;
        }
      umul_ppmm (Dh, Dl, n0, mu);
      Dh += n1 * mu;

      assert (Dl % 4 != 1);

      S = isqrt2 (Dh, Dl);

      Q1 = 1;
      P = S;

      /* Square root remainder fits in one word, so ignore high part. */
      Q = Dl - P*P;
      /* FIXME: When can this differ from floor(sqrt(2 sqrt(D)))? */
      L = isqrt (2*S);
      B = 2*L;
      L1 = mu * 2 * L;

      /* The form is (+/- Q1, 2P, -/+ Q), of discriminant 4 (P^2 + Q Q1) =
         4 D. */

      for (i = 0; i <= B; i++)
        {
          uintmax_t q, P1, t, r;

          div_smallq (q, r, S+P, Q);
          P1 = S - r;   /* P1 = q*Q - P */

#if STAT_SQUFOF
          assert (q > 0);
          q_freq[0]++;
          q_freq[MIN(q, Q_FREQ_SIZE)]++;
#endif

          if (Q <= L1)
            {
              uintmax_t g = Q;

              if ( (Q & 1) == 0)
                g /= 2;

              g /= gcd_odd (g, mu);

              if (g <= L)
                {
                  if (qpos >= QUEUE_SIZE)
                    {
                      fprintf (stderr, "squfof queue overflow.\n");
                      exit (EXIT_FAILURE);
                    }
                  queue[qpos].Q = g;
                  queue[qpos].P = P % g;
                  qpos++;
                }
            }

          /* I think the difference can be either sign, but mod
             2^W_TYPE_SIZE arithmetic should be fine. */
          t = Q1 + q * (P - P1);
          Q1 = Q;
          Q = t;
          P = P1;

          if ( (i & 1) == 0)
            {
              uintmax_t r = is_square (Q);
              if (r)
                {
                  unsigned int j;

                  for (j = 0; j < qpos; j++)
                    {
                      if (queue[j].Q == r)
                        {
                          if (r == 1)
                            /* Traversed entire cycle. */
                            goto next_multiplier;

                          /* Need the absolute value for divisibility test. */
                          if (P >= queue[j].P)
                            t = P - queue[j].P;
                          else
                            t = queue[j].P - P;
                          if (t % r == 0)
                            {
                              /* Delete entries up to and including entry
                                 j, which matched. */
                              memmove (queue, queue + j + 1,
                                       (qpos - j - 1) * sizeof (queue[0]));
                              qpos -= (j + 1);
                            }
                          goto next_i;
                        }
                    }

                  /* We have found a square form, which should give a
                     factor. */
                  Q1 = r;
                  assert (S >= P); /* What signs are possible? */
                  P += r * ((S - P) / r);

                  /* Note: Paper says (N - P*P) / Q1, that seems incorrect
                     for the case D = 2N. */
                  /* Compute Q = (D - P*P) / Q1, but we need double
                     precision. */
                  {
                    uintmax_t hi, lo, rem;
                    umul_ppmm (hi, lo, P, P);
                    sub_ddmmss (hi, lo, Dh, Dl, hi, lo);
                    udiv_qrnnd (Q, rem, hi, lo, Q1);
                    assert (rem == 0);
                  }

                  for (;;)
                    {
                      uintmax_t r;

                      /* Note: There appears to by a typo in the paper,
                         Step 4a in the algorithm description says q <--
                         floor([S+P]/\hat Q), but looking at the equations
                         in Sec. 3.1, it should be q <-- floor([S+P] / Q).
                         (In this code, \hat Q is Q1). */
                      div_smallq (q, r, S+P, Q);
                      P1 = S - r;       /* P1 = q*Q - P */

#if STAT_SQUFOF
                      q_freq[0]++;
                      q_freq[MIN(q, Q_FREQ_SIZE)]++;
#endif
                      if (P == P1)
                        break;
                      t = Q1 + q * (P - P1);
                      Q1 = Q;
                      Q = t;
                      P = P1;
                    }

                  if ( (Q & 1) == 0)
                    Q /= 2;
                  Q /= gcd_odd (Q, mu);

                  assert (Q > 1 && (n1 || Q < n0));

                  if (prime_p (Q))
                    factor_insert (factors, Q);
                  else
                    factor_using_squfof (0, Q, factors);

                  divexact_21 (n1, n0, n1, n0, Q);

                  if (prime2_p (n1, n0))
                    factor_insert_large (factors, n1, n0);
                  else
                    {
                      if (n1 == 0)
                        factor_using_pollard_rho (n0, 1, factors);
                      else
                        factor_using_squfof (n1, n0, factors);
                    }

                  return;
                }
            }
        next_i:
          ;
        }
    next_multiplier:
      ;
    }
  fprintf (stderr, "squfof failed.\n");
  exit (EXIT_FAILURE);
}

static void
factor (uintmax_t t1, uintmax_t t0, struct factors *factors)
{
  factors->nfactors = 0;
  factors->plarge[1] = 0;

  if (t1 == 0 && t0 < 2)
    return;

  t0 = factor_using_division (&t1, t1, t0, factors);

  if (t1 == 0)
    {
      if (t0 != 1)
        {
          if (prime_p (t0))
            factor_insert (factors, t0);
          else if (alg == ALG_POLLARD_RHO)
            factor_using_pollard_rho (t0, 1, factors);
          else
            factor_using_squfof (0, t0, factors);
        }
    }
  else
    {
      if (prime2_p (t1, t0))
        factor_insert_large (factors, t1, t0);
      else if (alg == ALG_POLLARD_RHO)
        factor_using_pollard_rho2 (t1, t0, 1, factors);
      else
        factor_using_squfof (t1, t0, factors);
    }
}

#if HAVE_GMP
static void
mp_factor (mpz_t t, struct mp_factors *factors)
{
  mp_factor_init (factors);

  if (mpz_sgn (t) != 0)
    {
      mp_factor_using_division (t, factors);

      if (mpz_cmp_ui (t, 1) != 0)
        {
          if (flag_verbose > 0)
            {
              printf ("[is number prime?] ");
            }
          if (mp_prime_p (t))
            mp_factor_insert (factors, t);
          else
            mp_factor_using_pollard_rho (t, 1, factors);
        }
    }
}
#endif

static int
strto2uintmax (uintmax_t *hip, uintmax_t *lop, const char *s)
{
  int errcode;
  unsigned int lo_carry;
  uintmax_t hi, lo;

  errcode = -1;

  hi = lo = 0;
  for (;;)
    {
      unsigned int c = *s++;
      if (c == 0)
        break;

      if (UNLIKELY (c < '0' || c > '9'))
        {
          errcode = -1;
          break;
        }
      c -= '0';

      errcode = 0;              /* we've seen at least one valid digit */

      if (UNLIKELY (hi > ~(uintmax_t)0 / 10))
        {
          errcode = -1; /* overflow */
          break;
        }
      hi = 10 * hi;

      lo_carry = (lo >> (W_TYPE_SIZE - 3)) + (lo >> (W_TYPE_SIZE - 1));
      lo_carry += 10 * lo < 2 * lo;

      lo = 10 * lo;
      lo += c;

      lo_carry += lo < c;
      hi += lo_carry;
      if (UNLIKELY (hi < lo_carry))
        {
          errcode = -1; /* overflow */
          break;
        }
    }

  *hip = hi;
  *lop = lo;

  return errcode;
}

static void
print_uintmaxes (uintmax_t t1, uintmax_t t0)
{
  uintmax_t q, r;

  if (t1 == 0)
    printf ("%ju", t0);
  else
    {
      /* Use very plain code here since it seems hard to write fast code
         without assuming a specific word size.  */
      q = t1 / 1000000000;
      r = t1 % 1000000000;
      udiv_qrnnd (t0, r, r, t0, 1000000000);
      print_uintmaxes (q, t0);
      printf ("%09u", (int) r);
    }
}

static void
factor_one (const char *input)
{
  uintmax_t t1, t0;
  int errcode;

  /* Try converting the number to one or two words.  If it fails, use GMP or
     print an error message.  The 2nd condition checks that the most
     significant bit of the two-word number is clear, in a typesize neutral
     way.  */
  errcode = strto2uintmax (&t1, &t0, input);
  if (errcode == 0 && ((t1 << 1) >> 1) == t1)
    {
      struct factors factors;

      print_uintmaxes (t1, t0);
      printf (":");

      factor (t1, t0, &factors);

      for (unsigned int j = 0; j < factors.nfactors; j++)
        for (unsigned int k = 0; k < factors.e[j]; k++)
          printf (" %ju", factors.p[j]);

      if (factors.plarge[1])
        {
          printf (" ");
          print_uintmaxes (factors.plarge[1], factors.plarge[0]);
        }
      puts ("");
    }
  else
    {
#if HAVE_GMP
      mpz_t t;
      struct mp_factors factors;

      mpz_init_set_str (t, input, 10);

      gmp_printf ("%Zd:", t);
      mp_factor (t, &factors);

      for (unsigned int j = 0; j < factors.nfactors; j++)
        for (unsigned int k = 0; k < factors.e[j]; k++)
          gmp_printf (" %Zd", factors.p[j]);

      mp_factor_clear (&factors);
      mpz_clear (t);
      puts ("");
#else
      fprintf (stderr, "error: number %s not parsable or too large\n", input);
      exit (EXIT_FAILURE);
#endif
    }
}

struct inbuf
{
  char *buf;
  size_t alloc;
};

/* Read white-space delimited items. Return 1 on success, 0 on EOF.
   Exit on I/O errors. */
int
read_item (struct inbuf *bufstruct)
{
  int c;
  size_t i;
  char *buf = bufstruct->buf;

  do
    c = getchar_unlocked ();
  while (isspace (c));

  for (i = 0; !isspace(c); i++)
    {
      if (c < 0)
        {
          if (ferror (stdin))
            {
              fprintf (stderr, "read error on stdin: %s\n", strerror(errno));
              exit (EXIT_FAILURE);
            }
          if (i == 0)
            return 0;
          else
            break;
        }

      if (UNLIKELY (bufstruct->alloc <= i + 1)) /* +1 byte for terminating NUL */
        {
          bufstruct->alloc = bufstruct->alloc * 5 / 4 + 1;
          bufstruct->buf = realloc (bufstruct->buf, bufstruct->alloc);
          buf = bufstruct->buf;
        }

      buf[i] = c;
      c = getchar_unlocked ();
    }

  buf[i] = '\0';
  return 1;
}

int
main (int argc, char *argv[])
{
  int c;

  alg = ALG_POLLARD_RHO;        /* Default to Pollard rho */

  while ( (c = getopt(argc, argv, "svw")) != -1)
    switch (c)
      {
      case 's':
        alg = ALG_SQUFOF;
        break;
      case 'v':
        flag_verbose = 1;
        break;
      case 'w':
        flag_prove_primality = 0;
        break;
      case '?':
        printf ("Usage: %s [-s] number ...\n", argv[0]);
        return EXIT_FAILURE;
      }
#if STAT_SQUFOF
  if (alg == ALG_SQUFOF)
    memset (q_freq, 0, sizeof (q_freq));
#endif

  if (optind < argc)
    for (int i = optind; i < argc; i++)
      factor_one (argv[i]);
  else
    {
      struct inbuf bufstruct;
      bufstruct.alloc = 50;     /* enough unless HAVE_GMP */
      bufstruct.buf = malloc (bufstruct.alloc);
      while (read_item (&bufstruct))
        factor_one (bufstruct.buf);
    }

#if STAT_SQUFOF
  if (alg == ALG_SQUFOF && q_freq[0] > 0)
    {
      double acc_f;
      printf ("q  freq.  cum. freq.(total: %d)\n", q_freq[0]);
      for (unsigned int i = 1, acc_f = 0.0; i <= Q_FREQ_SIZE; i++)
        {
          double f = (double) q_freq[i] / q_freq[0];
          acc_f += f;
          printf ("%s%d %.2f%% %.2f%%\n", i == Q_FREQ_SIZE ? ">=" : "", i,
                  100.0 * f, 100.0 * acc_f);
        }
    }
#endif

  exit (EXIT_SUCCESS);
}

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 17 Sep 2012 09:27:01 GMT) Full text and rfc822 format available.

Message #149 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 17 Sep 2012 11:25:16 +0200

Jim Meyering <jim <at> meyering.net> writes:

  At least one other change may be interesting to you:
  another scope-reduction one:
  
      - inline uintmax_t
      + static inline uintmax_t
        mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)
  
How about going all the way, making everything (but main) static?

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 17 Sep 2012 09:33:01 GMT) Full text and rfc822 format available.

Message #152 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 17 Sep 2012 11:31:19 +0200

Torbjorn Granlund wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>
>   At least one other change may be interesting to you:
>   another scope-reduction one:
>
>       - inline uintmax_t
>       + static inline uintmax_t
>         mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)
>
> How about going all the way, making everything (but main) static?

That was the intent.
One of the earlier patches I sent converted all of the others.
I had missed mulredc initially because I'd searched manually for
names annotated with a " T " in the output of nm.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 17 Sep 2012 16:06:02 GMT) Full text and rfc822 format available.

Message #155 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 17 Sep 2012 18:04:35 +0200

Jim Meyering <jim <at> meyering.net> writes:

  Torbjorn Granlund wrote:
  > Jim Meyering <jim <at> meyering.net> writes:
  >
  >   At least one other change may be interesting to you:
  >   another scope-reduction one:
  >
  >       - inline uintmax_t
  >       + static inline uintmax_t
  >         mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi)
  >
  > How about going all the way, making everything (but main) static?
  
Please consider these additional changes:

*** .~/cu-factor.c.~1~	Mon Sep 17 11:26:31 2012
--- cu-factor.c	Mon Sep 17 15:45:05 2012
***************
*** 134,139 ****
--- 134,140 ----
  #endif
  #define LONGLONG_STANDALONE     /* Don't require GMP's longlong.h mdep files */
  #define ASSERT(x)               /* FIXME make longlong.h really standalone */
+ #define __GMP_DECLSPEC          /* FIXME make longlong.h really standalone */
  #define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */
  #ifndef __GMP_GNUC_PREREQ
  #define __GMP_GNUC_PREREQ(a,b) 1
***************
*** 143,149 ****
  # endif
  # include "longlong.h"
  # ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
! const unsigned char factor_clz_tab[129] =
  {
    1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
--- 144,150 ----
  #endif
  #include "longlong.h"
  #ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
! static const unsigned char factor_clz_tab[129] =
  {
    1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
    7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
***************
*** 1555,1561 ****
  {
    mpz_t x, z, y, P;
    mpz_t t, t2;
-   unsigned long long int k, l;
  
    if (flag_verbose > 0)
      {
--- 1556,1561 ----
***************
*** 1567,1574 ****
    mpz_init_set_si (x, 2);
    mpz_init_set_si (z, 2);
    mpz_init_set_ui (P, 1);
!   k = 1;
!   l = 1;
  
    while (mpz_cmp_ui (n, 1) != 0)
      {
--- 1567,1575 ----
    mpz_init_set_si (x, 2);
    mpz_init_set_si (z, 2);
    mpz_init_set_ui (P, 1);
! 
!   unsigned long long int k = 1;
!   unsigned long long int l = 1;
  
    while (mpz_cmp_ui (n, 1) != 0)
      {
***************
*** 2287,2293 ****
  
  /* Read white-space delimited items. Return 1 on success, 0 on EOF.
     Exit on I/O errors. */
! int
  read_item (struct inbuf *bufstruct)
  {
    int c;
--- 2288,2294 ----
  
  /* Read white-space delimited items. Return 1 on success, 0 on EOF.
     Exit on I/O errors. */
! static int
  read_item (struct inbuf *bufstruct)
  {
    int c;
  

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 17 Sep 2012 17:38:01 GMT) Full text and rfc822 format available.

Message #158 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 17 Sep 2012 19:36:18 +0200

Jim Meyering <jim <at> meyering.net> writes:

  Would you mind changing the names of a few variables
  or adjusting declarations to avoid some -Wshadow warnings?
  
  I changed the innermost "r" to "rem" locally, but there are
  others.  Also, "S".
  
The two remaining instances are in squfof.

Niels, please address those.  You'll find them by passing -Wshadow to
gcc.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 17 Sep 2012 20:02:01 GMT) Full text and rfc822 format available.

Message #161 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 17 Sep 2012 21:59:33 +0200

Torbjorn Granlund wrote:
...
> Please consider these additional changes:
>
> *** .~/cu-factor.c.~1~	Mon Sep 17 11:26:31 2012
> --- cu-factor.c	Mon Sep 17 15:45:05 2012
> ***************
> *** 134,139 ****
> --- 134,140 ----
>   #endif
>   #define LONGLONG_STANDALONE     /* Don't require GMP's longlong.h mdep files */
>   #define ASSERT(x)               /* FIXME make longlong.h really standalone */
> + #define __GMP_DECLSPEC          /* FIXME make longlong.h really standalone */

Oh, that must be used in an #else branch that I'm not exercising.
It triggers a "symbol defined but never used" warning.
Thanks.

>   #define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */

BTW, why does __clz_tab use the "__" prefix in the first place?

>   #ifndef __GMP_GNUC_PREREQ
>   #define __GMP_GNUC_PREREQ(a,b) 1
> ***************
> *** 143,149 ****
>   # endif
>   # include "longlong.h"
>   # ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
> ! const unsigned char factor_clz_tab[129] =
>   {
>     1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
>     7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
> --- 144,150 ----
>   #endif
>   #include "longlong.h"
>   #ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
> ! static const unsigned char factor_clz_tab[129] =

Thanks.  Done.

>   {
>     1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
>     7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
> ***************
> *** 1555,1561 ****
>   {
>     mpz_t x, z, y, P;
>     mpz_t t, t2;
> -   unsigned long long int k, l;
>
>     if (flag_verbose > 0)
>       {
> --- 1556,1561 ----
> ***************
> *** 1567,1574 ****
>     mpz_init_set_si (x, 2);
>     mpz_init_set_si (z, 2);
>     mpz_init_set_ui (P, 1);
> !   k = 1;
> !   l = 1;
>
>     while (mpz_cmp_ui (n, 1) != 0)
>       {
> --- 1567,1575 ----
>     mpz_init_set_si (x, 2);
>     mpz_init_set_si (z, 2);
>     mpz_init_set_ui (P, 1);
> !
> !   unsigned long long int k = 1;
> !   unsigned long long int l = 1;

Likewise.

>     while (mpz_cmp_ui (n, 1) != 0)
>       {
> ***************
> *** 2287,2293 ****
>
>   /* Read white-space delimited items. Return 1 on success, 0 on EOF.
>      Exit on I/O errors. */
> ! int
>   read_item (struct inbuf *bufstruct)
>   {
>     int c;
> --- 2288,2294 ----
>
>   /* Read white-space delimited items. Return 1 on success, 0 on EOF.
>      Exit on I/O errors. */
> ! static int

I didn't bother with this one, because in coreutils
I've just switched back to using readtokens to read input.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 17 Sep 2012 20:28:02 GMT) Full text and rfc822 format available.

Message #164 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 17 Sep 2012 22:26:27 +0200

  >   #endif
  >   #define LONGLONG_STANDALONE     /* Don't require GMP's longlong.h mdep files */
  >   #define ASSERT(x)               /* FIXME make longlong.h really standalone */
  > + #define __GMP_DECLSPEC          /* FIXME make longlong.h really standalone */
  
  Oh, that must be used in an #else branch that I'm not exercising.
  It triggers a "symbol defined but never used" warning.
  Thanks.
  
__GMP_DECLSPEC is used from longlong.h when count_leading_zeros ask for
__clz_tab.

  >   #define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */
  
  BTW, why does __clz_tab use the "__" prefix in the first place?
  
It makes sense in glibc.  In GMP we prepend something like __gmp_.  Here
in a non-library we prepend factor_.  I'd like to keep the __ for src
compatibility.

  >   # endif
  >   # include "longlong.h"
  >   # ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
  > ! const unsigned char factor_clz_tab[129] =
  >   {
  >     1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
  >     7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
  > --- 144,150 ----
  >   #endif
  >   #include "longlong.h"
  >   #ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
  > ! static const unsigned char factor_clz_tab[129] =
  
  Thanks.  Done.
  
Actually, that patch is wrong.  :-(

In longlong.h, we declare it without static.  This causes a type clash.

I suggest that we kep clz_tab non-static.  Do you have any other
suggestions?

  >     while (mpz_cmp_ui (n, 1) != 0)
  >       {
  > ***************
  > *** 2287,2293 ****
  >
  >   /* Read white-space delimited items. Return 1 on success, 0 on EOF.
  >      Exit on I/O errors. */
  > ! int
  >   read_item (struct inbuf *bufstruct)
  >   {
  >     int c;
  > --- 2288,2294 ----
  >
  >   /* Read white-space delimited items. Return 1 on success, 0 on EOF.
  >      Exit on I/O errors. */
  > ! static int
  
  I didn't bother with this one, because in coreutils
  I've just switched back to using readtokens to read input.
  
Ok.  Hopefully, that does not cause too much slowdown for large ranges.
With this fast factoring code, input is a significant part of the time!

I have a variant of input the code, which replaces all the input code
with a single function which freads 4 KiB blocks and plays with
sentinels.  Here is its non-GMP part.  I left this code out for a 20%
performance penalty.

void
read_stdin_and_factor ()
{
  uintmax_t n1, n0;
  unsigned int carry;
  unsigned int c, tc;
  char *bp, *be;
  int valid_char_seen;
  size_t nread;
  enum { BUFSIZE = 4096 };
  char buf[BUFSIZE + 1];

  buf[0] = '\377';			/* sentinel */
  bp = buf;
  be = buf;

 restart:
  n0 = 0;
  valid_char_seen = 0;

#ifndef OPTIMISE_SINGLE_WORD
#define OPTIMISE_SINGLE_WORD
#endif

#if OPTIMISE_SINGLE_WORD
  /* Loop while we have one word */
  while (LIKELY (n0 < (~(uintmax_t) 0 - 9) / 10))
    {
      c = (unsigned char) *bp++;
      tc = c - '0';

      if (UNLIKELY (tc >= 10))
	{
	  if (bp > be)		/* did we hit the sentinel? */
	    {
	      nread = fread (buf, 1, BUFSIZE, stdin);
	      if (nread == 0)
		{
		  if (valid_char_seen)
		    {
		      factor_and_report (0, n0);
		    }
		  return;
		}

	      buf[nread] = '\377';	/* sentinel */
	      bp = buf;
	      be = buf + nread;
	      continue;
	    }
	  else
	    {
	      if (valid_char_seen)
		{
		  factor_and_report (0, n0);
		  valid_char_seen = 0;
		}
	      if (! isspace (c))
		{
		  fprintf (stderr, "`%c' is not a valid positive integer\n", c);
		}
	    }
	  goto restart;
	}

      /* got valid digit */
      n0 = 10 * n0 + tc;
      valid_char_seen = 1;
    }
#endif

  n1 = 0;

  /* Loop while we have two words */
  for (;;)
    {
      c = (unsigned char) *bp++;
      tc = c - '0';

      if (UNLIKELY (tc >= 10))
	{
	  if (bp > be)		/* did we hit the sentinel? */
	    {
	      nread = fread (buf, 1, BUFSIZE, stdin);
	      if (nread == 0)
		{
		  if (valid_char_seen)
		    {
		      factor_and_report (n1, n0);
		    }
		  return;
		}

	      buf[nread] = '\377';	/* sentinel */
	      bp = buf;
	      be = buf + nread;
	      continue;
	    }
	  else
	    {
	      if (valid_char_seen)
		{
		  factor_and_report (n1, n0);
		  valid_char_seen = 0;
		}
	      if (! isspace (c))
		{
		  fprintf (stderr, "`%c' is not a valid positive integer\n", c);
		}
	    }
	  goto restart;
	}

      /* got valid digit */
      carry = (n0 >> (W_TYPE_SIZE - 3)) + (n0 >> (W_TYPE_SIZE - 1));
      carry += 10 * n0 < 2 * n0;

      if (UNLIKELY (n1 > ~(uintmax_t)0 / 2 / 10))
	break;		/* overflow, error unless we have GMP */

      n1 = 10 * n1;
      n0 = 10 * n0 + tc;
      carry += n0 < tc;
      n1 += carry;

      if (UNLIKELY (((n1 << 1) >> 1) != n1))
	break;		/* overflow, error unless we have GMP */

      valid_char_seen = 1;
    }
#if HAVE_GMP
  // FIXME: keep going, non-trivial, since any buffer might be insufficient
  // * Could balk and do O(n^2) algo instead of calling GMP's inp function
  // * Or we could let the buffer size determine, grab chunks from it, feed
  //   such chunks to GMP functions, multiply by suitable powers.  Good, tricky.
  // * Or skip this nifty function altogether when HAVE_GMP?  Bad!
  abort ();
#else
  fprintf (stderr, "number is too large\n");
  // FIXME: perhaps skip over rest of number
#endif
}


-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 18 Sep 2012 14:36:01 GMT) Full text and rfc822 format available.

Message #167 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 18 Sep 2012 16:33:53 +0200

Niels and I made more changes, including some needed to silence
-Wshadow.

We also re-enabled the div_smallq code after a bug fix, allowed squfof
to fail with recovery, and simplified the function factor.

I suppose you'll need to apply these manually, because of the TAB/SPC
problem.

I don't think we anticipate more changes to this code anytime soon.

diff -r 3024e91e4b82 factor.c
--- a/factor.c	Mon Sep 17 19:43:40 2012 +0200
+++ b/factor.c	Tue Sep 18 16:28:09 2012 +0200
@@ -140,7 +140,7 @@
 #endif
 #include "longlong.h"
 #ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB
-static const unsigned char factor_clz_tab[129] =
+const unsigned char factor_clz_tab[129] =
 {
   1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,
@@ -479,8 +479,7 @@
 #define factor_insert(f, p) factor_insert_multiplicity(f, p, 1)
 
 static void
-factor_insert_large (struct factors *factors,
-		     uintmax_t p1, uintmax_t p0)
+factor_insert_large (struct factors *factors, uintmax_t p1, uintmax_t p0)
 {
   if (p1 > 0)
     {
@@ -1739,8 +1738,10 @@
   return 0;
 }
 
-static const unsigned short invtab[] =
+/* invtab[i] = floor(0x10000 / (0x100 + i) */
+static const unsigned short invtab[0x81] =
   {
+    0x200,
     0x1fc, 0x1f8, 0x1f4, 0x1f0, 0x1ec, 0x1e9, 0x1e5, 0x1e1,
     0x1de, 0x1da, 0x1d7, 0x1d4, 0x1d0, 0x1cd, 0x1ca, 0x1c7,
     0x1c3, 0x1c0, 0x1bd, 0x1ba, 0x1b7, 0x1b4, 0x1b2, 0x1af,
@@ -1763,17 +1764,22 @@
    that q < 0x40; here it instead uses a table of (Euclidian) inverses.  */
 #define div_smallq(q, r, u, d)						\
   do {									\
-    if (0 && (u) / 0x40 < (d))						\
+    if ((u) / 0x40 < (d))						\
       {									\
 	int _cnt;							\
 	uintmax_t _dinv, _mask, _q, _r;					\
 	count_leading_zeros (_cnt, (d));				\
-									\
-	_dinv = invtab[((d) >> (W_TYPE_SIZE - 8 - _cnt))		\
-		       - (1 << (8 - 1))];				\
-									\
 	_r = (u);							\
-	_q = _r * _dinv >> (W_TYPE_SIZE + 8 - _cnt);			\
+	if (UNLIKELY (_cnt > (W_TYPE_SIZE - 8)))			\
+	  {								\
+	    _dinv = invtab[((d) << (_cnt + 8 - W_TYPE_SIZE)) - 0x80];	\
+	    _q = _dinv * _r >> (8 + W_TYPE_SIZE - _cnt);		\
+	  }								\
+	else								\
+	  {								\
+	    _dinv = invtab[((d) >> (W_TYPE_SIZE - 8 - _cnt)) - 0x7f];	\
+	    _q = _dinv * (_r >> (W_TYPE_SIZE - 3 - _cnt)) >> 11;	\
+	  }								\
 	_r -= _q*(d);							\
 									\
 	_mask = -(uintmax_t) (_r >= (d));				\
@@ -1833,8 +1839,9 @@
 #define MIN(a,b) ((a) < (b) ? (a) : (b))
 #endif
 
-
-static void
+/* Return true on success. Expected to fail only for numbers >=
+   2^{2*W_TYPE_SIZE - 2}, or close to that limit. */
+static bool
 factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors)
 {
   /* Uses algorithm and notation from
@@ -1854,35 +1861,41 @@
       1155, 15, 231, 35, 3, 55, 7, 11, 0
     };
 
-  uintmax_t S;
   const unsigned int *m;
 
   struct { uintmax_t Q; uintmax_t P; } queue[QUEUE_SIZE];
 
-  S = isqrt2 (n1, n0);
+  if (n1 >= ((uintmax_t) 1 << (W_TYPE_SIZE - 2)))
+    return false;
 
-  if (n0 == S * S)
+  uintmax_t sqrt_n = isqrt2 (n1, n0);
+
+  if (n0 == sqrt_n * sqrt_n)
     {
       uintmax_t p1, p0;
 
-      umul_ppmm (p1, p0, S, S);
+      umul_ppmm (p1, p0, sqrt_n, sqrt_n);
       assert (p0 == n0);
 
       if (n1 == p1)
 	{
-	  if (prime_p (S))
-	    factor_insert_multiplicity (factors, S, 2);
+	  if (prime_p (sqrt_n))
+	    factor_insert_multiplicity (factors, sqrt_n, 2);
 	  else
 	    {
 	      struct factors f;
 
 	      f.nfactors = 0;
-	      factor_using_squfof (0, S, &f);
+	      if (!factor_using_squfof (0, sqrt_n, &f))
+		{
+		  /* Try pollard rho instead */
+		  factor_using_pollard_rho (sqrt_n, 1, &f);
+		}
 	      /* Duplicate the new factors */
 	      for (unsigned int i = 0; i < f.nfactors; i++)
 		factor_insert_multiplicity (factors, f.p[i], 2*f.e[i]);
 	    }
-	  return;
+	  return true;
 	}
     }
 
@@ -1921,6 +1934,7 @@
       Dh += n1 * mu;
 
       assert (Dl % 4 != 1);
+      assert (Dh < (uintmax_t) 1 << (W_TYPE_SIZE - 2));
 
       S = isqrt2 (Dh, Dl);
 
@@ -1939,10 +1953,10 @@
 
       for (i = 0; i <= B; i++)
 	{
-	  uintmax_t q, P1, t, r;
+	  uintmax_t q, P1, t, rem;
 
-	  div_smallq (q, r, S+P, Q);
-	  P1 = S - r;	/* P1 = q*Q - P */
+	  div_smallq (q, rem, S+P, Q);
+	  P1 = S - rem;	/* P1 = q*Q - P */
 
 #if STAT_SQUFOF
 	  assert (q > 0);
@@ -2021,25 +2035,21 @@
 		     for the case D = 2N. */
 		  /* Compute Q = (D - P*P) / Q1, but we need double
 		     precision. */
-		  {
-		    uintmax_t hi, lo, rem;
+		  uintmax_t hi, lo;
 		    umul_ppmm (hi, lo, P, P);
 		    sub_ddmmss (hi, lo, Dh, Dl, hi, lo);
 		    udiv_qrnnd (Q, rem, hi, lo, Q1);
 		    assert (rem == 0);
-		  }
 
 		  for (;;)
 		    {
-		      uintmax_t r;
-
 		      /* Note: There appears to by a typo in the paper,
 			 Step 4a in the algorithm description says q <--
 			 floor([S+P]/\hat Q), but looking at the equations
 			 in Sec. 3.1, it should be q <-- floor([S+P] / Q).
 			 (In this code, \hat Q is Q1). */
-		      div_smallq (q, r, S+P, Q);
-		      P1 = S - r;	/* P1 = q*Q - P */
+		      div_smallq (q, rem, S+P, Q);
+		      P1 = S - rem;	/* P1 = q*Q - P */
 
 #if STAT_SQUFOF
 		      q_freq[0]++;
@@ -2061,8 +2071,8 @@
 
 		  if (prime_p (Q))
 		    factor_insert (factors, Q);
-		  else
-		    factor_using_squfof (0, Q, factors);
+		  else if (!factor_using_squfof (0, Q, factors))
+		    factor_using_pollard_rho (Q, 2, factors);
 
 		  divexact_21 (n1, n0, n1, n0, Q);
 
@@ -2070,13 +2080,16 @@
 		    factor_insert_large (factors, n1, n0);
 		  else
 		    {
+		      if (!factor_using_squfof (n1, n0, factors))
+			{
 		      if (n1 == 0)
 			factor_using_pollard_rho (n0, 1, factors);
 		      else
-			factor_using_squfof (n1, n0, factors);
+			    factor_using_pollard_rho2 (n1, n0, 1, factors);
+			}
 		    }
 
-		  return;
+		  return true;
 		}
 	    }
 	next_i:
@@ -2085,8 +2098,7 @@
     next_multiplier:
       ;
     }
-  fprintf (stderr, "squfof failed.\n");
-  exit (EXIT_FAILURE);
+  return false;
 }
 
 static void
@@ -2100,26 +2112,21 @@
 
   t0 = factor_using_division (&t1, t1, t0, factors);
 
+  if (t1 == 0 && t0 < 2)
+    return;
+
+  if (prime2_p (t1, t0))
+    factor_insert_large (factors, t1, t0);
+  else
+    {
+      if (alg == ALG_SQUFOF)
+	if (factor_using_squfof (t1, t0, factors))
+	  return;
+
   if (t1 == 0)
-    {
-      if (t0 != 1)
-	{
-	  if (prime_p (t0))
-	    factor_insert (factors, t0);
-	  else if (alg == ALG_POLLARD_RHO)
 	    factor_using_pollard_rho (t0, 1, factors);
 	  else
-	    factor_using_squfof (0, t0, factors);
-	}
-    }
-  else
-    {
-      if (prime2_p (t1, t0))
-	factor_insert_large (factors, t1, t0);
-      else if (alg == ALG_POLLARD_RHO)
 	factor_using_pollard_rho2 (t1, t0, 1, factors);
-      else
-	factor_using_squfof (t1, t0, factors);
     }
 }
 


-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 18 Sep 2012 17:03:01 GMT) Full text and rfc822 format available.

Message #170 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>,  12350 <at> debbugs.gnu.org,
	nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 18 Sep 2012 19:01:09 +0200

Torbjorn Granlund <tg <at> gmplib.org> writes:

  Niels and I made more changes, including some needed to silence
  -Wshadow.

Change log entries can be fund in:

http://gmplib.org:8000/factoring/file/61b4eac24ea4/ChangeLog

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 18 Sep 2012 18:23:01 GMT) Full text and rfc822 format available.

Message #173 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 18 Sep 2012 20:21:07 +0200

Torbjorn Granlund wrote:

> Torbjorn Granlund <tg <at> gmplib.org> writes:
>
>   Niels and I made more changes, including some needed to silence
>   -Wshadow.
>
> Change log entries can be fund in:
>
> http://gmplib.org:8000/factoring/file/61b4eac24ea4/ChangeLog

Thanks.
I've nearly finished integrating it, but have not yet
added the options that enable new functionality.
I will post soon.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 23 Sep 2012 20:59:02 GMT) Full text and rfc822 format available.

Message #176 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 23 Sep 2012 22:56:26 +0200

[Message part 1 (text/plain, inline)]

Torbjorn Granlund wrote:

> Torbjorn Granlund <tg <at> gmplib.org> writes:
>
>   Niels and I made more changes, including some needed to silence
>   -Wshadow.
>
> Change log entries can be fund in:
>
> http://gmplib.org:8000/factoring/file/61b4eac24ea4/ChangeLog

Thanks.  Here's how I've integrated it so far.
This is not ready to push, but rather just to give you an idea
of what's coming.  I'm sure I'll have to adjust things before pushing.

[k (application/octet-stream, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 30 Sep 2012 14:52:01 GMT) Full text and rfc822 format available.

Message #179 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 30 Sep 2012 16:51:20 +0200

[Message part 1 (text/plain, inline)]

Jim Meyering wrote:
> Torbjorn Granlund wrote:
>
>> Torbjorn Granlund <tg <at> gmplib.org> writes:
>>
>>   Niels and I made more changes, including some needed to silence
>>   -Wshadow.
>>
>> Change log entries can be fund in:
>>
>> http://gmplib.org:8000/factoring/file/61b4eac24ea4/ChangeLog
>
> Thanks.  Here's how I've integrated it so far.
> This is not ready to push, but rather just to give you an idea
> of what's coming.  I'm sure I'll have to adjust things before pushing.

There have been a few corrections, and I've fleshed out some log entries.

The following series is ready:

  [PATCH 01/13] build: remove redundant dependency: $(PROGRAMS):
  [PATCH 02/13] factor: prepare for the new factor program
  [PATCH 03/13] factor: new implementation; not yet integrated
  [PATCH 04/13] build: add rules to build the new factor program
  [PATCH 05/13] factor: improvements from
  [PATCH 06/13] factor: merge with preexisting factor, integrate
  [PATCH 07/13] maint: use __builtin_expect only if __GNUC__
  [PATCH 08/13] maint: syntax-check: add make-prime-list exemptions
  [PATCH 09/13] factor: 25% speed-up, on output
  [PATCH 10/13] build: avoid warning about unused macro
  [PATCH 11/13] maint: mark set-but-not-used variables with
  [PATCH 12/13] maint: avoid -Wsuggest-attribute=const warning
  [PATCH 13/13] maint: make-prime-list: do not ignore write failure

Torbjorn and Niels,

I'd be happy to include more fine-grained changes to factor.c
if you can convert the http://gmplib.org:8000/factoring commits
and ChangeLog deltas to git commits where the ChangeLog delta
appears in the commit log and passes coreutils' commit-log-checking hook.
But that may be more trouble than it's worth.

The only missing piece is a NEWS entry.
Would one of you please write that?

[factor-ng.xz (application/octet-stream, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 30 Sep 2012 15:14:02 GMT) Full text and rfc822 format available.

Message #182 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 30 Sep 2012 17:13:21 +0200

Jim Meyering <jim <at> meyering.net> writes:

  > Thanks.  Here's how I've integrated it so far.
  > This is not ready to push, but rather just to give you an idea
  > of what's coming.  I'm sure I'll have to adjust things before pushing.

  There have been a few corrections, and I've fleshed out some log entries.

  The following series is ready:

    [PATCH 01/13] build: remove redundant dependency: $(PROGRAMS):
    [PATCH 02/13] factor: prepare for the new factor program
    [PATCH 03/13] factor: new implementation; not yet integrated
    [PATCH 04/13] build: add rules to build the new factor program
    [PATCH 05/13] factor: improvements from
    [PATCH 06/13] factor: merge with preexisting factor, integrate
    [PATCH 07/13] maint: use __builtin_expect only if __GNUC__
    [PATCH 08/13] maint: syntax-check: add make-prime-list exemptions
    [PATCH 09/13] factor: 25% speed-up, on output
    [PATCH 10/13] build: avoid warning about unused macro
    [PATCH 11/13] maint: mark set-but-not-used variables with
    [PATCH 12/13] maint: avoid -Wsuggest-attribute=const warning
    [PATCH 13/13] maint: make-prime-list: do not ignore write failure

  Torbjorn and Niels,

  I'd be happy to include more fine-grained changes to factor.c
  if you can convert the http://gmplib.org:8000/factoring commits
  and ChangeLog deltas to git commits where the ChangeLog delta
  appears in the commit log and passes coreutils' commit-log-checking hook.
  But that may be more trouble than it's worth.

I think those change logs are not super relevant for the coreutils
ChangeLog.  "factor.c: Complete rewrite" seem sufficient to me...

Both Niels and I mailed the paperwork to the FSF a week or two ago.
Have you heard from them?  Snail mail tend to disappear.

  The only missing piece is a NEWS entry.
  Would one of you please write that?

Sure.  Do you have an example of an old one?  Here is a start:

  The 'factor' program has been completely rewritten for speed and to
  add range.  It can now always factor numbers up to 2^128, even without
  GMP support.  Its speed is from a few times better (for small numbers)
  to over 10,000 times better (just below 2^64).  The new program also
  runs a proper prime criterion on found factors, not a probabilistic
  test.

As you might have spotted from our repo, I and Nisse Möller are working
on a small Quadratic Sieve ("QS") factorer, for which we have two goals:

(1) offer it as a HAVE_GMP dependent addition to GNU factor
(2) make a more complex variant intended to be state-of-the-art

QS is one of the most powerful factoring algorithms yet discovered.
With our implementation, we will be able to factor even the most
stubborn 128-bit composites within seconds, but with enough patience
numbers of upp to 300 bits are within reach.

The code is not very large, it will make 'factor' about 30% larger.

It should factor any 128-bit numbers in around 1 second.  Any 30 bits
extra take about 10 times more time.

I don't think these new developments should hold up a commit of our old
factor.c developments.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 30 Sep 2012 15:34:01 GMT) Full text and rfc822 format available.

Message #185 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 30 Sep 2012 17:33:33 +0200

Torbjorn Granlund wrote:

> Jim Meyering <jim <at> meyering.net> writes:
>
>   > Thanks.  Here's how I've integrated it so far.
>   > This is not ready to push, but rather just to give you an idea
>   > of what's coming.  I'm sure I'll have to adjust things before pushing.
>
>   There have been a few corrections, and I've fleshed out some log entries.
>
>   The following series is ready:
>
>     [PATCH 01/13] build: remove redundant dependency: $(PROGRAMS):
>     [PATCH 02/13] factor: prepare for the new factor program
>     [PATCH 03/13] factor: new implementation; not yet integrated
>     [PATCH 04/13] build: add rules to build the new factor program
>     [PATCH 05/13] factor: improvements from
>     [PATCH 06/13] factor: merge with preexisting factor, integrate
>     [PATCH 07/13] maint: use __builtin_expect only if __GNUC__
>     [PATCH 08/13] maint: syntax-check: add make-prime-list exemptions
>     [PATCH 09/13] factor: 25% speed-up, on output
>     [PATCH 10/13] build: avoid warning about unused macro
>     [PATCH 11/13] maint: mark set-but-not-used variables with
>     [PATCH 12/13] maint: avoid -Wsuggest-attribute=const warning
>     [PATCH 13/13] maint: make-prime-list: do not ignore write failure
>
>   Torbjorn and Niels,
>
>   I'd be happy to include more fine-grained changes to factor.c
>   if you can convert the http://gmplib.org:8000/factoring commits
>   and ChangeLog deltas to git commits where the ChangeLog delta
>   appears in the commit log and passes coreutils' commit-log-checking hook.
>   But that may be more trouble than it's worth.
>
> I think those change logs are not super relevant for the coreutils
> ChangeLog.  "factor.c: Complete rewrite" seem sufficient to me...
>
> Both Niels and I mailed the paperwork to the FSF a week or two ago.
> Have you heard from them?  Snail mail tend to disappear.

Not yet.  I've just asked them.

>   The only missing piece is a NEWS entry.
>   Would one of you please write that?
>
> Sure.  Do you have an example of an old one?  Here is a start:

Here are a few years worth of NEWS entries:

  http://git.sv.gnu.org/cgit/coreutils.git/tree/NEWS

That looks fine.  Thanks.

>   The 'factor' program has been completely rewritten for speed and to
>   add range.  It can now always factor numbers up to 2^128, even without
>   GMP support.  Its speed is from a few times better (for small numbers)
>   to over 10,000 times better (just below 2^64).  The new program also
>   runs a proper prime criterion on found factors, not a probabilistic
>   test.

How about this in place of the final sentence?

                                                    The new program also
    runs a deterministic primality test for each prime factor, not just
    a probabilistic test.

> As you might have spotted from our repo, I and Nisse Möller are working
> on a small Quadratic Sieve ("QS") factorer, for which we have two goals:
>
> (1) offer it as a HAVE_GMP dependent addition to GNU factor
> (2) make a more complex variant intended to be state-of-the-art
>
> QS is one of the most powerful factoring algorithms yet discovered.
> With our implementation, we will be able to factor even the most
> stubborn 128-bit composites within seconds, but with enough patience
> numbers of upp to 300 bits are within reach.
>
> The code is not very large, it will make 'factor' about 30% larger.

That sort of code size increase sounds perfectly
reasonable considering the algorithmic improvements.

> It should factor any 128-bit numbers in around 1 second.  Any 30 bits
> extra take about 10 times more time.

That sounds great.

> I don't think these new developments should hold up a commit of our old
> factor.c developments.

I agree.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Sun, 30 Sep 2012 15:40:02 GMT) Full text and rfc822 format available.

Message #188 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 30 Sep 2012 17:39:31 +0200

Jim Meyering <jim <at> meyering.net> writes:

  How about this in place of the final sentence?
  
                                                      The new program also
      runs a deterministic primality test for each prime factor, not just
      a probabilistic test.
  
That's better, thanks.

-- 
Torbjörn

Reply sent to Jim Meyering <meyering <at> hx.meyering.net>:
You have taken responsibility. (Sun, 07 Oct 2012 09:02:01 GMT) Full text and rfc822 format available.

Notification sent to Torbjorn Granlund <tg <at> gmplib.org>:
bug acknowledged by developer. (Sun, 07 Oct 2012 09:02:02 GMT) Full text and rfc822 format available.

Message #193 received at 12350-done <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <meyering <at> hx.meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350-done <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Sun, 07 Oct 2012 11:00:55 +0200

Torbjorn Granlund wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>
>   How about this in place of the final sentence?
>   
>                                                       The new program also
>       runs a deterministic primality test for each prime factor, not just
>       a probabilistic test.
>   
> That's better, thanks.

I pushed the actual bug fix (for the issue mentioned in the Subject)
long ago, so I'm closing this "issue".

Regarding your upcoming improvements, please start a new thread
when you're ready to discuss them, so that your comments are not
lost in the volume of with this now-"done" bug report.

Thanks again,
Jim

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 10:36:02 GMT) Full text and rfc822 format available.

Message #196 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 11:34:56 +0100

On 09/04/2012 10:55 PM, Torbjorn Granlund wrote:
> Pádraig Brady <P <at> draigBrady.com> writes:
>
>    Sure. I was just quantifying the performance change,
>    for others who may be referencing or noticing patches.
>    (Actually, I'd add a note to the commit message that,
>    this increases calculations by about 25%).
>
> And surely mode for certain cases. We spend 25/3 or about 8 times more
> effort in Miller Rabin.
>
>    > As I mentioned in the original post, we will replace the current code
>    > with code that is many times faster.  Your example above will run at
>    > less than a minute on your system.
>
>    I'd left my test files in place in anticipation ;)
>
> Please do, and let me and Niels know if it takes more than 45s.  Your
> test case takes 28s on my 3.3 GHz Sandy bridge system with our current
> code.  I'm a little disappointed the code doesn't beat the old code more
> for small factorisations.

So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.

old broken code = 14m
old fixed code  = 18m
new code        = 63s

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 10:39:01 GMT) Full text and rfc822 format available.

Message #199 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 12350 <at> debbugs.gnu.org, meyering <at> meyering.net
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 11:37:45 +0100

On 10/07/2012 10:00 AM, Jim Meyering wrote:
> Torbjorn Granlund wrote:
>> Jim Meyering <jim <at> meyering.net> writes:
>>
>>    How about this in place of the final sentence?
>>
>>                                                        The new program also
>>        runs a deterministic primality test for each prime factor, not just
>>        a probabilistic test.
>>
>> That's better, thanks.
>
> I pushed the actual bug fix (for the issue mentioned in the Subject)
> long ago, so I'm closing this "issue".
>
> Regarding your upcoming improvements, please start a new thread
> when you're ready to discuss them, so that your comments are not
> lost in the volume of with this now-"done" bug report.

A small amendment I'm going to push is not to rely on GMP5.
GMP4 on my fedora 15 system doesn't have mpz_inits().
i.e. support for initializing multiple variables at once.

Patch is simple enough...

diff --git a/src/factor.c b/src/factor.c
index 5bfbfdc..1857297 100644
--- a/src/factor.c
+++ b/src/factor.c
@@ -1335,7 +1335,10 @@ mp_prime_p (mpz_t n)
   if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) < 0)
     return true;

-  mpz_inits (q, a, nm1, tmp, NULL);
+  mpz_init (q);
+  mpz_init (a);
+  mpz_init (nm1);
+  mpz_init (tmp);

   /* Precomputation for Miller-Rabin.  */
   mpz_sub_ui (nm1, n, 1);
@@ -1399,7 +1402,10 @@ mp_prime_p (mpz_t n)
   if (flag_prove_primality)
     mp_factor_clear (&factors);
  ret2:
-  mpz_clears (q, a, nm1, tmp, NULL);
+  mpz_clear (q);
+  mpz_clear (a);
+  mpz_clear (nm1);
+  mpz_clear (tmp);

   return is_prime;
 }
@@ -1608,7 +1614,8 @@ mp_factor_using_pollard_rho (mpz_t n, unsigned long int a,

   debug ("[pollard-rho (%lu)] ", a);

-  mpz_inits (t, t2, NULL);
+  mpz_init (t);
+  mpz_init (t2);
   mpz_init_set_si (y, 2);
   mpz_init_set_si (x, 2);
   mpz_init_set_si (z, 2);
@@ -1688,7 +1695,12 @@ mp_factor_using_pollard_rho (mpz_t n, unsigned long int a,
       mpz_mod (y, y, n);
     }

-  mpz_clears (P, t2, t, z, x, y, NULL);
+  mpz_clear (P);
+  mpz_clear (t2);
+  mpz_clear (t);
+  mpz_clear (z);
+  mpz_clear (x);
+  mpz_clear (y);
 }
 #endif

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 10:53:02 GMT) Full text and rfc822 format available.

Message #202 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 12:52:08 +0200

Pádraig Brady wrote:
> On 10/07/2012 10:00 AM, Jim Meyering wrote:
>> Torbjorn Granlund wrote:
>>> Jim Meyering <jim <at> meyering.net> writes:
>>>
>>>    How about this in place of the final sentence?
>>>
>>>                                                        The new program also
>>>        runs a deterministic primality test for each prime factor, not just
>>>        a probabilistic test.
>>>
>>> That's better, thanks.
>>
>> I pushed the actual bug fix (for the issue mentioned in the Subject)
>> long ago, so I'm closing this "issue".
>>
>> Regarding your upcoming improvements, please start a new thread
>> when you're ready to discuss them, so that your comments are not
>> lost in the volume of with this now-"done" bug report.
>
> A small amendment I'm going to push is not to rely on GMP5.
> GMP4 on my fedora 15 system doesn't have mpz_inits().
> i.e. support for initializing multiple variables at once.
>
> Patch is simple enough...

Hi Pádraig,

Thanks, but wouldn't that be a slight "pessimization"?
What do you think about providing the missing function instead?
Maybe not worth the hassle, but still, it would avoid adding those 12
in-function lines.  factor.c is already large and complex enough that
every little bit helps.

BTW, Fedora 15 passed "end of life" back in June.

> diff --git a/src/factor.c b/src/factor.c
> index 5bfbfdc..1857297 100644
> --- a/src/factor.c
> +++ b/src/factor.c
> @@ -1335,7 +1335,10 @@ mp_prime_p (mpz_t n)
>    if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) < 0)
>      return true;
>
> -  mpz_inits (q, a, nm1, tmp, NULL);
> +  mpz_init (q);
> +  mpz_init (a);
> +  mpz_init (nm1);
> +  mpz_init (tmp);
...

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 12:14:01 GMT) Full text and rfc822 format available.

Message #205 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se, jim <at> meyering.net
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 14:12:46 +0200

  > Please do, and let me and Niels know if it takes more than 45s.  Your
  > test case takes 28s on my 3.3 GHz Sandy bridge system with our current
  > code.  I'm a little disappointed the code doesn't beat the old code more
  > for small factorisations.
  
  So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.
  
  old broken code = 14m
  old fixed code  = 18m
  new code        = 63s
  
OK, this is about 60% slower than I would have expected.  Our code at
http://gmplib.org:8000/factoring/ should run at about 39s on your
system.  (I am using gcc 4.7.1.)

I haven't looked at the final version that went into codeutils, so I
have no idea why it runs slower.  A wild guess is that its actual input
or tokenisation has been slowed down.  For smallish numbers, such things
can dominate over actually factoring the numbers.

I think the current coreutils factor performance for smallish numbers
might be sufficient.  (Larger numbers than 2^100 need a bit too much
time, but we are working on a fix.)

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 12:36:02 GMT) Full text and rfc822 format available.

Message #208 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 14:34:39 +0200

Torbjorn Granlund wrote:

>   > Please do, and let me and Niels know if it takes more than 45s.  Your
>   > test case takes 28s on my 3.3 GHz Sandy bridge system with our current
>   > code.  I'm a little disappointed the code doesn't beat the old code more
>   > for small factorisations.
>
>   So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.
>
>   old broken code = 14m
>   old fixed code  = 18m
>   new code        = 63s
>
> OK, this is about 60% slower than I would have expected.  Our code at
> http://gmplib.org:8000/factoring/ should run at about 39s on your
> system.  (I am using gcc 4.7.1.)
>
> I haven't looked at the final version that went into codeutils, so I
> have no idea why it runs slower.  A wild guess is that its actual input
> or tokenisation has been slowed down.  For smallish numbers, such things
> can dominate over actually factoring the numbers.
>
> I think the current coreutils factor performance for smallish numbers
> might be sufficient.  (Larger numbers than 2^100 need a bit too much
> time, but we are working on a fix.)

When I compare the factor built from http://gmplib.org:8000/factoring/, and
the one in coreutils.git I see these single-trial times: (on a 3.2GHz i7-970)

  $ seq 452930477 472882027|env time ./factor > k
  29.28user 0.44system 0:30.85elapsed 96%CPU (0avgtext+0avgdata 548maxresident)k
  0inputs+1011088outputs (0major+162minor)pagefaults 0swaps
  ...
  $ seq 452930477 472882027|env time /cu/src/factor > k
  26.47user 0.48system 0:28.07elapsed 96%CPU (0avgtext+0avgdata 692maxresident)k
  0inputs+1011088outputs (0major+205minor)pagefaults 0swaps

I.e., a 9% improvement.  Compiled with bleeding edge gcc 4.8.0 20121007.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 12:51:01 GMT) Full text and rfc822 format available.

Message #211 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Torbjorn Granlund <tg <at> gmplib.org>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 13:49:47 +0100

On 10/08/2012 01:34 PM, Jim Meyering wrote:
> Torbjorn Granlund wrote:
>
>>    > Please do, and let me and Niels know if it takes more than 45s.  Your
>>    > test case takes 28s on my 3.3 GHz Sandy bridge system with our current
>>    > code.  I'm a little disappointed the code doesn't beat the old code more
>>    > for small factorisations.
>>
>>    So on my 2.1GHz i3-2310M, running over the range 452,930,477 to 472,882,027.
>>
>>    old broken code = 14m
>>    old fixed code  = 18m
>>    new code        = 63s
>>
>> OK, this is about 60% slower than I would have expected.  Our code at
>> http://gmplib.org:8000/factoring/ should run at about 39s on your
>> system.  (I am using gcc 4.7.1.)
>>
>> I haven't looked at the final version that went into codeutils, so I
>> have no idea why it runs slower.  A wild guess is that its actual input
>> or tokenisation has been slowed down.  For smallish numbers, such things
>> can dominate over actually factoring the numbers.
>>
>> I think the current coreutils factor performance for smallish numbers
>> might be sufficient.  (Larger numbers than 2^100 need a bit too much
>> time, but we are working on a fix.)
>
> When I compare the factor built from http://gmplib.org:8000/factoring/, and
> the one in coreutils.git I see these single-trial times: (on a 3.2GHz i7-970)
>
>    $ seq 452930477 472882027|env time ./factor > k
>    29.28user 0.44system 0:30.85elapsed 96%CPU (0avgtext+0avgdata 548maxresident)k
>    0inputs+1011088outputs (0major+162minor)pagefaults 0swaps
>    ...
>    $ seq 452930477 472882027|env time /cu/src/factor > k
>    26.47user 0.48system 0:28.07elapsed 96%CPU (0avgtext+0avgdata 692maxresident)k
>    0inputs+1011088outputs (0major+205minor)pagefaults 0swaps
>
> I.e., a 9% improvement.  Compiled with bleeding edge gcc 4.8.0 20121007.

Thanks for checking that.

Torbjorn you seem to be interploating from your
3.3 GHz Sandy bridge to my 2.1GHz i3-2310M Sandy Bridge,
but it might not be linear due to cache, turbo boost,
mem bandwidth, ...

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 13:00:02 GMT) Full text and rfc822 format available.

Message #214 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se, jim <at> meyering.net
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 14:59:11 +0200

Pádraig Brady <P <at> draigBrady.com> writes:

  Thanks for checking that.
  
  Torbjorn you seem to be interploating from your
  3.3 GHz Sandy bridge to my 2.1GHz i3-2310M Sandy Bridge,
  but it might not be linear due to cache, turbo boost,
  mem bandwidth, ...
  
It should be linear in clock frequency, yes.

The factor program's working set (code and data) is tiny.  Things should
fit in L1 caches.  We have a prime table, but it is smaller than L1D (at
about 10 kB).

All Sandy bridges in the world have the same L1 cache sizes.

Mem bandwidth is therefore irrelevamt, and so is higher-level caches.
"Turbo boost" is relevant, but I have that switched off since I am quite
fond of reproducible benchmarking results.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 13:05:02 GMT) Full text and rfc822 format available.

Message #217 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 15:04:22 +0200

Torbjorn Granlund wrote:
> Pádraig Brady <P <at> draigBrady.com> writes:
>
>   Thanks for checking that.
>
>   Torbjorn you seem to be interploating from your
>   3.3 GHz Sandy bridge to my 2.1GHz i3-2310M Sandy Bridge,
>   but it might not be linear due to cache, turbo boost,
>   mem bandwidth, ...
>
> It should be linear in clock frequency, yes.

"It" meaning the core computation.

However, this little command does a lot of I/O, too:
191M input, 77M output.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 13:23:01 GMT) Full text and rfc822 format available.

Message #220 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Torbjorn Granlund <tg <at> gmplib.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se, P <at> draigBrady.com
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 15:21:29 +0200

Jim Meyering <jim <at> meyering.net> writes:

  However, this little command does a lot of I/O, too:
  191M input, 77M output.
  
Sure.  I've never seen significant variance for such stuff, measurable
as CPU time.

I tested with a Sandybridge i3-2120T now.  The range takes 32 s.

In both cases, the systems run GNU/Linux.  The kernel version is 3.2.

The factoring speed varies very much with GCC version.  I particular the
trial division code has a very tight loop, and such loops have more
compiler reliance.  GCC 4.6 an later generate code that executes 4 insns
per (unsuccessful) division.

It is also important to use a 32bit binary.  We should perhaps have
provided better 32-bit code paths to be used for numbers < 2^32 on
32-bit hardware.  Now, Pádraig's example needs about 3x more time for a
32-bit binary on the same hardware.

-- 
Torbjörn

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 14:45:02 GMT) Full text and rfc822 format available.

Message #223 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Torbjorn Granlund <tg <at> gmplib.org>
Cc: 12350 <at> debbugs.gnu.org, nisse <at> lysator.liu.se,
	Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 15:43:41 +0100

On 10/08/2012 02:21 PM, Torbjorn Granlund wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>
>    However, this little command does a lot of I/O, too:
>    191M input, 77M output.
>
> Sure.  I've never seen significant variance for such stuff, measurable
> as CPU time.
>
> I tested with a Sandybridge i3-2120T now.  The range takes 32 s.
>
> In both cases, the systems run GNU/Linux.  The kernel version is 3.2.
>
> The factoring speed varies very much with GCC version.  I particular the
> trial division code has a very tight loop, and such loops have more
> compiler reliance.  GCC 4.6 an later generate code that executes 4 insns
> per (unsuccessful) division.

gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10)
With -march=native -O2

> It is also important to use a 32bit binary.  We should perhaps have

s/to use/to not use/ I presume. Yep I'm on 64 bit.

> provided better 32-bit code paths to be used for numbers < 2^32 on
> 32-bit hardware.  Now, Pádraig's example needs about 3x more time for a
> 32-bit binary on the same hardware.

thanks,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 14:49:01 GMT) Full text and rfc822 format available.

Message #226 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 15:47:44 +0100

[Message part 1 (text/plain, inline)]

On 10/08/2012 11:52 AM, Jim Meyering wrote:
> Pádraig Brady wrote:
>> On 10/07/2012 10:00 AM, Jim Meyering wrote:
>>> Torbjorn Granlund wrote:
>>>> Jim Meyering <jim <at> meyering.net> writes:
>>>>
>>>>     How about this in place of the final sentence?
>>>>
>>>>                                                         The new program also
>>>>         runs a deterministic primality test for each prime factor, not just
>>>>         a probabilistic test.
>>>>
>>>> That's better, thanks.
>>>
>>> I pushed the actual bug fix (for the issue mentioned in the Subject)
>>> long ago, so I'm closing this "issue".
>>>
>>> Regarding your upcoming improvements, please start a new thread
>>> when you're ready to discuss them, so that your comments are not
>>> lost in the volume of with this now-"done" bug report.
>>
>> A small amendment I'm going to push is not to rely on GMP5.
>> GMP4 on my fedora 15 system doesn't have mpz_inits().
>> i.e. support for initializing multiple variables at once.
>>
>> Patch is simple enough...
>
> Hi Pádraig,
>
> Thanks, but wouldn't that be a slight "pessimization"?
> What do you think about providing the missing function instead?
> Maybe not worth the hassle, but still, it would avoid adding those 12
> in-function lines.  factor.c is already large and complex enough that
> every little bit helps.

Borderline, but to align code with newer libs,
I've done as you suggest in the attached.

> BTW, Fedora 15 passed "end of life" back in June.

Right, so it's not too old.
The same issue will affect debian stable, RHEL, ...

cheers,
Pádraig.

[factor-inits.diff (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 14:54:02 GMT) Full text and rfc822 format available.

Message #229 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 16:52:44 +0200

Pádraig Brady wrote:
...
>> Thanks, but wouldn't that be a slight "pessimization"?
>> What do you think about providing the missing function instead?
>> Maybe not worth the hassle, but still, it would avoid adding those 12
>> in-function lines.  factor.c is already large and complex enough that
>> every little bit helps.
>
> Borderline, but to align code with newer libs,
> I've done as you suggest in the attached.
...

Thank you!
That looks great.

> Subject: [PATCH] build: support older GMP versions
>
> The new factor code introduced usage of mpz_inits() and
> mpz_clears(), which are only available since GMP >= 5,
> and will result in a compile error when missing.
>
> * m4/gmp.m4 (cu_GMP): Define HAVE_DECL_MPZ_INITS appropriately.
> * src/factor (mpz_inits): New function, defined where missing.
> (mpz_clears): Likewise.
> ---
>  m4/gmp.m4    |    2 ++
>  src/factor.c |   23 +++++++++++++++++++++++
>  2 files changed, 25 insertions(+), 0 deletions(-)
>
> diff --git a/m4/gmp.m4 b/m4/gmp.m4
> index e337e16..59a664f 100644
> --- a/m4/gmp.m4
> +++ b/m4/gmp.m4
> @@ -30,6 +30,8 @@ AC_DEFUN([cu_GMP],
>          LIB_GMP=$ac_cv_search___gmpz_init
>          AC_DEFINE([HAVE_GMP], [1],
>            [Define if you have GNU libgmp (or replacement)])
> +        # This only available in GMP >= 5

           # This is available only in GMP >= 5

> +        AC_CHECK_DECLS([mpz_inits], [], [], [[#include <gmp.h>]])
>         }],
>        [AC_MSG_WARN([libgmp development library was not found or not usable.])
>         AC_MSG_WARN([AC_PACKAGE_NAME will be built without GMP support.])])

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Mon, 08 Oct 2012 22:41:02 GMT) Full text and rfc822 format available.

Message #232 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 12350 <at> debbugs.gnu.org
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Mon, 08 Oct 2012 23:39:25 +0100

On 10/08/2012 03:47 PM, Pádraig Brady wrote:
> diff --git a/src/factor.c b/src/factor.c
> index 5bfbfdc..843542b 100644
> --- a/src/factor.c
> +++ b/src/factor.c
> @@ -526,6 +526,29 @@ factor_insert_large (struct factors *factors,
>   }
>
>   #if HAVE_GMP
> +
> +# if !HAVE_DECL_MPZ_INITS
> +
> +#  define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
> +#  define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)
> +
> +static void
> +mpz_va_init (void (*mpz_single_init)(mpz_t), mpz_ptr mpz, ...)
> +{
> +  va_list ap;
> +
> +  va_start (ap, mpz);
> +
> +  while (mpz != NULL)
> +    {
> +      mpz_single_init (mpz);
> +      mpz = va_arg (ap, mpz_ptr);
> +    }
> +
> +  va_end (ap);
> +}
> +# endif
> +
>   static void mp_factor (mpz_t, struct mp_factors *);

Actually the above doesn't order the va_arg() call correctly.
Also it uses mpz_ptr which is not kosher it seems:
  http://gmplib.org/list-archives/gmp-discuss/2009-May/003769.html
So I've adjusted to:

#define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
#define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)

static void
mpz_va_init (void (*mpz_single_init)(mpz_t), ...)
{
  va_list ap;

  va_start (ap, mpz_single_init);

  mpz_t *mpz;
  while ((mpz = va_arg (ap, mpz_t *)))
    mpz_single_init (*mpz);

  va_end (ap);
}

cheers,
Pádraig.

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Tue, 09 Oct 2012 12:28:01 GMT) Full text and rfc822 format available.

Message #235 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 12350 <at> debbugs.gnu.org
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Tue, 09 Oct 2012 14:26:55 +0200

Pádraig Brady wrote:
> On 10/08/2012 03:47 PM, Pádraig Brady wrote:
>> diff --git a/src/factor.c b/src/factor.c
>> index 5bfbfdc..843542b 100644
>> --- a/src/factor.c
>> +++ b/src/factor.c
>> @@ -526,6 +526,29 @@ factor_insert_large (struct factors *factors,
>>   }
>>
>>   #if HAVE_GMP
>> +
>> +# if !HAVE_DECL_MPZ_INITS
>> +
>> +#  define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
>> +#  define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)
>> +
>> +static void
>> +mpz_va_init (void (*mpz_single_init)(mpz_t), mpz_ptr mpz, ...)
>> +{
>> +  va_list ap;
>> +
>> +  va_start (ap, mpz);
>> +
>> +  while (mpz != NULL)
>> +    {
>> +      mpz_single_init (mpz);
>> +      mpz = va_arg (ap, mpz_ptr);
>> +    }
>> +
>> +  va_end (ap);
>> +}
>> +# endif
>> +
>>   static void mp_factor (mpz_t, struct mp_factors *);
>
> Actually the above doesn't order the va_arg() call correctly.
> Also it uses mpz_ptr which is not kosher it seems:
>   http://gmplib.org/list-archives/gmp-discuss/2009-May/003769.html
> So I've adjusted to:
>
> #define mpz_inits(...) mpz_va_init (mpz_init, __VA_ARGS__)
> #define mpz_clears(...) mpz_va_init (mpz_clear, __VA_ARGS__)
>
> static void
> mpz_va_init (void (*mpz_single_init)(mpz_t), ...)
> {
>   va_list ap;
>
>   va_start (ap, mpz_single_init);
>
>   mpz_t *mpz;
>   while ((mpz = va_arg (ap, mpz_t *)))
>     mpz_single_init (*mpz);
>
>   va_end (ap);
> }

Oh!  Were there symptoms?

Ship it :-)

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Wed, 17 Oct 2012 10:43:02 GMT) Full text and rfc822 format available.

Message #238 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Thomas <pth <at> suse.de>
To: 12350 <at> debbugs.gnu.org
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Wed, 17 Oct 2012 12:40:49 +0200

* Niels Möller (nisse <at> lysator.liu.se) [20120907 11:10]:

> My understanding is that most gnu/linux distributions build coreutils
> without linking to gmp. So lots of users don't get this capability.

At least openSUSE has been building coreutils with gmp for quite some time.

Philipp

Information forwarded to bug-coreutils <at> gnu.org:
bug#12350; Package coreutils. (Wed, 17 Oct 2012 16:07:02 GMT) Full text and rfc822 format available.

Message #241 received at 12350 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Philipp Thomas <pth <at> suse.de>
Cc: 12350 <at> debbugs.gnu.org
Subject: Re: bug#12350: Composites identified as primes in factor.c (when
	HAVE_GMP)
Date: Wed, 17 Oct 2012 18:05:28 +0200

Philipp Thomas wrote:
> * Niels Möller (nisse <at> lysator.liu.se) [20120907 11:10]:
>
>> My understanding is that most gnu/linux distributions build coreutils
>> without linking to gmp. So lots of users don't get this capability.
>
> At least openSUSE has been building coreutils with gmp for quite some time.

So does Fedora.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 15 Nov 2012 12:24:02 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 276 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #12350 Composites identified as primes in factor.c (when HAVE_GMP)

GNU bug report logs - #12350
Composites identified as primes in factor.c (when HAVE_GMP)