From unknown Mon Aug 18 11:26:01 2025 X-Loop: help-debbugs@gnu.org Subject: bug#42269: Remove non-GMP code from coreutils factor.c Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 08 Jul 2020 16:27:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 42269 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 42269@debbugs.gnu.org Cc: James Youngman , =?UTF-8?Q?P=C3=A1draig?= Brady , =?UTF-8?Q?Torbj=C3=B6rn?= Granlund , Jim Meyering X-Debbugs-Original-To: Coreutils bugs Received: via spool by submit@debbugs.gnu.org id=B.159422558531449 (code B ref -1); Wed, 08 Jul 2020 16:27:02 +0000 Received: (at submit) by debbugs.gnu.org; 8 Jul 2020 16:26:25 +0000 Received: from localhost ([127.0.0.1]:38972 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtCtt-0008BB-QK for submit@debbugs.gnu.org; Wed, 08 Jul 2020 12:26:25 -0400 Received: from lists.gnu.org ([209.51.188.17]:43778) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtCtr-0008B4-Ub for submit@debbugs.gnu.org; Wed, 08 Jul 2020 12:26:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:45806) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jtCtr-0006y8-Mf for bug-coreutils@gnu.org; Wed, 08 Jul 2020 12:26:23 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37108) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jtCtl-0005jv-5q; Wed, 08 Jul 2020 12:26:23 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E902A1600C4; Wed, 8 Jul 2020 09:26:11 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Lo6_5XxoDOLh; Wed, 8 Jul 2020 09:26:01 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 2B9491600CD; Wed, 8 Jul 2020 09:26:01 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id nxconB06pCw1; Wed, 8 Jul 2020 09:26:00 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 9A3471600C4; Wed, 8 Jul 2020 09:26:00 -0700 (PDT) From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDUGdRVEFRSUFL QVVDVElCeQpaQUliQXdVSkVzd0RBQVlMQ1FnSEF3SUdGUWdDQ1FvTEJCWUNBd0VDSGdFQ0Y0 QUFDZ2tRN1pmcERtS3FmalJSCkd3LytJajAzZGhZZllsL2dYVlJpdXpWMWdHcmJIayt0bmZy SS9DN2ZBZW9GelE1dFZnVmluU2hhUGtabzBIVFAKZjE4eDZJREVkQWlPOE1xbzF5cDBDdEht ekdNQ0o1MG80R3JnZmpscjZnLyt2dEVPS2JobGVzek4yWHBKdnB3TQoyUWdHdm4vbGFUTFV1 OFBIOWFSV1RzN3FKSlpLS0tBYjRzeFljOTJGZWhQdTZGT0QwZERpeWhsREFxNGxPVjJtCmRC cHpRYmlvam9aelFMTVF3anBnQ1RLMjU3MmVLOUVPRVF5U1VUaFhyU0l6NkFTZW5wNE5ZVEZI czl0dUpRdlgKazlnWkRkUFNsM2JwKzQ3ZEd4bHhFV0xwQklNN3pJT053NGtzNGF6Z1Q4bnZE WnhBNUlaSHR2cUJsSkxCT2JZWQowTGU2MVdwMHkzVGxCRGgycWRLOGVZTDQyNlc0c2NFTVN1 aWc1Z2I4T0F0UWlCVzZrMnNHVXh4ZWl2OG92V3U4CllBWmdLSmZ1b1dJK3VSbk1FZGRydVk4 SnNvTTU0S2FLdlppa2tLczJiZzFuZHRMVnpIcEo2cUZaQzdRVmplSFUKaDYvQm1ndmRqV1Ba WUZUdE4rS0E5Q1dYM0dRS0tnTjN1dTk4OHl6bkQ3TG5COThUNEVVSDFIQS9HbmZCcU1WMQpn cHpUdlBjNHFWUWluQ21Ja0VGcDgzemwrRzVmQ2pKSjNXN2l2ekNuWW80S2hLTHBGVW05N29r VEtSMkxXM3haCnpFVzRjTFNXTzM4N01USzNDekRPeDVxZTZzNGE5MVp1Wk0vai9UUWRUTERh cU5uODNrQTRIcTQ4VUhYWXhjSWgKK05kOGsvM3c2bEZ1b0swd3JPRml5d2pMeCswdXI1am1t YmVjQkdIYzF4ZGhBRkc1QWcwRVRJQnlaQUVRQUthRgo2NzhUOXd5SDR3alRyVjFQejNjREVv U25WLzBaVXJPVDM3cDFkY0d5ai9JWHExeDY3MEhSVmFoQW1rMHNacFljCjI1UEY5RDVHUFlI RldsTmp1UFU5NnJEbmRYQjNoZWRtQlJoTGRDNGJBWGpJNERWK2JtZFZlK3EvSU1ubFpSYVYK bG05RWlNQ1ZBUjZ3MTNzUmV1N3FYa1c5cjNSd1kyQXpYc2twL3RBZTRCUktyMVptYnZpMm5i blE2ZXBFQzQycgpSYngwQjFFaGpiSVFaNUpIR2syNGlQVDdMZEJnbk5tb3M1d1lqendObGtN UUQ1VDBZZHpoazdKK1V4d0E1bTQ2Cm1PaFJEQzJyRlYvQTBnbTVUTHk4RFhqdi9Fc2M0Z1lu WWFpNlNRcW5VRVZoNUx1VjhZQ0pCbmlqcytUaXc3MXgKMWljbW42eEdJNDVFdWdKT2dlYyty THlwWWdwVnA0eDBISTVUODhxQlJZQ2t4SDNLZzhRbytFV05BOUE0TFJROQpEWDhuam9uYTBn ZjBzMDN0b2NLOGtCTjY2VW9xcVB0SEJuYzRlTWdCeW1DZmxLMTJlS2ZkMllZeG55ZzljWmF6 CldBNVZzbHZUeHBtNzZoYmc1b2lBRUgvVmcvOE14SHlBblBoZnJnd3lQcm1KRWNWQmFmZHNw Sm5ZUXhCWU5jbzIKTEZQSWhsT3ZXaDhyNGF0K3MrTTNMYjI2b1VUY3psZ2RXMVNmM1NEQTc3 Qk1SbkYwRlF5RSs3QXpWNzlNQk40eQpraXFhZXpReHRhRjFGeS90dmtoZmZTbzh1K2R3RzBF Z0poK3RlMzhnVGNJU1ZyMEdJUHBsTHo2WWhqcmJIclBSCkYxQ041VXVMOURCR2p4dU4zNVJM TlZFZnRhNlJVRmxSNk5jdFRqdnJBQkVCQUFHSkFpVUVHQUVDQUE4RkFreUEKY21RQ0d3d0ZD UkxNQXdBQUNna1E3WmZwRG1LcWZqU3JIQS8rS3pBS3ZUeFJoQTlNV05MeEl5SjdTNXVKMTZn cwpUM29DalpyQktHRWhLTU9HWDRPMEdBNlZPRXJ5TzdRUkNDWWFoM294U0czOElBbk5laXdK WGdVOUJ6a2s4NVVHCmJQRWQ3SEdGL1ZTZUhDUXdXb3U2anFVRFRTRHZuOVloTlRkRzBLWFBN NzRhQyt4cjJab3cxTzJtaFhpaGdXS0QKMER3KzBMWVBuVU9zUTBLT0Z4SFhYWUhtUnJTMU9a UFU1OUJMdmMrVFJoSWhhZlNIS0x3YlhLKzZja2t4Qng2aAo4ejVjY3BHMFFzNGJGaGRGWW5G ckVpZURMb0dtbkUyWUxoZFY2c3dKOVZOQ1M2cExpRW9oVDNmbTdhWG0xNXRaCk9JeXpNWmhI UlNBUGJsWHhRMFpTV2pxOG9ScmNZTkZ4YzRXMVVScEFrQkNPWUpvWHZRZkQ1TDNscUFsOFRD cUQKVXpZeGhIL3RKaGJEZEhycUhINzY3amFEYVRCMStUYWxwLzJBTUt3Y1hOT2Rpa2xHeGJt SFZHNllHbDZnOExyYgpzdTlOWkVJNHlMbEh6dWlrdGhKV2d6KzN2WmhWR3lObHQrSE5Jb0Y2 Q2pETDJvbXU1Y0VxNFJESE00NFFxUGs2Cmw3TzBwVXZOMW1UNEIrUzFiMDhSS3BxbS9mZjAx NUUzN0hOVi9waUl2Smx4R0FZejhQU2Z1R0NCMXRoTVlxbG0KZ2RoZDkvQmFiR0ZiR0dZSEE2 VTQvVDV6cVUrZjZ4SHkxU3NBUVoxTVNLbEx3ZWtCSVQrNC9jTFJHcUNIam5WMApxNUgvVDZh N3Q1bVBrYnpTck9MU280cHVqK0lUb05qWXlZSURCV3pobEExOWF2T2ErcnZVam1IdEQzc0ZO N2NYCld0a0dvaThidU5jYnk0VT0KPUFMNm8KLS0tLS1FTkQgUEdQIFBVQkxJQyBLRVkgQkxP Q0stLS0tLQo= Organization: UCLA Computer Science Department Message-ID: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> Date: Wed, 8 Jul 2020 09:25:57 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------FAA07B69FA7D344621580E10" Content-Language: en-US Received-SPF: pass client-ip=131.179.128.68; envelope-from=eggert@cs.ucla.edu; helo=zimbra.cs.ucla.edu X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/08 12:26:12 X-ACL-Warn: Detected OS = Linux 3.1-3.10 X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" This is a multi-part message in MIME format. --------------FAA07B69FA7D344621580E10 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit I recently modified GNU coreutils so that it can assume GMP, possibly by compiling and linking mini-gmp.c. This helps simplify the coreutils source code and makes coreutils behavior more portable. In doing so, I noticed that factor.c has special-purpose code to factor integers up to 127 bits. Although this code added functionality when coreutils could not assume GMP, it's no longer needed for that. And although it runs faster than the GMP code does, while doing the recent surgery on factor.c I began to wonder whether the hassle of maintaining the code outweighed its usefulness. So I wrote up the attached patch, which simply removes the non-GMP code and simplifies factor.c quite a bit. I assume the attached patch will hurt performance significantly in some cases for 127-bit numbers, so I did not install it. Perhaps it would be better to keep the non-GMP algorithm and recode it with GMP. Or perhaps it would be better to leave the factor.c code alone. Comments? --------------FAA07B69FA7D344621580E10 Content-Type: text/x-patch; charset=UTF-8; name="0001-factor-simplify-by-assuming-libgmp.patch" Content-Disposition: attachment; filename="0001-factor-simplify-by-assuming-libgmp.patch" Content-Transfer-Encoding: quoted-printable >From 51def5ff599a25dea16bbb1a76ca061b5e41c5a6 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Tue, 7 Jul 2020 14:41:56 -0700 Subject: [PATCH] factor: simplify by assuming libgmp Since coreutils now always has libgmp available, if only via compiling mini-gmp.c, there's not a big need for special-purpose code for factoring 128-bit integers. On platforms that care about performance libgmp will be linked in, and on other platforms mini-gmp.c is good enough. Although there is a performance advantage on the special-purpose code for factoring 128-bit integers, it's not big enough to justify the software engineering hassle of maintaining it. * cfg.mk (_ll): Remove. (exclude_file_name_regexp--sc_useless_cpp_parens) (exclude_file_name_regexp--sc_space_before_open_paren) (exclude_file_name_regexp--sc_preprocessor_indentation) (exclude_file_name_regexp--sc_ensure_comma_after_id_est) (exclude_file_name_regexp--sc_long_lines): Omit mention of $(_ll). * src/factor.c (USE_SQUFOF, STAT_SQUFOF, USE_LONGLONG_H) (W_TYPE_SIZE, UWtype, UHWtype, UQItype, SItype, USItype, DItype) (UDItype, UQItype, SItype, USItype, DItype, UDItype) (LONGLONG_STANDALONE, ASSERT, __GMP_DECLSPEC, __clz_tab) (__GMP_GNUC_PREREQ, HAVE_HOST_CPU_FAMILY_powerpc) (factor_clz_tab, __ll_B, __ll_lowpart, __ll_highpart) (MAX_NFACTS, struct factors, umul_ppmm, udiv_qrnnd, add_ssaaaa) (rsh2, lsh2, ge2, gt2, sub_ddmmss, count_leading_zeros) (count_trailing_zeros, submod, addmod, addmod2, submod2) (HIGHBIT_TO_MASK, mod2, gcd_odd, gcd2_odd) (factor_insert_multiplicity, factor_insert, factor_insert_large) (primes_diff8, struct primes_dtab, primes_dtab) (factor_insert_refind, factor_using_division, DIVBLOCK) (binvert_table, binv, divexact_21, redcify, redcify2, mulredc) (mulredc2, powm, powm2, millerrabin, millerrabin2, prime_p) (prime2_p, factor_using_pollard_rho, factor_using_pollard_rho2) (isqrt, MAGIC64, MAGIC63, MAGIC65, MAGIC11, is_square, invtab) (div_smallq, QUEUE_SIZE, Q_FREQ_SIZE, q_freq, MIN) (factor_using_squfof, factor, strto2uintmax, struct lbuf_, lbuf) (lbuf_alloc, lbuf_flush, lbuf_putc, lbuf_putint) (print_uintmaxes, print_factors_single): Remove; no longer used. (print_factors, main): Simplify by assuming libgmp. * src/local.mk (noinst_HEADERS): Remove src/longlong.h. * src/longlong.h: Remove. --- cfg.mk | 13 +- src/factor.c | 2070 +------------------------------------------ src/local.mk | 1 - src/longlong.h | 2267 ------------------------------------------------ 4 files changed, 45 insertions(+), 4306 deletions(-) delete mode 100644 src/longlong.h diff --git a/cfg.mk b/cfg.mk index d352aac94..b89f2e9cd 100644 --- a/cfg.mk +++ b/cfg.mk @@ -855,19 +855,14 @@ exclude_file_name_regexp--sc_prohibit_fail_0 =3D \ exclude_file_name_regexp--sc_prohibit_test_minus_ao =3D *\.texi$$ exclude_file_name_regexp--sc_prohibit_atoi_atof =3D ^lib/euidaccess-stat= \.c$$ =20 -# longlong.h is maintained elsewhere. -_ll =3D ^src/longlong\.h$$ -exclude_file_name_regexp--sc_useless_cpp_parens =3D $(_ll) -exclude_file_name_regexp--sc_space_before_open_paren =3D $(_ll) - tbi_1 =3D ^tests/pr/|(\.mk|^man/help2man)$$ tbi_2 =3D ^scripts/git-hooks/(pre-commit|pre-applypatch|applypatch-msg)$= $ -tbi_3 =3D (GNU)?[Mm]akefile(\.am)?$$|$(_ll) +tbi_3 =3D (GNU)?[Mm]akefile(\.am)?$$ exclude_file_name_regexp--sc_prohibit_tab_based_indentation =3D \ $(tbi_1)|$(tbi_2)|$(tbi_3) =20 exclude_file_name_regexp--sc_preprocessor_indentation =3D \ - ^(gl/lib/rand-isaac\.[ch]|gl/tests/test-rand-isaac\.c)$$|$(_ll) + ^(gl/lib/rand-isaac\.[ch]|gl/tests/test-rand-isaac\.c)$$ exclude_file_name_regexp--sc_prohibit_stat_st_blocks =3D \ ^(src/system\.h|tests/du/2g\.sh)$$ =20 @@ -889,8 +884,8 @@ exclude_file_name_regexp--sc_prohibit-gl-attributes =3D= ^src/libstdbuf\.c$$ =20 exclude_file_name_regexp--sc_prohibit_uppercase_id_est =3D \.diff$$ exclude_file_name_regexp--sc_ensure_dblspace_after_dot_before_id_est =3D= \.diff$$ -exclude_file_name_regexp--sc_ensure_comma_after_id_est =3D \.diff|$(_ll)= $$ -exclude_file_name_regexp--sc_long_lines =3D \.diff$$|$(_ll) +exclude_file_name_regexp--sc_ensure_comma_after_id_est =3D \.diff$$ +exclude_file_name_regexp--sc_long_lines =3D \.diff$$ =20 # Augment AM_CFLAGS to include our per-directory options: AM_CFLAGS +=3D $($(@D)_CFLAGS) diff --git a/src/factor.c b/src/factor.c index c1c35a562..2b4b419d7 100644 --- a/src/factor.c +++ b/src/factor.c @@ -15,36 +15,20 @@ along with this program. If not, see = . */ =20 /* Originally written by Paul Rubin . - Adapted for GNU, fixed to factor UINT_MAX by Jim Meyering. + Adapted for GNU by Jim Meyering. Arbitrary-precision code adapted by James Youngman from Torbj=C3=B6rn Granlund's factorize.c, from GNU MP version 4.2.2. In 2012, the core was rewritten by Torbj=C3=B6rn Granlund and Niels M= =C3=B6ller. Contains code from GNU MP. */ =20 -/* Efficiently factor numbers that fit in one or two words (word =3D uin= tmax_t), - or, with GMP, numbers of any size. - - Code organisation: - - There are several variants of many functions, for handling one word,= two - words, and GMP's mpz_t type. If the one-word variant is called foo,= the - two-word variant will be foo2, and the one for mpz_t will be mp_foo.= In - some cases, the plain function variants will handle both one-word an= d - two-word numbers, evidenced by function arguments. - - The factoring code for two words will fall into the code for one wor= d when - progress allows that. +/* Efficiently factor numbers of any size. =20 Algorithm: =20 - (1) Perform trial division using a small primes table, but without h= ardware - division since the primes table store inverses modulo the word b= ase. - (The GMP variant of this code doesn't make use of the precompute= d - inverses, but instead relies on GMP for fast divisibility testin= g.) + (1) Perform trial division. (2) Check the nature of any non-factored part using Miller-Rabin for detecting composites, and Lucas for detecting primes. - (3) Factor any remaining composite part using the Pollard-Brent rho - algorithm or if USE_SQUFOF is defined to 1, try that first. + (3) Factor any remaining composite part using Pollard-Brent rho. Status of found factors are checked again using Miller-Rabin and= Lucas. =20 We prefer using Hensel norm in the divisions, not the more familiar @@ -59,23 +43,9 @@ elsewhere. A problem is to locate the inverses not from an index,= but from a prime. We might instead compute the inverse on-the-fly. =20 - * Tune trial division table size (not forgetting that this is a stan= dalone - program where the table will be read from disk for each invocation= ). - * Implement less naive powm, using k-ary exponentiation for k =3D 3 = or perhaps k =3D 4. =20 - * Try to speed trial division code for single uintmax_t numbers, i.e= ., the - code using DIVBLOCK. It currently runs at 2 cycles per prime (Int= el SBR, - IBR), 3 cycles per prime (AMD Stars) and 5 cycles per prime (AMD B= D) when - using gcc 4.6 and 4.7. Some software pipelining should help; 1, 2= , and 4 - respectively cycles ought to be possible. - - * The redcify function could be vastly improved by using (plain Eucl= idian) - pre-inversion (such as GMP's invert_limb) and udiv_qrnnd_preinv (f= rom - GMP's gmp-impl.h). The redcify2 function could be vastly improved= using - similar methoods. These functions currently dominate run time whe= n using - the -w option. */ =20 /* Whether to recursively factor to prove primality, @@ -84,17 +54,6 @@ # define PROVE_PRIMALITY 1 #endif =20 -/* Faster for certain ranges but less general. */ -#ifndef USE_SQUFOF -# define USE_SQUFOF 0 -#endif - -/* Output SQUFOF statistics. */ -#ifndef STAT_SQUFOF -# define STAT_SQUFOF 0 -#endif - - #include #include #include @@ -120,99 +79,6 @@ /* Token delimiters when reading from a file. */ #define DELIM "\n\t " =20 -#ifndef USE_LONGLONG_H -/* With the way we use longlong.h, it's only safe to use - when UWtype =3D UHWtype, as there were various cases - (as can be seen in the history for longlong.h) where - for example, _LP64 was required to enable W_TYPE_SIZE=3D=3D64 code, - to avoid compile time or run time issues. */ -# if LONG_MAX =3D=3D INTMAX_MAX -# define USE_LONGLONG_H 1 -# endif -#endif - -#if USE_LONGLONG_H - -/* Make definitions for longlong.h to make it do what it can do for us *= / - -/* bitcount for uintmax_t */ -# if UINTMAX_MAX =3D=3D UINT32_MAX -# define W_TYPE_SIZE 32 -# elif UINTMAX_MAX =3D=3D UINT64_MAX -# define W_TYPE_SIZE 64 -# elif UINTMAX_MAX =3D=3D UINT128_MAX -# define W_TYPE_SIZE 128 -# endif - -# define UWtype uintmax_t -# define UHWtype unsigned long int -# undef UDWtype -# if HAVE_ATTRIBUTE_MODE -typedef unsigned int UQItype __attribute__ ((mode (QI))); -typedef int SItype __attribute__ ((mode (SI))); -typedef unsigned int USItype __attribute__ ((mode (SI))); -typedef int DItype __attribute__ ((mode (DI))); -typedef unsigned int UDItype __attribute__ ((mode (DI))); -# else -typedef unsigned char UQItype; -typedef long SItype; -typedef unsigned long int USItype; -# if HAVE_LONG_LONG_INT -typedef long long int DItype; -typedef unsigned long long int UDItype; -# else /* Assume `long' gives us a wide enough type. Needed for hppa2.= 0w. */ -typedef long int DItype; -typedef unsigned long int UDItype; -# endif -# endif -# define LONGLONG_STANDALONE /* Don't require GMP's longlong.h mdep = files */ -# define ASSERT(x) /* FIXME make longlong.h really standal= one */ -# define __GMP_DECLSPEC /* FIXME make longlong.h really standal= one */ -# define __clz_tab factor_clz_tab /* Rename to avoid glibc collision */ -# ifndef __GMP_GNUC_PREREQ -# define __GMP_GNUC_PREREQ(a,b) 1 -# endif - -/* These stub macros are only used in longlong.h in certain system compi= ler - combinations, so ensure usage to avoid -Wunused-macros warnings. */ -# if __GMP_GNUC_PREREQ (1,1) && defined __clz_tab -ASSERT (1) -__GMP_DECLSPEC -# endif - -# if _ARCH_PPC -# define HAVE_HOST_CPU_FAMILY_powerpc 1 -# endif -# include "longlong.h" -# ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB -const unsigned char factor_clz_tab[129] =3D -{ - 1,2,3,3,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6, - 7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, - 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, - 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, - 9 -}; -# endif - -#else /* not USE_LONGLONG_H */ - -# define W_TYPE_SIZE (8 * sizeof (uintmax_t)) -# define __ll_B ((uintmax_t) 1 << (W_TYPE_SIZE / 2)) -# define __ll_lowpart(t) ((uintmax_t) (t) & (__ll_B - 1)) -# define __ll_highpart(t) ((uintmax_t) (t) >> (W_TYPE_SIZE / 2)) - -#endif - -#if !defined __clz_tab && !defined UHWtype -/* Without this seemingly useless conditional, gcc -Wunused-macros - warns that each of the two tested macros is unused on Fedora 18. - FIXME: this is just an ugly band-aid. Fix it properly. */ -#endif - -/* 2*3*5*7*11...*101 is 128 bits, and has 26 prime factors */ -#define MAX_NFACTS 26 - enum { DEV_DEBUG_OPTION =3D CHAR_MAX + 1 @@ -226,14 +92,6 @@ static struct option const long_options[] =3D {NULL, 0, NULL, 0} }; =20 -struct factors -{ - uintmax_t plarge[2]; /* Can have a single large factor */ - uintmax_t p[MAX_NFACTS]; - unsigned char e[MAX_NFACTS]; - unsigned char nfactors; -}; - struct mp_factors { mpz_t *p; @@ -241,321 +99,6 @@ struct mp_factors unsigned long int nfactors; }; =20 -static void factor (uintmax_t, uintmax_t, struct factors *); - -#ifndef umul_ppmm -# define umul_ppmm(w1, w0, u, v) = \ - do { = \ - uintmax_t __x0, __x1, __x2, __x3; = \ - unsigned long int __ul, __vl, __uh, __vh; = \ - uintmax_t __u =3D (u), __v =3D (v); = \ - = \ - __ul =3D __ll_lowpart (__u); = \ - __uh =3D __ll_highpart (__u); = \ - __vl =3D __ll_lowpart (__v); = \ - __vh =3D __ll_highpart (__v); = \ - = \ - __x0 =3D (uintmax_t) __ul * __vl; = \ - __x1 =3D (uintmax_t) __ul * __vh; = \ - __x2 =3D (uintmax_t) __uh * __vl; = \ - __x3 =3D (uintmax_t) __uh * __vh; = \ - = \ - __x1 +=3D __ll_highpart (__x0);/* this can't give carry */ = \ - __x1 +=3D __x2; /* but this indeed can */ = \ - if (__x1 < __x2) /* did we get it? */ = \ - __x3 +=3D __ll_B; /* yes, add it in the proper pos. */ = \ - = \ - (w1) =3D __x3 + __ll_highpart (__x1); = \ - (w0) =3D (__x1 << W_TYPE_SIZE / 2) + __ll_lowpart (__x0); = \ - } while (0) -#endif - -#if !defined udiv_qrnnd || defined UDIV_NEEDS_NORMALIZATION -/* Define our own, not needing normalization. This function is - currently not performance critical, so keep it simple. Similar to - the mod macro below. */ -# undef udiv_qrnnd -# define udiv_qrnnd(q, r, n1, n0, d) = \ - do { = \ - uintmax_t __d1, __d0, __q, __r1, __r0; = \ - = \ - assert ((n1) < (d)); = \ - __d1 =3D (d); __d0 =3D 0; = \ - __r1 =3D (n1); __r0 =3D (n0); = \ - __q =3D 0; = \ - for (unsigned int __i =3D W_TYPE_SIZE; __i > 0; __i--) = \ - { = \ - rsh2 (__d1, __d0, __d1, __d0, 1); = \ - __q <<=3D 1; = \ - if (ge2 (__r1, __r0, __d1, __d0)) = \ - { = \ - __q++; = \ - sub_ddmmss (__r1, __r0, __r1, __r0, __d1, __d0); = \ - } = \ - } = \ - (r) =3D __r0; = \ - (q) =3D __q; = \ - } while (0) -#endif - -#if !defined add_ssaaaa -# define add_ssaaaa(sh, sl, ah, al, bh, bl) = \ - do { = \ - uintmax_t _add_x; = \ - _add_x =3D (al) + (bl); = \ - (sh) =3D (ah) + (bh) + (_add_x < (al)); = \ - (sl) =3D _add_x; = \ - } while (0) -#endif - -#define rsh2(rh, rl, ah, al, cnt) = \ - do { = \ - (rl) =3D ((ah) << (W_TYPE_SIZE - (cnt))) | ((al) >> (cnt)); = \ - (rh) =3D (ah) >> (cnt); = \ - } while (0) - -#define lsh2(rh, rl, ah, al, cnt) = \ - do { = \ - (rh) =3D ((ah) << cnt) | ((al) >> (W_TYPE_SIZE - (cnt))); = \ - (rl) =3D (al) << (cnt); = \ - } while (0) - -#define ge2(ah, al, bh, bl) = \ - ((ah) > (bh) || ((ah) =3D=3D (bh) && (al) >=3D (bl))) - -#define gt2(ah, al, bh, bl) = \ - ((ah) > (bh) || ((ah) =3D=3D (bh) && (al) > (bl))) - -#ifndef sub_ddmmss -# define sub_ddmmss(rh, rl, ah, al, bh, bl) = \ - do { = \ - uintmax_t _cy; = \ - _cy =3D (al) < (bl); = \ - (rl) =3D (al) - (bl); = \ - (rh) =3D (ah) - (bh) - _cy; = \ - } while (0) -#endif - -#ifndef count_leading_zeros -# define count_leading_zeros(count, x) do { = \ - uintmax_t __clz_x =3D (x); = \ - unsigned int __clz_c; = \ - for (__clz_c =3D 0; = \ - (__clz_x & ((uintmax_t) 0xff << (W_TYPE_SIZE - 8))) =3D=3D 0; = \ - __clz_c +=3D 8) = \ - __clz_x <<=3D 8; = \ - for (; (intmax_t)__clz_x >=3D 0; __clz_c++) = \ - __clz_x <<=3D 1; = \ - (count) =3D __clz_c; = \ - } while (0) -#endif - -#ifndef count_trailing_zeros -# define count_trailing_zeros(count, x) do { = \ - uintmax_t __ctz_x =3D (x); = \ - unsigned int __ctz_c =3D 0; = \ - while ((__ctz_x & 1) =3D=3D 0) = \ - { = \ - __ctz_x >>=3D 1; = \ - __ctz_c++; = \ - } = \ - (count) =3D __ctz_c; = \ - } while (0) -#endif - -/* Requires that a < n and b <=3D n */ -#define submod(r,a,b,n) = \ - do { = \ - uintmax_t _t =3D - (uintmax_t) (a < b); = \ - (r) =3D ((n) & _t) + (a) - (b); = \ - } while (0) - -#define addmod(r,a,b,n) = \ - submod ((r), (a), ((n) - (b)), (n)) - -/* Modular two-word addition and subtraction. For performance reasons, = the - most significant bit of n1 must be clear. The destination variables = must be - distinct from the mod operand. */ -#define addmod2(r1, r0, a1, a0, b1, b0, n1, n0) = \ - do { = \ - add_ssaaaa ((r1), (r0), (a1), (a0), (b1), (b0)); = \ - if (ge2 ((r1), (r0), (n1), (n0))) = \ - sub_ddmmss ((r1), (r0), (r1), (r0), (n1), (n0)); = \ - } while (0) -#define submod2(r1, r0, a1, a0, b1, b0, n1, n0) = \ - do { = \ - sub_ddmmss ((r1), (r0), (a1), (a0), (b1), (b0)); = \ - if ((intmax_t) (r1) < 0) = \ - add_ssaaaa ((r1), (r0), (r1), (r0), (n1), (n0)); = \ - } while (0) - -#define HIGHBIT_TO_MASK(x) = \ - (((intmax_t)-1 >> 1) < 0 = \ - ? (uintmax_t)((intmax_t)(x) >> (W_TYPE_SIZE - 1)) = \ - : ((x) & ((uintmax_t) 1 << (W_TYPE_SIZE - 1)) = \ - ? UINTMAX_MAX : (uintmax_t) 0)) - -/* Compute r =3D a mod d, where r =3D <*t1,retval>, a =3D , d =3D= . - Requires that d1 !=3D 0. */ -static uintmax_t -mod2 (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t d1, uintmax_t= d0) -{ - int cntd, cnta; - - assert (d1 !=3D 0); - - if (a1 =3D=3D 0) - { - *r1 =3D 0; - return a0; - } - - count_leading_zeros (cntd, d1); - count_leading_zeros (cnta, a1); - int cnt =3D cntd - cnta; - lsh2 (d1, d0, d1, d0, cnt); - for (int i =3D 0; i < cnt; i++) - { - if (ge2 (a1, a0, d1, d0)) - sub_ddmmss (a1, a0, a1, a0, d1, d0); - rsh2 (d1, d0, d1, d0, 1); - } - - *r1 =3D a1; - return a0; -} - -static uintmax_t _GL_ATTRIBUTE_CONST -gcd_odd (uintmax_t a, uintmax_t b) -{ - if ( (b & 1) =3D=3D 0) - { - uintmax_t t =3D b; - b =3D a; - a =3D t; - } - if (a =3D=3D 0) - return b; - - /* Take out least significant one bit, to make room for sign */ - b >>=3D 1; - - for (;;) - { - uintmax_t t; - uintmax_t bgta; - - while ((a & 1) =3D=3D 0) - a >>=3D 1; - a >>=3D 1; - - t =3D a - b; - if (t =3D=3D 0) - return (a << 1) + 1; - - bgta =3D HIGHBIT_TO_MASK (t); - - /* b <-- min (a, b) */ - b +=3D (bgta & t); - - /* a <-- |a - b| */ - a =3D (t ^ bgta) - bgta; - } -} - -static uintmax_t -gcd2_odd (uintmax_t *r1, uintmax_t a1, uintmax_t a0, uintmax_t b1, uintm= ax_t b0) -{ - assert (b0 & 1); - - if ( (a0 | a1) =3D=3D 0) - { - *r1 =3D b1; - return b0; - } - - while ((a0 & 1) =3D=3D 0) - rsh2 (a1, a0, a1, a0, 1); - - for (;;) - { - if ((b1 | a1) =3D=3D 0) - { - *r1 =3D 0; - return gcd_odd (b0, a0); - } - - if (gt2 (a1, a0, b1, b0)) - { - sub_ddmmss (a1, a0, a1, a0, b1, b0); - do - rsh2 (a1, a0, a1, a0, 1); - while ((a0 & 1) =3D=3D 0); - } - else if (gt2 (b1, b0, a1, a0)) - { - sub_ddmmss (b1, b0, b1, b0, a1, a0); - do - rsh2 (b1, b0, b1, b0, 1); - while ((b0 & 1) =3D=3D 0); - } - else - break; - } - - *r1 =3D a1; - return a0; -} - -static void -factor_insert_multiplicity (struct factors *factors, - uintmax_t prime, unsigned int m) -{ - unsigned int nfactors =3D factors->nfactors; - uintmax_t *p =3D factors->p; - unsigned char *e =3D factors->e; - - /* Locate position for insert new or increment e. */ - int i; - for (i =3D nfactors - 1; i >=3D 0; i--) - { - if (p[i] <=3D prime) - break; - } - - if (i < 0 || p[i] !=3D prime) - { - for (int j =3D nfactors - 1; j > i; j--) - { - p[j + 1] =3D p[j]; - e[j + 1] =3D e[j]; - } - p[i + 1] =3D prime; - e[i + 1] =3D m; - factors->nfactors =3D nfactors + 1; - } - else - { - e[i] +=3D m; - } -} - -#define factor_insert(f, p) factor_insert_multiplicity (f, p, 1) - -static void -factor_insert_large (struct factors *factors, - uintmax_t p1, uintmax_t p0) -{ - if (p1 > 0) - { - assert (factors->plarge[1] =3D=3D 0); - factors->plarge[0] =3D p0; - factors->plarge[1] =3D p1; - } - else - factor_insert (factors, p0); -} - #ifndef mpz_inits =20 # include @@ -664,31 +207,11 @@ static const unsigned char primes_diff[] =3D { #define PRIMES_PTAB_ENTRIES \ (sizeof (primes_diff) / sizeof (primes_diff[0]) - 8 + 1) =20 -#define P(a,b,c,d) b, -static const unsigned char primes_diff8[] =3D { -#include "primes.h" -0,0,0,0,0,0,0 /* 7 sentinels for 8-way loop */ -}; -#undef P - -struct primes_dtab -{ - uintmax_t binv, lim; -}; - -#define P(a,b,c,d) {c,d}, -static const struct primes_dtab primes_dtab[] =3D { -#include "primes.h" -{1,0},{1,0},{1,0},{1,0},{1,0},{1,0},{1,0} /* 7 sentinels for 8-way loop = */ -}; -#undef P - /* Verify that uintmax_t is not wider than the integers used to generate primes.h. */ verify (W <=3D WIDE_UINT_BITS); =20 -/* debugging for developers. Enables devmsg(). - This flag is used only in the GMP code. */ +/* debugging for developers. Enables devmsg(). */ static bool dev_debug =3D false; =20 /* Prove primality or run probabilistic tests. */ @@ -697,15 +220,6 @@ static bool flag_prove_primality =3D PROVE_PRIMALITY= ; /* Number of Miller-Rabin tests to run when not proving primality. */ #define MR_REPS 25 =20 -static void -factor_insert_refind (struct factors *factors, uintmax_t p, unsigned int= i, - unsigned int off) -{ - for (unsigned int j =3D 0; j < off; j++) - p +=3D primes_diff[i + j]; - factor_insert (factors, p); -} - /* Trial division with odd primes uses the following trick. =20 Let p be an odd prime, and B =3D 2^{W_TYPE_SIZE}. For simplicity, @@ -739,87 +253,6 @@ factor_insert_refind (struct factors *factors, uintm= ax_t p, unsigned int i, order, and the non-multiples of p onto the range lim < q < B. */ =20 -static uintmax_t -factor_using_division (uintmax_t *t1p, uintmax_t t1, uintmax_t t0, - struct factors *factors) -{ - if (t0 % 2 =3D=3D 0) - { - unsigned int cnt; - - if (t0 =3D=3D 0) - { - count_trailing_zeros (cnt, t1); - t0 =3D t1 >> cnt; - t1 =3D 0; - cnt +=3D W_TYPE_SIZE; - } - else - { - count_trailing_zeros (cnt, t0); - rsh2 (t1, t0, t1, t0, cnt); - } - - factor_insert_multiplicity (factors, 2, cnt); - } - - uintmax_t p =3D 3; - unsigned int i; - for (i =3D 0; t1 > 0 && i < PRIMES_PTAB_ENTRIES; i++) - { - for (;;) - { - uintmax_t q1, q0, hi, lo _GL_UNUSED; - - q0 =3D t0 * primes_dtab[i].binv; - umul_ppmm (hi, lo, q0, p); - if (hi > t1) - break; - hi =3D t1 - hi; - q1 =3D hi * primes_dtab[i].binv; - if (LIKELY (q1 > primes_dtab[i].lim)) - break; - t1 =3D q1; t0 =3D q0; - factor_insert (factors, p); - } - p +=3D primes_diff[i + 1]; - } - if (t1p) - *t1p =3D t1; - -#define DIVBLOCK(I) = \ - do { = \ - for (;;) = \ - { = \ - q =3D t0 * pd[I].binv; = \ - if (LIKELY (q > pd[I].lim)) = \ - break; = \ - t0 =3D q; = \ - factor_insert_refind (factors, p, i + 1, I); = \ - } = \ - } while (0) - - for (; i < PRIMES_PTAB_ENTRIES; i +=3D 8) - { - uintmax_t q; - const struct primes_dtab *pd =3D &primes_dtab[i]; - DIVBLOCK (0); - DIVBLOCK (1); - DIVBLOCK (2); - DIVBLOCK (3); - DIVBLOCK (4); - DIVBLOCK (5); - DIVBLOCK (6); - DIVBLOCK (7); - - p +=3D primes_diff8[i]; - if (p * p > t0) - break; - } - - return t0; -} - static void mp_factor_using_division (mpz_t t, struct mp_factors *factors) { @@ -857,303 +290,6 @@ mp_factor_using_division (mpz_t t, struct mp_factor= s *factors) mpz_clear (q); } =20 -/* Entry i contains (2i+1)^(-1) mod 2^8. */ -static const unsigned char binvert_table[128] =3D -{ - 0x01, 0xAB, 0xCD, 0xB7, 0x39, 0xA3, 0xC5, 0xEF, - 0xF1, 0x1B, 0x3D, 0xA7, 0x29, 0x13, 0x35, 0xDF, - 0xE1, 0x8B, 0xAD, 0x97, 0x19, 0x83, 0xA5, 0xCF, - 0xD1, 0xFB, 0x1D, 0x87, 0x09, 0xF3, 0x15, 0xBF, - 0xC1, 0x6B, 0x8D, 0x77, 0xF9, 0x63, 0x85, 0xAF, - 0xB1, 0xDB, 0xFD, 0x67, 0xE9, 0xD3, 0xF5, 0x9F, - 0xA1, 0x4B, 0x6D, 0x57, 0xD9, 0x43, 0x65, 0x8F, - 0x91, 0xBB, 0xDD, 0x47, 0xC9, 0xB3, 0xD5, 0x7F, - 0x81, 0x2B, 0x4D, 0x37, 0xB9, 0x23, 0x45, 0x6F, - 0x71, 0x9B, 0xBD, 0x27, 0xA9, 0x93, 0xB5, 0x5F, - 0x61, 0x0B, 0x2D, 0x17, 0x99, 0x03, 0x25, 0x4F, - 0x51, 0x7B, 0x9D, 0x07, 0x89, 0x73, 0x95, 0x3F, - 0x41, 0xEB, 0x0D, 0xF7, 0x79, 0xE3, 0x05, 0x2F, - 0x31, 0x5B, 0x7D, 0xE7, 0x69, 0x53, 0x75, 0x1F, - 0x21, 0xCB, 0xED, 0xD7, 0x59, 0xC3, 0xE5, 0x0F, - 0x11, 0x3B, 0x5D, 0xC7, 0x49, 0x33, 0x55, 0xFF -}; - -/* Compute n^(-1) mod B, using a Newton iteration. */ -#define binv(inv,n) = \ - do { = \ - uintmax_t __n =3D (n); = \ - uintmax_t __inv; = \ - = \ - __inv =3D binvert_table[(__n / 2) & 0x7F]; /* 8 */ = \ - if (W_TYPE_SIZE > 8) __inv =3D 2 * __inv - __inv * __inv * __n; = \ - if (W_TYPE_SIZE > 16) __inv =3D 2 * __inv - __inv * __inv * __n; = \ - if (W_TYPE_SIZE > 32) __inv =3D 2 * __inv - __inv * __inv * __n; = \ - = \ - if (W_TYPE_SIZE > 64) = \ - { = \ - int __invbits =3D 64; = \ - do { = \ - __inv =3D 2 * __inv - __inv * __inv * __n; = \ - __invbits *=3D 2; = \ - } while (__invbits < W_TYPE_SIZE); = \ - } = \ - = \ - (inv) =3D __inv; = \ - } while (0) - -/* q =3D u / d, assuming d|u. */ -#define divexact_21(q1, q0, u1, u0, d) = \ - do { = \ - uintmax_t _di, _q0; = \ - binv (_di, (d)); = \ - _q0 =3D (u0) * _di; = \ - if ((u1) >=3D (d)) = \ - { = \ - uintmax_t _p1, _p0 _GL_UNUSED; \ - umul_ppmm (_p1, _p0, _q0, d); = \ - (q1) =3D ((u1) - _p1) * _di; = \ - (q0) =3D _q0; = \ - } = \ - else = \ - { = \ - (q0) =3D _q0; = \ - (q1) =3D 0; = \ - } = \ - } while (0) - -/* x B (mod n). */ -#define redcify(r_prim, r, n) = \ - do { = \ - uintmax_t _redcify_q _GL_UNUSED; \ - udiv_qrnnd (_redcify_q, r_prim, r, 0, n); = \ - } while (0) - -/* x B^2 (mod n). Requires x > 0, n1 < B/2 */ -#define redcify2(r1, r0, x, n1, n0) = \ - do { = \ - uintmax_t _r1, _r0, _i; = \ - if ((x) < (n1)) = \ - { = \ - _r1 =3D (x); _r0 =3D 0; = \ - _i =3D W_TYPE_SIZE; = \ - } = \ - else = \ - { = \ - _r1 =3D 0; _r0 =3D (x); = \ - _i =3D 2*W_TYPE_SIZE; = \ - } = \ - while (_i-- > 0) = \ - { = \ - lsh2 (_r1, _r0, _r1, _r0, 1); = \ - if (ge2 (_r1, _r0, (n1), (n0))) = \ - sub_ddmmss (_r1, _r0, _r1, _r0, (n1), (n0)); = \ - } = \ - (r1) =3D _r1; = \ - (r0) =3D _r0; = \ - } while (0) - -/* Modular two-word multiplication, r =3D a * b mod m, with mi =3D m^(-1= ) mod B. - Both a and b must be in redc form, the result will be in redc form to= o. */ -static inline uintmax_t -mulredc (uintmax_t a, uintmax_t b, uintmax_t m, uintmax_t mi) -{ - uintmax_t rh, rl, q, th, tl _GL_UNUSED, xh; - - umul_ppmm (rh, rl, a, b); - q =3D rl * mi; - umul_ppmm (th, tl, q, m); - xh =3D rh - th; - if (rh < th) - xh +=3D m; - - return xh; -} - -/* Modular two-word multiplication, r =3D a * b mod m, with mi =3D m^(-1= ) mod B. - Both a and b must be in redc form, the result will be in redc form to= o. - For performance reasons, the most significant bit of m must be clear.= */ -static uintmax_t -mulredc2 (uintmax_t *r1p, - uintmax_t a1, uintmax_t a0, uintmax_t b1, uintmax_t b0, - uintmax_t m1, uintmax_t m0, uintmax_t mi) -{ - uintmax_t r1, r0, q, p1, p0 _GL_UNUSED, t1, t0, s1, s0; - mi =3D -mi; - assert ( (a1 >> (W_TYPE_SIZE - 1)) =3D=3D 0); - assert ( (b1 >> (W_TYPE_SIZE - 1)) =3D=3D 0); - assert ( (m1 >> (W_TYPE_SIZE - 1)) =3D=3D 0); - - /* First compute a0 * B^{-1} - +-----+ - |a0 b0| - +--+--+--+ - |a0 b1| - +--+--+--+ - |q0 m0| - +--+--+--+ - |q0 m1| - -+--+--+--+ - |r1|r0| 0| - +--+--+--+ - */ - umul_ppmm (t1, t0, a0, b0); - umul_ppmm (r1, r0, a0, b1); - q =3D mi * t0; - umul_ppmm (p1, p0, q, m0); - umul_ppmm (s1, s0, q, m1); - r0 +=3D (t0 !=3D 0); /* Carry */ - add_ssaaaa (r1, r0, r1, r0, 0, p1); - add_ssaaaa (r1, r0, r1, r0, 0, t1); - add_ssaaaa (r1, r0, r1, r0, s1, s0); - - /* Next, (a1 * + B^{-1} - +-----+ - |a1 b0| - +--+--+ - |r1|r0| - +--+--+--+ - |a1 b1| - +--+--+--+ - |q1 m0| - +--+--+--+ - |q1 m1| - -+--+--+--+ - |r1|r0| 0| - +--+--+--+ - */ - umul_ppmm (t1, t0, a1, b0); - umul_ppmm (s1, s0, a1, b1); - add_ssaaaa (t1, t0, t1, t0, 0, r0); - q =3D mi * t0; - add_ssaaaa (r1, r0, s1, s0, 0, r1); - umul_ppmm (p1, p0, q, m0); - umul_ppmm (s1, s0, q, m1); - r0 +=3D (t0 !=3D 0); /* Carry */ - add_ssaaaa (r1, r0, r1, r0, 0, p1); - add_ssaaaa (r1, r0, r1, r0, 0, t1); - add_ssaaaa (r1, r0, r1, r0, s1, s0); - - if (ge2 (r1, r0, m1, m0)) - sub_ddmmss (r1, r0, r1, r0, m1, m0); - - *r1p =3D r1; - return r0; -} - -static uintmax_t _GL_ATTRIBUTE_CONST -powm (uintmax_t b, uintmax_t e, uintmax_t n, uintmax_t ni, uintmax_t one= ) -{ - uintmax_t y =3D one; - - if (e & 1) - y =3D b; - - while (e !=3D 0) - { - b =3D mulredc (b, b, n, ni); - e >>=3D 1; - - if (e & 1) - y =3D mulredc (y, b, n, ni); - } - - return y; -} - -static uintmax_t -powm2 (uintmax_t *r1m, - const uintmax_t *bp, const uintmax_t *ep, const uintmax_t *np, - uintmax_t ni, const uintmax_t *one) -{ - uintmax_t r1, r0, b1, b0, n1, n0; - unsigned int i; - uintmax_t e; - - b0 =3D bp[0]; - b1 =3D bp[1]; - n0 =3D np[0]; - n1 =3D np[1]; - - r0 =3D one[0]; - r1 =3D one[1]; - - for (e =3D ep[0], i =3D W_TYPE_SIZE; i > 0; i--, e >>=3D 1) - { - if (e & 1) - { - r0 =3D mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni); - r1 =3D *r1m; - } - b0 =3D mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni); - b1 =3D *r1m; - } - for (e =3D ep[1]; e > 0; e >>=3D 1) - { - if (e & 1) - { - r0 =3D mulredc2 (r1m, r1, r0, b1, b0, n1, n0, ni); - r1 =3D *r1m; - } - b0 =3D mulredc2 (r1m, b1, b0, b1, b0, n1, n0, ni); - b1 =3D *r1m; - } - *r1m =3D r1; - return r0; -} - -static bool _GL_ATTRIBUTE_CONST -millerrabin (uintmax_t n, uintmax_t ni, uintmax_t b, uintmax_t q, - unsigned int k, uintmax_t one) -{ - uintmax_t y =3D powm (b, q, n, ni, one); - - uintmax_t nm1 =3D n - one; /* -1, but in redc representation. */ - - if (y =3D=3D one || y =3D=3D nm1) - return true; - - for (unsigned int i =3D 1; i < k; i++) - { - y =3D mulredc (y, y, n, ni); - - if (y =3D=3D nm1) - return true; - if (y =3D=3D one) - return false; - } - return false; -} - -static bool -millerrabin2 (const uintmax_t *np, uintmax_t ni, const uintmax_t *bp, - const uintmax_t *qp, unsigned int k, const uintmax_t *one) -{ - uintmax_t y1, y0, nm1_1, nm1_0, r1m; - - y0 =3D powm2 (&r1m, bp, qp, np, ni, one); - y1 =3D r1m; - - if (y0 =3D=3D one[0] && y1 =3D=3D one[1]) - return true; - - sub_ddmmss (nm1_1, nm1_0, np[1], np[0], one[1], one[0]); - - if (y0 =3D=3D nm1_0 && y1 =3D=3D nm1_1) - return true; - - for (unsigned int i =3D 1; i < k; i++) - { - y0 =3D mulredc2 (&r1m, y1, y0, y1, y0, np[1], np[0], ni); - y1 =3D r1m; - - if (y0 =3D=3D nm1_0 && y1 =3D=3D nm1_1) - return true; - if (y0 =3D=3D one[0] && y1 =3D=3D one[1]) - return false; - } - return false; -} - static bool mp_millerrabin (mpz_srcptr n, mpz_srcptr nm1, mpz_ptr x, mpz_ptr y, mpz_srcptr q, unsigned long int k) @@ -1174,41 +310,43 @@ mp_millerrabin (mpz_srcptr n, mpz_srcptr nm1, mpz_= ptr x, mpz_ptr y, return false; } =20 -/* Lucas' prime test. The number of iterations vary greatly, up to a fe= w dozen - have been observed. The average seem to be about 2. */ static bool -prime_p (uintmax_t n) +mp_prime_p (mpz_t n) { - int k; bool is_prime; - uintmax_t a_prim, one, ni; - struct factors factors; + mpz_t q, a, nm1, tmp; + struct mp_factors factors; =20 - if (n <=3D 1) + if (mpz_cmp_ui (n, 1) <=3D 0) return false; =20 /* We have already casted out small primes. */ - if (n < (uintmax_t) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) + if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) <= 0) return true; =20 + mpz_inits (q, a, nm1, tmp, NULL); + /* Precomputation for Miller-Rabin. */ - uintmax_t q =3D n - 1; - for (k =3D 0; (q & 1) =3D=3D 0; k++) - q >>=3D 1; + mpz_sub_ui (nm1, n, 1); =20 - uintmax_t a =3D 2; - binv (ni, n); /* ni <- 1/n mod B */ - redcify (one, 1, n); - addmod (a_prim, one, one, n); /* i.e., redcify a =3D 2 */ + /* Find q and k, where q is odd and n =3D 1 + 2**k * q. */ + unsigned long int k =3D mpz_scan1 (nm1, 0); + mpz_tdiv_q_2exp (q, nm1, k); + + mpz_set_ui (a, 2); =20 /* Perform a Miller-Rabin test, finds most composites quickly. */ - if (!millerrabin (n, ni, a_prim, q, k, one)) - return false; + if (!mp_millerrabin (n, nm1, a, tmp, q, k)) + { + is_prime =3D false; + goto ret2; + } =20 if (flag_prove_primality) { /* Factor n-1 for Lucas. */ - factor (0, n - 1, &factors); + mpz_set (tmp, nm1); + mp_factor (tmp, &factors); } =20 /* Loop until Lucas proves our number prime, or Miller-Rabin proves ou= r @@ -1218,10 +356,11 @@ prime_p (uintmax_t n) if (flag_prove_primality) { is_prime =3D true; - for (unsigned int i =3D 0; i < factors.nfactors && is_prime; i= ++) + for (unsigned long int i =3D 0; i < factors.nfactors && is_pri= me; i++) { - is_prime - =3D powm (a_prim, (n - 1) / factors.p[i], n, ni, one) !=3D= one; + mpz_divexact (tmp, nm1, factors.p[i]); + mpz_powm (tmp, a, tmp, n); + is_prime =3D mpz_cmp_ui (tmp, 1) !=3D 0; } } else @@ -1231,202 +370,15 @@ prime_p (uintmax_t n) } =20 if (is_prime) - return true; + goto ret1; =20 - a +=3D primes_diff[r]; /* Establish new base. */ - - /* The following is equivalent to redcify (a_prim, a, n). It runs= faster - on most processors, since it avoids udiv_qrnnd. If we go down = the - udiv_qrnnd_preinv path, this code should be replaced. */ - { - uintmax_t s1, s0; - umul_ppmm (s1, s0, one, a); - if (LIKELY (s1 =3D=3D 0)) - a_prim =3D s0 % n; - else - { - uintmax_t dummy _GL_UNUSED; - udiv_qrnnd (dummy, a_prim, s1, s0, n); - } - } - - if (!millerrabin (n, ni, a_prim, q, k, one)) - return false; - } - - error (0, 0, _("Lucas prime test failure. This should not happen")); - abort (); -} - -static bool -prime2_p (uintmax_t n1, uintmax_t n0) -{ - uintmax_t q[2], nm1[2]; - uintmax_t a_prim[2]; - uintmax_t one[2]; - uintmax_t na[2]; - uintmax_t ni; - unsigned int k; - struct factors factors; - - if (n1 =3D=3D 0) - return prime_p (n0); - - nm1[1] =3D n1 - (n0 =3D=3D 0); - nm1[0] =3D n0 - 1; - if (nm1[0] =3D=3D 0) - { - count_trailing_zeros (k, nm1[1]); - - q[0] =3D nm1[1] >> k; - q[1] =3D 0; - k +=3D W_TYPE_SIZE; - } - else - { - count_trailing_zeros (k, nm1[0]); - rsh2 (q[1], q[0], nm1[1], nm1[0], k); - } - - uintmax_t a =3D 2; - binv (ni, n0); - redcify2 (one[1], one[0], 1, n1, n0); - addmod2 (a_prim[1], a_prim[0], one[1], one[0], one[1], one[0], n1, n0)= ; - - /* FIXME: Use scalars or pointers in arguments? Some consistency neede= d. */ - na[0] =3D n0; - na[1] =3D n1; - - if (!millerrabin2 (na, ni, a_prim, q, k, one)) - return false; - - if (flag_prove_primality) - { - /* Factor n-1 for Lucas. */ - factor (nm1[1], nm1[0], &factors); - } - - /* Loop until Lucas proves our number prime, or Miller-Rabin proves ou= r - number composite. */ - for (unsigned int r =3D 0; r < PRIMES_PTAB_ENTRIES; r++) - { - bool is_prime; - uintmax_t e[2], y[2]; - - if (flag_prove_primality) - { - is_prime =3D true; - if (factors.plarge[1]) - { - uintmax_t pi; - binv (pi, factors.plarge[0]); - e[0] =3D pi * nm1[0]; - e[1] =3D 0; - y[0] =3D powm2 (&y[1], a_prim, e, na, ni, one); - is_prime =3D (y[0] !=3D one[0] || y[1] !=3D one[1]); - } - for (unsigned int i =3D 0; i < factors.nfactors && is_prime; i= ++) - { - /* FIXME: We always have the factor 2. Do we really need t= o - handle it here? We have done the same powering as part - of millerrabin. */ - if (factors.p[i] =3D=3D 2) - rsh2 (e[1], e[0], nm1[1], nm1[0], 1); - else - divexact_21 (e[1], e[0], nm1[1], nm1[0], factors.p[i]); - y[0] =3D powm2 (&y[1], a_prim, e, na, ni, one); - is_prime =3D (y[0] !=3D one[0] || y[1] !=3D one[1]); - } - } - else - { - /* After enough Miller-Rabin runs, be content. */ - is_prime =3D (r =3D=3D MR_REPS - 1); - } - - if (is_prime) - return true; - - a +=3D primes_diff[r]; /* Establish new base. */ - redcify2 (a_prim[1], a_prim[0], a, n1, n0); - - if (!millerrabin2 (na, ni, a_prim, q, k, one)) - return false; - } - - error (0, 0, _("Lucas prime test failure. This should not happen")); - abort (); -} - -static bool -mp_prime_p (mpz_t n) -{ - bool is_prime; - mpz_t q, a, nm1, tmp; - struct mp_factors factors; - - if (mpz_cmp_ui (n, 1) <=3D 0) - return false; - - /* We have already casted out small primes. */ - if (mpz_cmp_ui (n, (long) FIRST_OMITTED_PRIME * FIRST_OMITTED_PRIME) <= 0) - return true; - - mpz_inits (q, a, nm1, tmp, NULL); - - /* Precomputation for Miller-Rabin. */ - mpz_sub_ui (nm1, n, 1); - - /* Find q and k, where q is odd and n =3D 1 + 2**k * q. */ - unsigned long int k =3D mpz_scan1 (nm1, 0); - mpz_tdiv_q_2exp (q, nm1, k); - - mpz_set_ui (a, 2); - - /* Perform a Miller-Rabin test, finds most composites quickly. */ - if (!mp_millerrabin (n, nm1, a, tmp, q, k)) - { - is_prime =3D false; - goto ret2; - } - - if (flag_prove_primality) - { - /* Factor n-1 for Lucas. */ - mpz_set (tmp, nm1); - mp_factor (tmp, &factors); - } - - /* Loop until Lucas proves our number prime, or Miller-Rabin proves ou= r - number composite. */ - for (unsigned int r =3D 0; r < PRIMES_PTAB_ENTRIES; r++) - { - if (flag_prove_primality) - { - is_prime =3D true; - for (unsigned long int i =3D 0; i < factors.nfactors && is_pri= me; i++) - { - mpz_divexact (tmp, nm1, factors.p[i]); - mpz_powm (tmp, a, tmp, n); - is_prime =3D mpz_cmp_ui (tmp, 1) !=3D 0; - } - } - else - { - /* After enough Miller-Rabin runs, be content. */ - is_prime =3D (r =3D=3D MR_REPS - 1); - } - - if (is_prime) - goto ret1; - - mpz_add_ui (a, a, primes_diff[r]); /* Establish new base. = */ - - if (!mp_millerrabin (n, nm1, a, tmp, q, k)) - { - is_prime =3D false; - goto ret1; - } + mpz_add_ui (a, a, primes_diff[r]); /* Establish new base. = */ + + if (!mp_millerrabin (n, nm1, a, tmp, q, k)) + { + is_prime =3D false; + goto ret1; + } } =20 error (0, 0, _("Lucas prime test failure. This should not happen")); @@ -1441,213 +393,6 @@ mp_prime_p (mpz_t n) return is_prime; } =20 -static void -factor_using_pollard_rho (uintmax_t n, unsigned long int a, - struct factors *factors) -{ - uintmax_t x, z, y, P, t, ni, g; - - unsigned long int k =3D 1; - unsigned long int l =3D 1; - - redcify (P, 1, n); - addmod (x, P, P, n); /* i.e., redcify(2) */ - y =3D z =3D x; - - while (n !=3D 1) - { - assert (a < n); - - binv (ni, n); /* FIXME: when could we use old 'ni' val= ue? */ - - for (;;) - { - do - { - x =3D mulredc (x, x, n, ni); - addmod (x, x, a, n); - - submod (t, z, x, n); - P =3D mulredc (P, t, n, ni); - - if (k % 32 =3D=3D 1) - { - if (gcd_odd (P, n) !=3D 1) - goto factor_found; - y =3D x; - } - } - while (--k !=3D 0); - - z =3D x; - k =3D l; - l =3D 2 * l; - for (unsigned long int i =3D 0; i < k; i++) - { - x =3D mulredc (x, x, n, ni); - addmod (x, x, a, n); - } - y =3D x; - } - - factor_found: - do - { - y =3D mulredc (y, y, n, ni); - addmod (y, y, a, n); - - submod (t, z, y, n); - g =3D gcd_odd (t, n); - } - while (g =3D=3D 1); - - if (n =3D=3D g) - { - /* Found n itself as factor. Restart with different params. = */ - factor_using_pollard_rho (n, a + 1, factors); - return; - } - - n =3D n / g; - - if (!prime_p (g)) - factor_using_pollard_rho (g, a + 1, factors); - else - factor_insert (factors, g); - - if (prime_p (n)) - { - factor_insert (factors, n); - break; - } - - x =3D x % n; - z =3D z % n; - y =3D y % n; - } -} - -static void -factor_using_pollard_rho2 (uintmax_t n1, uintmax_t n0, unsigned long int= a, - struct factors *factors) -{ - uintmax_t x1, x0, z1, z0, y1, y0, P1, P0, t1, t0, ni, g1, g0, r1m; - - unsigned long int k =3D 1; - unsigned long int l =3D 1; - - redcify2 (P1, P0, 1, n1, n0); - addmod2 (x1, x0, P1, P0, P1, P0, n1, n0); /* i.e., redcify(2) */ - y1 =3D z1 =3D x1; - y0 =3D z0 =3D x0; - - while (n1 !=3D 0 || n0 !=3D 1) - { - binv (ni, n0); - - for (;;) - { - do - { - x0 =3D mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni); - x1 =3D r1m; - addmod2 (x1, x0, x1, x0, 0, (uintmax_t) a, n1, n0); - - submod2 (t1, t0, z1, z0, x1, x0, n1, n0); - P0 =3D mulredc2 (&r1m, P1, P0, t1, t0, n1, n0, ni); - P1 =3D r1m; - - if (k % 32 =3D=3D 1) - { - g0 =3D gcd2_odd (&g1, P1, P0, n1, n0); - if (g1 !=3D 0 || g0 !=3D 1) - goto factor_found; - y1 =3D x1; y0 =3D x0; - } - } - while (--k !=3D 0); - - z1 =3D x1; z0 =3D x0; - k =3D l; - l =3D 2 * l; - for (unsigned long int i =3D 0; i < k; i++) - { - x0 =3D mulredc2 (&r1m, x1, x0, x1, x0, n1, n0, ni); - x1 =3D r1m; - addmod2 (x1, x0, x1, x0, 0, (uintmax_t) a, n1, n0); - } - y1 =3D x1; y0 =3D x0; - } - - factor_found: - do - { - y0 =3D mulredc2 (&r1m, y1, y0, y1, y0, n1, n0, ni); - y1 =3D r1m; - addmod2 (y1, y0, y1, y0, 0, (uintmax_t) a, n1, n0); - - submod2 (t1, t0, z1, z0, y1, y0, n1, n0); - g0 =3D gcd2_odd (&g1, t1, t0, n1, n0); - } - while (g1 =3D=3D 0 && g0 =3D=3D 1); - - if (g1 =3D=3D 0) - { - /* The found factor is one word, and > 1. */ - divexact_21 (n1, n0, n1, n0, g0); /* n =3D n / g */ - - if (!prime_p (g0)) - factor_using_pollard_rho (g0, a + 1, factors); - else - factor_insert (factors, g0); - } - else - { - /* The found factor is two words. This is highly unlikely, th= us hard - to trigger. Please be careful before you change this code!= */ - uintmax_t ginv; - - if (n1 =3D=3D g1 && n0 =3D=3D g0) - { - /* Found n itself as factor. Restart with different param= s. */ - factor_using_pollard_rho2 (n1, n0, a + 1, factors); - return; - } - - binv (ginv, g0); /* Compute n =3D n / g. Since the resul= t will */ - n0 =3D ginv * n0; /* fit one word, we can compute the qu= otient */ - n1 =3D 0; /* modulo B, ignoring the high divisor= word. */ - - if (!prime2_p (g1, g0)) - factor_using_pollard_rho2 (g1, g0, a + 1, factors); - else - factor_insert_large (factors, g1, g0); - } - - if (n1 =3D=3D 0) - { - if (prime_p (n0)) - { - factor_insert (factors, n0); - break; - } - - factor_using_pollard_rho (n0, a, factors); - return; - } - - if (prime2_p (n1, n0)) - { - factor_insert_large (factors, n1, n0); - break; - } - - x0 =3D mod2 (&x1, x1, x0, n1, n0); - z0 =3D mod2 (&z1, z1, z0, n1, n0); - y0 =3D mod2 (&y1, y1, y0, n1, n0); - } -} - static void mp_factor_using_pollard_rho (mpz_t n, unsigned long int a, struct mp_factors *factors) @@ -1740,494 +485,6 @@ mp_factor_using_pollard_rho (mpz_t n, unsigned lon= g int a, mpz_clears (P, t2, t, z, x, y, NULL); } =20 -#if USE_SQUFOF -/* FIXME: Maybe better to use an iteration converging to 1/sqrt(n)? If - algorithm is replaced, consider also returning the remainder. */ -static uintmax_t _GL_ATTRIBUTE_CONST -isqrt (uintmax_t n) -{ - uintmax_t x; - unsigned c; - if (n =3D=3D 0) - return 0; - - count_leading_zeros (c, n); - - /* Make x > sqrt(n). This will be invariant through the loop. */ - x =3D (uintmax_t) 1 << ((W_TYPE_SIZE + 1 - c) / 2); - - for (;;) - { - uintmax_t y =3D (x + n/x) / 2; - if (y >=3D x) - return x; - - x =3D y; - } -} - -static uintmax_t _GL_ATTRIBUTE_CONST -isqrt2 (uintmax_t nh, uintmax_t nl) -{ - unsigned int shift; - uintmax_t x; - - /* Ensures the remainder fits in an uintmax_t. */ - assert (nh < ((uintmax_t) 1 << (W_TYPE_SIZE - 2))); - - if (nh =3D=3D 0) - return isqrt (nl); - - count_leading_zeros (shift, nh); - shift &=3D ~1; - - /* Make x > sqrt(n) */ - x =3D isqrt ( (nh << shift) + (nl >> (W_TYPE_SIZE - shift))) + 1; - x <<=3D (W_TYPE_SIZE - shift) / 2; - - /* Do we need more than one iteration? */ - for (;;) - { - uintmax_t r _GL_UNUSED; - uintmax_t q, y; - udiv_qrnnd (q, r, nh, nl, x); - y =3D (x + q) / 2; - - if (y >=3D x) - { - uintmax_t hi, lo; - umul_ppmm (hi, lo, x + 1, x + 1); - assert (gt2 (hi, lo, nh, nl)); - - umul_ppmm (hi, lo, x, x); - assert (ge2 (nh, nl, hi, lo)); - sub_ddmmss (hi, lo, nh, nl, hi, lo); - assert (hi =3D=3D 0); - - return x; - } - - x =3D y; - } -} - -/* MAGIC[N] has a bit i set iff i is a quadratic residue mod N. */ -# define MAGIC64 0x0202021202030213ULL -# define MAGIC63 0x0402483012450293ULL -# define MAGIC65 0x218a019866014613ULL -# define MAGIC11 0x23b - -/* Return the square root if the input is a square, otherwise 0. */ -static uintmax_t _GL_ATTRIBUTE_CONST -is_square (uintmax_t x) -{ - /* Uses the tests suggested by Cohen. Excludes 99% of the non-squares = before - computing the square root. */ - if (((MAGIC64 >> (x & 63)) & 1) - && ((MAGIC63 >> (x % 63)) & 1) - /* Both 0 and 64 are squares mod (65) */ - && ((MAGIC65 >> ((x % 65) & 63)) & 1) - && ((MAGIC11 >> (x % 11) & 1))) - { - uintmax_t r =3D isqrt (x); - if (r*r =3D=3D x) - return r; - } - return 0; -} - -/* invtab[i] =3D floor(0x10000 / (0x100 + i) */ -static const unsigned short invtab[0x81] =3D - { - 0x200, - 0x1fc, 0x1f8, 0x1f4, 0x1f0, 0x1ec, 0x1e9, 0x1e5, 0x1e1, - 0x1de, 0x1da, 0x1d7, 0x1d4, 0x1d0, 0x1cd, 0x1ca, 0x1c7, - 0x1c3, 0x1c0, 0x1bd, 0x1ba, 0x1b7, 0x1b4, 0x1b2, 0x1af, - 0x1ac, 0x1a9, 0x1a6, 0x1a4, 0x1a1, 0x19e, 0x19c, 0x199, - 0x197, 0x194, 0x192, 0x18f, 0x18d, 0x18a, 0x188, 0x186, - 0x183, 0x181, 0x17f, 0x17d, 0x17a, 0x178, 0x176, 0x174, - 0x172, 0x170, 0x16e, 0x16c, 0x16a, 0x168, 0x166, 0x164, - 0x162, 0x160, 0x15e, 0x15c, 0x15a, 0x158, 0x157, 0x155, - 0x153, 0x151, 0x150, 0x14e, 0x14c, 0x14a, 0x149, 0x147, - 0x146, 0x144, 0x142, 0x141, 0x13f, 0x13e, 0x13c, 0x13b, - 0x139, 0x138, 0x136, 0x135, 0x133, 0x132, 0x130, 0x12f, - 0x12e, 0x12c, 0x12b, 0x129, 0x128, 0x127, 0x125, 0x124, - 0x123, 0x121, 0x120, 0x11f, 0x11e, 0x11c, 0x11b, 0x11a, - 0x119, 0x118, 0x116, 0x115, 0x114, 0x113, 0x112, 0x111, - 0x10f, 0x10e, 0x10d, 0x10c, 0x10b, 0x10a, 0x109, 0x108, - 0x107, 0x106, 0x105, 0x104, 0x103, 0x102, 0x101, 0x100, - }; - -/* Compute q =3D [u/d], r =3D u mod d. Avoids slow hardware division fo= r the case - that q < 0x40; here it instead uses a table of (Euclidian) inverses. = */ -# define div_smallq(q, r, u, d) = \ - do { = \ - if ((u) / 0x40 < (d)) = \ - { = \ - int _cnt; = \ - uintmax_t _dinv, _mask, _q, _r; = \ - count_leading_zeros (_cnt, (d)); = \ - _r =3D (u); = \ - if (UNLIKELY (_cnt > (W_TYPE_SIZE - 8))) = \ - { = \ - _dinv =3D invtab[((d) << (_cnt + 8 - W_TYPE_SIZE)) - 0x80]; = \ - _q =3D _dinv * _r >> (8 + W_TYPE_SIZE - _cnt); = \ - } = \ - else = \ - { = \ - _dinv =3D invtab[((d) >> (W_TYPE_SIZE - 8 - _cnt)) - 0x7f]; = \ - _q =3D _dinv * (_r >> (W_TYPE_SIZE - 3 - _cnt)) >> 11; = \ - } = \ - _r -=3D _q*(d); = \ - = \ - _mask =3D -(uintmax_t) (_r >=3D (d)); = \ - (r) =3D _r - (_mask & (d)); = \ - (q) =3D _q - _mask; = \ - assert ( (q) * (d) + (r) =3D=3D u); = \ - } = \ - else = \ - { = \ - uintmax_t _q =3D (u) / (d); = \ - (r) =3D (u) - _q * (d); = \ - (q) =3D _q; = \ - } = \ - } while (0) - -/* Notes: Example N =3D 22117019. After first phase we find Q1 =3D 6314,= Q - =3D 3025, P =3D 1737, representing F_{18} =3D (-6314, 2* 1737, 3025), - with 3025 =3D 55^2. - - Constructing the square root, we get Q1 =3D 55, Q =3D 8653, P =3D 465= 2, - representing G_0 =3D (-55, 2*4652, 8653). - - In the notation of the paper: - - S_{-1} =3D 55, S_0 =3D 8653, R_0 =3D 4652 - - Put - - t_0 =3D floor([q_0 + R_0] / S0) =3D 1 - R_1 =3D t_0 * S_0 - R_0 =3D 4001 - S_1 =3D S_{-1} +t_0 (R_0 - R_1) =3D 706 -*/ - -/* Multipliers, in order of efficiency: - 0.7268 3*5*7*11 =3D 1155 =3D 3 (mod 4) - 0.7317 3*5*7 =3D 105 =3D 1 - 0.7820 3*5*11 =3D 165 =3D 1 - 0.7872 3*5 =3D 15 =3D 3 - 0.8101 3*7*11 =3D 231 =3D 3 - 0.8155 3*7 =3D 21 =3D 1 - 0.8284 5*7*11 =3D 385 =3D 1 - 0.8339 5*7 =3D 35 =3D 3 - 0.8716 3*11 =3D 33 =3D 1 - 0.8774 3 =3D 3 =3D 3 - 0.8913 5*11 =3D 55 =3D 3 - 0.8972 5 =3D 5 =3D 1 - 0.9233 7*11 =3D 77 =3D 1 - 0.9295 7 =3D 7 =3D 3 - 0.9934 11 =3D 11 =3D 3 -*/ -# define QUEUE_SIZE 50 -#endif - -#if STAT_SQUFOF -# define Q_FREQ_SIZE 50 -/* Element 0 keeps the total */ -static unsigned int q_freq[Q_FREQ_SIZE + 1]; -# define MIN(a,b) ((a) < (b) ? (a) : (b)) -#endif - -#if USE_SQUFOF -/* Return true on success. Expected to fail only for numbers - >=3D 2^{2*W_TYPE_SIZE - 2}, or close to that limit. */ -static bool -factor_using_squfof (uintmax_t n1, uintmax_t n0, struct factors *factors= ) -{ - /* Uses algorithm and notation from - - SQUARE FORM FACTORIZATION - JASON E. GOWER AND SAMUEL S. WAGSTAFF, JR. - - https://homes.cerias.purdue.edu/~ssw/squfof.pdf - */ - - static const unsigned int multipliers_1[] =3D - { /* =3D 1 (mod 4) */ - 105, 165, 21, 385, 33, 5, 77, 1, 0 - }; - static const unsigned int multipliers_3[] =3D - { /* =3D 3 (mod 4) */ - 1155, 15, 231, 35, 3, 55, 7, 11, 0 - }; - - const unsigned int *m; - - struct { uintmax_t Q; uintmax_t P; } queue[QUEUE_SIZE]; - - if (n1 >=3D ((uintmax_t) 1 << (W_TYPE_SIZE - 2))) - return false; - - uintmax_t sqrt_n =3D isqrt2 (n1, n0); - - if (n0 =3D=3D sqrt_n * sqrt_n) - { - uintmax_t p1, p0; - - umul_ppmm (p1, p0, sqrt_n, sqrt_n); - assert (p0 =3D=3D n0); - - if (n1 =3D=3D p1) - { - if (prime_p (sqrt_n)) - factor_insert_multiplicity (factors, sqrt_n, 2); - else - { - struct factors f; - - f.nfactors =3D 0; - if (!factor_using_squfof (0, sqrt_n, &f)) - { - /* Try pollard rho instead */ - factor_using_pollard_rho (sqrt_n, 1, &f); - } - /* Duplicate the new factors */ - for (unsigned int i =3D 0; i < f.nfactors; i++) - factor_insert_multiplicity (factors, f.p[i], 2*f.e[i]); - } - return true; - } - } - - /* Select multipliers so we always get n * mu =3D 3 (mod 4) */ - for (m =3D (n0 % 4 =3D=3D 1) ? multipliers_3 : multipliers_1; - *m; m++) - { - uintmax_t S, Dh, Dl, Q1, Q, P, L, L1, B; - unsigned int i; - unsigned int mu =3D *m; - unsigned int qpos =3D 0; - - assert (mu * n0 % 4 =3D=3D 3); - - /* In the notation of the paper, with mu * n =3D=3D 3 (mod 4), we - get \Delta =3D 4 mu * n, and the paper's \mu is 2 mu. As far as - I understand it, the necessary bound is 4 \mu^3 < n, or 32 - mu^3 < n. - - However, this seems insufficient: With n =3D 37243139 and mu =3D - 105, we get a trivial factor, from the square 38809 =3D 197^2, - without any corresponding Q earlier in the iteration. - - Requiring 64 mu^3 < n seems sufficient. */ - if (n1 =3D=3D 0) - { - if ((uintmax_t) mu*mu*mu >=3D n0 / 64) - continue; - } - else - { - if (n1 > ((uintmax_t) 1 << (W_TYPE_SIZE - 2)) / mu) - continue; - } - umul_ppmm (Dh, Dl, n0, mu); - Dh +=3D n1 * mu; - - assert (Dl % 4 !=3D 1); - assert (Dh < (uintmax_t) 1 << (W_TYPE_SIZE - 2)); - - S =3D isqrt2 (Dh, Dl); - - Q1 =3D 1; - P =3D S; - - /* Square root remainder fits in one word, so ignore high part. */ - Q =3D Dl - P*P; - /* FIXME: When can this differ from floor(sqrt(2 sqrt(D)))? */ - L =3D isqrt (2*S); - B =3D 2*L; - L1 =3D mu * 2 * L; - - /* The form is (+/- Q1, 2P, -/+ Q), of discriminant 4 (P^2 + Q Q1)= =3D - 4 D. */ - - for (i =3D 0; i <=3D B; i++) - { - uintmax_t q, P1, t, rem; - - div_smallq (q, rem, S+P, Q); - P1 =3D S - rem; /* P1 =3D q*Q - P */ - - IF_LINT (assert (q > 0 && Q > 0)); - -# if STAT_SQUFOF - q_freq[0]++; - q_freq[MIN (q, Q_FREQ_SIZE)]++; -# endif - - if (Q <=3D L1) - { - uintmax_t g =3D Q; - - if ( (Q & 1) =3D=3D 0) - g /=3D 2; - - g /=3D gcd_odd (g, mu); - - if (g <=3D L) - { - if (qpos >=3D QUEUE_SIZE) - die (EXIT_FAILURE, 0, _("squfof queue overflow")); - queue[qpos].Q =3D g; - queue[qpos].P =3D P % g; - qpos++; - } - } - - /* I think the difference can be either sign, but mod - 2^W_TYPE_SIZE arithmetic should be fine. */ - t =3D Q1 + q * (P - P1); - Q1 =3D Q; - Q =3D t; - P =3D P1; - - if ( (i & 1) =3D=3D 0) - { - uintmax_t r =3D is_square (Q); - if (r) - { - for (unsigned int j =3D 0; j < qpos; j++) - { - if (queue[j].Q =3D=3D r) - { - if (r =3D=3D 1) - /* Traversed entire cycle. */ - goto next_multiplier; - - /* Need the absolute value for divisibility te= st. */ - if (P >=3D queue[j].P) - t =3D P - queue[j].P; - else - t =3D queue[j].P - P; - if (t % r =3D=3D 0) - { - /* Delete entries up to and including entr= y - j, which matched. */ - memmove (queue, queue + j + 1, - (qpos - j - 1) * sizeof (queue[0]= )); - qpos -=3D (j + 1); - } - goto next_i; - } - } - - /* We have found a square form, which should give a - factor. */ - Q1 =3D r; - assert (S >=3D P); /* What signs are possible? */ - P +=3D r * ((S - P) / r); - - /* Note: Paper says (N - P*P) / Q1, that seems incorre= ct - for the case D =3D 2N. */ - /* Compute Q =3D (D - P*P) / Q1, but we need double - precision. */ - uintmax_t hi, lo; - umul_ppmm (hi, lo, P, P); - sub_ddmmss (hi, lo, Dh, Dl, hi, lo); - udiv_qrnnd (Q, rem, hi, lo, Q1); - assert (rem =3D=3D 0); - - for (;;) - { - /* Note: There appears to by a typo in the paper, - Step 4a in the algorithm description says q <-- - floor([S+P]/\hat Q), but looking at the equatio= ns - in Sec. 3.1, it should be q <-- floor([S+P] / Q= ). - (In this code, \hat Q is Q1). */ - div_smallq (q, rem, S+P, Q); - P1 =3D S - rem; /* P1 =3D q*Q - P */ - -# if STAT_SQUFOF - q_freq[0]++; - q_freq[MIN (q, Q_FREQ_SIZE)]++; -# endif - if (P =3D=3D P1) - break; - t =3D Q1 + q * (P - P1); - Q1 =3D Q; - Q =3D t; - P =3D P1; - } - - if ( (Q & 1) =3D=3D 0) - Q /=3D 2; - Q /=3D gcd_odd (Q, mu); - - assert (Q > 1 && (n1 || Q < n0)); - - if (prime_p (Q)) - factor_insert (factors, Q); - else if (!factor_using_squfof (0, Q, factors)) - factor_using_pollard_rho (Q, 2, factors); - - divexact_21 (n1, n0, n1, n0, Q); - - if (prime2_p (n1, n0)) - factor_insert_large (factors, n1, n0); - else - { - if (!factor_using_squfof (n1, n0, factors)) - { - if (n1 =3D=3D 0) - factor_using_pollard_rho (n0, 1, factors); - else - factor_using_pollard_rho2 (n1, n0, 1, factor= s); - } - } - - return true; - } - } - next_i:; - } - next_multiplier:; - } - return false; -} -#endif - -/* Compute the prime factors of the 128-bit number (T1,T0), and put the - results in FACTORS. */ -static void -factor (uintmax_t t1, uintmax_t t0, struct factors *factors) -{ - factors->nfactors =3D 0; - factors->plarge[1] =3D 0; - - if (t1 =3D=3D 0 && t0 < 2) - return; - - t0 =3D factor_using_division (&t1, t1, t0, factors); - - if (t1 =3D=3D 0 && t0 < 2) - return; - - if (prime2_p (t1, t0)) - factor_insert_large (factors, t1, t0); - else - { -#if USE_SQUFOF - if (factor_using_squfof (t1, t0, factors)) - return; -#endif - - if (t1 =3D=3D 0) - factor_using_pollard_rho (t0, 1, factors); - else - factor_using_pollard_rho2 (t1, t0, 1, factors); - } -} - /* Use Pollard-rho to compute the prime factors of arbitrary-precision T, and put the results in FACTORS. */ static void @@ -2250,206 +507,6 @@ mp_factor (mpz_t t, struct mp_factors *factors) } } =20 -static strtol_error -strto2uintmax (uintmax_t *hip, uintmax_t *lop, const char *s) -{ - unsigned int lo_carry; - uintmax_t hi =3D 0, lo =3D 0; - - strtol_error err =3D LONGINT_INVALID; - - /* Initial scan for invalid digits. */ - const char *p =3D s; - for (;;) - { - unsigned int c =3D *p++; - if (c =3D=3D 0) - break; - - if (UNLIKELY (!ISDIGIT (c))) - { - err =3D LONGINT_INVALID; - break; - } - - err =3D LONGINT_OK; /* we've seen at least one valid dig= it */ - } - - while (err =3D=3D LONGINT_OK) - { - unsigned int c =3D *s++; - if (c =3D=3D 0) - break; - - c -=3D '0'; - - if (UNLIKELY (hi > ~(uintmax_t)0 / 10)) - { - err =3D LONGINT_OVERFLOW; - break; - } - hi =3D 10 * hi; - - lo_carry =3D (lo >> (W_TYPE_SIZE - 3)) + (lo >> (W_TYPE_SIZE - 1))= ; - lo_carry +=3D 10 * lo < 2 * lo; - - lo =3D 10 * lo; - lo +=3D c; - - lo_carry +=3D lo < c; - hi +=3D lo_carry; - if (UNLIKELY (hi < lo_carry)) - { - err =3D LONGINT_OVERFLOW; - break; - } - } - - *hip =3D hi; - *lop =3D lo; - - return err; -} - -/* Structure and routines for buffering and outputting full lines, - to support parallel operation efficiently. */ -static struct lbuf_ -{ - char *buf; - char *end; -} lbuf; - -/* 512 is chosen to give good performance, - and also is the max guaranteed size that - consumers can read atomically through pipes. - Also it's big enough to cater for max line length - even with 128 bit uintmax_t. */ -#define FACTOR_PIPE_BUF 512 - -static void -lbuf_alloc (void) -{ - if (lbuf.buf) - return; - - /* Double to ensure enough space for - previous numbers + next number. */ - lbuf.buf =3D xmalloc (FACTOR_PIPE_BUF * 2); - lbuf.end =3D lbuf.buf; -} - -/* Write complete LBUF to standard output. */ -static void -lbuf_flush (void) -{ - size_t size =3D lbuf.end - lbuf.buf; - if (full_write (STDOUT_FILENO, lbuf.buf, size) !=3D size) - die (EXIT_FAILURE, errno, "%s", _("write error")); - lbuf.end =3D lbuf.buf; -} - -/* Add a character C to LBUF and if it's a newline - and enough bytes are already buffered, - then write atomically to standard output. */ -static void -lbuf_putc (char c) -{ - *lbuf.end++ =3D c; - - if (c =3D=3D '\n') - { - size_t buffered =3D lbuf.end - lbuf.buf; - - /* Provide immediate output for interactive use. */ - static int line_buffered =3D -1; - if (line_buffered =3D=3D -1) - line_buffered =3D isatty (STDIN_FILENO) || isatty (STDOUT_FILENO= ); - if (line_buffered) - lbuf_flush (); - else if (buffered >=3D FACTOR_PIPE_BUF) - { - /* Write output in <=3D PIPE_BUF chunks - so consumers can read atomically. */ - char const *tend =3D lbuf.end; - - /* Since a umaxint_t's factors must fit in 512 - we're guaranteed to find a newline here. */ - char *tlend =3D lbuf.buf + FACTOR_PIPE_BUF; - while (*--tlend !=3D '\n'); - tlend++; - - lbuf.end =3D tlend; - lbuf_flush (); - - /* Buffer the remainder. */ - memcpy (lbuf.buf, tlend, tend - tlend); - lbuf.end =3D lbuf.buf + (tend - tlend); - } - } -} - -/* Buffer an int to the internal LBUF. */ -static void -lbuf_putint (uintmax_t i, size_t min_width) -{ - char buf[INT_BUFSIZE_BOUND (uintmax_t)]; - char const *umaxstr =3D umaxtostr (i, buf); - size_t width =3D sizeof (buf) - (umaxstr - buf) - 1; - size_t z =3D width; - - for (; z < min_width; z++) - *lbuf.end++ =3D '0'; - - memcpy (lbuf.end, umaxstr, width); - lbuf.end +=3D width; -} - -static void -print_uintmaxes (uintmax_t t1, uintmax_t t0) -{ - uintmax_t q, r; - - if (t1 =3D=3D 0) - lbuf_putint (t0, 0); - else - { - /* Use very plain code here since it seems hard to write fast code - without assuming a specific word size. */ - q =3D t1 / 1000000000; - r =3D t1 % 1000000000; - udiv_qrnnd (t0, r, r, t0, 1000000000); - print_uintmaxes (q, t0); - lbuf_putint (r, 9); - } -} - -/* Single-precision factoring */ -static void -print_factors_single (uintmax_t t1, uintmax_t t0) -{ - struct factors factors; - - print_uintmaxes (t1, t0); - lbuf_putc (':'); - - factor (t1, t0, &factors); - - for (unsigned int j =3D 0; j < factors.nfactors; j++) - for (unsigned int k =3D 0; k < factors.e[j]; k++) - { - lbuf_putc (' '); - print_uintmaxes (0, factors.p[j]); - } - - if (factors.plarge[1]) - { - lbuf_putc (' '); - print_uintmaxes (factors.plarge[1], factors.plarge[0]); - } - - lbuf_putc ('\n'); -} - /* Emit the factors of the indicated number. If we have the option of u= sing either algorithm, we select on the basis of the length of the number. For longer numbers, we prefer the MP algorithm even if the native alg= orithm @@ -2464,40 +521,16 @@ print_factors (const char *input) str++; str +=3D *str =3D=3D '+'; =20 - uintmax_t t1, t0; - - /* Try converting the number to one or two words. If it fails, use GM= P or - print an error message. The 2nd condition checks that the most - significant bit of the two-word number is clear, in a typesize neut= ral - way. */ - strtol_error err =3D strto2uintmax (&t1, &t0, str); + devmsg ("[using arbitrary-precision arithmetic] "); + mpz_t t; + struct mp_factors factors; =20 - switch (err) + if (*str =3D=3D '-' || mpz_init_set_str (t, str, 10) !=3D 0) { - case LONGINT_OK: - if (((t1 << 1) >> 1) =3D=3D t1) - { - devmsg ("[using single-precision arithmetic] "); - print_factors_single (t1, t0); - return true; - } - break; - - case LONGINT_OVERFLOW: - /* Try GMP. */ - break; - - default: error (0, 0, _("%s is not a valid positive integer"), quote (input= )); return false; } =20 - devmsg ("[using arbitrary-precision arithmetic] "); - mpz_t t; - struct mp_factors factors; - - mpz_init_set_str (t, str, 10); - gmp_printf ("%Zd:", t); mp_factor (t, &factors); =20 @@ -2566,9 +599,7 @@ main (int argc, char **argv) bindtextdomain (PACKAGE, LOCALEDIR); textdomain (PACKAGE); =20 - lbuf_alloc (); atexit (close_stdout); - atexit (lbuf_flush); =20 int c; while ((c =3D getopt_long (argc, argv, "", long_options, NULL)) !=3D -= 1) @@ -2588,10 +619,6 @@ main (int argc, char **argv) } } =20 -#if STAT_SQUFOF - memset (q_freq, 0, sizeof (q_freq)); -#endif - bool ok; if (argc <=3D optind) ok =3D do_stdin (); @@ -2603,20 +630,5 @@ main (int argc, char **argv) ok =3D false; } =20 -#if STAT_SQUFOF - if (q_freq[0] > 0) - { - double acc_f; - printf ("q freq. cum. freq.(total: %d)\n", q_freq[0]); - for (unsigned int i =3D 1, acc_f =3D 0.0; i <=3D Q_FREQ_SIZE; i++) - { - double f =3D (double) q_freq[i] / q_freq[0]; - acc_f +=3D f; - printf ("%s%d %.2f%% %.2f%%\n", i =3D=3D Q_FREQ_SIZE ? ">=3D" = : "", i, - 100.0 * f, 100.0 * acc_f); - } - } -#endif - return ok ? EXIT_SUCCESS : EXIT_FAILURE; } diff --git a/src/local.mk b/src/local.mk index 72db9c704..b4cd8e5b3 100644 --- a/src/local.mk +++ b/src/local.mk @@ -52,7 +52,6 @@ noinst_HEADERS =3D \ src/fs-is-local.h \ src/group-list.h \ src/ioblksize.h \ - src/longlong.h \ src/ls.h \ src/operand2sig.h \ src/prog-fprintf.h \ diff --git a/src/longlong.h b/src/longlong.h deleted file mode 100644 index e57ba7821..000000000 --- a/src/longlong.h +++ /dev/null @@ -1,2267 +0,0 @@ -/* longlong.h -- definitions for mixed size 32/64 bit arithmetic. - -Copyright 1991-2020 Free Software Foundation, Inc. - -This file is free software; you can redistribute it and/or modify it und= er the -terms of the GNU Lesser General Public License as published by the Free -Software Foundation; either version 3 of the License, or (at your option= ) any -later version. - -This file is distributed in the hope that it will be useful, but WITHOUT= ANY -WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNES= S FOR A -PARTICULAR PURPOSE. See the GNU Lesser General Public License for more -details. - -You should have received a copy of the GNU Lesser General Public License -along with this file. If not, see https://www.gnu.org/licenses/. */ - -/* You have to define the following before including this file: - - UWtype -- An unsigned type, default type for operations (typically a = "word") - UHWtype -- An unsigned type, at least half the size of UWtype - UDWtype -- An unsigned type, at least twice as large a UWtype - W_TYPE_SIZE -- size in bits of UWtype - - SItype, USItype -- Signed and unsigned 32 bit types - DItype, UDItype -- Signed and unsigned 64 bit types - - On a 32 bit machine UWtype should typically be USItype; - on a 64 bit machine, UWtype should typically be UDItype. - - Optionally, define: - - LONGLONG_STANDALONE -- Avoid code that needs machine-dependent suppor= t files - NO_ASM -- Disable inline asm - - - CAUTION! Using this version of longlong.h outside of GMP is not safe= . You - need to include gmp.h and gmp-impl.h, or certain things might not wor= k as - expected. -*/ - -#define __BITS4 (W_TYPE_SIZE / 4) -#define __ll_B ((UWtype) 1 << (W_TYPE_SIZE / 2)) -#define __ll_lowpart(t) ((UWtype) (t) & (__ll_B - 1)) -#define __ll_highpart(t) ((UWtype) (t) >> (W_TYPE_SIZE / 2)) - -/* This is used to make sure no undesirable sharing between different li= braries - that use this file takes place. */ -#ifndef __MPN -#define __MPN(x) __##x -#endif - -/* Define auxiliary asm macros. - - 1) umul_ppmm(high_prod, low_prod, multiplier, multiplicand) multiplie= s two - UWtype integers MULTIPLIER and MULTIPLICAND, and generates a two UWty= pe - word product in HIGH_PROD and LOW_PROD. - - 2) __umulsidi3(a,b) multiplies two UWtype integers A and B, and retur= ns a - UDWtype product. This is just a variant of umul_ppmm. - - 3) udiv_qrnnd(quotient, remainder, high_numerator, low_numerator, - denominator) divides a UDWtype, composed by the UWtype integers - HIGH_NUMERATOR and LOW_NUMERATOR, by DENOMINATOR and places the quoti= ent - in QUOTIENT and the remainder in REMAINDER. HIGH_NUMERATOR must be l= ess - than DENOMINATOR for correct operation. If, in addition, the most - significant bit of DENOMINATOR must be 1, then the pre-processor symb= ol - UDIV_NEEDS_NORMALIZATION is defined to 1. - - 4) sdiv_qrnnd(quotient, remainder, high_numerator, low_numerator, - denominator). Like udiv_qrnnd but the numbers are signed. The quoti= ent - is rounded towards 0. - - 5) count_leading_zeros(count, x) counts the number of zero-bits from = the - msb to the first non-zero bit in the UWtype X. This is the number of - steps X needs to be shifted left to set the msb. Undefined for X =3D= =3D 0, - unless the symbol COUNT_LEADING_ZEROS_0 is defined to some value. - - 6) count_trailing_zeros(count, x) like count_leading_zeros, but count= s - from the least significant end. - - 7) add_ssaaaa(high_sum, low_sum, high_addend_1, low_addend_1, - high_addend_2, low_addend_2) adds two UWtype integers, composed by - HIGH_ADDEND_1 and LOW_ADDEND_1, and HIGH_ADDEND_2 and LOW_ADDEND_2 - respectively. The result is placed in HIGH_SUM and LOW_SUM. Overflo= w - (i.e. carry out) is not stored anywhere, and is lost. - - 8) sub_ddmmss(high_difference, low_difference, high_minuend, low_minu= end, - high_subtrahend, low_subtrahend) subtracts two two-word UWtype intege= rs, - composed by HIGH_MINUEND_1 and LOW_MINUEND_1, and HIGH_SUBTRAHEND_2 a= nd - LOW_SUBTRAHEND_2 respectively. The result is placed in HIGH_DIFFEREN= CE - and LOW_DIFFERENCE. Overflow (i.e. carry out) is not stored anywhere= , - and is lost. - - If any of these macros are left undefined for a particular CPU, - C macros are used. - - - Notes: - - For add_ssaaaa the two high and two low addends can both commute, but - unfortunately gcc only supports one "%" commutative in each asm block= . - This has always been so but is only documented in recent versions - (eg. pre-release 3.3). Having two or more "%"s can cause an internal - compiler error in certain rare circumstances. - - Apparently it was only the last "%" that was ever actually respected,= so - the code has been updated to leave just that. Clearly there's a free - choice whether high or low should get it, if there's a reason to favo= ur - one over the other. Also obviously when the constraints on the two - operands are identical there's no benefit to the reloader in any "%" = at - all. - - */ - -/* The CPUs come in alphabetical order below. - - Please add support for more CPUs here, or improve the current support - for the CPUs below! */ - - -/* count_leading_zeros_gcc_clz is count_leading_zeros implemented with g= cc - 3.4 __builtin_clzl or __builtin_clzll, according to our limb size. - Similarly count_trailing_zeros_gcc_ctz using __builtin_ctzl or - __builtin_ctzll. - - These builtins are only used when we check what code comes out, on so= me - chips they're merely libgcc calls, where we will instead want an inli= ne - in that case (either asm or generic C). - - These builtins are better than an asm block of the same insn, since a= n - asm block doesn't give gcc any information about scheduling or resour= ce - usage. We keep an asm block for use on prior versions of gcc though. - - For reference, __builtin_ffs existed in gcc prior to __builtin_clz, b= ut - it's not used (for count_leading_zeros) because it generally gives ex= tra - code to ensure the result is 0 when the input is 0, which we don't ne= ed - or want. */ - -#ifdef _LONG_LONG_LIMB -#define count_leading_zeros_gcc_clz(count,x) \ - do { \ - ASSERT ((x) !=3D 0); \ - (count) =3D __builtin_clzll (x); \ - } while (0) -#else -#define count_leading_zeros_gcc_clz(count,x) \ - do { \ - ASSERT ((x) !=3D 0); \ - (count) =3D __builtin_clzl (x); \ - } while (0) -#endif - -#ifdef _LONG_LONG_LIMB -#define count_trailing_zeros_gcc_ctz(count,x) \ - do { \ - ASSERT ((x) !=3D 0); \ - (count) =3D __builtin_ctzll (x); \ - } while (0) -#else -#define count_trailing_zeros_gcc_ctz(count,x) \ - do { \ - ASSERT ((x) !=3D 0); \ - (count) =3D __builtin_ctzl (x); \ - } while (0) -#endif - - -/* FIXME: The macros using external routines like __MPN(count_leading_ze= ros) - don't need to be under !NO_ASM */ -#if ! defined (NO_ASM) - -#if defined (__alpha) && W_TYPE_SIZE =3D=3D 64 -/* Most alpha-based machines, except Cray systems. */ -#if defined (__GNUC__) -#if __GMP_GNUC_PREREQ (3,3) -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDItype __m0 =3D (m0), __m1 =3D (m1); \ - (ph) =3D __builtin_alpha_umulh (__m0, __m1); \ - (pl) =3D __m0 * __m1; \ - } while (0) -#else -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("umulh %r1,%2,%0" \ - : "=3Dr" (ph) \ - : "%rJ" (__m0), "rI" (__m1)); \ - (pl) =3D __m0 * __m1; \ - } while (0) -#endif -#else /* ! __GNUC__ */ -#include -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDItype __m0 =3D (m0), __m1 =3D (m1); \ - (ph) =3D __UMULH (__m0, __m1); \ - (pl) =3D __m0 * __m1; \ - } while (0) -#endif -#ifndef LONGLONG_STANDALONE -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { UWtype __di; \ - __di =3D __MPN(invert_limb) (d); \ - udiv_qrnnd_preinv (q, r, n1, n0, d, __di); \ - } while (0) -#define UDIV_PREINV_ALWAYS 1 -#define UDIV_NEEDS_NORMALIZATION 1 -#endif /* LONGLONG_STANDALONE */ - -/* clz_tab is required in all configurations, since mpn/alpha/cntlz.asm - always goes into libgmp.so, even when not actually used. */ -#define COUNT_LEADING_ZEROS_NEED_CLZ_TAB - -#if defined (__GNUC__) && HAVE_HOST_CPU_alpha_CIX -#define count_leading_zeros(COUNT,X) \ - __asm__("ctlz %1,%0" : "=3Dr"(COUNT) : "r"(X)) -#define count_trailing_zeros(COUNT,X) \ - __asm__("cttz %1,%0" : "=3Dr"(COUNT) : "r"(X)) -#endif /* clz/ctz using cix */ - -#if ! defined (count_leading_zeros) \ - && defined (__GNUC__) && ! defined (LONGLONG_STANDALONE) -/* ALPHA_CMPBGE_0 gives "cmpbge $31,src,dst", ie. test src bytes =3D=3D = 0. - "$31" is written explicitly in the asm, since an "r" constraint won't - select reg 31. There seems no need to worry about "r31" syntax for c= ray, - since gcc itself (pre-release 3.4) emits just $31 in various places. = */ -#define ALPHA_CMPBGE_0(dst, src) \ - do { asm ("cmpbge $31, %1, %0" : "=3Dr" (dst) : "r" (src)); } while (0= ) -/* Zero bytes are turned into bits with cmpbge, a __clz_tab lookup count= s - them, locating the highest non-zero byte. A second __clz_tab lookup - counts the leading zero bits in that byte, giving the result. */ -#define count_leading_zeros(count, x) \ - do { \ - UWtype __clz__b, __clz__c, __clz__x =3D (x); \ - ALPHA_CMPBGE_0 (__clz__b, __clz__x); /* zero bytes */ \ - __clz__b =3D __clz_tab [(__clz__b >> 1) ^ 0x7F]; /* 8 to 1 byte */ = \ - __clz__b =3D __clz__b * 8 - 7; /* 57 to 1 shift */ \ - __clz__x >>=3D __clz__b; \ - __clz__c =3D __clz_tab [__clz__x]; /* 8 to 1 bit */ \ - __clz__b =3D 65 - __clz__b; \ - (count) =3D __clz__b - __clz__c; \ - } while (0) -#define COUNT_LEADING_ZEROS_NEED_CLZ_TAB -#endif /* clz using cmpbge */ - -#if ! defined (count_leading_zeros) && ! defined (LONGLONG_STANDALONE) -#if HAVE_ATTRIBUTE_CONST -long __MPN(count_leading_zeros) (UDItype) __attribute__ ((const)); -#else -long __MPN(count_leading_zeros) (UDItype); -#endif -#define count_leading_zeros(count, x) \ - ((count) =3D __MPN(count_leading_zeros) (x)) -#endif /* clz using mpn */ -#endif /* __alpha */ - -#if defined (__AVR) && W_TYPE_SIZE =3D=3D 8 -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - unsigned short __p =3D (unsigned short) (m0) * (m1); \ - (ph) =3D __p >> 8; \ - (pl) =3D __p; \ - } while (0) -#endif /* AVR */ - -#if defined (_CRAY) && W_TYPE_SIZE =3D=3D 64 -#include -#define UDIV_PREINV_ALWAYS 1 -#define UDIV_NEEDS_NORMALIZATION 1 -long __MPN(count_leading_zeros) (UDItype); -#define count_leading_zeros(count, x) \ - ((count) =3D _leadz ((UWtype) (x))) -#if defined (_CRAYIEEE) /* I.e., Cray T90/ieee, T3D, and T3E */ -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDItype __m0 =3D (m0), __m1 =3D (m1); \ - (ph) =3D _int_mult_upper (__m0, __m1); \ - (pl) =3D __m0 * __m1; \ - } while (0) -#ifndef LONGLONG_STANDALONE -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { UWtype __di; \ - __di =3D __MPN(invert_limb) (d); \ - udiv_qrnnd_preinv (q, r, n1, n0, d, __di); \ - } while (0) -#endif /* LONGLONG_STANDALONE */ -#endif /* _CRAYIEEE */ -#endif /* _CRAY */ - -#if defined (__ia64) && W_TYPE_SIZE =3D=3D 64 -/* This form encourages gcc (pre-release 3.4 at least) to emit predicate= d - "sub r=3Dr,r" and "sub r=3Dr,r,1", giving a 2 cycle latency. The gen= eric - code using "al>=3D _c; \ - if (_x >=3D 1 << 4) \ - _x >>=3D 4, _c +=3D 4; \ - if (_x >=3D 1 << 2) \ - _x >>=3D 2, _c +=3D 2; \ - _c +=3D _x >> 1; \ - (count) =3D W_TYPE_SIZE - 1 - _c; \ - } while (0) -/* similar to what gcc does for __builtin_ffs, but 0 based rather than 1 - based, and we don't need a special case for x=3D=3D0 here */ -#define count_trailing_zeros(count, x) \ - do { \ - UWtype __ctz_x =3D (x); \ - __asm__ ("popcnt %0 =3D %1" \ - : "=3Dr" (count) \ - : "r" ((__ctz_x-1) & ~__ctz_x)); \ - } while (0) -#endif -#if defined (__INTEL_COMPILER) -#include -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UWtype __m0 =3D (m0), __m1 =3D (m1); \ - ph =3D _m64_xmahu (__m0, __m1, 0); \ - pl =3D __m0 * __m1; \ - } while (0) -#endif -#ifndef LONGLONG_STANDALONE -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { UWtype __di; \ - __di =3D __MPN(invert_limb) (d); \ - udiv_qrnnd_preinv (q, r, n1, n0, d, __di); \ - } while (0) -#define UDIV_PREINV_ALWAYS 1 -#define UDIV_NEEDS_NORMALIZATION 1 -#endif -#endif - - -#if defined (__GNUC__) - -/* We sometimes need to clobber "cc" with gcc2, but that would not be - understood by gcc1. Use cpp to avoid major code duplication. */ -#if __GNUC__ < 2 -#define __CLOBBER_CC -#define __AND_CLOBBER_CC -#else /* __GNUC__ >=3D 2 */ -#define __CLOBBER_CC : "cc" -#define __AND_CLOBBER_CC , "cc" -#endif /* __GNUC__ < 2 */ - -#if (defined (__a29k__) || defined (_AM29K)) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("add %1,%4,%5\n\taddc %0,%2,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "%r" (al), "rI" (bl)) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("sub %1,%4,%5\n\tsubc %0,%2,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "r" (al), "rI" (bl)) -#define umul_ppmm(xh, xl, m0, m1) \ - do { \ - USItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("multiplu %0,%1,%2" \ - : "=3Dr" (xl) \ - : "r" (__m0), "r" (__m1)); \ - __asm__ ("multmu %0,%1,%2" \ - : "=3Dr" (xh) \ - : "r" (__m0), "r" (__m1)); \ - } while (0) -#define udiv_qrnnd(q, r, n1, n0, d) \ - __asm__ ("dividu %0,%3,%4" \ - : "=3Dr" (q), "=3Dq" (r) \ - : "1" (n1), "r" (n0), "r" (d)) -#define count_leading_zeros(count, x) \ - __asm__ ("clz %0,%1" \ - : "=3Dr" (count) \ - : "r" (x)) -#define COUNT_LEADING_ZEROS_0 32 -#endif /* __a29k__ */ - -#if defined (__arc__) -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("add.f\t%1, %4, %5\n\tadc\t%0, %2, %3" \ - : "=3Dr" (sh), \ - "=3D&r" (sl) \ - : "r" ((USItype) (ah)), \ - "rICal" ((USItype) (bh)), \ - "%r" ((USItype) (al)), \ - "rICal" ((USItype) (bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("sub.f\t%1, %4, %5\n\tsbc\t%0, %2, %3" \ - : "=3Dr" (sh), \ - "=3D&r" (sl) \ - : "r" ((USItype) (ah)), \ - "rICal" ((USItype) (bh)), \ - "r" ((USItype) (al)), \ - "rICal" ((USItype) (bl))) -#endif - -#if defined (__arm__) && (defined (__thumb2__) || !defined (__thumb__)) = \ - && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (bl) && -(USItype)(bl) < 0x100) \ - __asm__ ("subs\t%1, %4, %5\n\tadc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), \ - "%r" (al), "rI" (-(USItype)(bl)) __CLOBBER_CC); \ - else \ - __asm__ ("adds\t%1, %4, %5\n\tadc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "%r" (al), "rI" (bl) __CLOBBER_CC); \ - } while (0) -/* FIXME: Extend the immediate range for the low word by using both ADDS= and - SUBS, since they set carry in the same way. Note: We need separate - definitions for thumb and non-thumb due to the absence of RSC on thum= b. */ -#if defined (__thumb__) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (ah) && __builtin_constant_p (bh) \ - && (ah) =3D=3D (bh)) \ - __asm__ ("subs\t%1, %2, %3\n\tsbc\t%0, %0, %0" \ - : "=3Dr" (sh), "=3Dr" (sl) \ - : "r" (al), "rI" (bl) __CLOBBER_CC); \ - else if (__builtin_constant_p (al)) \ - __asm__ ("rsbs\t%1, %5, %4\n\tsbc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "rI" (al), "r" (bl) __CLOBBER_CC); \ - else if (__builtin_constant_p (bl)) \ - __asm__ ("subs\t%1, %4, %5\n\tsbc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "r" (al), "rI" (bl) __CLOBBER_CC); \ - else \ - __asm__ ("subs\t%1, %4, %5\n\tsbc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "r" (al), "rI" (bl) __CLOBBER_CC); \ - } while (0) -#else -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (ah) && __builtin_constant_p (bh) \ - && (ah) =3D=3D (bh)) \ - __asm__ ("subs\t%1, %2, %3\n\tsbc\t%0, %0, %0" \ - : "=3Dr" (sh), "=3Dr" (sl) \ - : "r" (al), "rI" (bl) __CLOBBER_CC); \ - else if (__builtin_constant_p (al)) \ - { \ - if (__builtin_constant_p (ah)) \ - __asm__ ("rsbs\t%1, %5, %4\n\trsc\t%0, %3, %2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rI" (ah), "r" (bh), "rI" (al), "r" (bl) __CLOBBER_CC); \ - else \ - __asm__ ("rsbs\t%1, %5, %4\n\tsbc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "rI" (al), "r" (bl) __CLOBBER_CC); \ - } \ - else if (__builtin_constant_p (ah)) \ - { \ - if (__builtin_constant_p (bl)) \ - __asm__ ("subs\t%1, %4, %5\n\trsc\t%0, %3, %2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rI" (ah), "r" (bh), "r" (al), "rI" (bl) __CLOBBER_CC); \ - else \ - __asm__ ("rsbs\t%1, %5, %4\n\trsc\t%0, %3, %2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rI" (ah), "r" (bh), "rI" (al), "r" (bl) __CLOBBER_CC); \ - } \ - else if (__builtin_constant_p (bl)) \ - __asm__ ("subs\t%1, %4, %5\n\tsbc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "r" (al), "rI" (bl) __CLOBBER_CC); \ - else /* only bh might be a constant */ \ - __asm__ ("subs\t%1, %4, %5\n\tsbc\t%0, %2, %3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "rI" (bh), "r" (al), "rI" (bl) __CLOBBER_CC); \ - } while (0) -#endif -#if defined (__ARM_ARCH_2__) || defined (__ARM_ARCH_2A__) \ - || defined (__ARM_ARCH_3__) -#define umul_ppmm(xh, xl, a, b) \ - do { \ - register USItype __t0, __t1, __t2; \ - __asm__ ("%@ Inlined umul_ppmm\n" \ - " mov %2, %5, lsr #16\n" \ - " mov %0, %6, lsr #16\n" \ - " bic %3, %5, %2, lsl #16\n" \ - " bic %4, %6, %0, lsl #16\n" \ - " mul %1, %3, %4\n" \ - " mul %4, %2, %4\n" \ - " mul %3, %0, %3\n" \ - " mul %0, %2, %0\n" \ - " adds %3, %4, %3\n" \ - " addcs %0, %0, #65536\n" \ - " adds %1, %1, %3, lsl #16\n" \ - " adc %0, %0, %3, lsr #16" \ - : "=3D&r" ((USItype) (xh)), "=3Dr" ((USItype) (xl)), \ - "=3D&r" (__t0), "=3D&r" (__t1), "=3Dr" (__t2) \ - : "r" ((USItype) (a)), "r" ((USItype) (b)) __CLOBBER_CC); \ - } while (0) -#ifndef LONGLONG_STANDALONE -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { UWtype __r; \ - (q) =3D __MPN(udiv_qrnnd) (&__r, (n1), (n0), (d)); \ - (r) =3D __r; \ - } while (0) -extern UWtype __MPN(udiv_qrnnd) (UWtype *, UWtype, UWtype, UWtype); -#endif /* LONGLONG_STANDALONE */ -#else /* ARMv4 or newer */ -#define umul_ppmm(xh, xl, a, b) \ - __asm__ ("umull %0,%1,%2,%3" : "=3D&r" (xl), "=3D&r" (xh) : "r" (a), "= r" (b)) -#define smul_ppmm(xh, xl, a, b) \ - __asm__ ("smull %0,%1,%2,%3" : "=3D&r" (xl), "=3D&r" (xh) : "r" (a), "= r" (b)) -#ifndef LONGLONG_STANDALONE -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { UWtype __di; \ - __di =3D __MPN(invert_limb) (d); \ - udiv_qrnnd_preinv (q, r, n1, n0, d, __di); \ - } while (0) -#define UDIV_PREINV_ALWAYS 1 -#define UDIV_NEEDS_NORMALIZATION 1 -#endif /* LONGLONG_STANDALONE */ -#endif /* defined(__ARM_ARCH_2__) ... */ -#define count_leading_zeros(count, x) count_leading_zeros_gcc_clz(count= , x) -#define count_trailing_zeros(count, x) count_trailing_zeros_gcc_ctz(cou= nt, x) -#endif /* __arm__ */ - -#if defined (__aarch64__) && W_TYPE_SIZE =3D=3D 64 -/* FIXME: Extend the immediate range for the low word by using both - ADDS and SUBS, since they set carry in the same way. */ -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (bl) && -(UDItype)(bl) < 0x1000) \ - __asm__ ("subs\t%1, %x4, %5\n\tadc\t%0, %x2, %x3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rZ" ((UDItype)(ah)), "rZ" ((UDItype)(bh)), \ - "%r" ((UDItype)(al)), "rI" (-(UDItype)(bl)) __CLOBBER_CC);\ - else \ - __asm__ ("adds\t%1, %x4, %5\n\tadc\t%0, %x2, %x3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rZ" ((UDItype)(ah)), "rZ" ((UDItype)(bh)), \ - "%r" ((UDItype)(al)), "rI" ((UDItype)(bl)) __CLOBBER_CC);\ - } while (0) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (bl) && -(UDItype)(bl) < 0x1000) \ - __asm__ ("adds\t%1, %x4, %5\n\tsbc\t%0, %x2, %x3" \ - : "=3Dr,r" (sh), "=3D&r,&r" (sl) \ - : "rZ,rZ" ((UDItype)(ah)), "rZ,rZ" ((UDItype)(bh)), \ - "r,Z" ((UDItype)(al)), "rI,r" (-(UDItype)(bl)) __CLOBBER_CC);\ - else \ - __asm__ ("subs\t%1, %x4, %5\n\tsbc\t%0, %x2, %x3" \ - : "=3Dr,r" (sh), "=3D&r,&r" (sl) \ - : "rZ,rZ" ((UDItype)(ah)), "rZ,rZ" ((UDItype)(bh)), \ - "r,Z" ((UDItype)(al)), "rI,r" ((UDItype)(bl)) __CLOBBER_CC);\ - } while(0); -#if __GMP_GNUC_PREREQ (4,9) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - typedef unsigned int __ll_UTItype __attribute__((mode(TI))); \ - __ll_UTItype __ll =3D (__ll_UTItype)(u) * (v); \ - w1 =3D __ll >> 64; \ - w0 =3D __ll; \ - } while (0) -#endif -#if !defined (umul_ppmm) -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("umulh\t%0, %1, %2" : "=3Dr" (ph) : "r" (__m0), "r" (__m1))= ; \ - (pl) =3D __m0 * __m1; \ - } while (0) -#endif -#define count_leading_zeros(count, x) count_leading_zeros_gcc_clz(count= , x) -#define count_trailing_zeros(count, x) count_trailing_zeros_gcc_ctz(cou= nt, x) -#endif /* __aarch64__ */ - -#if defined (__clipper__) && W_TYPE_SIZE =3D=3D 32 -#define umul_ppmm(w1, w0, u, v) \ - ({union {UDItype __ll; \ - struct {USItype __l, __h;} __i; \ - } __x; \ - __asm__ ("mulwux %2,%0" \ - : "=3Dr" (__x.__ll) \ - : "%0" ((USItype)(u)), "r" ((USItype)(v))); \ - (w1) =3D __x.__i.__h; (w0) =3D __x.__i.__l;}) -#define smul_ppmm(w1, w0, u, v) \ - ({union {DItype __ll; \ - struct {SItype __l, __h;} __i; \ - } __x; \ - __asm__ ("mulwx %2,%0" \ - : "=3Dr" (__x.__ll) \ - : "%0" ((SItype)(u)), "r" ((SItype)(v))); \ - (w1) =3D __x.__i.__h; (w0) =3D __x.__i.__l;}) -#define __umulsidi3(u, v) \ - ({UDItype __w; \ - __asm__ ("mulwux %2,%0" \ - : "=3Dr" (__w) : "%0" ((USItype)(u)), "r" ((USItype)(v))); \ - __w; }) -#endif /* __clipper__ */ - -/* Fujitsu vector computers. */ -#if defined (__uxp__) && W_TYPE_SIZE =3D=3D 32 -#define umul_ppmm(ph, pl, u, v) \ - do { \ - union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __asm__ ("mult.lu %1,%2,%0" : "=3Dr" (__x.__ll) : "%r" (u), "rK" (v)= );\ - (ph) =3D __x.__i.__h; \ - (pl) =3D __x.__i.__l; \ - } while (0) -#define smul_ppmm(ph, pl, u, v) \ - do { \ - union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __asm__ ("mult.l %1,%2,%0" : "=3Dr" (__x.__ll) : "%r" (u), "rK" (v))= ; \ - (ph) =3D __x.__i.__h; \ - (pl) =3D __x.__i.__l; \ - } while (0) -#endif - -#if defined (__gmicro__) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("add.w %5,%1\n\taddx %3,%0" \ - : "=3Dg" (sh), "=3D&g" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "%1" ((USItype)(al)), "g" ((USItype)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("sub.w %5,%1\n\tsubx %3,%0" \ - : "=3Dg" (sh), "=3D&g" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "1" ((USItype)(al)), "g" ((USItype)(bl))) -#define umul_ppmm(ph, pl, m0, m1) \ - __asm__ ("mulx %3,%0,%1" \ - : "=3Dg" (ph), "=3Dr" (pl) \ - : "%0" ((USItype)(m0)), "g" ((USItype)(m1))) -#define udiv_qrnnd(q, r, nh, nl, d) \ - __asm__ ("divx %4,%0,%1" \ - : "=3Dg" (q), "=3Dr" (r) \ - : "1" ((USItype)(nh)), "0" ((USItype)(nl)), "g" ((USItype)(d))) -#define count_leading_zeros(count, x) \ - __asm__ ("bsch/1 %1,%0" \ - : "=3Dg" (count) : "g" ((USItype)(x)), "0" ((USItype)0)) -#endif - -#if defined (__hppa) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("add%I5 %5,%r4,%1\n\taddc %r2,%r3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rM" (ah), "rM" (bh), "%rM" (al), "rI" (bl)) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("sub%I4 %4,%r5,%1\n\tsubb %r2,%r3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rM" (ah), "rM" (bh), "rI" (al), "rM" (bl)) -#if defined (_PA_RISC1_1) -#define umul_ppmm(wh, wl, u, v) \ - do { \ - union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __asm__ ("xmpyu %1,%2,%0" : "=3D*f" (__x.__ll) : "*f" (u), "*f" (v))= ; \ - (wh) =3D __x.__i.__h; \ - (wl) =3D __x.__i.__l; \ - } while (0) -#endif -#define count_leading_zeros(count, x) \ - do { \ - USItype __tmp; \ - __asm__ ( \ - "ldi 1,%0\n" \ -" extru,=3D %1,15,16,%%r0 ; Bits 31..16 zero?\n" \ -" extru,tr %1,15,16,%1 ; No. Shift down, skip add.\n" \ -" ldo 16(%0),%0 ; Yes. Perform add.\n" \ -" extru,=3D %1,23,8,%%r0 ; Bits 15..8 zero?\n" \ -" extru,tr %1,23,8,%1 ; No. Shift down, skip add.\n" \ -" ldo 8(%0),%0 ; Yes. Perform add.\n" \ -" extru,=3D %1,27,4,%%r0 ; Bits 7..4 zero?\n" \ -" extru,tr %1,27,4,%1 ; No. Shift down, skip add.\n" \ -" ldo 4(%0),%0 ; Yes. Perform add.\n" \ -" extru,=3D %1,29,2,%%r0 ; Bits 3..2 zero?\n" \ -" extru,tr %1,29,2,%1 ; No. Shift down, skip add.\n" \ -" ldo 2(%0),%0 ; Yes. Perform add.\n" \ -" extru %1,30,1,%1 ; Extract bit 1.\n" \ -" sub %0,%1,%0 ; Subtract it.\n" \ - : "=3Dr" (count), "=3Dr" (__tmp) : "1" (x)); \ - } while (0) -#endif /* hppa */ - -/* These macros are for ABI=3D2.0w. In ABI=3D2.0n they can't be used, s= ince GCC - (3.2) puts longlong into two adjacent 32-bit registers. Presumably t= his - is just a case of no direct support for 2.0n but treating it like 1.0= . */ -#if defined (__hppa) && W_TYPE_SIZE =3D=3D 64 && ! defined (_LONG_LONG_L= IMB) -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("add%I5 %5,%r4,%1\n\tadd,dc %r2,%r3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rM" (ah), "rM" (bh), "%rM" (al), "rI" (bl)) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("sub%I4 %4,%r5,%1\n\tsub,db %r2,%r3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rM" (ah), "rM" (bh), "rI" (al), "rM" (bl)) -#endif /* hppa */ - -#if (defined (__i370__) || defined (__s390__) || defined (__mvs__)) && W= _TYPE_SIZE =3D=3D 32 -#if defined (__zarch__) || defined (HAVE_HOST_CPU_s390_zarch) -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ -/* if (__builtin_constant_p (bl)) \ - __asm__ ("alfi\t%1,%o5\n\talcr\t%0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" (ah), "r" (bh), "%1" (al), "n" (bl) __CLOBBER_CC);\ - else \ -*/ __asm__ ("alr\t%1,%5\n\talcr\t%0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" (ah), "r" (bh), "%1" (al), "r" (bl)__CLOBBER_CC); \ - } while (0) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ -/* if (__builtin_constant_p (bl)) \ - __asm__ ("slfi\t%1,%o5\n\tslbr\t%0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" (ah), "r" (bh), "1" (al), "n" (bl) __CLOBBER_CC); \ - else \ -*/ __asm__ ("slr\t%1,%5\n\tslbr\t%0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" (ah), "r" (bh), "1" (al), "r" (bl) __CLOBBER_CC); \ - } while (0) -#if __GMP_GNUC_PREREQ (4,5) -#define umul_ppmm(xh, xl, m0, m1) \ - do { \ - union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __x.__ll =3D (UDItype) (m0) * (UDItype) (m1); \ - (xh) =3D __x.__i.__h; (xl) =3D __x.__i.__l; \ - } while (0) -#else -#if 0 -/* FIXME: this fails if gcc knows about the 64-bit registers. Use only - with a new enough processor pretending we have 32-bit registers. */ -#define umul_ppmm(xh, xl, m0, m1) \ - do { \ - union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __asm__ ("mlr\t%0,%2" \ - : "=3Dr" (__x.__ll) \ - : "%0" (m0), "r" (m1)); \ - (xh) =3D __x.__i.__h; (xl) =3D __x.__i.__l; \ - } while (0) -#else -#define umul_ppmm(xh, xl, m0, m1) \ - do { \ - /* When we have 64-bit regs and gcc is aware of that, we cannot simply= use - DImode for the product, since that would be allocated to a single 6= 4-bit - register, whereas mlr uses the low 32-bits of an even-odd register = pair. - */ \ - register USItype __r0 __asm__ ("0"); \ - register USItype __r1 __asm__ ("1") =3D (m0); \ - __asm__ ("mlr\t%0,%3" \ - : "=3Dr" (__r0), "=3Dr" (__r1) \ - : "r" (__r1), "r" (m1)); \ - (xh) =3D __r0; (xl) =3D __r1; \ - } while (0) -#endif /* if 0 */ -#endif -#if 0 -/* FIXME: this fails if gcc knows about the 64-bit registers. Use only - with a new enough processor pretending we have 32-bit registers. */ -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __x.__i.__h =3D n1; __x.__i.__l =3D n0; \ - __asm__ ("dlr\t%0,%2" \ - : "=3Dr" (__x.__ll) \ - : "0" (__x.__ll), "r" (d)); \ - (q) =3D __x.__i.__l; (r) =3D __x.__i.__h; \ - } while (0) -#else -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - register USItype __r0 __asm__ ("0") =3D (n1); \ - register USItype __r1 __asm__ ("1") =3D (n0); \ - __asm__ ("dlr\t%0,%4" \ - : "=3Dr" (__r0), "=3Dr" (__r1) \ - : "r" (__r0), "r" (__r1), "r" (d)); \ - (q) =3D __r1; (r) =3D __r0; \ - } while (0) -#endif /* if 0 */ -#else /* if __zarch__ */ -/* FIXME: this fails if gcc knows about the 64-bit registers. */ -#define smul_ppmm(xh, xl, m0, m1) \ - do { \ - union {DItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __asm__ ("mr\t%0,%2" \ - : "=3Dr" (__x.__ll) \ - : "%0" (m0), "r" (m1)); \ - (xh) =3D __x.__i.__h; (xl) =3D __x.__i.__l; \ - } while (0) -/* FIXME: this fails if gcc knows about the 64-bit registers. */ -#define sdiv_qrnnd(q, r, n1, n0, d) \ - do { \ - union {DItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __x.__i.__h =3D n1; __x.__i.__l =3D n0; \ - __asm__ ("dr\t%0,%2" \ - : "=3Dr" (__x.__ll) \ - : "0" (__x.__ll), "r" (d)); \ - (q) =3D __x.__i.__l; (r) =3D __x.__i.__h; \ - } while (0) -#endif /* if __zarch__ */ -#endif - -#if defined (__s390x__) && W_TYPE_SIZE =3D=3D 64 -/* We need to cast operands with register constraints, otherwise their t= ypes - will be assumed to be SImode by gcc. For these machines, such operat= ions - will insert a value into the low 32 bits, and leave the high 32 bits = with - garbage. */ -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ - __asm__ ("algr\t%1,%5\n\talcgr\t%0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((UDItype)(ah)), "r" ((UDItype)(bh)), \ - "%1" ((UDItype)(al)), "r" ((UDItype)(bl)) __CLOBBER_CC); \ - } while (0) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ - __asm__ ("slgr\t%1,%5\n\tslbgr\t%0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((UDItype)(ah)), "r" ((UDItype)(bh)), \ - "1" ((UDItype)(al)), "r" ((UDItype)(bl)) __CLOBBER_CC); \ - } while (0) -#define umul_ppmm(xh, xl, m0, m1) \ - do { \ - union {unsigned int __attribute__ ((mode(TI))) __ll; \ - struct {UDItype __h, __l;} __i; \ - } __x; \ - __asm__ ("mlgr\t%0,%2" \ - : "=3Dr" (__x.__ll) \ - : "%0" ((UDItype)(m0)), "r" ((UDItype)(m1))); \ - (xh) =3D __x.__i.__h; (xl) =3D __x.__i.__l; \ - } while (0) -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - union {unsigned int __attribute__ ((mode(TI))) __ll; \ - struct {UDItype __h, __l;} __i; \ - } __x; \ - __x.__i.__h =3D n1; __x.__i.__l =3D n0; \ - __asm__ ("dlgr\t%0,%2" \ - : "=3Dr" (__x.__ll) \ - : "0" (__x.__ll), "r" ((UDItype)(d))); \ - (q) =3D __x.__i.__l; (r) =3D __x.__i.__h; \ - } while (0) -#if 0 /* FIXME: Enable for z10 (?) */ -#define count_leading_zeros(cnt, x) \ - do { \ - union {unsigned int __attribute__ ((mode(TI))) __ll; \ - struct {UDItype __h, __l;} __i; \ - } __clr_cnt; \ - __asm__ ("flogr\t%0,%1" \ - : "=3Dr" (__clr_cnt.__ll) \ - : "r" (x) __CLOBBER_CC); \ - (cnt) =3D __clr_cnt.__i.__h; \ - } while (0) -#endif -#endif - -/* On x86 and x86_64, every asm implicitly clobbers "flags" and "fpsr", - so we don't need __CLOBBER_CC. */ -#if (defined (__i386__) || defined (__i486__)) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("addl %5,%k1\n\tadcl %3,%k0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "%1" ((USItype)(al)), "g" ((USItype)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("subl %5,%k1\n\tsbbl %3,%k0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "1" ((USItype)(al)), "g" ((USItype)(bl))) -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("mull %3" \ - : "=3Da" (w0), "=3Dd" (w1) \ - : "%0" ((USItype)(u)), "rm" ((USItype)(v))) -#define udiv_qrnnd(q, r, n1, n0, dx) /* d renamed to dx avoiding "=3Dd" = */\ - __asm__ ("divl %4" /* stringification in K&R C */ \ - : "=3Da" (q), "=3Dd" (r) \ - : "0" ((USItype)(n0)), "1" ((USItype)(n1)), "rm" ((USItype)(dx))) - -#if HAVE_HOST_CPU_i586 || HAVE_HOST_CPU_pentium || HAVE_HOST_CPU_pentium= mmx -/* Pentium bsrl takes between 10 and 72 cycles depending where the most - significant 1 bit is, hence the use of the following alternatives. b= sfl - is slow too, between 18 and 42 depending where the least significant = 1 - bit is, so let the generic count_trailing_zeros below make use of the - count_leading_zeros here too. */ - -#if HAVE_HOST_CPU_pentiummmx && ! defined (LONGLONG_STANDALONE) -/* The following should be a fixed 14 or 15 cycles, but possibly plus an= L1 - cache miss reading from __clz_tab. For P55 it's favoured over the fl= oat - below so as to avoid mixing MMX and x87, since the penalty for switch= ing - between the two is about 100 cycles. - - The asm block sets __shift to -3 if the high 24 bits are clear, -2 fo= r - 16, -1 for 8, or 0 otherwise. This could be written equivalently as - follows, but as of gcc 2.95.2 it results in conditional jumps. - - __shift =3D -(__n < 0x1000000); - __shift -=3D (__n < 0x10000); - __shift -=3D (__n < 0x100); - - The middle two sbbl and cmpl's pair, and with luck something gcc - generates might pair with the first cmpl and the last sbbl. The "32+= 1" - constant could be folded into __clz_tab[], but it doesn't seem worth - making a different table just for that. */ - -#define count_leading_zeros(c,n) \ - do { \ - USItype __n =3D (n); \ - USItype __shift; \ - __asm__ ("cmpl $0x1000000, %1\n" \ - "sbbl %0, %0\n" \ - "cmpl $0x10000, %1\n" \ - "sbbl $0, %0\n" \ - "cmpl $0x100, %1\n" \ - "sbbl $0, %0\n" \ - : "=3D&r" (__shift) : "r" (__n)); \ - __shift =3D __shift*8 + 24 + 1; \ - (c) =3D 32 + 1 - __shift - __clz_tab[__n >> __shift]; \ - } while (0) -#define COUNT_LEADING_ZEROS_NEED_CLZ_TAB -#define COUNT_LEADING_ZEROS_0 31 /* n=3D=3D0 indistinguishable from = n=3D=3D1 */ - -#else /* ! pentiummmx || LONGLONG_STANDALONE */ -/* The following should be a fixed 14 cycles or so. Some scheduling - opportunities should be available between the float load/store too. = This - sort of code is used in gcc 3 for __builtin_ffs (with "n&-n") and is - apparently suggested by the Intel optimizing manual (don't know exact= ly - where). gcc 2.95 or up will be best for this, so the "double" is - correctly aligned on the stack. */ -#define count_leading_zeros(c,n) \ - do { \ - union { \ - double d; \ - unsigned a[2]; \ - } __u; \ - __u.d =3D (UWtype) (n); \ - (c) =3D 0x3FF + 31 - (__u.a[1] >> 20); \ - } while (0) -#define COUNT_LEADING_ZEROS_0 (0x3FF + 31) -#endif /* pentiummx */ - -#else /* ! pentium */ - -#if __GMP_GNUC_PREREQ (3,4) /* using bsrl */ -#define count_leading_zeros(count,x) count_leading_zeros_gcc_clz(count,= x) -#endif /* gcc clz */ - -/* On P6, gcc prior to 3.0 generates a partial register stall for - __cbtmp^31, due to using "xorb $31" instead of "xorl $31", the former - being 1 code byte smaller. "31-__cbtmp" is a workaround, probably at= the - cost of one extra instruction. Do this for "i386" too, since that me= ans - generic x86. */ -#if ! defined (count_leading_zeros) && __GNUC__ < 3 \ - && (HAVE_HOST_CPU_i386 \ - || HAVE_HOST_CPU_i686 \ - || HAVE_HOST_CPU_pentiumpro \ - || HAVE_HOST_CPU_pentium2 \ - || HAVE_HOST_CPU_pentium3) -#define count_leading_zeros(count, x) \ - do { \ - USItype __cbtmp; \ - ASSERT ((x) !=3D 0); \ - __asm__ ("bsrl %1,%0" : "=3Dr" (__cbtmp) : "rm" ((USItype)(x))); \ - (count) =3D 31 - __cbtmp; \ - } while (0) -#endif /* gcc<3 asm bsrl */ - -#ifndef count_leading_zeros -#define count_leading_zeros(count, x) \ - do { \ - USItype __cbtmp; \ - ASSERT ((x) !=3D 0); \ - __asm__ ("bsrl %1,%0" : "=3Dr" (__cbtmp) : "rm" ((USItype)(x))); \ - (count) =3D __cbtmp ^ 31; \ - } while (0) -#endif /* asm bsrl */ - -#if __GMP_GNUC_PREREQ (3,4) /* using bsfl */ -#define count_trailing_zeros(count,x) count_trailing_zeros_gcc_ctz(coun= t,x) -#endif /* gcc ctz */ - -#ifndef count_trailing_zeros -#define count_trailing_zeros(count, x) \ - do { \ - ASSERT ((x) !=3D 0); \ - __asm__ ("bsfl %1,%k0" : "=3Dr" (count) : "rm" ((USItype)(x))); \ - } while (0) -#endif /* asm bsfl */ - -#endif /* ! pentium */ - -#endif /* 80x86 */ - -#if defined (__amd64__) && W_TYPE_SIZE =3D=3D 64 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("addq %5,%q1\n\tadcq %3,%q0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((UDItype)(ah)), "rme" ((UDItype)(bh)), \ - "%1" ((UDItype)(al)), "rme" ((UDItype)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("subq %5,%q1\n\tsbbq %3,%q0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((UDItype)(ah)), "rme" ((UDItype)(bh)), \ - "1" ((UDItype)(al)), "rme" ((UDItype)(bl))) -#if X86_ASM_MULX \ - && (HAVE_HOST_CPU_haswell || HAVE_HOST_CPU_broadwell \ - || HAVE_HOST_CPU_skylake || HAVE_HOST_CPU_bd4 || HAVE_HOST_CPU_ze= n) -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("mulx\t%3, %0, %1" \ - : "=3Dr" (w0), "=3Dr" (w1) \ - : "%d" ((UDItype)(u)), "rm" ((UDItype)(v))) -#else -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("mulq\t%3" \ - : "=3Da" (w0), "=3Dd" (w1) \ - : "%0" ((UDItype)(u)), "rm" ((UDItype)(v))) -#endif -#define udiv_qrnnd(q, r, n1, n0, dx) /* d renamed to dx avoiding "=3Dd" = */\ - __asm__ ("divq %4" /* stringification in K&R C */ \ - : "=3Da" (q), "=3Dd" (r) \ - : "0" ((UDItype)(n0)), "1" ((UDItype)(n1)), "rm" ((UDItype)(dx))) - -#if HAVE_HOST_CPU_haswell || HAVE_HOST_CPU_broadwell || HAVE_HOST_CPU_sk= ylake \ - || HAVE_HOST_CPU_k10 || HAVE_HOST_CPU_bd1 || HAVE_HOST_CPU_bd2 \ - || HAVE_HOST_CPU_bd3 || HAVE_HOST_CPU_bd4 || HAVE_HOST_CPU_zen \ - || HAVE_HOST_CPU_bobcat || HAVE_HOST_CPU_jaguar -#define count_leading_zeros(count, x) \ - do { \ - /* This is lzcnt, spelled for older assemblers. Destination and */ = \ - /* source must be a 64-bit registers, hence cast and %q. */ = \ - __asm__ ("rep;bsr\t%1, %q0" : "=3Dr" (count) : "rm" ((UDItype)(x)));= \ - } while (0) -#define COUNT_LEADING_ZEROS_0 64 -#else -#define count_leading_zeros(count, x) \ - do { \ - UDItype __cbtmp; \ - ASSERT ((x) !=3D 0); \ - __asm__ ("bsr\t%1,%0" : "=3Dr" (__cbtmp) : "rm" ((UDItype)(x))); \ - (count) =3D __cbtmp ^ 63; \ - } while (0) -#endif - -#if HAVE_HOST_CPU_bd2 || HAVE_HOST_CPU_bd3 || HAVE_HOST_CPU_bd4 \ - || HAVE_HOST_CPU_zen || HAVE_HOST_CPU_jaguar -#define count_trailing_zeros(count, x) \ - do { \ - /* This is tzcnt, spelled for older assemblers. Destination and */ = \ - /* source must be a 64-bit registers, hence cast and %q. */ = \ - __asm__ ("rep;bsf\t%1, %q0" : "=3Dr" (count) : "rm" ((UDItype)(x)));= \ - } while (0) -#define COUNT_TRAILING_ZEROS_0 64 -#else -#define count_trailing_zeros(count, x) \ - do { \ - ASSERT ((x) !=3D 0); \ - __asm__ ("bsf\t%1, %q0" : "=3Dr" (count) : "rm" ((UDItype)(x))); \ - } while (0) -#endif -#endif /* __amd64__ */ - -#if defined (__i860__) && W_TYPE_SIZE =3D=3D 32 -#define rshift_rhlc(r,h,l,c) \ - __asm__ ("shr %3,r0,r0\;shrd %1,%2,%0" \ - "=3Dr" (r) : "r" (h), "r" (l), "rn" (c)) -#endif /* i860 */ - -#if defined (__i960__) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("cmpo 1,0\;addc %5,%4,%1\;addc %3,%2,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "dI" (ah), "dI" (bh), "%dI" (al), "dI" (bl)) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("cmpo 0,0\;subc %5,%4,%1\;subc %3,%2,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "dI" (ah), "dI" (bh), "dI" (al), "dI" (bl)) -#define umul_ppmm(w1, w0, u, v) \ - ({union {UDItype __ll; \ - struct {USItype __l, __h;} __i; \ - } __x; \ - __asm__ ("emul %2,%1,%0" \ - : "=3Dd" (__x.__ll) : "%dI" (u), "dI" (v)); \ - (w1) =3D __x.__i.__h; (w0) =3D __x.__i.__l;}) -#define __umulsidi3(u, v) \ - ({UDItype __w; \ - __asm__ ("emul %2,%1,%0" : "=3Dd" (__w) : "%dI" (u), "dI" (v)); \ - __w; }) -#define udiv_qrnnd(q, r, nh, nl, d) \ - do { \ - union {UDItype __ll; \ - struct {USItype __l, __h;} __i; \ - } __nn; \ - __nn.__i.__h =3D (nh); __nn.__i.__l =3D (nl); \ - __asm__ ("ediv %d,%n,%0" \ - : "=3Dd" (__rq.__ll) : "dI" (__nn.__ll), "dI" (d)); \ - (r) =3D __rq.__i.__l; (q) =3D __rq.__i.__h; \ - } while (0) -#define count_leading_zeros(count, x) \ - do { \ - USItype __cbtmp; \ - __asm__ ("scanbit %1,%0" : "=3Dr" (__cbtmp) : "r" (x)); \ - (count) =3D __cbtmp ^ 31; \ - } while (0) -#define COUNT_LEADING_ZEROS_0 (-32) /* sic */ -#if defined (__i960mx) /* what is the proper symbol to test??? */ -#define rshift_rhlc(r,h,l,c) \ - do { \ - union {UDItype __ll; \ - struct {USItype __l, __h;} __i; \ - } __nn; \ - __nn.__i.__h =3D (h); __nn.__i.__l =3D (l); \ - __asm__ ("shre %2,%1,%0" : "=3Dd" (r) : "dI" (__nn.__ll), "dI" (c));= \ - } -#endif /* i960mx */ -#endif /* i960 */ - -#if (defined (__mc68000__) || defined (__mc68020__) || defined(mc68020) = \ - || defined (__m68k__) || defined (__mc5200__) || defined (__mc5206e= __) \ - || defined (__mc5307__)) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("add%.l %5,%1\n\taddx%.l %3,%0" \ - : "=3Dd" (sh), "=3D&d" (sl) \ - : "0" ((USItype)(ah)), "d" ((USItype)(bh)), \ - "%1" ((USItype)(al)), "g" ((USItype)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("sub%.l %5,%1\n\tsubx%.l %3,%0" \ - : "=3Dd" (sh), "=3D&d" (sl) \ - : "0" ((USItype)(ah)), "d" ((USItype)(bh)), \ - "1" ((USItype)(al)), "g" ((USItype)(bl))) -/* The '020, '030, '040 and CPU32 have 32x32->64 and 64/32->32q-32r. */ -#if defined (__mc68020__) || defined(mc68020) \ - || defined (__mc68030__) || defined (mc68030) \ - || defined (__mc68040__) || defined (mc68040) \ - || defined (__mcpu32__) || defined (mcpu32) \ - || defined (__NeXT__) -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("mulu%.l %3,%1:%0" \ - : "=3Dd" (w0), "=3Dd" (w1) \ - : "%0" ((USItype)(u)), "dmi" ((USItype)(v))) -#define udiv_qrnnd(q, r, n1, n0, d) \ - __asm__ ("divu%.l %4,%1:%0" \ - : "=3Dd" (q), "=3Dd" (r) \ - : "0" ((USItype)(n0)), "1" ((USItype)(n1)), "dmi" ((USItype)(d))) -#define sdiv_qrnnd(q, r, n1, n0, d) \ - __asm__ ("divs%.l %4,%1:%0" \ - : "=3Dd" (q), "=3Dd" (r) \ - : "0" ((USItype)(n0)), "1" ((USItype)(n1)), "dmi" ((USItype)(d))) -#else /* for other 68k family members use 16x16->32 multiplication */ -#define umul_ppmm(xh, xl, a, b) \ - do { USItype __umul_tmp1, __umul_tmp2; \ - __asm__ ("| Inlined umul_ppmm\n" \ -" move%.l %5,%3\n" \ -" move%.l %2,%0\n" \ -" move%.w %3,%1\n" \ -" swap %3\n" \ -" swap %0\n" \ -" mulu%.w %2,%1\n" \ -" mulu%.w %3,%0\n" \ -" mulu%.w %2,%3\n" \ -" swap %2\n" \ -" mulu%.w %5,%2\n" \ -" add%.l %3,%2\n" \ -" jcc 1f\n" \ -" add%.l %#0x10000,%0\n" \ -"1: move%.l %2,%3\n" \ -" clr%.w %2\n" \ -" swap %2\n" \ -" swap %3\n" \ -" clr%.w %3\n" \ -" add%.l %3,%1\n" \ -" addx%.l %2,%0\n" \ -" | End inlined umul_ppmm" \ - : "=3D&d" (xh), "=3D&d" (xl), \ - "=3Dd" (__umul_tmp1), "=3D&d" (__umul_tmp2) \ - : "%2" ((USItype)(a)), "d" ((USItype)(b))); \ - } while (0) -#endif /* not mc68020 */ -/* The '020, '030, '040 and '060 have bitfield insns. - GCC 3.4 defines __mc68020__ when in CPU32 mode, check for __mcpu32__ = to - exclude bfffo on that chip (bitfield insns not available). */ -#if (defined (__mc68020__) || defined (mc68020) \ - || defined (__mc68030__) || defined (mc68030) \ - || defined (__mc68040__) || defined (mc68040) \ - || defined (__mc68060__) || defined (mc68060) \ - || defined (__NeXT__)) \ - && ! defined (__mcpu32__) -#define count_leading_zeros(count, x) \ - __asm__ ("bfffo %1{%b2:%b2},%0" \ - : "=3Dd" (count) \ - : "od" ((USItype) (x)), "n" (0)) -#define COUNT_LEADING_ZEROS_0 32 -#endif -#endif /* mc68000 */ - -#if defined (__m88000__) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("addu.co %1,%r4,%r5\n\taddu.ci %0,%r2,%r3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rJ" (ah), "rJ" (bh), "%rJ" (al), "rJ" (bl)) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("subu.co %1,%r4,%r5\n\tsubu.ci %0,%r2,%r3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rJ" (ah), "rJ" (bh), "rJ" (al), "rJ" (bl)) -#define count_leading_zeros(count, x) \ - do { \ - USItype __cbtmp; \ - __asm__ ("ff1 %0,%1" : "=3Dr" (__cbtmp) : "r" (x)); \ - (count) =3D __cbtmp ^ 31; \ - } while (0) -#define COUNT_LEADING_ZEROS_0 63 /* sic */ -#if defined (__m88110__) -#define umul_ppmm(wh, wl, u, v) \ - do { \ - union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __asm__ ("mulu.d %0,%1,%2" : "=3Dr" (__x.__ll) : "r" (u), "r" (v)); = \ - (wh) =3D __x.__i.__h; \ - (wl) =3D __x.__i.__l; \ - } while (0) -#define udiv_qrnnd(q, r, n1, n0, d) \ - ({union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x, __q; \ - __x.__i.__h =3D (n1); __x.__i.__l =3D (n0); \ - __asm__ ("divu.d %0,%1,%2" \ - : "=3Dr" (__q.__ll) : "r" (__x.__ll), "r" (d)); \ - (r) =3D (n0) - __q.__l * (d); (q) =3D __q.__l; }) -#endif /* __m88110__ */ -#endif /* __m88000__ */ - -#if defined (__mips) && W_TYPE_SIZE =3D=3D 32 -#if __GMP_GNUC_PREREQ (4,4) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - UDItype __ll =3D (UDItype)(u) * (v); \ - w1 =3D __ll >> 32; \ - w0 =3D __ll; \ - } while (0) -#endif -#if !defined (umul_ppmm) && __GMP_GNUC_PREREQ (2,7) && !defined (__clang= __) -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("multu %2,%3" : "=3Dl" (w0), "=3Dh" (w1) : "d" (u), "d" (v)) -#endif -#if !defined (umul_ppmm) -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("multu %2,%3\n\tmflo %0\n\tmfhi %1" \ - : "=3Dd" (w0), "=3Dd" (w1) : "d" (u), "d" (v)) -#endif -#endif /* __mips */ - -#if (defined (__mips) && __mips >=3D 3) && W_TYPE_SIZE =3D=3D 64 -#if defined (_MIPS_ARCH_MIPS64R6) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - UDItype __m0 =3D (u), __m1 =3D (v); \ - (w0) =3D __m0 * __m1; \ - __asm__ ("dmuhu\t%0, %1, %2" : "=3Dd" (w1) : "d" (__m0), "d" (__m1))= ; \ - } while (0) -#endif -#if !defined (umul_ppmm) && __GMP_GNUC_PREREQ (4,4) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - typedef unsigned int __ll_UTItype __attribute__((mode(TI))); \ - __ll_UTItype __ll =3D (__ll_UTItype)(u) * (v); \ - w1 =3D __ll >> 64; \ - w0 =3D __ll; \ - } while (0) -#endif -#if !defined (umul_ppmm) && __GMP_GNUC_PREREQ (2,7) && !defined (__clang= __) -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("dmultu %2,%3" \ - : "=3Dl" (w0), "=3Dh" (w1) \ - : "d" ((UDItype)(u)), "d" ((UDItype)(v))) -#endif -#if !defined (umul_ppmm) -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("dmultu %2,%3\n\tmflo %0\n\tmfhi %1" \ - : "=3Dd" (w0), "=3Dd" (w1) \ - : "d" ((UDItype)(u)), "d" ((UDItype)(v))) -#endif -#endif /* __mips */ - -#if defined (__mmix__) && W_TYPE_SIZE =3D=3D 64 -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("MULU %0,%2,%3" : "=3Dr" (w0), "=3Dz" (w1) : "r" (u), "r" (v)= ) -#endif - -#if defined (__ns32000__) && W_TYPE_SIZE =3D=3D 32 -#define umul_ppmm(w1, w0, u, v) \ - ({union {UDItype __ll; \ - struct {USItype __l, __h;} __i; \ - } __x; \ - __asm__ ("meid %2,%0" \ - : "=3Dg" (__x.__ll) \ - : "%0" ((USItype)(u)), "g" ((USItype)(v))); \ - (w1) =3D __x.__i.__h; (w0) =3D __x.__i.__l;}) -#define __umulsidi3(u, v) \ - ({UDItype __w; \ - __asm__ ("meid %2,%0" \ - : "=3Dg" (__w) \ - : "%0" ((USItype)(u)), "g" ((USItype)(v))); \ - __w; }) -#define udiv_qrnnd(q, r, n1, n0, d) \ - ({union {UDItype __ll; \ - struct {USItype __l, __h;} __i; \ - } __x; \ - __x.__i.__h =3D (n1); __x.__i.__l =3D (n0); \ - __asm__ ("deid %2,%0" \ - : "=3Dg" (__x.__ll) \ - : "0" (__x.__ll), "g" ((USItype)(d))); \ - (r) =3D __x.__i.__l; (q) =3D __x.__i.__h; }) -#define count_trailing_zeros(count,x) \ - do { \ - __asm__ ("ffsd %2,%0" \ - : "=3Dr" (count) \ - : "0" ((USItype) 0), "r" ((USItype) (x))); \ - } while (0) -#endif /* __ns32000__ */ - -/* In the past we had a block of various #defines tested - _ARCH_PPC - AIX - _ARCH_PWR - AIX - __powerpc__ - gcc - __POWERPC__ - BEOS - __ppc__ - Darwin - PPC - old gcc, GNU/Linux, SysV - The plain PPC test was not good for vxWorks, since PPC is defined on = all - CPUs there (eg. m68k too), as a constant one is expected to compare - CPU_FAMILY against. - - At any rate, this was pretty unattractive and a bit fragile. The use= of - HAVE_HOST_CPU_FAMILY is designed to cut through it all and be sure of - getting the desired effect. - - ENHANCE-ME: We should test _IBMR2 here when we add assembly support f= or - the system vendor compilers. (Is that vendor compilers with inline a= sm, - or what?) */ - -#if (HAVE_HOST_CPU_FAMILY_power || HAVE_HOST_CPU_FAMILY_powerpc) \ - && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (bh) && (bh) =3D=3D 0) \ - __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D ~(USItype) 0) \ - __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) : "r" (ah), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ - else \ - __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "r" (bh), "%r" (al), "rI" (bl) \ - __CLOBBER_CC); \ - } while (0) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (ah) && (ah) =3D=3D 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\tsubfze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) : "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (ah) && (ah) =3D=3D ~(USItype) 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\tsubfme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) : "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\taddme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) : "r" (ah), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D ~(USItype) 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\taddze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) : "r" (ah), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ - else \ - __asm__ ("subf%I4c %1,%5,%4\n\tsubfe %0,%3,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" (ah), "r" (bh), "rI" (al), "r" (bl) \ - __CLOBBER_CC); \ - } while (0) -#define count_leading_zeros(count, x) \ - __asm__ ("cntlzw %0,%1" : "=3Dr" (count) : "r" (x)) -#define COUNT_LEADING_ZEROS_0 32 -#if HAVE_HOST_CPU_FAMILY_powerpc -#if __GMP_GNUC_PREREQ (4,4) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - UDItype __ll =3D (UDItype)(u) * (v); \ - w1 =3D __ll >> 32; \ - w0 =3D __ll; \ - } while (0) -#endif -#if !defined (umul_ppmm) -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - USItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("mulhwu %0,%1,%2" : "=3Dr" (ph) : "%r" (m0), "r" (m1)); \ - (pl) =3D __m0 * __m1; \ - } while (0) -#endif -#define smul_ppmm(ph, pl, m0, m1) \ - do { \ - SItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("mulhw %0,%1,%2" : "=3Dr" (ph) : "%r" (m0), "r" (m1)); \ - (pl) =3D __m0 * __m1; \ - } while (0) -#else -#define smul_ppmm(xh, xl, m0, m1) \ - __asm__ ("mul %0,%2,%3" : "=3Dr" (xh), "=3Dq" (xl) : "r" (m0), "r" (m1= )) -#define sdiv_qrnnd(q, r, nh, nl, d) \ - __asm__ ("div %0,%2,%4" : "=3Dr" (q), "=3Dq" (r) : "r" (nh), "1" (nl),= "r" (d)) -#endif -#endif /* 32-bit POWER architecture variants. */ - -/* We should test _IBMR2 here when we add assembly support for the syste= m - vendor compilers. */ -#if HAVE_HOST_CPU_FAMILY_powerpc && W_TYPE_SIZE =3D=3D 64 -#if !defined (_LONG_LONG_LIMB) -/* _LONG_LONG_LIMB is ABI=3Dmode32 where adde operates on 32-bit values.= So - use adde etc only when not _LONG_LONG_LIMB. */ -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (bh) && (bh) =3D=3D 0) \ - __asm__ ("add%I4c %1,%3,%4\n\taddze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), \ - "%r" ((UDItype)(al)), "rI" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D ~(UDItype) 0) \ - __asm__ ("add%I4c %1,%3,%4\n\taddme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), \ - "%r" ((UDItype)(al)), "rI" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - else \ - __asm__ ("add%I5c %1,%4,%5\n\tadde %0,%2,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), "r" ((UDItype)(bh)), \ - "%r" ((UDItype)(al)), "rI" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - } while (0) -/* We use "*rI" for the constant operand here, since with just "I", gcc = barfs. - This might seem strange, but gcc folds away the dead code late. */ -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ - if (__builtin_constant_p (bl) && bl > -0x8000 && bl <=3D 0x8000) { \ - if (__builtin_constant_p (ah) && (ah) =3D=3D 0) \ - __asm__ ("addic %1,%3,%4\n\tsubfze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(bh)), \ - "rI" ((UDItype)(al)), "*rI" (-((UDItype)(bl))) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (ah) && (ah) =3D=3D ~(UDItype) 0) \ - __asm__ ("addic %1,%3,%4\n\tsubfme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(bh)), \ - "rI" ((UDItype)(al)), "*rI" (-((UDItype)(bl))) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D 0) \ - __asm__ ("addic %1,%3,%4\n\taddme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), \ - "rI" ((UDItype)(al)), "*rI" (-((UDItype)(bl))) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D ~(UDItype) 0) \ - __asm__ ("addic %1,%3,%4\n\taddze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), \ - "rI" ((UDItype)(al)), "*rI" (-((UDItype)(bl))) \ - __CLOBBER_CC); \ - else \ - __asm__ ("addic %1,%4,%5\n\tsubfe %0,%3,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), "r" ((UDItype)(bh)), \ - "rI" ((UDItype)(al)), "*rI" (-((UDItype)(bl))) \ - __CLOBBER_CC); \ - } else { \ - if (__builtin_constant_p (ah) && (ah) =3D=3D 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\tsubfze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(bh)), \ - "rI" ((UDItype)(al)), "r" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (ah) && (ah) =3D=3D ~(UDItype) 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\tsubfme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(bh)), \ - "rI" ((UDItype)(al)), "r" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\taddme %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), \ - "rI" ((UDItype)(al)), "r" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - else if (__builtin_constant_p (bh) && (bh) =3D=3D ~(UDItype) 0) \ - __asm__ ("subf%I3c %1,%4,%3\n\taddze %0,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), \ - "rI" ((UDItype)(al)), "r" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - else \ - __asm__ ("subf%I4c %1,%5,%4\n\tsubfe %0,%3,%2" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "r" ((UDItype)(ah)), "r" ((UDItype)(bh)), \ - "rI" ((UDItype)(al)), "r" ((UDItype)(bl)) \ - __CLOBBER_CC); \ - } \ - } while (0) -#endif /* ! _LONG_LONG_LIMB */ -#define count_leading_zeros(count, x) \ - __asm__ ("cntlzd %0,%1" : "=3Dr" (count) : "r" (x)) -#define COUNT_LEADING_ZEROS_0 64 -#if __GMP_GNUC_PREREQ (4,8) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - typedef unsigned int __ll_UTItype __attribute__((mode(TI))); \ - __ll_UTItype __ll =3D (__ll_UTItype)(u) * (v); \ - w1 =3D __ll >> 64; \ - w0 =3D __ll; \ - } while (0) -#endif -#if !defined (umul_ppmm) -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("mulhdu %0,%1,%2" : "=3Dr" (ph) : "%r" (__m0), "r" (__m1));= \ - (pl) =3D __m0 * __m1; \ - } while (0) -#endif -#define smul_ppmm(ph, pl, m0, m1) \ - do { \ - DItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("mulhd %0,%1,%2" : "=3Dr" (ph) : "%r" (__m0), "r" (__m1)); = \ - (pl) =3D __m0 * __m1; \ - } while (0) -#endif /* 64-bit PowerPC. */ - -#if defined (__pyr__) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("addw %5,%1\n\taddwc %3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "%1" ((USItype)(al)), "g" ((USItype)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("subw %5,%1\n\tsubwb %3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "1" ((USItype)(al)), "g" ((USItype)(bl))) -/* This insn works on Pyramids with AP, XP, or MI CPUs, but not with SP.= */ -#define umul_ppmm(w1, w0, u, v) \ - ({union {UDItype __ll; \ - struct {USItype __h, __l;} __i; \ - } __x; \ - __asm__ ("movw %1,%R0\n\tuemul %2,%0" \ - : "=3D&r" (__x.__ll) \ - : "g" ((USItype) (u)), "g" ((USItype)(v))); \ - (w1) =3D __x.__i.__h; (w0) =3D __x.__i.__l;}) -#endif /* __pyr__ */ - -#if defined (__ibm032__) /* RT/ROMP */ && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("a %1,%5\n\tae %0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((USItype)(ah)), "r" ((USItype)(bh)), \ - "%1" ((USItype)(al)), "r" ((USItype)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("s %1,%5\n\tse %0,%3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((USItype)(ah)), "r" ((USItype)(bh)), \ - "1" ((USItype)(al)), "r" ((USItype)(bl))) -#define smul_ppmm(ph, pl, m0, m1) \ - __asm__ ( \ - "s r2,r2\n" \ -" mts r10,%2\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" m r2,%3\n" \ -" cas %0,r2,r0\n" \ -" mfs r10,%1" \ - : "=3Dr" (ph), "=3Dr" (pl) \ - : "%r" ((USItype)(m0)), "r" ((USItype)(m1)) \ - : "r2") -#define count_leading_zeros(count, x) \ - do { \ - if ((x) >=3D 0x10000) \ - __asm__ ("clz %0,%1" \ - : "=3Dr" (count) : "r" ((USItype)(x) >> 16)); \ - else \ - { \ - __asm__ ("clz %0,%1" \ - : "=3Dr" (count) : "r" ((USItype)(x))); \ - (count) +=3D 16; \ - } \ - } while (0) -#endif /* RT/ROMP */ - -#if defined (__riscv64) && W_TYPE_SIZE =3D=3D 64 -#define umul_ppmm(ph, pl, u, v) \ - do { \ - UDItype __u =3D (u), __v =3D (v); \ - (pl) =3D __u * __v; \ - __asm__ ("mulhu\t%2, %1, %0" : "=3Dr" (ph) : "%r" (__u), "r" (__v));= \ - } while (0) -#endif - -#if (defined (__SH2__) || defined (__SH3__) || defined (__SH4__)) && W_T= YPE_SIZE =3D=3D 32 -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("dmulu.l %2,%3\n\tsts macl,%1\n\tsts mach,%0" \ - : "=3Dr" (w1), "=3Dr" (w0) : "r" (u), "r" (v) : "macl", "mach") -#endif - -#if defined (__sparc__) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("addcc %r4,%5,%1\n\taddx %r2,%3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rJ" (ah), "rI" (bh),"%rJ" (al), "rI" (bl) \ - __CLOBBER_CC) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("subcc %r4,%5,%1\n\tsubx %r2,%3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rJ" (ah), "rI" (bh), "rJ" (al), "rI" (bl) \ - __CLOBBER_CC) -/* FIXME: When gcc -mcpu=3Dv9 is used on solaris, gcc/config/sol2-sld-64= .h - doesn't define anything to indicate that to us, it only sets __sparcv= 8. */ -#if defined (__sparc_v9__) || defined (__sparcv9) -/* Perhaps we should use floating-point operations here? */ -#if 0 -/* Triggers a bug making mpz/tests/t-gcd.c fail. - Perhaps we simply need explicitly zero-extend the inputs? */ -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("mulx %2,%3,%%g1; srl %%g1,0,%1; srlx %%g1,32,%0" : \ - "=3Dr" (w1), "=3Dr" (w0) : "r" (u), "r" (v) : "g1") -#else -/* Use v8 umul until above bug is fixed. */ -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("umul %2,%3,%1;rd %%y,%0" : "=3Dr" (w1), "=3Dr" (w0) : "r" (u= ), "r" (v)) -#endif -/* Use a plain v8 divide for v9. */ -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - USItype __q; \ - __asm__ ("mov %1,%%y;nop;nop;nop;udiv %2,%3,%0" \ - : "=3Dr" (__q) : "r" (n1), "r" (n0), "r" (d)); \ - (r) =3D (n0) - __q * (d); \ - (q) =3D __q; \ - } while (0) -#else -#if defined (__sparc_v8__) /* gcc normal */ \ - || defined (__sparcv8) /* gcc solaris */ \ - || HAVE_HOST_CPU_supersparc -/* Don't match immediate range because, 1) it is not often useful, - 2) the 'I' flag thinks of the range as a 13 bit signed interval, - while we want to match a 13 bit interval, sign extended to 32 bits, - but INTERPRETED AS UNSIGNED. */ -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("umul %2,%3,%1;rd %%y,%0" : "=3Dr" (w1), "=3Dr" (w0) : "r" (u= ), "r" (v)) - -#if HAVE_HOST_CPU_supersparc -#else -/* Don't use this on SuperSPARC because its udiv only handles 53 bit - dividends and will trap to the kernel for the rest. */ -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - USItype __q; \ - __asm__ ("mov %1,%%y;nop;nop;nop;udiv %2,%3,%0" \ - : "=3Dr" (__q) : "r" (n1), "r" (n0), "r" (d)); \ - (r) =3D (n0) - __q * (d); \ - (q) =3D __q; \ - } while (0) -#endif /* HAVE_HOST_CPU_supersparc */ - -#else /* ! __sparc_v8__ */ -#if defined (__sparclite__) -/* This has hardware multiply but not divide. It also has two additiona= l - instructions scan (ffs from high bit) and divscc. */ -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("umul %2,%3,%1;rd %%y,%0" : "=3Dr" (w1), "=3Dr" (w0) : "r" (u= ), "r" (v)) -#define udiv_qrnnd(q, r, n1, n0, d) \ - __asm__ ("! Inlined udiv_qrnnd\n" \ -" wr %%g0,%2,%%y ! Not a delayed write for sparclite\n" \ -" tst %%g0\n" \ -" divscc %3,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%%g1\n" \ -" divscc %%g1,%4,%0\n" \ -" rd %%y,%1\n" \ -" bl,a 1f\n" \ -" add %1,%4,%1\n" \ -"1: ! End of inline udiv_qrnnd" \ - : "=3Dr" (q), "=3Dr" (r) : "r" (n1), "r" (n0), "rI" (d) \ - : "%g1" __AND_CLOBBER_CC) -#define count_leading_zeros(count, x) \ - __asm__ ("scan %1,1,%0" : "=3Dr" (count) : "r" (x)) -/* Early sparclites return 63 for an argument of 0, but they warn that f= uture - implementations might change this. Therefore, leave COUNT_LEADING_ZE= ROS_0 - undefined. */ -#endif /* __sparclite__ */ -#endif /* __sparc_v8__ */ -#endif /* __sparc_v9__ */ -/* Default to sparc v7 versions of umul_ppmm and udiv_qrnnd. */ -#ifndef umul_ppmm -#define umul_ppmm(w1, w0, u, v) \ - __asm__ ("! Inlined umul_ppmm\n" \ -" wr %%g0,%2,%%y ! SPARC has 0-3 delay insn after a wr\n" \ -" sra %3,31,%%g2 ! Don't move this insn\n" \ -" and %2,%%g2,%%g2 ! Don't move this insn\n" \ -" andcc %%g0,0,%%g1 ! Don't move this insn\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,%3,%%g1\n" \ -" mulscc %%g1,0,%%g1\n" \ -" add %%g1,%%g2,%0\n" \ -" rd %%y,%1" \ - : "=3Dr" (w1), "=3Dr" (w0) : "%rI" (u), "r" (v) \ - : "%g1", "%g2" __AND_CLOBBER_CC) -#endif -#ifndef udiv_qrnnd -#ifndef LONGLONG_STANDALONE -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { UWtype __r; \ - (q) =3D __MPN(udiv_qrnnd) (&__r, (n1), (n0), (d)); \ - (r) =3D __r; \ - } while (0) -extern UWtype __MPN(udiv_qrnnd) (UWtype *, UWtype, UWtype, UWtype); -#endif /* LONGLONG_STANDALONE */ -#endif /* udiv_qrnnd */ -#endif /* __sparc__ */ - -#if defined (__sparc__) && W_TYPE_SIZE =3D=3D 64 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ( \ - "addcc %r4,%5,%1\n" \ - " addccc %r6,%7,%%g0\n" \ - " addc %r2,%3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rJ" ((UDItype)(ah)), "rI" ((UDItype)(bh)), \ - "%rJ" ((UDItype)(al)), "rI" ((UDItype)(bl)), \ - "%rJ" ((UDItype)(al) >> 32), "rI" ((UDItype)(bl) >> 32) \ - __CLOBBER_CC) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ( \ - "subcc %r4,%5,%1\n" \ - " subccc %r6,%7,%%g0\n" \ - " subc %r2,%3,%0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rJ" ((UDItype)(ah)), "rI" ((UDItype)(bh)), \ - "rJ" ((UDItype)(al)), "rI" ((UDItype)(bl)), \ - "rJ" ((UDItype)(al) >> 32), "rI" ((UDItype)(bl) >> 32) \ - __CLOBBER_CC) -#if __VIS__ >=3D 0x300 -#undef add_ssaaaa -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ( \ - "addcc %r4, %5, %1\n" \ - " addxc %r2, %r3, %0" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "rJ" ((UDItype)(ah)), "rJ" ((UDItype)(bh)), \ - "%rJ" ((UDItype)(al)), "rI" ((UDItype)(bl)) __CLOBBER_CC) -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDItype __m0 =3D (m0), __m1 =3D (m1); \ - (pl) =3D __m0 * __m1; \ - __asm__ ("umulxhi\t%2, %1, %0" \ - : "=3Dr" (ph) \ - : "%r" (__m0), "r" (__m1)); \ - } while (0) -#define count_leading_zeros(count, x) \ - __asm__ ("lzd\t%1,%0" : "=3Dr" (count) : "r" (x)) -/* Needed by count_leading_zeros_32 in sparc64.h. */ -#define COUNT_LEADING_ZEROS_NEED_CLZ_TAB -#endif -#endif - -#if (defined (__vax) || defined (__vax__)) && W_TYPE_SIZE =3D=3D 32 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("addl2 %5,%1\n\tadwc %3,%0" \ - : "=3Dg" (sh), "=3D&g" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "%1" ((USItype)(al)), "g" ((USItype)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("subl2 %5,%1\n\tsbwc %3,%0" \ - : "=3Dg" (sh), "=3D&g" (sl) \ - : "0" ((USItype)(ah)), "g" ((USItype)(bh)), \ - "1" ((USItype)(al)), "g" ((USItype)(bl))) -#define smul_ppmm(xh, xl, m0, m1) \ - do { \ - union {UDItype __ll; \ - struct {USItype __l, __h;} __i; \ - } __x; \ - USItype __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("emul %1,%2,$0,%0" \ - : "=3Dg" (__x.__ll) : "g" (__m0), "g" (__m1)); \ - (xh) =3D __x.__i.__h; (xl) =3D __x.__i.__l; \ - } while (0) -#define sdiv_qrnnd(q, r, n1, n0, d) \ - do { \ - union {DItype __ll; \ - struct {SItype __l, __h;} __i; \ - } __x; \ - __x.__i.__h =3D n1; __x.__i.__l =3D n0; \ - __asm__ ("ediv %3,%2,%0,%1" \ - : "=3Dg" (q), "=3Dg" (r) : "g" (__x.__ll), "g" (d)); \ - } while (0) -#if 0 -/* FIXME: This instruction appears to be unimplemented on some systems (= vax - 8800 maybe). */ -#define count_trailing_zeros(count,x) \ - do { \ - __asm__ ("ffs 0, 31, %1, %0" \ - : "=3Dg" (count) \ - : "g" ((USItype) (x))); \ - } while (0) -#endif -#endif /* vax */ - -#if defined (__z8000__) && W_TYPE_SIZE =3D=3D 16 -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - __asm__ ("add %H1,%H5\n\tadc %H0,%H3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((unsigned int)(ah)), "r" ((unsigned int)(bh)), \ - "%1" ((unsigned int)(al)), "rQR" ((unsigned int)(bl))) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - __asm__ ("sub %H1,%H5\n\tsbc %H0,%H3" \ - : "=3Dr" (sh), "=3D&r" (sl) \ - : "0" ((unsigned int)(ah)), "r" ((unsigned int)(bh)), \ - "1" ((unsigned int)(al)), "rQR" ((unsigned int)(bl))) -#define umul_ppmm(xh, xl, m0, m1) \ - do { \ - union {long int __ll; \ - struct {unsigned int __h, __l;} __i; \ - } __x; \ - unsigned int __m0 =3D (m0), __m1 =3D (m1); \ - __asm__ ("mult %S0,%H3" \ - : "=3Dr" (__x.__i.__h), "=3Dr" (__x.__i.__l) \ - : "%1" (m0), "rQR" (m1)); \ - (xh) =3D __x.__i.__h; (xl) =3D __x.__i.__l; \ - (xh) +=3D ((((signed int) __m0 >> 15) & __m1) \ - + (((signed int) __m1 >> 15) & __m0)); \ - } while (0) -#endif /* __z8000__ */ - -#endif /* __GNUC__ */ - -#endif /* NO_ASM */ - - -/* FIXME: "sidi" here is highly doubtful, should sometimes be "diti". *= / -#if !defined (umul_ppmm) && defined (__umulsidi3) -#define umul_ppmm(ph, pl, m0, m1) \ - do { \ - UDWtype __ll =3D __umulsidi3 (m0, m1); \ - ph =3D (UWtype) (__ll >> W_TYPE_SIZE); \ - pl =3D (UWtype) __ll; \ - } while (0) -#endif - -#if !defined (__umulsidi3) -#define __umulsidi3(u, v) \ - ({UWtype __hi, __lo; \ - umul_ppmm (__hi, __lo, u, v); \ - ((UDWtype) __hi << W_TYPE_SIZE) | __lo; }) -#endif - - -#if defined (__cplusplus) -#define __longlong_h_C "C" -#else -#define __longlong_h_C -#endif - -/* Use mpn_umul_ppmm or mpn_udiv_qrnnd functions, if they exist. The "_= r" - forms have "reversed" arguments, meaning the pointer is last, which - sometimes allows better parameter passing, in particular on 64-bit - hppa. */ - -#define mpn_umul_ppmm __MPN(umul_ppmm) -extern __longlong_h_C UWtype mpn_umul_ppmm (UWtype *, UWtype, UWtype); - -#if ! defined (umul_ppmm) && HAVE_NATIVE_mpn_umul_ppmm \ - && ! defined (LONGLONG_STANDALONE) -#define umul_ppmm(wh, wl, u, v) \ - do { \ - UWtype __umul_ppmm__p0; \ - (wh) =3D mpn_umul_ppmm (&__umul_ppmm__p0, (UWtype) (u), (UWtype) (v)= );\ - (wl) =3D __umul_ppmm__p0; \ - } while (0) -#endif - -#define mpn_umul_ppmm_r __MPN(umul_ppmm_r) -extern __longlong_h_C UWtype mpn_umul_ppmm_r (UWtype, UWtype, UWtype *); - -#if ! defined (umul_ppmm) && HAVE_NATIVE_mpn_umul_ppmm_r \ - && ! defined (LONGLONG_STANDALONE) -#define umul_ppmm(wh, wl, u, v) \ - do { \ - UWtype __umul_p0; \ - (wh) =3D mpn_umul_ppmm_r ((UWtype) (u), (UWtype) (v), &__umul_p0); \ - (wl) =3D __umul_p0; \ - } while (0) -#endif - -#define mpn_udiv_qrnnd __MPN(udiv_qrnnd) -extern __longlong_h_C UWtype mpn_udiv_qrnnd (UWtype *, UWtype, UWtype, U= Wtype); - -#if ! defined (udiv_qrnnd) && HAVE_NATIVE_mpn_udiv_qrnnd \ - && ! defined (LONGLONG_STANDALONE) -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - UWtype __udiv_qrnnd_r; \ - (q) =3D mpn_udiv_qrnnd (&__udiv_qrnnd_r, \ - (UWtype) (n1), (UWtype) (n0), (UWtype) d); \ - (r) =3D __udiv_qrnnd_r; \ - } while (0) -#endif - -#define mpn_udiv_qrnnd_r __MPN(udiv_qrnnd_r) -extern __longlong_h_C UWtype mpn_udiv_qrnnd_r (UWtype, UWtype, UWtype, U= Wtype *); - -#if ! defined (udiv_qrnnd) && HAVE_NATIVE_mpn_udiv_qrnnd_r \ - && ! defined (LONGLONG_STANDALONE) -#define udiv_qrnnd(q, r, n1, n0, d) \ - do { \ - UWtype __udiv_qrnnd_r; \ - (q) =3D mpn_udiv_qrnnd_r ((UWtype) (n1), (UWtype) (n0), (UWtype) d, = \ - &__udiv_qrnnd_r); \ - (r) =3D __udiv_qrnnd_r; \ - } while (0) -#endif - - -/* If this machine has no inline assembler, use C macros. */ - -#if !defined (add_ssaaaa) -#define add_ssaaaa(sh, sl, ah, al, bh, bl) \ - do { \ - UWtype __x; \ - __x =3D (al) + (bl); \ - (sh) =3D (ah) + (bh) + (__x < (al)); \ - (sl) =3D __x; \ - } while (0) -#endif - -#if !defined (sub_ddmmss) -#define sub_ddmmss(sh, sl, ah, al, bh, bl) \ - do { \ - UWtype __x; \ - __x =3D (al) - (bl); \ - (sh) =3D (ah) - (bh) - ((al) < (bl)); \ - (sl) =3D __x; \ - } while (0) -#endif - -/* If we lack umul_ppmm but have smul_ppmm, define umul_ppmm in terms of - smul_ppmm. */ -#if !defined (umul_ppmm) && defined (smul_ppmm) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - UWtype __w1; \ - UWtype __xm0 =3D (u), __xm1 =3D (v); \ - smul_ppmm (__w1, w0, __xm0, __xm1); \ - (w1) =3D __w1 + (-(__xm0 >> (W_TYPE_SIZE - 1)) & __xm1) \ - + (-(__xm1 >> (W_TYPE_SIZE - 1)) & __xm0); \ - } while (0) -#endif - -/* If we still don't have umul_ppmm, define it using plain C. - - For reference, when this code is used for squaring (ie. u and v ident= ical - expressions), gcc recognises __x1 and __x2 are the same and generates= 3 - multiplies, not 4. The subsequent additions could be optimized a bit= , - but the only place GMP currently uses such a square is mpn_sqr_baseca= se, - and chips obliged to use this generic C umul will have plenty of wors= e - performance problems than a couple of extra instructions on the diago= nal - of sqr_basecase. */ - -#if !defined (umul_ppmm) -#define umul_ppmm(w1, w0, u, v) \ - do { \ - UWtype __x0, __x1, __x2, __x3; \ - UHWtype __ul, __vl, __uh, __vh; \ - UWtype __u =3D (u), __v =3D (v); \ - \ - __ul =3D __ll_lowpart (__u); \ - __uh =3D __ll_highpart (__u); \ - __vl =3D __ll_lowpart (__v); \ - __vh =3D __ll_highpart (__v); \ - \ - __x0 =3D (UWtype) __ul * __vl; \ - __x1 =3D (UWtype) __ul * __vh; \ - __x2 =3D (UWtype) __uh * __vl; \ - __x3 =3D (UWtype) __uh * __vh; \ - \ - __x1 +=3D __ll_highpart (__x0);/* this can't give carry */ \ - __x1 +=3D __x2; /* but this indeed can */ \ - if (__x1 < __x2) /* did we get it? */ \ - __x3 +=3D __ll_B; /* yes, add it in the proper pos. */ \ - \ - (w1) =3D __x3 + __ll_highpart (__x1); \ - (w0) =3D (__x1 << W_TYPE_SIZE/2) + __ll_lowpart (__x0); \ - } while (0) -#endif - -/* If we don't have smul_ppmm, define it using umul_ppmm (which surely w= ill - exist in one form or another. */ -#if !defined (smul_ppmm) -#define smul_ppmm(w1, w0, u, v) \ - do { \ - UWtype __w1; \ - UWtype __xm0 =3D (u), __xm1 =3D (v); \ - umul_ppmm (__w1, w0, __xm0, __xm1); \ - (w1) =3D __w1 - (-(__xm0 >> (W_TYPE_SIZE - 1)) & __xm1) \ - - (-(__xm1 >> (W_TYPE_SIZE - 1)) & __xm0); \ - } while (0) -#endif - -/* Define this unconditionally, so it can be used for debugging. */ -#define __udiv_qrnnd_c(q, r, n1, n0, d) \ - do { \ - UWtype __d1, __d0, __q1, __q0, __r1, __r0, __m; \ - \ - ASSERT ((d) !=3D 0); \ - ASSERT ((n1) < (d)); \ - \ - __d1 =3D __ll_highpart (d); \ - __d0 =3D __ll_lowpart (d); \ - \ - __q1 =3D (n1) / __d1; \ - __r1 =3D (n1) - __q1 * __d1; \ - __m =3D __q1 * __d0; \ - __r1 =3D __r1 * __ll_B | __ll_highpart (n0); \ - if (__r1 < __m) \ - { \ - __q1--, __r1 +=3D (d); \ - if (__r1 >=3D (d)) /* i.e. we didn't get carry when adding to __r1 */\ - if (__r1 < __m) \ - __q1--, __r1 +=3D (d); \ - } \ - __r1 -=3D __m; \ - \ - __q0 =3D __r1 / __d1; \ - __r0 =3D __r1 - __q0 * __d1; \ - __m =3D __q0 * __d0; \ - __r0 =3D __r0 * __ll_B | __ll_lowpart (n0); \ - if (__r0 < __m) \ - { \ - __q0--, __r0 +=3D (d); \ - if (__r0 >=3D (d)) \ - if (__r0 < __m) \ - __q0--, __r0 +=3D (d); \ - } \ - __r0 -=3D __m; \ - \ - (q) =3D __q1 * __ll_B | __q0; \ - (r) =3D __r0; \ - } while (0) - -/* If the processor has no udiv_qrnnd but sdiv_qrnnd, go through - __udiv_w_sdiv (defined in libgcc or elsewhere). */ -#if !defined (udiv_qrnnd) && defined (sdiv_qrnnd) \ - && ! defined (LONGLONG_STANDALONE) -#define udiv_qrnnd(q, r, nh, nl, d) \ - do { \ - UWtype __r; \ - (q) =3D __MPN(udiv_w_sdiv) (&__r, nh, nl, d); \ - (r) =3D __r; \ - } while (0) -__GMP_DECLSPEC UWtype __MPN(udiv_w_sdiv) (UWtype *, UWtype, UWtype, UWty= pe); -#endif - -/* If udiv_qrnnd was not defined for this processor, use __udiv_qrnnd_c.= */ -#if !defined (udiv_qrnnd) -#define UDIV_NEEDS_NORMALIZATION 1 -#define udiv_qrnnd __udiv_qrnnd_c -#endif - -#if !defined (count_leading_zeros) -#define count_leading_zeros(count, x) \ - do { \ - UWtype __xr =3D (x); \ - UWtype __a; \ - \ - if (W_TYPE_SIZE =3D=3D 32) \ - { \ - __a =3D __xr < ((UWtype) 1 << 2*__BITS4) \ - ? (__xr < ((UWtype) 1 << __BITS4) ? 1 : __BITS4 + 1) \ - : (__xr < ((UWtype) 1 << 3*__BITS4) ? 2*__BITS4 + 1 \ - : 3*__BITS4 + 1); \ - } \ - else \ - { \ - for (__a =3D W_TYPE_SIZE - 8; __a > 0; __a -=3D 8) \ - if (((__xr >> __a) & 0xff) !=3D 0) \ - break; \ - ++__a; \ - } \ - \ - (count) =3D W_TYPE_SIZE + 1 - __a - __clz_tab[__xr >> __a]; \ - } while (0) -/* This version gives a well-defined value for zero. */ -#define COUNT_LEADING_ZEROS_0 (W_TYPE_SIZE - 1) -#define COUNT_LEADING_ZEROS_NEED_CLZ_TAB -#define COUNT_LEADING_ZEROS_SLOW -#endif - -/* clz_tab needed by mpn/x86/pentium/mod_1.asm in a fat binary */ -#if HAVE_HOST_CPU_FAMILY_x86 && WANT_FAT_BINARY -#define COUNT_LEADING_ZEROS_NEED_CLZ_TAB -#endif - -#ifdef COUNT_LEADING_ZEROS_NEED_CLZ_TAB -extern const unsigned char __GMP_DECLSPEC __clz_tab[129]; -#endif - -#if !defined (count_trailing_zeros) -#if !defined (COUNT_LEADING_ZEROS_SLOW) -/* Define count_trailing_zeros using an asm count_leading_zeros. */ -#define count_trailing_zeros(count, x) \ - do { \ - UWtype __ctz_x =3D (x); \ - UWtype __ctz_c; \ - ASSERT (__ctz_x !=3D 0); \ - count_leading_zeros (__ctz_c, __ctz_x & -__ctz_x); \ - (count) =3D W_TYPE_SIZE - 1 - __ctz_c; \ - } while (0) -#else -/* Define count_trailing_zeros in plain C, assuming small counts are com= mon. - We use clz_tab without ado, since the C count_leading_zeros above wil= l have - pulled it in. */ -#define count_trailing_zeros(count, x) \ - do { \ - UWtype __ctz_x =3D (x); \ - int __ctz_c; \ - \ - if (LIKELY ((__ctz_x & 0xff) !=3D 0)) \ - (count) =3D __clz_tab[__ctz_x & -__ctz_x] - 2; \ - else \ - { \ - for (__ctz_c =3D 8 - 2; __ctz_c < W_TYPE_SIZE - 2; __ctz_c +=3D 8) \ - { \ - __ctz_x >>=3D 8; \ - if (LIKELY ((__ctz_x & 0xff) !=3D 0)) \ - break; \ - } \ - \ - (count) =3D __ctz_c + __clz_tab[__ctz_x & -__ctz_x]; \ - } \ - } while (0) -#endif -#endif - -#ifndef UDIV_NEEDS_NORMALIZATION -#define UDIV_NEEDS_NORMALIZATION 0 -#endif - -/* Whether udiv_qrnnd is actually implemented with udiv_qrnnd_preinv, an= d - that hence the latter should always be used. */ -#ifndef UDIV_PREINV_ALWAYS -#define UDIV_PREINV_ALWAYS 0 -#endif --=20 2.26.1 --------------FAA07B69FA7D344621580E10-- From unknown Mon Aug 18 11:26:01 2025 X-Loop: help-debbugs@gnu.org Subject: bug#42269: Remove non-GMP code from coreutils factor.c Resent-From: tg@gmplib.org (=?UTF-8?Q?Torbj=C3=B6rn?= Granlund) Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 08 Jul 2020 16:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42269 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: Paul Eggert Cc: nisse@lysator.liu.se, P@draigBrady.com, jay@gnu.org, 42269@debbugs.gnu.org, jim@meyering.net X-Debbugs-Original-Cc: nisse@lysator.liu.se, =?UTF-8?Q?P=C3=A1draig?= Brady , James Youngman , Coreutils bugs , Jim Meyering Received: via spool by submit@debbugs.gnu.org id=B.15942274871950 (code B ref -1); Wed, 08 Jul 2020 16:59:02 +0000 Received: (at submit) by debbugs.gnu.org; 8 Jul 2020 16:58:07 +0000 Received: from localhost ([127.0.0.1]:38986 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtDOY-0000VO-MM for submit@debbugs.gnu.org; Wed, 08 Jul 2020 12:58:06 -0400 Received: from lists.gnu.org ([209.51.188.17]:37396) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtDOW-0000VD-J6 for submit@debbugs.gnu.org; Wed, 08 Jul 2020 12:58:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:54146) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jtDOW-0001pW-B9 for bug-coreutils@gnu.org; Wed, 08 Jul 2020 12:58:04 -0400 Received: from martin.gmplib.org ([130.242.124.102]:59734 helo=shell.gmplib.org) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jtDOU-0001ji-8j; Wed, 08 Jul 2020 12:58:03 -0400 Received: by shell.gmplib.org (Postfix, from userid 1001) id 6DA2051340; Wed, 8 Jul 2020 18:57:53 +0200 (CEST) From: tg@gmplib.org (=?UTF-8?Q?Torbj=C3=B6rn?= Granlund) References: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> Date: Wed, 08 Jul 2020 18:57:53 +0200 In-Reply-To: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> (Paul Eggert's message of "Wed, 8 Jul 2020 09:25:57 -0700") Message-ID: <865zayosfi.fsf@shell.gmplib.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: none client-ip=130.242.124.102; envelope-from=tg@gmplib.org; helo=shell.gmplib.org X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/08 12:57:58 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, KHOP_HELO_FCRDNS=1, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Paul Eggert writes: I recently modified GNU coreutils so that it can assume GMP, possibly by compiling and linking mini-gmp.c. This helps simplify the coreutils source code and makes coreutils behavior more portable. In doing so, I noticed that factor.c has special-purpose code to factor integers up to 127 bits. Although this code added functionality when coreutils could not assume GMP, it's no longer needed for that. And although it runs faster than the GMP code does, while doing the recent surgery on factor.c I began to wonder whether the hassle of maintaining the code outweighed its usefulness. So I wrote up the attached patch, which simply removes the non-GMP code and simplifies factor.c quite a bit. I assume the attached patch will hurt performance significantly in some cases for 127-bit numbers, so I did not install it. Perhaps it would be better to keep the non-GMP algorithm and recode it with GMP. Or perhaps it would be better to leave the factor.c code alone. Comments? The GMP code in coreutils factor.c was writen by me as a demo (see gmp/demos) over 25 years ago. It was put into coreutils' factor.c without consulting me. I would have disagreed if I had been asked. The non-GMP code of coreutils was extremely well-tuned by me and Niels M=C3=B6ller a couple of years ago. It is so fast that it has created some stir in the mathematical community, or so I have been told I expect the GMP code of factor.c to be pretty much unused. Why? Because Pollard rho is suitable only in the range currently covered by the non-GMP code. Iff we were to write well-tuned low-level GMP code, that we could expend the practical range ever so slightly. By leaving just the GMP code, you would create a pretty useless factor command. Any naive old factor command would often beat it. It would make much more sense to remove the factor command altogether. If any code is to be removed, then that would be the GMP code of coreutils factor. --=20 Torbj=C3=B6rn Please encrypt, key id 0xC8601622 From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 08 13:58:10 2020 Received: (at control) by debbugs.gnu.org; 8 Jul 2020 17:58:10 +0000 Received: from localhost ([127.0.0.1]:39014 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtEKg-0001xw-GZ for submit@debbugs.gnu.org; Wed, 08 Jul 2020 13:58:10 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:33744) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtEKe-0001xg-Bh for control@debbugs.gnu.org; Wed, 08 Jul 2020 13:58:09 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E9A611600C4 for ; Wed, 8 Jul 2020 10:58:01 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id eNTXNSTybW6J for ; Wed, 8 Jul 2020 10:58:01 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 5228A1600CD for ; Wed, 8 Jul 2020 10:58:01 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 0azPEIlNuvUu for ; Wed, 8 Jul 2020 10:58:01 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 2F52F1600C4 for ; Wed, 8 Jul 2020 10:58:01 -0700 (PDT) To: control@debbugs.gnu.org From: Paul Eggert Subject: 42269 has a patch Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDUGdRVEFRSUFL QVVDVElCeQpaQUliQXdVSkVzd0RBQVlMQ1FnSEF3SUdGUWdDQ1FvTEJCWUNBd0VDSGdFQ0Y0 QUFDZ2tRN1pmcERtS3FmalJSCkd3LytJajAzZGhZZllsL2dYVlJpdXpWMWdHcmJIayt0bmZy SS9DN2ZBZW9GelE1dFZnVmluU2hhUGtabzBIVFAKZjE4eDZJREVkQWlPOE1xbzF5cDBDdEht ekdNQ0o1MG80R3JnZmpscjZnLyt2dEVPS2JobGVzek4yWHBKdnB3TQoyUWdHdm4vbGFUTFV1 OFBIOWFSV1RzN3FKSlpLS0tBYjRzeFljOTJGZWhQdTZGT0QwZERpeWhsREFxNGxPVjJtCmRC cHpRYmlvam9aelFMTVF3anBnQ1RLMjU3MmVLOUVPRVF5U1VUaFhyU0l6NkFTZW5wNE5ZVEZI czl0dUpRdlgKazlnWkRkUFNsM2JwKzQ3ZEd4bHhFV0xwQklNN3pJT053NGtzNGF6Z1Q4bnZE WnhBNUlaSHR2cUJsSkxCT2JZWQowTGU2MVdwMHkzVGxCRGgycWRLOGVZTDQyNlc0c2NFTVN1 aWc1Z2I4T0F0UWlCVzZrMnNHVXh4ZWl2OG92V3U4CllBWmdLSmZ1b1dJK3VSbk1FZGRydVk4 SnNvTTU0S2FLdlppa2tLczJiZzFuZHRMVnpIcEo2cUZaQzdRVmplSFUKaDYvQm1ndmRqV1Ba WUZUdE4rS0E5Q1dYM0dRS0tnTjN1dTk4OHl6bkQ3TG5COThUNEVVSDFIQS9HbmZCcU1WMQpn cHpUdlBjNHFWUWluQ21Ja0VGcDgzemwrRzVmQ2pKSjNXN2l2ekNuWW80S2hLTHBGVW05N29r VEtSMkxXM3haCnpFVzRjTFNXTzM4N01USzNDekRPeDVxZTZzNGE5MVp1Wk0vai9UUWRUTERh cU5uODNrQTRIcTQ4VUhYWXhjSWgKK05kOGsvM3c2bEZ1b0swd3JPRml5d2pMeCswdXI1am1t YmVjQkdIYzF4ZGhBRkc1QWcwRVRJQnlaQUVRQUthRgo2NzhUOXd5SDR3alRyVjFQejNjREVv U25WLzBaVXJPVDM3cDFkY0d5ai9JWHExeDY3MEhSVmFoQW1rMHNacFljCjI1UEY5RDVHUFlI RldsTmp1UFU5NnJEbmRYQjNoZWRtQlJoTGRDNGJBWGpJNERWK2JtZFZlK3EvSU1ubFpSYVYK bG05RWlNQ1ZBUjZ3MTNzUmV1N3FYa1c5cjNSd1kyQXpYc2twL3RBZTRCUktyMVptYnZpMm5i blE2ZXBFQzQycgpSYngwQjFFaGpiSVFaNUpIR2syNGlQVDdMZEJnbk5tb3M1d1lqendObGtN UUQ1VDBZZHpoazdKK1V4d0E1bTQ2Cm1PaFJEQzJyRlYvQTBnbTVUTHk4RFhqdi9Fc2M0Z1lu WWFpNlNRcW5VRVZoNUx1VjhZQ0pCbmlqcytUaXc3MXgKMWljbW42eEdJNDVFdWdKT2dlYyty THlwWWdwVnA0eDBISTVUODhxQlJZQ2t4SDNLZzhRbytFV05BOUE0TFJROQpEWDhuam9uYTBn ZjBzMDN0b2NLOGtCTjY2VW9xcVB0SEJuYzRlTWdCeW1DZmxLMTJlS2ZkMllZeG55ZzljWmF6 CldBNVZzbHZUeHBtNzZoYmc1b2lBRUgvVmcvOE14SHlBblBoZnJnd3lQcm1KRWNWQmFmZHNw Sm5ZUXhCWU5jbzIKTEZQSWhsT3ZXaDhyNGF0K3MrTTNMYjI2b1VUY3psZ2RXMVNmM1NEQTc3 Qk1SbkYwRlF5RSs3QXpWNzlNQk40eQpraXFhZXpReHRhRjFGeS90dmtoZmZTbzh1K2R3RzBF Z0poK3RlMzhnVGNJU1ZyMEdJUHBsTHo2WWhqcmJIclBSCkYxQ041VXVMOURCR2p4dU4zNVJM TlZFZnRhNlJVRmxSNk5jdFRqdnJBQkVCQUFHSkFpVUVHQUVDQUE4RkFreUEKY21RQ0d3d0ZD UkxNQXdBQUNna1E3WmZwRG1LcWZqU3JIQS8rS3pBS3ZUeFJoQTlNV05MeEl5SjdTNXVKMTZn cwpUM29DalpyQktHRWhLTU9HWDRPMEdBNlZPRXJ5TzdRUkNDWWFoM294U0czOElBbk5laXdK WGdVOUJ6a2s4NVVHCmJQRWQ3SEdGL1ZTZUhDUXdXb3U2anFVRFRTRHZuOVloTlRkRzBLWFBN NzRhQyt4cjJab3cxTzJtaFhpaGdXS0QKMER3KzBMWVBuVU9zUTBLT0Z4SFhYWUhtUnJTMU9a UFU1OUJMdmMrVFJoSWhhZlNIS0x3YlhLKzZja2t4Qng2aAo4ejVjY3BHMFFzNGJGaGRGWW5G ckVpZURMb0dtbkUyWUxoZFY2c3dKOVZOQ1M2cExpRW9oVDNmbTdhWG0xNXRaCk9JeXpNWmhI UlNBUGJsWHhRMFpTV2pxOG9ScmNZTkZ4YzRXMVVScEFrQkNPWUpvWHZRZkQ1TDNscUFsOFRD cUQKVXpZeGhIL3RKaGJEZEhycUhINzY3amFEYVRCMStUYWxwLzJBTUt3Y1hOT2Rpa2xHeGJt SFZHNllHbDZnOExyYgpzdTlOWkVJNHlMbEh6dWlrdGhKV2d6KzN2WmhWR3lObHQrSE5Jb0Y2 Q2pETDJvbXU1Y0VxNFJESE00NFFxUGs2Cmw3TzBwVXZOMW1UNEIrUzFiMDhSS3BxbS9mZjAx NUUzN0hOVi9waUl2Smx4R0FZejhQU2Z1R0NCMXRoTVlxbG0KZ2RoZDkvQmFiR0ZiR0dZSEE2 VTQvVDV6cVUrZjZ4SHkxU3NBUVoxTVNLbEx3ZWtCSVQrNC9jTFJHcUNIam5WMApxNUgvVDZh N3Q1bVBrYnpTck9MU280cHVqK0lUb05qWXlZSURCV3pobEExOWF2T2ErcnZVam1IdEQzc0ZO N2NYCld0a0dvaThidU5jYnk0VT0KPUFMNm8KLS0tLS1FTkQgUEdQIFBVQkxJQyBLRVkgQkxP Q0stLS0tLQo= Organization: UCLA Computer Science Department Message-ID: <4b678d42-6923-7bdf-f6a1-300b0a15da04@cs.ucla.edu> Date: Wed, 8 Jul 2020 10:58:00 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tags 42269 patch From unknown Mon Aug 18 11:26:01 2025 X-Loop: help-debbugs@gnu.org Subject: bug#42269: Remove non-GMP code from coreutils factor.c Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 08 Jul 2020 18:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42269 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: patch To: =?UTF-8?Q?Torbj=C3=B6rn?= Granlund Cc: nisse@lysator.liu.se, P@draigBrady.com, jay@gnu.org, 42269@debbugs.gnu.org, jim@meyering.net Received: via spool by 42269-submit@debbugs.gnu.org id=B42269.159423299618728 (code B ref 42269); Wed, 08 Jul 2020 18:30:02 +0000 Received: (at 42269) by debbugs.gnu.org; 8 Jul 2020 18:29:56 +0000 Received: from localhost ([127.0.0.1]:39051 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtEpQ-0004rz-72 for submit@debbugs.gnu.org; Wed, 08 Jul 2020 14:29:56 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:40460) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtEpO-0004rj-Nw for 42269@debbugs.gnu.org; Wed, 08 Jul 2020 14:29:55 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0784C1600C4; Wed, 8 Jul 2020 11:29:49 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id korGCrkCR5eQ; Wed, 8 Jul 2020 11:29:48 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 18BAD1600CD; Wed, 8 Jul 2020 11:29:48 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id uNanaNv-moKL; Wed, 8 Jul 2020 11:29:48 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id DADE71600C4; Wed, 8 Jul 2020 11:29:47 -0700 (PDT) References: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> <865zayosfi.fsf@shell.gmplib.org> From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDUGdRVEFRSUFL QVVDVElCeQpaQUliQXdVSkVzd0RBQVlMQ1FnSEF3SUdGUWdDQ1FvTEJCWUNBd0VDSGdFQ0Y0 QUFDZ2tRN1pmcERtS3FmalJSCkd3LytJajAzZGhZZllsL2dYVlJpdXpWMWdHcmJIayt0bmZy SS9DN2ZBZW9GelE1dFZnVmluU2hhUGtabzBIVFAKZjE4eDZJREVkQWlPOE1xbzF5cDBDdEht ekdNQ0o1MG80R3JnZmpscjZnLyt2dEVPS2JobGVzek4yWHBKdnB3TQoyUWdHdm4vbGFUTFV1 OFBIOWFSV1RzN3FKSlpLS0tBYjRzeFljOTJGZWhQdTZGT0QwZERpeWhsREFxNGxPVjJtCmRC cHpRYmlvam9aelFMTVF3anBnQ1RLMjU3MmVLOUVPRVF5U1VUaFhyU0l6NkFTZW5wNE5ZVEZI czl0dUpRdlgKazlnWkRkUFNsM2JwKzQ3ZEd4bHhFV0xwQklNN3pJT053NGtzNGF6Z1Q4bnZE WnhBNUlaSHR2cUJsSkxCT2JZWQowTGU2MVdwMHkzVGxCRGgycWRLOGVZTDQyNlc0c2NFTVN1 aWc1Z2I4T0F0UWlCVzZrMnNHVXh4ZWl2OG92V3U4CllBWmdLSmZ1b1dJK3VSbk1FZGRydVk4 SnNvTTU0S2FLdlppa2tLczJiZzFuZHRMVnpIcEo2cUZaQzdRVmplSFUKaDYvQm1ndmRqV1Ba WUZUdE4rS0E5Q1dYM0dRS0tnTjN1dTk4OHl6bkQ3TG5COThUNEVVSDFIQS9HbmZCcU1WMQpn cHpUdlBjNHFWUWluQ21Ja0VGcDgzemwrRzVmQ2pKSjNXN2l2ekNuWW80S2hLTHBGVW05N29r VEtSMkxXM3haCnpFVzRjTFNXTzM4N01USzNDekRPeDVxZTZzNGE5MVp1Wk0vai9UUWRUTERh cU5uODNrQTRIcTQ4VUhYWXhjSWgKK05kOGsvM3c2bEZ1b0swd3JPRml5d2pMeCswdXI1am1t YmVjQkdIYzF4ZGhBRkc1QWcwRVRJQnlaQUVRQUthRgo2NzhUOXd5SDR3alRyVjFQejNjREVv U25WLzBaVXJPVDM3cDFkY0d5ai9JWHExeDY3MEhSVmFoQW1rMHNacFljCjI1UEY5RDVHUFlI RldsTmp1UFU5NnJEbmRYQjNoZWRtQlJoTGRDNGJBWGpJNERWK2JtZFZlK3EvSU1ubFpSYVYK bG05RWlNQ1ZBUjZ3MTNzUmV1N3FYa1c5cjNSd1kyQXpYc2twL3RBZTRCUktyMVptYnZpMm5i blE2ZXBFQzQycgpSYngwQjFFaGpiSVFaNUpIR2syNGlQVDdMZEJnbk5tb3M1d1lqendObGtN UUQ1VDBZZHpoazdKK1V4d0E1bTQ2Cm1PaFJEQzJyRlYvQTBnbTVUTHk4RFhqdi9Fc2M0Z1lu WWFpNlNRcW5VRVZoNUx1VjhZQ0pCbmlqcytUaXc3MXgKMWljbW42eEdJNDVFdWdKT2dlYyty THlwWWdwVnA0eDBISTVUODhxQlJZQ2t4SDNLZzhRbytFV05BOUE0TFJROQpEWDhuam9uYTBn ZjBzMDN0b2NLOGtCTjY2VW9xcVB0SEJuYzRlTWdCeW1DZmxLMTJlS2ZkMllZeG55ZzljWmF6 CldBNVZzbHZUeHBtNzZoYmc1b2lBRUgvVmcvOE14SHlBblBoZnJnd3lQcm1KRWNWQmFmZHNw Sm5ZUXhCWU5jbzIKTEZQSWhsT3ZXaDhyNGF0K3MrTTNMYjI2b1VUY3psZ2RXMVNmM1NEQTc3 Qk1SbkYwRlF5RSs3QXpWNzlNQk40eQpraXFhZXpReHRhRjFGeS90dmtoZmZTbzh1K2R3RzBF Z0poK3RlMzhnVGNJU1ZyMEdJUHBsTHo2WWhqcmJIclBSCkYxQ041VXVMOURCR2p4dU4zNVJM TlZFZnRhNlJVRmxSNk5jdFRqdnJBQkVCQUFHSkFpVUVHQUVDQUE4RkFreUEKY21RQ0d3d0ZD UkxNQXdBQUNna1E3WmZwRG1LcWZqU3JIQS8rS3pBS3ZUeFJoQTlNV05MeEl5SjdTNXVKMTZn cwpUM29DalpyQktHRWhLTU9HWDRPMEdBNlZPRXJ5TzdRUkNDWWFoM294U0czOElBbk5laXdK WGdVOUJ6a2s4NVVHCmJQRWQ3SEdGL1ZTZUhDUXdXb3U2anFVRFRTRHZuOVloTlRkRzBLWFBN NzRhQyt4cjJab3cxTzJtaFhpaGdXS0QKMER3KzBMWVBuVU9zUTBLT0Z4SFhYWUhtUnJTMU9a UFU1OUJMdmMrVFJoSWhhZlNIS0x3YlhLKzZja2t4Qng2aAo4ejVjY3BHMFFzNGJGaGRGWW5G ckVpZURMb0dtbkUyWUxoZFY2c3dKOVZOQ1M2cExpRW9oVDNmbTdhWG0xNXRaCk9JeXpNWmhI UlNBUGJsWHhRMFpTV2pxOG9ScmNZTkZ4YzRXMVVScEFrQkNPWUpvWHZRZkQ1TDNscUFsOFRD cUQKVXpZeGhIL3RKaGJEZEhycUhINzY3amFEYVRCMStUYWxwLzJBTUt3Y1hOT2Rpa2xHeGJt SFZHNllHbDZnOExyYgpzdTlOWkVJNHlMbEh6dWlrdGhKV2d6KzN2WmhWR3lObHQrSE5Jb0Y2 Q2pETDJvbXU1Y0VxNFJESE00NFFxUGs2Cmw3TzBwVXZOMW1UNEIrUzFiMDhSS3BxbS9mZjAx NUUzN0hOVi9waUl2Smx4R0FZejhQU2Z1R0NCMXRoTVlxbG0KZ2RoZDkvQmFiR0ZiR0dZSEE2 VTQvVDV6cVUrZjZ4SHkxU3NBUVoxTVNLbEx3ZWtCSVQrNC9jTFJHcUNIam5WMApxNUgvVDZh N3Q1bVBrYnpTck9MU280cHVqK0lUb05qWXlZSURCV3pobEExOWF2T2ErcnZVam1IdEQzc0ZO N2NYCld0a0dvaThidU5jYnk0VT0KPUFMNm8KLS0tLS1FTkQgUEdQIFBVQkxJQyBLRVkgQkxP Q0stLS0tLQo= Organization: UCLA Computer Science Department Message-ID: Date: Wed, 8 Jul 2020 11:29:47 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <865zayosfi.fsf@shell.gmplib.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 7/8/20 9:57 AM, Torbj=C3=B6rn Granlund wrote: > The non-GMP code of coreutils was extremely well-tuned by me and Niels > M=C3=B6ller a couple of years ago. How time flies! The code was merged in 2012. > By leaving just the GMP code, you would create a pretty useless factor > command. Any naive old factor command would often beat it. It would > make much more sense to remove the factor command altogether. OK, thanks. Then let's forget about the patch I just proposed. Could you give an example of where the 128-bit code shines, compared to t= he GMP=20 code on the same arguments? I could add the example as a comment in the f= actor.c=20 code, to let me and future maintainers know why it's useful for performan= ce. From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 08 14:31:19 2020 Received: (at control) by debbugs.gnu.org; 8 Jul 2020 18:31:19 +0000 Received: from localhost ([127.0.0.1]:39055 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtEql-0004vu-H4 for submit@debbugs.gnu.org; Wed, 08 Jul 2020 14:31:19 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:40678) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtEqk-0004vh-1s for control@debbugs.gnu.org; Wed, 08 Jul 2020 14:31:18 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id BBFE31600C4 for ; Wed, 8 Jul 2020 11:31:12 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 7JMZUdZ7bbD7 for ; Wed, 8 Jul 2020 11:31:12 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 267371600CD for ; Wed, 8 Jul 2020 11:31:12 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id T4Y3sMx0F-ZC for ; Wed, 8 Jul 2020 11:31:12 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 06A571600C4 for ; Wed, 8 Jul 2020 11:31:12 -0700 (PDT) To: control@debbugs.gnu.org From: Paul Eggert Subject: close 42269 Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDUGdRVEFRSUFL QVVDVElCeQpaQUliQXdVSkVzd0RBQVlMQ1FnSEF3SUdGUWdDQ1FvTEJCWUNBd0VDSGdFQ0Y0 QUFDZ2tRN1pmcERtS3FmalJSCkd3LytJajAzZGhZZllsL2dYVlJpdXpWMWdHcmJIayt0bmZy SS9DN2ZBZW9GelE1dFZnVmluU2hhUGtabzBIVFAKZjE4eDZJREVkQWlPOE1xbzF5cDBDdEht ekdNQ0o1MG80R3JnZmpscjZnLyt2dEVPS2JobGVzek4yWHBKdnB3TQoyUWdHdm4vbGFUTFV1 OFBIOWFSV1RzN3FKSlpLS0tBYjRzeFljOTJGZWhQdTZGT0QwZERpeWhsREFxNGxPVjJtCmRC cHpRYmlvam9aelFMTVF3anBnQ1RLMjU3MmVLOUVPRVF5U1VUaFhyU0l6NkFTZW5wNE5ZVEZI czl0dUpRdlgKazlnWkRkUFNsM2JwKzQ3ZEd4bHhFV0xwQklNN3pJT053NGtzNGF6Z1Q4bnZE WnhBNUlaSHR2cUJsSkxCT2JZWQowTGU2MVdwMHkzVGxCRGgycWRLOGVZTDQyNlc0c2NFTVN1 aWc1Z2I4T0F0UWlCVzZrMnNHVXh4ZWl2OG92V3U4CllBWmdLSmZ1b1dJK3VSbk1FZGRydVk4 SnNvTTU0S2FLdlppa2tLczJiZzFuZHRMVnpIcEo2cUZaQzdRVmplSFUKaDYvQm1ndmRqV1Ba WUZUdE4rS0E5Q1dYM0dRS0tnTjN1dTk4OHl6bkQ3TG5COThUNEVVSDFIQS9HbmZCcU1WMQpn cHpUdlBjNHFWUWluQ21Ja0VGcDgzemwrRzVmQ2pKSjNXN2l2ekNuWW80S2hLTHBGVW05N29r VEtSMkxXM3haCnpFVzRjTFNXTzM4N01USzNDekRPeDVxZTZzNGE5MVp1Wk0vai9UUWRUTERh cU5uODNrQTRIcTQ4VUhYWXhjSWgKK05kOGsvM3c2bEZ1b0swd3JPRml5d2pMeCswdXI1am1t YmVjQkdIYzF4ZGhBRkc1QWcwRVRJQnlaQUVRQUthRgo2NzhUOXd5SDR3alRyVjFQejNjREVv U25WLzBaVXJPVDM3cDFkY0d5ai9JWHExeDY3MEhSVmFoQW1rMHNacFljCjI1UEY5RDVHUFlI RldsTmp1UFU5NnJEbmRYQjNoZWRtQlJoTGRDNGJBWGpJNERWK2JtZFZlK3EvSU1ubFpSYVYK bG05RWlNQ1ZBUjZ3MTNzUmV1N3FYa1c5cjNSd1kyQXpYc2twL3RBZTRCUktyMVptYnZpMm5i blE2ZXBFQzQycgpSYngwQjFFaGpiSVFaNUpIR2syNGlQVDdMZEJnbk5tb3M1d1lqendObGtN UUQ1VDBZZHpoazdKK1V4d0E1bTQ2Cm1PaFJEQzJyRlYvQTBnbTVUTHk4RFhqdi9Fc2M0Z1lu WWFpNlNRcW5VRVZoNUx1VjhZQ0pCbmlqcytUaXc3MXgKMWljbW42eEdJNDVFdWdKT2dlYyty THlwWWdwVnA0eDBISTVUODhxQlJZQ2t4SDNLZzhRbytFV05BOUE0TFJROQpEWDhuam9uYTBn ZjBzMDN0b2NLOGtCTjY2VW9xcVB0SEJuYzRlTWdCeW1DZmxLMTJlS2ZkMllZeG55ZzljWmF6 CldBNVZzbHZUeHBtNzZoYmc1b2lBRUgvVmcvOE14SHlBblBoZnJnd3lQcm1KRWNWQmFmZHNw Sm5ZUXhCWU5jbzIKTEZQSWhsT3ZXaDhyNGF0K3MrTTNMYjI2b1VUY3psZ2RXMVNmM1NEQTc3 Qk1SbkYwRlF5RSs3QXpWNzlNQk40eQpraXFhZXpReHRhRjFGeS90dmtoZmZTbzh1K2R3RzBF Z0poK3RlMzhnVGNJU1ZyMEdJUHBsTHo2WWhqcmJIclBSCkYxQ041VXVMOURCR2p4dU4zNVJM TlZFZnRhNlJVRmxSNk5jdFRqdnJBQkVCQUFHSkFpVUVHQUVDQUE4RkFreUEKY21RQ0d3d0ZD UkxNQXdBQUNna1E3WmZwRG1LcWZqU3JIQS8rS3pBS3ZUeFJoQTlNV05MeEl5SjdTNXVKMTZn cwpUM29DalpyQktHRWhLTU9HWDRPMEdBNlZPRXJ5TzdRUkNDWWFoM294U0czOElBbk5laXdK WGdVOUJ6a2s4NVVHCmJQRWQ3SEdGL1ZTZUhDUXdXb3U2anFVRFRTRHZuOVloTlRkRzBLWFBN NzRhQyt4cjJab3cxTzJtaFhpaGdXS0QKMER3KzBMWVBuVU9zUTBLT0Z4SFhYWUhtUnJTMU9a UFU1OUJMdmMrVFJoSWhhZlNIS0x3YlhLKzZja2t4Qng2aAo4ejVjY3BHMFFzNGJGaGRGWW5G ckVpZURMb0dtbkUyWUxoZFY2c3dKOVZOQ1M2cExpRW9oVDNmbTdhWG0xNXRaCk9JeXpNWmhI UlNBUGJsWHhRMFpTV2pxOG9ScmNZTkZ4YzRXMVVScEFrQkNPWUpvWHZRZkQ1TDNscUFsOFRD cUQKVXpZeGhIL3RKaGJEZEhycUhINzY3amFEYVRCMStUYWxwLzJBTUt3Y1hOT2Rpa2xHeGJt SFZHNllHbDZnOExyYgpzdTlOWkVJNHlMbEh6dWlrdGhKV2d6KzN2WmhWR3lObHQrSE5Jb0Y2 Q2pETDJvbXU1Y0VxNFJESE00NFFxUGs2Cmw3TzBwVXZOMW1UNEIrUzFiMDhSS3BxbS9mZjAx NUUzN0hOVi9waUl2Smx4R0FZejhQU2Z1R0NCMXRoTVlxbG0KZ2RoZDkvQmFiR0ZiR0dZSEE2 VTQvVDV6cVUrZjZ4SHkxU3NBUVoxTVNLbEx3ZWtCSVQrNC9jTFJHcUNIam5WMApxNUgvVDZh N3Q1bVBrYnpTck9MU280cHVqK0lUb05qWXlZSURCV3pobEExOWF2T2ErcnZVam1IdEQzc0ZO N2NYCld0a0dvaThidU5jYnk0VT0KPUFMNm8KLS0tLS1FTkQgUEdQIFBVQkxJQyBLRVkgQkxP Q0stLS0tLQo= Organization: UCLA Computer Science Department Message-ID: Date: Wed, 8 Jul 2020 11:31:11 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) close 42269 From unknown Mon Aug 18 11:26:01 2025 X-Loop: help-debbugs@gnu.org Subject: bug#42269: Remove non-GMP code from coreutils factor.c Resent-From: tg@gmplib.org (=?UTF-8?Q?Torbj=C3=B6rn?= Granlund) Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 08 Jul 2020 19:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42269 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: patch To: Paul Eggert Cc: nisse@lysator.liu.se, P@draigBrady.com, jay@gnu.org, 42269@debbugs.gnu.org, jim@meyering.net Received: via spool by 42269-submit@debbugs.gnu.org id=B42269.1594236887862 (code B ref 42269); Wed, 08 Jul 2020 19:35:02 +0000 Received: (at 42269) by debbugs.gnu.org; 8 Jul 2020 19:34:47 +0000 Received: from localhost ([127.0.0.1]:39128 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtFqB-0000Dq-Io for submit@debbugs.gnu.org; Wed, 08 Jul 2020 15:34:47 -0400 Received: from martin.gmplib.org ([130.242.124.102]:64998 helo=shell.gmplib.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtFq9-0000Dh-N5 for 42269@debbugs.gnu.org; Wed, 08 Jul 2020 15:34:46 -0400 Received: by shell.gmplib.org (Postfix, from userid 1001) id 19BDC51359; Wed, 8 Jul 2020 21:34:44 +0200 (CEST) From: tg@gmplib.org (=?UTF-8?Q?Torbj=C3=B6rn?= Granlund) References: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> <865zayosfi.fsf@shell.gmplib.org> Date: Wed, 08 Jul 2020 21:34:44 +0200 In-Reply-To: (Paul Eggert's message of "Wed, 8 Jul 2020 11:29:47 -0700") Message-ID: <86tuyhajhn.fsf@shell.gmplib.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Paul Eggert writes: Could you give an example of where the 128-bit code shines, compared to the GMP code on the same arguments? I could add the example as a comment in the factor.c code, to let me and future maintainers know why it's useful for performance. Any number which does not happen to be B-smooth for, say B < 2^30, will show easily measurable performance difference of 5x to 40x IIRC. A semantic difference which sometimes makes the speed difference less pronounced is that the non-GMP code proves that the printed factors are indeed prime. We use a criterion which requires factoring of p-1 for any assumed prime factor p. In unlucky cases that recursive factorisation is costlier than the main factorisation. I have a patch which makes the non-GMP code some 2x - 3x faster. It's been maturing for several years now, so I suppose I should really finish it. (It got tangled with code which improves the GMP case by letting it fall into the non-GMP code as numbers get smaller. That sounds simple but is quite messy for various reasons. It is also not clear how much complexity we could defend for this command of limited utility.) --=20 Torbj=C3=B6rn Please encrypt, key id 0xC8601622 From unknown Mon Aug 18 11:26:01 2025 X-Loop: help-debbugs@gnu.org Subject: bug#42269: Remove non-GMP code from coreutils factor.c Resent-From: nisse@lysator.liu.se (Niels =?UTF-8?Q?M=C3=B6ller?=) Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 08 Jul 2020 20:52:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42269 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: patch To: tg@gmplib.org (=?UTF-8?Q?Torbj=C3=B6rn?= Granlund) Cc: jay@gnu.org, P@draigBrady.com, eggert@cs.ucla.edu, 42269@debbugs.gnu.org, jim@meyering.net X-Debbugs-Original-Cc: James Youngman , =?UTF-8?Q?P=C3=A1draig?= Brady , Paul Eggert , Coreutils bugs , Jim Meyering Received: via spool by submit@debbugs.gnu.org id=B.15942415088064 (code B ref -1); Wed, 08 Jul 2020 20:52:02 +0000 Received: (at submit) by debbugs.gnu.org; 8 Jul 2020 20:51:48 +0000 Received: from localhost ([127.0.0.1]:39182 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtH2h-000260-Qq for submit@debbugs.gnu.org; Wed, 08 Jul 2020 16:51:48 -0400 Received: from lists.gnu.org ([209.51.188.17]:36086) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtH2g-00025r-9A for submit@debbugs.gnu.org; Wed, 08 Jul 2020 16:51:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:53760) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jtH2g-0006UP-0d for bug-coreutils@gnu.org; Wed, 08 Jul 2020 16:51:46 -0400 Received: from mail.lysator.liu.se ([130.236.254.3]:54077) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jtH2a-0005nz-2X; Wed, 08 Jul 2020 16:51:45 -0400 Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 8EC3540004; Wed, 8 Jul 2020 22:51:34 +0200 (CEST) Received: from armitage.lysator.liu.se (armitage.lysator.liu.se [IPv6:2001:6b0:17:f0a0::83]) by mail.lysator.liu.se (Postfix) with SMTP id DC2B040002; Wed, 8 Jul 2020 22:51:32 +0200 (CEST) Received: by armitage.lysator.liu.se (sSMTP sendmail emulation); Wed, 08 Jul 2020 22:51:32 +0200 From: nisse@lysator.liu.se (Niels =?UTF-8?Q?M=C3=B6ller?=) References: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> <865zayosfi.fsf@shell.gmplib.org> Date: Wed, 08 Jul 2020 22:51:32 +0200 In-Reply-To: <865zayosfi.fsf@shell.gmplib.org> ("=?UTF-8?Q?Torbj=C3=B6rn?= Granlund"'s message of "Wed, 08 Jul 2020 18:57:53 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Virus-Scanned: ClamAV using ClamSMTP Received-SPF: pass client-ip=130.236.254.3; envelope-from=nisse@lysator.liu.se; helo=mail.lysator.liu.se X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/08 16:51:35 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.6 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) tg@gmplib.org (Torbj=C3=B6rn Granlund) writes: > If any code is to be removed, then that would be the GMP code of > coreutils factor. I agree with Torbj=C3=B6rn. The GMP code in GNU factor might have made more sense when most computers were 32-bit, and "bignums" were smaller. But on 64-bit computers, the GMP code would be used only for numbers above 127 bits. I'm really not that familiar with state of the art factoring, but I'd guess pollard rho is a bad algorithm choice for that range, and one ought to use, e.g., some variant of the quadratic sieve. Regards, /Niels --=20 Niels M=C3=B6ller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. From unknown Mon Aug 18 11:26:01 2025 X-Loop: help-debbugs@gnu.org Subject: bug#42269: Remove non-GMP code from coreutils factor.c Resent-From: tg@gmplib.org (=?UTF-8?Q?Torbj=C3=B6rn?= Granlund) Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Wed, 08 Jul 2020 21:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42269 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: patch To: nisse@lysator.liu.se (Niels =?UTF-8?Q?M=C3=B6ller?=) Cc: jay@gnu.org, P@draigBrady.com, eggert@cs.ucla.edu, 42269@debbugs.gnu.org, jim@meyering.net X-Debbugs-Original-Cc: James Youngman , =?UTF-8?Q?P=C3=A1draig?= Brady , Paul Eggert , Coreutils bugs , Jim Meyering Received: via spool by submit@debbugs.gnu.org id=B.159424480713333 (code B ref -1); Wed, 08 Jul 2020 21:47:01 +0000 Received: (at submit) by debbugs.gnu.org; 8 Jul 2020 21:46:47 +0000 Received: from localhost ([127.0.0.1]:39278 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtHtu-0003Sz-UQ for submit@debbugs.gnu.org; Wed, 08 Jul 2020 17:46:47 -0400 Received: from lists.gnu.org ([209.51.188.17]:37558) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtHtt-0003Sr-Cr for submit@debbugs.gnu.org; Wed, 08 Jul 2020 17:46:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39304) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jtHtr-0003sI-GF for bug-coreutils@gnu.org; Wed, 08 Jul 2020 17:46:45 -0400 Received: from martin.gmplib.org ([130.242.124.102]:50864 helo=shell.gmplib.org) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jtHtp-0004X9-7y; Wed, 08 Jul 2020 17:46:42 -0400 Received: by shell.gmplib.org (Postfix, from userid 1001) id 69E345136B; Wed, 8 Jul 2020 23:46:38 +0200 (CEST) From: tg@gmplib.org (=?UTF-8?Q?Torbj=C3=B6rn?= Granlund) References: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> <865zayosfi.fsf@shell.gmplib.org> Date: Wed, 08 Jul 2020 23:46:38 +0200 In-Reply-To: ("Niels =?UTF-8?Q?M=C3=B6ller?="'s message of "Wed, 08 Jul 2020 22:51:32 +0200") Message-ID: <86lfjtaddt.fsf@shell.gmplib.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: none client-ip=130.242.124.102; envelope-from=tg@gmplib.org; helo=shell.gmplib.org X-detected-operating-system: by eggs.gnu.org: First seen = 2020/07/08 17:46:38 X-ACL-Warn: Detected OS = FreeBSD 9.x or newer [fuzzy] X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, KHOP_HELO_FCRDNS=1, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) nisse@lysator.liu.se (Niels M=C3=B6ller) writes: I'm really not that familiar with state of the art factoring, but I'd guess pollard rho is a bad algorithm choice for that range, and one ought to use, e.g., some variant of the quadratic sieve. Let me modify that statement somewhat. Pollars rho is suitable for finding small factors of any size composites. Small here might mean about < 2^64. Any factoring effort should start with trial division, then some rounds of Pollard rho, then perhaps some EC rounds. Only after that and when a remaining cofactor is non-prime, QS is to be rolled out. That could sound like the GMP code of coreutils factor is great for factoring really large numbers. It is, but only for smooth huge numbers. If we ever get crazy enough to consider QS for coreutils factor, its current GMP Pollard rho code would become part of a general factoring engine for numbers > 2^127. --=20 Torbj=C3=B6rn Please encrypt, key id 0xC8601622 From unknown Mon Aug 18 11:26:01 2025 X-Loop: help-debbugs@gnu.org Subject: bug#42269: Remove non-GMP code from coreutils factor.c Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 09 Jul 2020 02:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42269 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: patch To: =?UTF-8?Q?Torbj=C3=B6rn?= Granlund Cc: nisse@lysator.liu.se, P@draigBrady.com, jay@gnu.org, 42269@debbugs.gnu.org, jim@meyering.net Received: via spool by 42269-submit@debbugs.gnu.org id=B42269.15942605294945 (code B ref 42269); Thu, 09 Jul 2020 02:09:02 +0000 Received: (at 42269) by debbugs.gnu.org; 9 Jul 2020 02:08:49 +0000 Received: from localhost ([127.0.0.1]:39479 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtLzV-0001Hg-FM for submit@debbugs.gnu.org; Wed, 08 Jul 2020 22:08:49 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:56104) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jtLzS-0001HT-OC for 42269@debbugs.gnu.org; Wed, 08 Jul 2020 22:08:47 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E27FE16008C; Wed, 8 Jul 2020 19:08:40 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id l6JZMOVjJCz3; Wed, 8 Jul 2020 19:08:39 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id D1EB216009A; Wed, 8 Jul 2020 19:08:39 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id SFt4eCX6VbIj; Wed, 8 Jul 2020 19:08:39 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 8506116008C; Wed, 8 Jul 2020 19:08:39 -0700 (PDT) References: <7c08ef70-bb82-2b7b-0d39-18bbae70afdd@cs.ucla.edu> <865zayosfi.fsf@shell.gmplib.org> <86tuyhajhn.fsf@shell.gmplib.org> From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDUGdRVEFRSUFL QVVDVElCeQpaQUliQXdVSkVzd0RBQVlMQ1FnSEF3SUdGUWdDQ1FvTEJCWUNBd0VDSGdFQ0Y0 QUFDZ2tRN1pmcERtS3FmalJSCkd3LytJajAzZGhZZllsL2dYVlJpdXpWMWdHcmJIayt0bmZy SS9DN2ZBZW9GelE1dFZnVmluU2hhUGtabzBIVFAKZjE4eDZJREVkQWlPOE1xbzF5cDBDdEht ekdNQ0o1MG80R3JnZmpscjZnLyt2dEVPS2JobGVzek4yWHBKdnB3TQoyUWdHdm4vbGFUTFV1 OFBIOWFSV1RzN3FKSlpLS0tBYjRzeFljOTJGZWhQdTZGT0QwZERpeWhsREFxNGxPVjJtCmRC cHpRYmlvam9aelFMTVF3anBnQ1RLMjU3MmVLOUVPRVF5U1VUaFhyU0l6NkFTZW5wNE5ZVEZI czl0dUpRdlgKazlnWkRkUFNsM2JwKzQ3ZEd4bHhFV0xwQklNN3pJT053NGtzNGF6Z1Q4bnZE WnhBNUlaSHR2cUJsSkxCT2JZWQowTGU2MVdwMHkzVGxCRGgycWRLOGVZTDQyNlc0c2NFTVN1 aWc1Z2I4T0F0UWlCVzZrMnNHVXh4ZWl2OG92V3U4CllBWmdLSmZ1b1dJK3VSbk1FZGRydVk4 SnNvTTU0S2FLdlppa2tLczJiZzFuZHRMVnpIcEo2cUZaQzdRVmplSFUKaDYvQm1ndmRqV1Ba WUZUdE4rS0E5Q1dYM0dRS0tnTjN1dTk4OHl6bkQ3TG5COThUNEVVSDFIQS9HbmZCcU1WMQpn cHpUdlBjNHFWUWluQ21Ja0VGcDgzemwrRzVmQ2pKSjNXN2l2ekNuWW80S2hLTHBGVW05N29r VEtSMkxXM3haCnpFVzRjTFNXTzM4N01USzNDekRPeDVxZTZzNGE5MVp1Wk0vai9UUWRUTERh cU5uODNrQTRIcTQ4VUhYWXhjSWgKK05kOGsvM3c2bEZ1b0swd3JPRml5d2pMeCswdXI1am1t YmVjQkdIYzF4ZGhBRkc1QWcwRVRJQnlaQUVRQUthRgo2NzhUOXd5SDR3alRyVjFQejNjREVv U25WLzBaVXJPVDM3cDFkY0d5ai9JWHExeDY3MEhSVmFoQW1rMHNacFljCjI1UEY5RDVHUFlI RldsTmp1UFU5NnJEbmRYQjNoZWRtQlJoTGRDNGJBWGpJNERWK2JtZFZlK3EvSU1ubFpSYVYK bG05RWlNQ1ZBUjZ3MTNzUmV1N3FYa1c5cjNSd1kyQXpYc2twL3RBZTRCUktyMVptYnZpMm5i blE2ZXBFQzQycgpSYngwQjFFaGpiSVFaNUpIR2syNGlQVDdMZEJnbk5tb3M1d1lqendObGtN UUQ1VDBZZHpoazdKK1V4d0E1bTQ2Cm1PaFJEQzJyRlYvQTBnbTVUTHk4RFhqdi9Fc2M0Z1lu WWFpNlNRcW5VRVZoNUx1VjhZQ0pCbmlqcytUaXc3MXgKMWljbW42eEdJNDVFdWdKT2dlYyty THlwWWdwVnA0eDBISTVUODhxQlJZQ2t4SDNLZzhRbytFV05BOUE0TFJROQpEWDhuam9uYTBn ZjBzMDN0b2NLOGtCTjY2VW9xcVB0SEJuYzRlTWdCeW1DZmxLMTJlS2ZkMllZeG55ZzljWmF6 CldBNVZzbHZUeHBtNzZoYmc1b2lBRUgvVmcvOE14SHlBblBoZnJnd3lQcm1KRWNWQmFmZHNw Sm5ZUXhCWU5jbzIKTEZQSWhsT3ZXaDhyNGF0K3MrTTNMYjI2b1VUY3psZ2RXMVNmM1NEQTc3 Qk1SbkYwRlF5RSs3QXpWNzlNQk40eQpraXFhZXpReHRhRjFGeS90dmtoZmZTbzh1K2R3RzBF Z0poK3RlMzhnVGNJU1ZyMEdJUHBsTHo2WWhqcmJIclBSCkYxQ041VXVMOURCR2p4dU4zNVJM TlZFZnRhNlJVRmxSNk5jdFRqdnJBQkVCQUFHSkFpVUVHQUVDQUE4RkFreUEKY21RQ0d3d0ZD UkxNQXdBQUNna1E3WmZwRG1LcWZqU3JIQS8rS3pBS3ZUeFJoQTlNV05MeEl5SjdTNXVKMTZn cwpUM29DalpyQktHRWhLTU9HWDRPMEdBNlZPRXJ5TzdRUkNDWWFoM294U0czOElBbk5laXdK WGdVOUJ6a2s4NVVHCmJQRWQ3SEdGL1ZTZUhDUXdXb3U2anFVRFRTRHZuOVloTlRkRzBLWFBN NzRhQyt4cjJab3cxTzJtaFhpaGdXS0QKMER3KzBMWVBuVU9zUTBLT0Z4SFhYWUhtUnJTMU9a UFU1OUJMdmMrVFJoSWhhZlNIS0x3YlhLKzZja2t4Qng2aAo4ejVjY3BHMFFzNGJGaGRGWW5G ckVpZURMb0dtbkUyWUxoZFY2c3dKOVZOQ1M2cExpRW9oVDNmbTdhWG0xNXRaCk9JeXpNWmhI UlNBUGJsWHhRMFpTV2pxOG9ScmNZTkZ4YzRXMVVScEFrQkNPWUpvWHZRZkQ1TDNscUFsOFRD cUQKVXpZeGhIL3RKaGJEZEhycUhINzY3amFEYVRCMStUYWxwLzJBTUt3Y1hOT2Rpa2xHeGJt SFZHNllHbDZnOExyYgpzdTlOWkVJNHlMbEh6dWlrdGhKV2d6KzN2WmhWR3lObHQrSE5Jb0Y2 Q2pETDJvbXU1Y0VxNFJESE00NFFxUGs2Cmw3TzBwVXZOMW1UNEIrUzFiMDhSS3BxbS9mZjAx NUUzN0hOVi9waUl2Smx4R0FZejhQU2Z1R0NCMXRoTVlxbG0KZ2RoZDkvQmFiR0ZiR0dZSEE2 VTQvVDV6cVUrZjZ4SHkxU3NBUVoxTVNLbEx3ZWtCSVQrNC9jTFJHcUNIam5WMApxNUgvVDZh N3Q1bVBrYnpTck9MU280cHVqK0lUb05qWXlZSURCV3pobEExOWF2T2ErcnZVam1IdEQzc0ZO N2NYCld0a0dvaThidU5jYnk0VT0KPUFMNm8KLS0tLS1FTkQgUEdQIFBVQkxJQyBLRVkgQkxP Q0stLS0tLQo= Organization: UCLA Computer Science Department Message-ID: <1d33c4d2-4e0c-a7eb-ec6b-90a0952a9da6@cs.ucla.edu> Date: Wed, 8 Jul 2020 19:08:36 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <86tuyhajhn.fsf@shell.gmplib.org> Content-Type: multipart/mixed; boundary="------------883BFBA9DA037193E833E116" Content-Language: en-US X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------883BFBA9DA037193E833E116 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 7/8/20 12:34 PM, Torbj=C3=B6rn Granlund wrote: > Any number which does not happen to be B-smooth for, say B < 2^30, will > show easily measurable performance difference of 5x to 40x IIRC. Ah, I had tried the example in the manual, (2^31 - 1) * (2^61 - 1). Even = though=20 it isn't B-smooth for B < 2^30, the performance difference was only 2x on= my=20 machine. I just now tried 2^127 - 1 and saw a similar performance differe= nce,=20 but 2^127 - 3 had a 15x difference so it's a better example. I installed the attached to try to document this better. > I have a patch which makes the non-GMP code some 2x - 3x faster. It's > been maturing for several years now, so I suppose I should really finis= h > it. (It got tangled with code which improves the GMP case by letting i= t > fall into the non-GMP code as numbers get smaller. That sounds simple > but is quite messy for various reasons. It is also not clear how much > complexity we could defend for this command of limited utility.) Yes, 'factor' is just a minor utility needed for POSIX compliance. Althou= gh it'd=20 be nice to get that 2x-3x improvement whenever you have the time, it's no= t=20 urgent. Thanks for your guidance on the GMP issue. --------------883BFBA9DA037193E833E116 Content-Type: text/x-patch; charset=UTF-8; name="0001-factor-explain-why-non-GMP-code-Bug-42269.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-factor-explain-why-non-GMP-code-Bug-42269.patch" >From ba1489d763b66dd1fcec08ecb4cba5917745f6bf Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 8 Jul 2020 18:58:18 -0700 Subject: [PATCH] factor: explain why non-GMP code (Bug#42269) * doc/coreutils.texi (factor invocation): * src/factor.c: Explain why the two-word algorithm is useful. --- doc/coreutils.texi | 24 ++++++++++++++---------- src/factor.c | 5 +++++ 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/doc/coreutils.texi b/doc/coreutils.texi index 6ec1e6c31..656b8bc79 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -18368,14 +18368,17 @@ Print the program version on standard output, then exit without further processing. @end table -Factoring the product of the eighth and ninth Mersenne primes -takes about 4 milliseconds of CPU time on an Intel Xeon Silver 4116. +If the number to be factored is small (less than @math{2^{127}} on +typical machines), @command{factor} uses a faster algorithm. +For example, on a circa-2017 Intel Xeon Silver 4116, factoring the +product of the eighth and ninth Mersenne primes (approximately +@math{2^{92}}) takes about 4 ms of CPU time: @example -M8=$(echo 2^31-1|bc) -M9=$(echo 2^61-1|bc) -n=$(echo "$M8 * $M9" | bc) -bash -c "time factor $n" +$ M8=$(echo 2^31-1 | bc) +$ M9=$(echo 2^61-1 | bc) +$ n=$(echo "$M8 * $M9" | bc) +$ bash -c "time factor $n" 4951760154835678088235319297: 2147483647 2305843009213693951 real 0m0.004s @@ -18383,11 +18386,12 @@ user 0m0.004s sys 0m0.000s @end example -Similarly, factoring the eighth Fermat number @math{2^{256}+1} takes -about 14 seconds on the same machine. +For larger numbers, @command{factor} uses a slower algorithm. On the +same platform, factoring the eighth Fermat number @math{2^{256} + 1} +takes about 14 seconds, and the slower algorithm would have taken +about 750 ms to factor @math{2^{127} - 3} instead of the 50 ms needed by +the faster algorithm. -The single-precision code uses an algorithm -designed for factoring smaller numbers. Factoring large numbers is, in general, hard. The Pollard-Brent rho algorithm used by @command{factor} is particularly effective for numbers with relatively small factors. If you wish to factor large diff --git a/src/factor.c b/src/factor.c index c1c35a562..1b1607f16 100644 --- a/src/factor.c +++ b/src/factor.c @@ -53,6 +53,11 @@ trick of multiplying all n-residues by the word base, allowing cheap Hensel reductions mod n. + The GMP code uses an algorithm that can be considerably slower; + for example, on a circa-2017 Intel Xeon Silver 4116, factoring + 2^{127}-3 takes about 50 ms with the two-word algorithm but would + take about 750 ms with the GMP code. + Improvements: * Use modular inverses also for exact division in the Lucas code, and -- 2.17.1 --------------883BFBA9DA037193E833E116--