From debbugs-submit-bounces@debbugs.gnu.org Sat Feb 29 05:09:53 2020 Received: (at submit) by debbugs.gnu.org; 29 Feb 2020 10:09:53 +0000 Received: from localhost ([127.0.0.1]:34270 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1j7z4C-0001zu-HR for submit@debbugs.gnu.org; Sat, 29 Feb 2020 05:09:52 -0500 Received: from lists.gnu.org ([209.51.188.17]:44544) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1j7yXC-0007FK-FD for submit@debbugs.gnu.org; Sat, 29 Feb 2020 04:35:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:43760) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j7yXA-0000u5-TF for bug-gzip@gnu.org; Sat, 29 Feb 2020 04:35:46 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j7yX9-00068q-Bu for bug-gzip@gnu.org; Sat, 29 Feb 2020 04:35:44 -0500 Received: from mail-lj1-x243.google.com ([2a00:1450:4864:20::243]:41886) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j7yX9-000680-0z for bug-gzip@gnu.org; Sat, 29 Feb 2020 04:35:43 -0500 Received: by mail-lj1-x243.google.com with SMTP id u26so5945578ljd.8 for ; Sat, 29 Feb 2020 01:35:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=2SlkeNp4G+YegXwm/5dJum7n72AI5zFob0Ix6Py/Eno=; b=bM/NNGfaDjGzoXv3VXWt4W7ABRyntfvS508TlYP5j6tZYDEb+e3daXq6TqR1zTbWVv f2pMIj2/vScAH+EYeNVW+f8/maDT5bye/JfO84GGMSMsrcGIBCcXixVE2y/ZSx2rHw6N NVoAOTm09AFIBNfdswLOJLRtFprOdYiYiKH9ax2801D11DhHRXd0NJ/3KAN2kNY0ZKbt GpHPeszGiQR1w07cMTwcHa1k3srrCl9iKg2jrEJhwuk06OqQcUEs/w4PHCnj1mGKESlY BTBg8NdHbceA6zxOL6/4GU61e8ms0Axnfviw4pCzsvIKr0qUyb524242rSECYo1j3Vfo Xqig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=2SlkeNp4G+YegXwm/5dJum7n72AI5zFob0Ix6Py/Eno=; b=XJsb2bqy62YZ42tl3kW38nyytshELZrezxQ3Wxkq42R9wxZo1tcjaTKI/KAZazwXPB bOY6u+AJs20JDmqK+AQ2+iobJMUl27lfGF9De/KY2eeqiKjOrPwuw35tkDxFYv3ruwQZ G2zc78vRlBmLUmWDymObxvq3eze/Y1dPC1t04NU8SPY9pG+3Ft8xo+NusCZuy0VZSl+E d6/iDhBxJnBfTE99T+TEsaaNVfNd8CE2zBx2C249j5C4pXbnhpd7eEnvpISzOA/ipQtF L8r4P8JfuphnWLs9s2Vz4KhgB6JWgOuUx9vIP2bJlNXcReTBKdW6XAAxqAH2TDnnxxof zOpw== X-Gm-Message-State: ANhLgQ2jyADvrq5babaAx5WMKvwdovz66U5hJ2cYiNjqNmGGDfhqcDLf Hagfv8VMrvTv38jfBn0n8bcBINaE9Dk27Jk+UunKpVCLqwU= X-Google-Smtp-Source: ADFU+vtfvJ91NIxqSadRiLscY8IeMsaJSlanmfia2WgrD7H5u+g5TRBhdbBKDkmKbqHw/HyNVcZGaHl7MysQFhuZB20= X-Received: by 2002:a2e:9816:: with SMTP id a22mr296799ljj.24.1582968940325; Sat, 29 Feb 2020 01:35:40 -0800 (PST) MIME-Version: 1.0 From: Yikun Jiang Date: Sat, 29 Feb 2020 17:35:29 +0800 Message-ID: Subject: [PATCH] Using crc instructions instead of crc_32_tab in aarch64. To: bug-gzip@gnu.org Content-Type: multipart/alternative; boundary="00000000000060ee50059fb3ad77" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::243 X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 29 Feb 2020 05:09:52 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --00000000000060ee50059fb3ad77 Content-Type: text/plain; charset="UTF-8" From: Yikun Jiang Implement CRC function using inline assembly instructions instead of crc_32_tab to improve the performance in aarch64. --- util.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/util.c b/util.c index 79fe505..d978c61 100644 --- a/util.c +++ b/util.c @@ -32,6 +32,17 @@ #include #include +/* ======================================================================== + * Implement CRC function using inline assembly instructions instead of + * crc_32_tab in aarch64. + */ +#ifdef __aarch64__ +# define CRC32D(crc, value) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(value)) +# define CRC32W(crc, value) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +# define CRC32H(crc, value) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +# define CRC32B(crc, value) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value)) +#endif + #ifndef CHAR_BIT # define CHAR_BIT 8 #endif @@ -41,6 +52,7 @@ static int write_buffer (int, voidp, unsigned int); /* ======================================================================== * Table of CRC-32's of all single-byte values (made by makecrc.c) */ +#ifndef __aarch64__ static const ulg crc_32_tab[] = { 0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, @@ -95,6 +107,7 @@ static const ulg crc_32_tab[] = { 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL }; +#endif /* Shift register contents. */ static ulg crc = 0xffffffffL; @@ -134,6 +147,42 @@ ulg updcrc(s, n) { register ulg c; /* temporary variable */ +#ifdef __aarch64__ + register const uint8_t *buf1; + register const uint16_t *buf2; + register const uint32_t *buf4; + register const uint64_t *buf8; + int64_t length = (int64_t)n; + buf8 = (const uint64_t *)(const void *)s; + + if (s == NULL) { + c = 0xffffffffL; + } else { + c = crc; + while(length >= sizeof(uint64_t)) { + CRC32D(c, *buf8++); + length -= sizeof(uint64_t); + } + + buf4 = (const uint32_t *)(const void *)buf8; + if (length >= sizeof(uint32_t)) { + CRC32W(c, *buf4++); + length -= sizeof(uint32_t); + } + + buf2 = (const uint16_t *)(const void *)buf4; + if(length >= sizeof(uint16_t)) { + CRC32H(c, *buf2++); + length -= sizeof(uint16_t); + } + + buf1 = (const uint8_t *)(const void *)buf2; + if (length >= sizeof(uint8_t)) { + CRC32B(c, *buf1); + length -= sizeof(uint8_t); + } + } +#else if (s == NULL) { c = 0xffffffffL; } else { @@ -142,6 +191,7 @@ ulg updcrc(s, n) c = crc_32_tab[((int)c ^ (*s++)) & 0xff] ^ (c >> 8); } while (--n); } +#endif crc = c; return c ^ 0xffffffffL; /* (instead of ~c for 64-bit machines) */ } -- 2.17.1 --00000000000060ee50059fb3ad77 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
From: Yikun Jiang <yikunkero@gmail.com>

Implement CRC funct= ion using inline assembly instructions
instead of crc_32_tab to improve = the performance in aarch64.
---
=C2=A0util.c | 50 +++++++++++++++++++= +++++++++++++++++++++++++++++++
=C2=A01 file changed, 50 insertions(+)
diff --git a/util.c b/util.c
index 79fe505..d978c61 100644
--- = a/util.c
+++ b/util.c
@@ -32,6 +32,17 @@
=C2=A0#include <dirnam= e.h>
=C2=A0#include <xalloc.h>

+/* =3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+ * Implement CRC function= using inline assembly instructions instead of
+ * crc_32_tab in aarch64= .
+ */
+#ifdef __aarch64__
+#=C2=A0 define CRC32D(crc, value) __as= m__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"= r"(value))
+#=C2=A0 define CRC32W(crc, value) __asm__("crc32w = %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value))+#=C2=A0 define CRC32H(crc, value) __asm__("crc32h %w[c], %w[c], %w[= v]":[c]"+r"(crc):[v]"r"(value))
+#=C2=A0 define= CRC32B(crc, value) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"= ;+r"(crc):[v]"r"(value))
+#endif
+
=C2=A0#ifndef CH= AR_BIT
=C2=A0#=C2=A0 define CHAR_BIT 8
=C2=A0#endif
@@ -41,6 +52,7= @@ static int write_buffer (int, voidp, unsigned int);
=C2=A0/* =3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
=C2=A0 * Ta= ble of CRC-32's of all single-byte values (made by makecrc.c)
=C2=A0= */
+#ifndef __aarch64__
=C2=A0static const ulg crc_32_tab[] =3D {=C2=A0 =C2=A00x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc41= 9L,
=C2=A0 =C2=A00x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x7= 9dcb8a4L,
@@ -95,6 +107,7 @@ static const ulg crc_32_tab[] =3D {
=C2= =A0 =C2=A00x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,<= br>=C2=A0 =C2=A00x2d02ef8dL
=C2=A0};
+#endif

=C2=A0/* Shift re= gister contents.=C2=A0 */
=C2=A0static ulg crc =3D 0xffffffffL;
@@ -1= 34,6 +147,42 @@ ulg updcrc(s, n)
=C2=A0{
=C2=A0 =C2=A0 =C2=A0register= ulg c;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* temporary variable */

+#= ifdef __aarch64__
+=C2=A0 =C2=A0 register const uint8_t=C2=A0 *buf1;
= +=C2=A0 =C2=A0 register const uint16_t *buf2;
+=C2=A0 =C2=A0 register co= nst uint32_t *buf4;
+=C2=A0 =C2=A0 register const uint64_t *buf8;
+= =C2=A0 =C2=A0 int64_t length =3D (int64_t)n;
+=C2=A0 =C2=A0 buf8 =3D (co= nst=C2=A0 uint64_t *)(const void *)s;
+
+=C2=A0 =C2=A0 if (s =3D=3D N= ULL) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 c =3D 0xffffffffL;
+=C2=A0 =C2=A0= } else {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 c =3D crc;
+=C2=A0 =C2=A0 =C2= =A0 =C2=A0 while(length >=3D sizeof(uint64_t)) {
+=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 CRC32D(c, *buf8++);
+=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 length -=3D sizeof(uint64_t);
+=C2=A0 =C2=A0 =C2=A0 = =C2=A0 }
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 buf4 =3D (const uint32_t *)(c= onst void *)buf8;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (length >=3D sizeof= (uint32_t)) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 CRC32W(c, *buf4= ++);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 length -=3D sizeof(uint3= 2_t);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
+
+=C2=A0 =C2=A0 =C2=A0 =C2= =A0 buf2 =3D (const uint16_t *)(const void *)buf4;
+=C2=A0 =C2=A0 =C2=A0= =C2=A0 if(length >=3D sizeof(uint16_t)) {
+=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 CRC32H(c, *buf2++);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 length -=3D sizeof(uint16_t);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 }<= br>+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 buf1 =3D (const uint8_t *)(const void = *)buf2;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (length >=3D sizeof(uint8_t))= {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 CRC32B(c, *buf1);
+=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 length -=3D sizeof(uint8_t);
+=C2= =A0 =C2=A0 =C2=A0 =C2=A0 }
+=C2=A0 =C2=A0 }
+#else
=C2=A0 =C2=A0 = =C2=A0if (s =3D=3D NULL) {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0c =3D 0xfff= fffffL;
=C2=A0 =C2=A0 =C2=A0} else {
@@ -142,6 +191,7 @@ ulg updcrc(s= , n)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0c =3D crc_32_tab[((= int)c ^ (*s++)) & 0xff] ^ (c >> 8);
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0} while (--n);
=C2=A0 =C2=A0 =C2=A0}
+#endif
=C2=A0 =C2= =A0 =C2=A0crc =3D c;
=C2=A0 =C2=A0 =C2=A0return c ^ 0xffffffffL;=C2=A0 = =C2=A0 =C2=A0 =C2=A0/* (instead of ~c for 64-bit machines) */
=C2=A0}
--
2.17.1=C2=A0=C2=A0
--00000000000060ee50059fb3ad77-- From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 04 21:36:35 2022 Received: (at control) by debbugs.gnu.org; 5 Apr 2022 01:36:35 +0000 Received: from localhost ([127.0.0.1]:53428 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nbY7X-0004FO-0I for submit@debbugs.gnu.org; Mon, 04 Apr 2022 21:36:35 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46330) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nbY7U-0004F4-VE for control@debbugs.gnu.org; Mon, 04 Apr 2022 21:36:33 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CB30716009A for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5oYvPCcQLN8w for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3B52F160130 for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ZEoBnrxCVF8D for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1C87816009A for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Message-ID: Date: Mon, 4 Apr 2022 18:36:25 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: GNU bug control From: Paul Eggert Subject: gzip bug report maintenance Organization: UCLA Computer Science Department Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tags 41535 wontfix tags 39832 wontfix tags 39831 wontfix