From unknown Sun Jun 15 08:52:17 2025 X-Loop: help-debbugs@gnu.org Subject: bug#39832: [PATCH] Optimized the deflate in aarch64 Resent-From: Yikun Jiang Original-Sender: "Debbugs-submit" Resent-CC: bug-gzip@gnu.org Resent-Date: Sat, 29 Feb 2020 10:10:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 39832 X-GNU-PR-Package: gzip X-GNU-PR-Keywords: patch To: 39832@debbugs.gnu.org X-Debbugs-Original-To: bug-gzip@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.15829709937694 (code B ref -1); Sat, 29 Feb 2020 10:10:02 +0000 Received: (at submit) by debbugs.gnu.org; 29 Feb 2020 10:09:53 +0000 Received: from localhost ([127.0.0.1]:34272 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1j7z4C-0001zx-W8 for submit@debbugs.gnu.org; Sat, 29 Feb 2020 05:09:53 -0500 Received: from lists.gnu.org ([209.51.188.17]:53632) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1j7yYG-0007Hi-G1 for submit@debbugs.gnu.org; Sat, 29 Feb 2020 04:36:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:43883) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j7yYF-0001S4-5h for bug-gzip@gnu.org; Sat, 29 Feb 2020 04:36:52 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j7yYD-0006cA-Ti for bug-gzip@gnu.org; Sat, 29 Feb 2020 04:36:51 -0500 Received: from mail-lj1-x243.google.com ([2a00:1450:4864:20::243]:40041) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j7yYD-0006Yd-LK for bug-gzip@gnu.org; Sat, 29 Feb 2020 04:36:49 -0500 Received: by mail-lj1-x243.google.com with SMTP id 143so6101467ljj.7 for ; Sat, 29 Feb 2020 01:36:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=; b=l39wt1krDAfRRHcGE/cyNtVQlMwwlmcuIAOhyXWGko+3Ro9FUKTomXxau9PQLchHPk HXneW5mAUjhrHM/yPC94ygK9MBPddaVKzChjCYFJeuarNaOIlhle56SttdUG2lyLfNsK IJLpPHbIZf9QT2uHc6raA85SiD8ovTwIJ/cgeAC7zosNaG9l3A6n1J00KH5EvjnO/qnO 3lEphIN4OKbHK+nb5ZOXiv+NHUbdugqrq5E5NkZIRX2lYmmBvXXig0AC0I05IXow8Ifv rb003NWZqB0dGk+h+dJwSBXOYOn++4VvJ2tXiU54S9RYfN3cX+ZII9Y9UPO1/fk43iNc 3Hgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=; b=YlYPnZpQsuVs0XhcaMuU4qAfi5k63s1bqfWM0Tysd0H9VSBCwkJH4lbDpTeFAbTURx tkz/3jySiE21bt4JEMaUspzFLLyRPUJsxPDWOXufU9h0xPzEMbAQGwAqUJJoBDa7i+nR mUWWPJX9zEJ/LNlIjK+TaY/QssyOR0dETvqaJP7NQt/ReDMITsEvqptauTBxo0d9xzOp weuBm0aijo9LtxQ+gdxzRX9mCClOYZc846rVi0S5s33A6gToGv9KBb9pUeMFy8y3W/Km T6yYJCy4KwFxDAWaVqCwNFJ8cSMBwQDIwr4Fc9NmSLwyVM/fXwZu2eOEfUKDXg531Ij3 eTMg== X-Gm-Message-State: ANhLgQ3B/zRiM09omoSgNOvx6iqYTn19v78R+eIJ8OcQZZowlU0g+TgW N3NHpBg+/sHtJDOhjuKgpn3uFqCNLuR217CxHilKadD3ulU= X-Google-Smtp-Source: ADFU+vvx0Cn/UwxOAXk9ponjemoGfq1y9N7+eB8dm/jq+cjrzNhHdbgSpUi+JLVIQqmwZUsZx+lVawm9C5IXf1vQaEo= X-Received: by 2002:a2e:80cc:: with SMTP id r12mr5241620ljg.154.1582969008369; Sat, 29 Feb 2020 01:36:48 -0800 (PST) MIME-Version: 1.0 From: Yikun Jiang Date: Sat, 29 Feb 2020 17:36:37 +0800 Message-ID: Content-Type: multipart/alternative; boundary="0000000000006f347c059fb3b106" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::243 X-Spam-Score: 0.3 (/) X-Mailman-Approved-At: Sat, 29 Feb 2020 05:09:52 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --0000000000006f347c059fb3b106 Content-Type: text/plain; charset="UTF-8" From: Yikun Jiang This patch uses the prefetch instruction to pre-load the next_match into cache to improve the performance, also makes an unrolling change to decrease the number of if branch usage. --- deflate.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/deflate.c b/deflate.c index 5ed2a9b..008c032 100644 --- a/deflate.c +++ b/deflate.c @@ -378,6 +378,9 @@ longest_match(IPos cur_match) register int len; /* length of current match */ int best_len = prev_length; /* best match length so far */ IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL; +#ifdef __aarch64__ + IPos next_match; +#endif /* Stop when cur_match becomes <= limit. To simplify the code, * we prevent matches with the string of window index 0. */ @@ -411,6 +414,10 @@ longest_match(IPos cur_match) do { Assert(cur_match < strstart, "no future"); match = window + cur_match; +#ifdef __aarch64__ + next_match = prev[cur_match & WMASK]; + __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK]))); +#endif /* Skip to next match if the match length cannot increase * or if the match length is less than 2: @@ -488,8 +495,14 @@ longest_match(IPos cur_match) scan_end = scan[best_len]; #endif } - } while ((cur_match = prev[cur_match & WMASK]) > limit - && --chain_length != 0); + } +#ifdef __aarch64__ + while ((cur_match = next_match) > limit + && --chain_length != 0); +#else + while ((cur_match = prev[cur_match & WMASK]) > limit + && --chain_length != 0); +#endif return best_len; } @@ -777,7 +790,20 @@ deflate (int pack_level) lookahead -= prev_length-1; prev_length -= 2; RSYNC_ROLL(strstart, prev_length+1); + + while (prev_length >= 4) { + prev_length -= 4; + strstart++; + INSERT_STRING(strstart, hash_head); + strstart++; + INSERT_STRING(strstart, hash_head); + strstart++; + INSERT_STRING(strstart, hash_head); + strstart++; + INSERT_STRING(strstart, hash_head); + } do { + if (prev_length == 0) break; strstart++; INSERT_STRING(strstart, hash_head); /* strstart never exceeds WSIZE-MAX_MATCH, so there are -- 2.17.1 --0000000000006f347c059fb3b106 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
From: Yikun Jiang <yikunkero@gmail.com>

This patch uses the= prefetch instruction to pre-load the
next_match into cache to improve t= he performance, also makes
an unrolling change to decrease the number of= if branch usage.
---
=C2=A0deflate.c | 30 ++++++++++++++++++++++++++= ++--
=C2=A01 file changed, 28 insertions(+), 2 deletions(-)

diff = --git a/deflate.c b/deflate.c
index 5ed2a9b..008c032 100644
--- a/def= late.c
+++ b/deflate.c
@@ -378,6 +378,9 @@ longest_match(IPos cur_mat= ch)
=C2=A0 =C2=A0 =C2=A0register int len;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* length= of current match */
=C2=A0 =C2=A0 =C2=A0int best_len =3D prev_length;= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* best match= length so far */
=C2=A0 =C2=A0 =C2=A0IPos limit =3D strstart > (IPos= )MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;
+#ifdef __aarch64__
+=C2= =A0 =C2=A0 IPos next_match;
+#endif
=C2=A0 =C2=A0 =C2=A0/* Stop when = cur_match becomes <=3D limit. To simplify the code,
=C2=A0 =C2=A0 =C2= =A0 * we prevent matches with the string of window index 0.
=C2=A0 =C2= =A0 =C2=A0 */
@@ -411,6 +414,10 @@ longest_match(IPos cur_match)
=C2= =A0 =C2=A0 =C2=A0do {
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Assert(cur_match= < strstart, "no future");
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0match =3D window + cur_match;
+#ifdef __aarch64__
+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 next_match =3D prev[cur_match & WMASK];
+=C2=A0 =C2=A0= =C2=A0 =C2=A0 __asm__("PRFM=C2=A0 =C2=A0PLDL1STRM, [%0]"::"= r"(&(prev[next_match & WMASK])));
+#endif

=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0/* Skip to next match if the match length cannot in= crease
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * or if the match length is le= ss than 2:
@@ -488,8 +495,14 @@ longest_match(IPos cur_match)
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scan_end=C2=A0 =C2=A0=3D scan[best= _len];
=C2=A0#endif
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
-=C2=A0 = =C2=A0 } while ((cur_match =3D prev[cur_match & WMASK]) > limit
-= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&& --chain_length != =3D 0);
+=C2=A0 =C2=A0 }
+#ifdef __aarch64__
+=C2=A0 =C2=A0 while = ((cur_match =3D next_match) > limit
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 && --chain_length !=3D 0);
+#else
+=C2=A0 =C2=A0 w= hile ((cur_match =3D prev[cur_match & WMASK]) > limit
+=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 && --chain_length !=3D 0);
+#end= if

=C2=A0 =C2=A0 =C2=A0return best_len;
=C2=A0}
@@ -777,7 +790= ,20 @@ deflate (int pack_level)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0lookahead -=3D prev_length-1;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0prev_length -=3D 2;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0RSYNC_ROLL(strstart, prev_length+1);
+
+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 while (prev_length >=3D 4) {
+=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 prev_length -=3D 4;
+=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;
+=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, h= ash_head);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strs= tart++;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_= STRING(strstart, hash_head);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 strstart++;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 INSERT_STRING(strstart, hash_head);
+=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, hash_head);
+=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0do {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 if (prev_length =3D=3D 0) break;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0strstart++;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0INSERT_STRING(strstart, hash_head);
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* strstart neve= r exceeds WSIZE-MAX_MATCH, so there are
--
2.= 17.1
=C2=A0=C2=A0
--0000000000006f347c059fb3b106-- From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 04 21:36:35 2022 Received: (at control) by debbugs.gnu.org; 5 Apr 2022 01:36:35 +0000 Received: from localhost ([127.0.0.1]:53428 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nbY7X-0004FO-0I for submit@debbugs.gnu.org; Mon, 04 Apr 2022 21:36:35 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46330) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nbY7U-0004F4-VE for control@debbugs.gnu.org; Mon, 04 Apr 2022 21:36:33 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id CB30716009A for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 5oYvPCcQLN8w for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3B52F160130 for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ZEoBnrxCVF8D for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1C87816009A for ; Mon, 4 Apr 2022 18:36:26 -0700 (PDT) Message-ID: Date: Mon, 4 Apr 2022 18:36:25 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Content-Language: en-US To: GNU bug control From: Paul Eggert Subject: gzip bug report maintenance Organization: UCLA Computer Science Department Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tags 41535 wontfix tags 39832 wontfix tags 39831 wontfix