From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 06 Jul 2022 01:25:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: 56413@debbugs.gnu.org X-Debbugs-Original-To: bug-guile@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.16570706522466 (code B ref -1); Wed, 06 Jul 2022 01:25:02 +0000 Received: (at submit) by debbugs.gnu.org; 6 Jul 2022 01:24:12 +0000 Received: from localhost ([127.0.0.1]:51950 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8tln-0000dD-8L for submit@debbugs.gnu.org; Tue, 05 Jul 2022 21:24:12 -0400 Received: from lists.gnu.org ([209.51.188.17]:41222) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8tlm-0000d6-2M for submit@debbugs.gnu.org; Tue, 05 Jul 2022 21:23:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51740) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o8tll-0006Aj-TV for bug-guile@gnu.org; Tue, 05 Jul 2022 21:23:57 -0400 Received: from defaultvalue.org ([45.33.119.55]:37416) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o8tlj-0000za-WF for bug-guile@gnu.org; Tue, 05 Jul 2022 21:23:57 -0400 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id DB0FB200C7 for ; Tue, 5 Jul 2022 20:23:23 -0500 (CDT) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 77DBF14E494; Tue, 5 Jul 2022 20:23:23 -0500 (CDT) From: Rob Browning Date: Tue, 5 Jul 2022 20:23:23 -0500 Message-Id: <20220706012323.1024763-1-rlb@defaultvalue.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=45.33.119.55; envelope-from=rlb@defaultvalue.org; helo=defaultvalue.org X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) Noticed while investigating a migration to utf-8 strings. After making changes that routed non-ascii symbol hashing through this function, encoding-iso88597.test began intermittently failing because it would traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. Change the scm_i_str2symbol internal hash type to unsigned long to explicitly match the hashing result type. --- Proposed for at least main. libguile/hash.c | 2 +- libguile/symbols.c | 2 +- test-suite/standalone/Makefile.am | 7 ++++ test-suite/standalone/test-hashing.c | 61 ++++++++++++++++++++++++++++ 4 files changed, 70 insertions(+), 2 deletions(-) create mode 100644 test-suite/standalone/test-hashing.c diff --git a/libguile/hash.c b/libguile/hash.c index 93431102f..0740b2645 100644 --- a/libguile/hash.c +++ b/libguile/hash.c @@ -188,7 +188,7 @@ scm_i_utf8_string_hash (const char *str, size_t len) /* Invalid UTF-8; punt. */ return scm_i_string_hash (scm_from_utf8_stringn (str, len)); - length = u8_strnlen (ustr, len); + length = u8_mbsnlen (ustr, len); /* Set up the internal state. */ a = b = c = 0xdeadbeef + ((uint32_t)(length<<2)) + 47; diff --git a/libguile/symbols.c b/libguile/symbols.c index ad5f22f57..cd9cda3de 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -239,7 +239,7 @@ static SCM scm_i_str2symbol (SCM str) { SCM symbol; - size_t raw_hash = scm_i_string_hash (str); + unsigned long raw_hash = scm_i_string_hash (str); symbol = lookup_interned_symbol (str, raw_hash); if (scm_is_true (symbol)) diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index e87100c96..ca1b3131b 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -167,6 +167,13 @@ test_conversion_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la check_PROGRAMS += test-conversion TESTS += test-conversion +# test-hashing +test_hashing_SOURCES = test-hashing.c +test_hashing_CFLAGS = ${test_cflags} +test_hashing_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la +check_PROGRAMS += test-hashing +TESTS += test-hashing + # test-loose-ends test_loose_ends_SOURCES = test-loose-ends.c test_loose_ends_CFLAGS = ${test_cflags} diff --git a/test-suite/standalone/test-hashing.c b/test-suite/standalone/test-hashing.c new file mode 100644 index 000000000..476181fe2 --- /dev/null +++ b/test-suite/standalone/test-hashing.c @@ -0,0 +1,61 @@ +/* Copyright 2022 + Free Software Foundation, Inc. + + This file is part of Guile. + + Guile is free software: you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published + by the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + Guile is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with Guile. If not, see + . */ + +#if HAVE_CONFIG_H +# include +#endif + +#include + +#include + +static void +test_hashing () +{ + // Make sure a utf-8 symbol has the expected hash. In addition to + // catching algorithmic regressions, this would have caught a + // long-standing buffer overflow. + + // περί + char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0}; + SCM sym = scm_from_utf8_symbol (about_u8); + + const unsigned long expect = 4029223418961680680; + const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym)); + + if (actual != expect) + { + fprintf (stderr, "fail: unexpected utf-8 symbol hash (%lu != %lu)\n", + actual, expect); + exit (EXIT_FAILURE); + } +} + +static void +tests (void *data, int argc, char **argv) +{ + test_hashing (); +} + +int +main (int argc, char *argv[]) +{ + scm_boot_guile (argc, argv, tests, NULL); + return 0; +} -- 2.30.2 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Wed, 06 Jul 2022 03:05:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.165707665211997 (code B ref 56413); Wed, 06 Jul 2022 03:05:02 +0000 Received: (at 56413) by debbugs.gnu.org; 6 Jul 2022 03:04:12 +0000 Received: from localhost ([127.0.0.1]:51988 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8vKm-00037R-J2 for submit@debbugs.gnu.org; Tue, 05 Jul 2022 23:04:12 -0400 Received: from defaultvalue.org ([45.33.119.55]:59668 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1o8vKk-00037H-7I for 56413@debbugs.gnu.org; Tue, 05 Jul 2022 23:04:11 -0400 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id 81BA220347 for <56413@debbugs.gnu.org>; Tue, 5 Jul 2022 22:04:08 -0500 (CDT) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 1E66A14E494; Tue, 5 Jul 2022 22:04:08 -0500 (CDT) From: Rob Browning In-Reply-To: <20220706012323.1024763-1-rlb@defaultvalue.org> References: <20220706012323.1024763-1-rlb@defaultvalue.org> Date: Tue, 05 Jul 2022 22:04:08 -0500 Message-ID: <878rp7q6vr.fsf@trouble.defaultvalue.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Rob Browning writes: > Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. > > Change the scm_i_str2symbol internal hash type to unsigned long to > explicitly match the hashing result type. Hmm. I suppose the current test could be handled on the scheme side instead. (I'd started off attempting some more direct, elaborate tests that didn't pan out.) Happy to rework that if desired. -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sat, 05 Nov 2022 22:19:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Rob Browning Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.166768671914876 (code B ref 56413); Sat, 05 Nov 2022 22:19:02 +0000 Received: (at 56413) by debbugs.gnu.org; 5 Nov 2022 22:18:39 +0000 Received: from localhost ([127.0.0.1]:58247 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orRUt-0003rs-8A for submit@debbugs.gnu.org; Sat, 05 Nov 2022 18:18:39 -0400 Received: from eggs.gnu.org ([209.51.188.92]:48592) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orRUo-0003rY-LM for 56413@debbugs.gnu.org; Sat, 05 Nov 2022 18:18:37 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orRUj-0006ak-1x; Sat, 05 Nov 2022 18:18:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=rbRFX+XMkLfyic9sqEAVuMw3z9YWksqTUg9TNG17KOU=; b=jqKMFUZfJgXuKoJJsF9C gWF9/7HizcOpEuZDvhAjO987EnofITtiBHWexuKiyUMk6juQCMxdeF3pf/mIVQywwL46Fq6RMqutB jvOLG1vAK9FF+XRBlYC5+v/zqLSNhzBS8O1+NL5Ah8oNmQkigdhEooQnKk0D58dRZ8VVirJnC1JOi h7EB1IDHsLdn5i1f7ZlvD7RVCYpaXRrSul8CkuWhwP3bTE8prHSn5tL+zCcoVB6553VmyYqEV7Q2s Smx4pZSaWFVOe0vVG57XOT/1C4fQiF4NDguAwV8QZwIluQjKs5k6/RvUqmw2h92brzWleQfG7ii6Q HnSq8swUFQw49A==; Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201] helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1orRUi-0005K2-Ga; Sat, 05 Nov 2022 18:18:28 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20220706012323.1024763-1-rlb@defaultvalue.org> Date: Sat, 05 Nov 2022 23:18:26 +0100 In-Reply-To: <20220706012323.1024763-1-rlb@defaultvalue.org> (Rob Browning's message of "Tue, 5 Jul 2022 20:23:23 -0500") Message-ID: <87zgd5gi4t.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Rob Browning skribis: > Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. > > Change the scm_i_str2symbol internal hash type to unsigned long to > explicitly match the hashing result type. Oh, good catch. For the final patch please add a ChangeLog-style entry. > + // Make sure a utf-8 symbol has the expected hash. In addition to > + // catching algorithmic regressions, this would have caught a > + // long-standing buffer overflow. > + > + // =CF=80=CE=B5=CF=81=CE=AF > + char about_u8[] =3D {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0= }; > + SCM sym =3D scm_from_utf8_symbol (about_u8); > + > + const unsigned long expect =3D 4029223418961680680; > + const unsigned long actual =3D scm_to_ulong (scm_symbol_hash (sym)); Is this a documented example of Jenkins? Or did you use a reference implementation? > Hmm. I suppose the current test could be handled on the scheme side > instead. (I'd started off attempting some more direct, elaborate tests > that didn't pan out.) Happy to rework that if desired. Yes, it may be nicer to have it in =E2=80=98test-suite/tests/hash.test=E2= =80=99. AFAICS this will only change the hash of UTF-8 symbols and won=E2=80=99t ha= ve any effect on the output of =E2=80=98string-hash=E2=80=99, right? If not t= hat would be an incompatibility. Thanks and sorry for the delay! Ludo=E2=80=99. From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 06 Nov 2022 16:45:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.16677530945100 (code B ref 56413); Sun, 06 Nov 2022 16:45:01 +0000 Received: (at 56413) by debbugs.gnu.org; 6 Nov 2022 16:44:54 +0000 Received: from localhost ([127.0.0.1]:60503 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orilS-0001KC-8m for submit@debbugs.gnu.org; Sun, 06 Nov 2022 11:44:54 -0500 Received: from defaultvalue.org ([45.33.119.55]:59688 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orilQ-0001K3-0N for 56413@debbugs.gnu.org; Sun, 06 Nov 2022 11:44:53 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id 68BE22017E; Sun, 6 Nov 2022 10:44:51 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id A599D14E553; Sun, 6 Nov 2022 10:44:50 -0600 (CST) From: Rob Browning In-Reply-To: <87zgd5gi4t.fsf@gnu.org> References: <20220706012323.1024763-1-rlb@defaultvalue.org> <87zgd5gi4t.fsf@gnu.org> Date: Sun, 06 Nov 2022 10:44:50 -0600 Message-ID: <87o7tkvxq5.fsf@trouble.defaultvalue.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Ludovic Court=C3=A8s writes: > For the final patch please add a ChangeLog-style entry. Will do. > Is this a documented example of Jenkins? Or did you use a reference > implementation? Jenkins? > Yes, it may be nicer to have it in =E2=80=98test-suite/tests/hash.test=E2= =80=99. > > AFAICS this will only change the hash of UTF-8 symbols and won=E2=80=99t = have > any effect on the output of =E2=80=98string-hash=E2=80=99, right? If not= that would be > an incompatibility. I think that's right, but I'll have to refresh my memory regarding the changes. (Haven't gotten back to the utf-8 work for a bit so it's not top of mind, though I hope to soon.) Thanks --=20 Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 06 Nov 2022 17:46:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.166775672415584 (code B ref 56413); Sun, 06 Nov 2022 17:46:01 +0000 Received: (at 56413) by debbugs.gnu.org; 6 Nov 2022 17:45:24 +0000 Received: from localhost ([127.0.0.1]:60549 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orji0-00042b-L4 for submit@debbugs.gnu.org; Sun, 06 Nov 2022 12:45:24 -0500 Received: from defaultvalue.org ([45.33.119.55]:59690 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orjhw-0003q7-KS for 56413@debbugs.gnu.org; Sun, 06 Nov 2022 12:45:23 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id ED7DD200BA; Sun, 6 Nov 2022 11:45:19 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 97F6314E553; Sun, 6 Nov 2022 11:45:19 -0600 (CST) From: Rob Browning In-Reply-To: <87o7tkvxq5.fsf@trouble.defaultvalue.org> References: <20220706012323.1024763-1-rlb@defaultvalue.org> <87zgd5gi4t.fsf@gnu.org> <87o7tkvxq5.fsf@trouble.defaultvalue.org> Date: Sun, 06 Nov 2022 11:45:19 -0600 Message-ID: <87bkpkvuxc.fsf@trouble.defaultvalue.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Rob Browning writes: > Jenkins? Oh, right (after looking back at the code). I'll get back to you regarding this and the other questions after I finish reviewing/remembering. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 06 Nov 2022 19:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.166776399931256 (code B ref 56413); Sun, 06 Nov 2022 19:47:01 +0000 Received: (at 56413) by debbugs.gnu.org; 6 Nov 2022 19:46:39 +0000 Received: from localhost ([127.0.0.1]:60714 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orlbL-000883-8k for submit@debbugs.gnu.org; Sun, 06 Nov 2022 14:46:39 -0500 Received: from defaultvalue.org ([45.33.119.55]:59692 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1orlbJ-00087v-G3 for 56413@debbugs.gnu.org; Sun, 06 Nov 2022 14:46:38 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id B49F12017E; Sun, 6 Nov 2022 13:46:36 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 3F2E114E553; Sun, 6 Nov 2022 13:46:36 -0600 (CST) From: Rob Browning In-Reply-To: <87zgd5gi4t.fsf@gnu.org> References: <20220706012323.1024763-1-rlb@defaultvalue.org> <87zgd5gi4t.fsf@gnu.org> Date: Sun, 06 Nov 2022 13:46:36 -0600 Message-ID: <87zgd3vpb7.fsf@trouble.defaultvalue.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Ludovic Court=C3=A8s writes: > Rob Browning skribis: >> + // Make sure a utf-8 symbol has the expected hash. In addition to >> + // catching algorithmic regressions, this would have caught a >> + // long-standing buffer overflow. >> + >> + // =CF=80=CE=B5=CF=81=CE=AF >> + char about_u8[] =3D {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, = 0}; >> + SCM sym =3D scm_from_utf8_symbol (about_u8); >> + >> + const unsigned long expect =3D 4029223418961680680; >> + const unsigned long actual =3D scm_to_ulong (scm_symbol_hash (sym)); > > Is this a documented example of Jenkins? Or did you use a reference > implementation? OK, so unfortunately I don't actually recall how I came up with that number, but I can start over with some canonical approach to compute the value if we like. ...if I didn't get it from somewhere more authoritative, I might also have just been trying to at least prevent undetected regressions. > AFAICS this will only change the hash of UTF-8 symbols and won=E2=80=99t = have > any effect on the output of =E2=80=98string-hash=E2=80=99, right? If not= that would be > an incompatibility. The u8_mbsnlen() change should strictly fix bugs I think? i.e. if the length is supposed to be in characters, which it looks like from all the other uses in the function (and from the comment), then the old code was returning the wrong values (which prompted the original crashes). So this change *could* alter results, but only for non-ASCII strings, and those results would have been wrong (i.e. relying on uninitialized memory). Of course if that memory was *always* the same for a given symbol somewhow (everywhere in memory), then the result would be stable, if incorrect. That leaves the size_t -> long change in scm_i_str2symbol(), and I don't think that has anything to do with UTF-8, but it could cause mangling of the value on any platform where the data types differ sufficiently, and then of course if we're not using the same type consistently, then we could give different answers for the same symbol in different contexts (for different code paths). And indeed, looks like I missed another case; just below in scm_i_str2uninterned_symbol() we also use size_t. For now, I suspect we should change both or neither, and definitely change them all to match "eventually". Thanks --=20 Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 07 Nov 2022 13:07:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Rob Browning Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.16678263876617 (code B ref 56413); Mon, 07 Nov 2022 13:07:01 +0000 Received: (at 56413) by debbugs.gnu.org; 7 Nov 2022 13:06:27 +0000 Received: from localhost ([127.0.0.1]:33596 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1os1pb-0001if-Gy for submit@debbugs.gnu.org; Mon, 07 Nov 2022 08:06:27 -0500 Received: from eggs.gnu.org ([209.51.188.92]:44624) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1os1pX-0001iP-0c for 56413@debbugs.gnu.org; Mon, 07 Nov 2022 08:06:26 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1os1pR-0002Fi-H3; Mon, 07 Nov 2022 08:06:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=XpghGQdKsZRBc+S6LdG7qrQBKJgdFYYgO6ipWAj9GIM=; b=fkVsYdNzBCADpLSYC8Fc Uwf04/zQvzGK7lwb6pgW7A/WfOnTXQkarmudt6hE6iL1ksR5iHcDk9HI9Xyo9VQdqHKdq/wy4eJHL EnFrSFBx7e5suCPoRMhKnic6NHjBfGT4o/v6H9jpHOZUdKMtzQNpY/8gtVZ0orNq3axUSlpXvRicj yzazKlpKT4zevknTUw8/PKPDlnJrf4FbstSRWjs9JkUA8IW6lIcW5HAldfrgFp3E6ALM5IvV7xjEF 7xn3G+7VKHA6k6m7L1Y89X8Fexi4UaTd6YSkx7oi/0ArhH0rkJA5vN4O4EapRaRnTjRnm+Oly2qKB MaIN/ZU+XykQ4Q==; Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1os1pR-0004Zp-3X; Mon, 07 Nov 2022 08:06:17 -0500 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20220706012323.1024763-1-rlb@defaultvalue.org> <87zgd5gi4t.fsf@gnu.org> <87o7tkvxq5.fsf@trouble.defaultvalue.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Septidi 17 Brumaire an 231 de la =?UTF-8?Q?R=C3=A9volution,?= jour du Cresson X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 07 Nov 2022 14:06:15 +0100 In-Reply-To: <87o7tkvxq5.fsf@trouble.defaultvalue.org> (Rob Browning's message of "Sun, 06 Nov 2022 10:44:50 -0600") Message-ID: <87y1smud6g.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Rob Browning skribis: >> Is this a documented example of Jenkins? Or did you use a reference >> implementation? > > Jenkins? That=E2=80=99s the name of the hash function in question. If not, where did you get that example from? :-) Thanks, Ludo=E2=80=99. From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 07 Nov 2022 13:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Rob Browning Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.16678264846798 (code B ref 56413); Mon, 07 Nov 2022 13:09:02 +0000 Received: (at 56413) by debbugs.gnu.org; 7 Nov 2022 13:08:04 +0000 Received: from localhost ([127.0.0.1]:33606 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1os1rA-0001la-8D for submit@debbugs.gnu.org; Mon, 07 Nov 2022 08:08:04 -0500 Received: from eggs.gnu.org ([209.51.188.92]:57744) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1os1r7-0001kw-Fw for 56413@debbugs.gnu.org; Mon, 07 Nov 2022 08:08:02 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1os1r2-0005Ih-9S; Mon, 07 Nov 2022 08:07:56 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=2gPzJ4o4IkBdtX4Du1L3qocYbS7ht7NtXONdO6+kJt4=; b=FxYZEuYpfqntDXCtGdBr gv44vsbJsQauhZwWAoPnzXszvt7uQAsml162iwh7wQbMlGzYU+JvZaTmyEwm3HVnA0NZ8PlyW0aw/ IljIRygSX3koDpuw/D8FqaGqvoMCn9iHCR4nKPNazLfWw0RvsaZzyNilaCaV6LidTj05EsqsvwAYx AJI2bKbOgkgKTz3y9gRZQhdVovbHdSI7WJ9X4L24LzaKAQ5jhS0dRbHIZjdEiWJvl4abK+60EMbfV IjxPClXry9+2GAg/DBEsNGWJ49Q8uN++P9NRlaMLW468E+KTq65eQ2YtHC2QXQt/lTVw8I0RLiS3I 3tPB5pxBdmdU0Q==; Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1os1r1-0004dU-FL; Mon, 07 Nov 2022 08:07:55 -0500 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20220706012323.1024763-1-rlb@defaultvalue.org> <87zgd5gi4t.fsf@gnu.org> <87zgd3vpb7.fsf@trouble.defaultvalue.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Septidi 17 Brumaire an 231 de la =?UTF-8?Q?R=C3=A9volution,?= jour du Cresson X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 07 Nov 2022 14:07:54 +0100 In-Reply-To: <87zgd3vpb7.fsf@trouble.defaultvalue.org> (Rob Browning's message of "Sun, 06 Nov 2022 13:46:36 -0600") Message-ID: <87tu3aud3p.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Rob Browning skribis: > So this change *could* alter results, but only for non-ASCII strings, > and those results would have been wrong (i.e. relying on uninitialized > memory). OK, that was my understanding too. > That leaves the size_t -> long change in scm_i_str2symbol(), and I don't > think that has anything to do with UTF-8, but it could cause mangling of > the value on any platform where the data types differ sufficiently, and > then of course if we're not using the same type consistently, then we > could give different answers for the same symbol in different contexts > (for different code paths). Right. This one looks safe to me. > And indeed, looks like I missed another case; just below in > scm_i_str2uninterned_symbol() we also use size_t. For now, I suspect we > should change both or neither, and definitely change them all to match > "eventually". Sure. Thanks! Ludo=E2=80=99. From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 08 Nov 2022 05:06:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.166788393914224 (code B ref 56413); Tue, 08 Nov 2022 05:06:01 +0000 Received: (at 56413) by debbugs.gnu.org; 8 Nov 2022 05:05:39 +0000 Received: from localhost ([127.0.0.1]:35960 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1osGnq-0003hL-Io for submit@debbugs.gnu.org; Tue, 08 Nov 2022 00:05:38 -0500 Received: from defaultvalue.org ([45.33.119.55]:59694 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1osGno-0003hD-Dt for 56413@debbugs.gnu.org; Tue, 08 Nov 2022 00:05:36 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id AC96B2043C; Mon, 7 Nov 2022 23:05:35 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 08A3714E552; Mon, 7 Nov 2022 23:05:35 -0600 (CST) From: Rob Browning In-Reply-To: <87zgd3vpb7.fsf@trouble.defaultvalue.org> References: <20220706012323.1024763-1-rlb@defaultvalue.org> <87zgd5gi4t.fsf@gnu.org> <87zgd3vpb7.fsf@trouble.defaultvalue.org> Date: Mon, 07 Nov 2022 23:05:34 -0600 Message-ID: <87mt92ujc1.fsf@trouble.defaultvalue.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Rob Browning writes: > OK, so unfortunately I don't actually recall how I came up with that > number, but I can start over with some canonical approach to compute the > value if we like. I hacked up hash.c to let me call wide_string_hash() directly and printed the hash for wchar_t {0x3A0, 0x3B5, 0x3C1, 0x3AF}, which should be what the optimized utf-8 code is consuming. I saw 4029223418961680680. I double-checked via (symbol-hash '=CE=A0=CE=B5=CF=81=CE=AF) from the terminal, and that returned the same va= lue. Oh, and unless I'm missing something, I remembered why we may need to keep the standalone C test program -- there's no straightforward way to call scm_from_utf8_symbol() from scheme? Thanks --=20 Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Tue, 08 Nov 2022 10:10:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Rob Browning Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.166790218412502 (code B ref 56413); Tue, 08 Nov 2022 10:10:02 +0000 Received: (at 56413) by debbugs.gnu.org; 8 Nov 2022 10:09:44 +0000 Received: from localhost ([127.0.0.1]:36363 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1osLY8-0003Fa-G6 for submit@debbugs.gnu.org; Tue, 08 Nov 2022 05:09:44 -0500 Received: from eggs.gnu.org ([209.51.188.92]:48770) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1osLY6-0003FN-5a for 56413@debbugs.gnu.org; Tue, 08 Nov 2022 05:09:43 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1osLY0-00030v-N1; Tue, 08 Nov 2022 05:09:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=v5q8sKgGVhU2nQbKK0vT/VVQpi7yBx20iK9Dl+7wJBk=; b=fNPOQJS1vuF2NgpyXiLD 0m2++DeQvp5GLaeB7Hjayt+RJ2vxW9uHdffR/QPGCi9sintxjpr1r6FsuOtZgpHXKqx+I3dKigQVN ouXV0vrrpWiNJIi1iwXTXFxodCvuv+HahJSPWKx92jFsAEWaGj5FX6YTlNTzaMaViQEEpD5EIQ/jm zcD5AZCOPm5Vrzx2HJuNPlWjL0r1CCOitJujFCDuRHVsITbCPxHC7Eqnc4bfhdCEi93xz+oBuUkw5 4c0m6jZsIzHtdxvSDpg/u1bl5exYleeSPf5ZC7ac1JMsWCMNVUk7xNWuKllKE5MJephGM5hNz60ei u9hfjgajF10Ckg==; Received: from [193.50.110.147] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1osLY0-00040g-8J; Tue, 08 Nov 2022 05:09:36 -0500 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20220706012323.1024763-1-rlb@defaultvalue.org> <87zgd5gi4t.fsf@gnu.org> <87zgd3vpb7.fsf@trouble.defaultvalue.org> <87mt92ujc1.fsf@trouble.defaultvalue.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Octidi 18 Brumaire an 231 de la =?UTF-8?Q?R=C3=A9volution,?= jour de la Dentelaire X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 08 Nov 2022 11:09:33 +0100 In-Reply-To: <87mt92ujc1.fsf@trouble.defaultvalue.org> (Rob Browning's message of "Mon, 07 Nov 2022 23:05:34 -0600") Message-ID: <87tu39rc4i.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Rob Browning skribis: > Oh, and unless I'm missing something, I remembered why we may need to > keep the standalone C test program -- there's no straightforward way to > call scm_from_utf8_symbol() from scheme? Ah yes, you=E2=80=99re probably right! Ludo=E2=80=99. From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH v2 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 05 Mar 2023 22:22:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: 56413@debbugs.gnu.org Cc: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.1678054869313 (code B ref 56413); Sun, 05 Mar 2023 22:22:02 +0000 Received: (at 56413) by debbugs.gnu.org; 5 Mar 2023 22:21:09 +0000 Received: from localhost ([127.0.0.1]:40912 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pYwj4-00004v-GX for submit@debbugs.gnu.org; Sun, 05 Mar 2023 17:21:09 -0500 Received: from defaultvalue.org ([45.33.119.55]:44570 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pYwiz-0008VM-NY for 56413@debbugs.gnu.org; Sun, 05 Mar 2023 17:21:05 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id EA3A520141; Sun, 5 Mar 2023 16:21:00 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 9513C14E09D; Sun, 5 Mar 2023 16:21:00 -0600 (CST) From: Rob Browning Date: Sun, 5 Mar 2023 16:21:00 -0600 Message-Id: <20230305222100.1062507-1-rlb@defaultvalue.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <87tu39rc4i.fsf@gnu.org> References: <87tu39rc4i.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Noticed while investigating a migration to utf-8 strings. After making changes that routed non-ascii symbol hashing through this function, encoding-iso88597.test began intermittently failing because it would traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. Change the scm_i_str2symbol and scm_i_str2uninterned_symbol internal hash type to unsigned long to explicitly match the scm_i_string_hash result type. * libguile/hash.c (scm_i_utf8_string_hash): Call u8_mbsnlen not u8_strnlen. * libguile/symbols.c (scm_i_str2symbol, scm_i_str2uninterned_symbol): Use unsigned long for scm_i_string_hash result. * test-suite/standalone/.gitignore: Add test-hashing. * test-suite/standalone/Makefile.am: Add test-hashing. * test-suite/standalone/test-hashing.c: Add. --- Fixed up a few things: the test, gitignore, and added the changelog entries to the message. libguile/hash.c | 2 +- libguile/symbols.c | 4 +- test-suite/standalone/.gitignore | 1 + test-suite/standalone/Makefile.am | 7 ++++ test-suite/standalone/test-hashing.c | 63 ++++++++++++++++++++++++++++ 5 files changed, 74 insertions(+), 3 deletions(-) create mode 100644 test-suite/standalone/test-hashing.c diff --git a/libguile/hash.c b/libguile/hash.c index c192ac2e5..5abdfe397 100644 --- a/libguile/hash.c +++ b/libguile/hash.c @@ -185,7 +185,7 @@ scm_i_utf8_string_hash (const char *str, size_t len) /* Invalid UTF-8; punt. */ return scm_i_string_hash (scm_from_utf8_stringn (str, len)); - length = u8_strnlen (ustr, len); + length = u8_mbsnlen (ustr, len); /* Set up the internal state. */ a = b = c = 0xdeadbeef + ((uint32_t)(length<<2)) + 47; diff --git a/libguile/symbols.c b/libguile/symbols.c index 02be7c1c4..086abf585 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -239,7 +239,7 @@ static SCM scm_i_str2symbol (SCM str) { SCM symbol; - size_t raw_hash = scm_i_string_hash (str); + unsigned long raw_hash = scm_i_string_hash (str); symbol = lookup_interned_symbol (str, raw_hash); if (scm_is_true (symbol)) @@ -261,7 +261,7 @@ scm_i_str2symbol (SCM str) static SCM scm_i_str2uninterned_symbol (SCM str) { - size_t raw_hash = scm_i_string_hash (str); + unsigned long raw_hash = scm_i_string_hash (str); return scm_i_make_symbol (str, SCM_I_F_SYMBOL_UNINTERNED, raw_hash); } diff --git a/test-suite/standalone/.gitignore b/test-suite/standalone/.gitignore index 794146e60..f38f7fbe2 100644 --- a/test-suite/standalone/.gitignore +++ b/test-suite/standalone/.gitignore @@ -1,5 +1,6 @@ /test-conversion /test-gh +/test-hashing /test-list /test-num2integral /test-round diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index 547241afa..17bb59a18 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -167,6 +167,13 @@ test_conversion_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la check_PROGRAMS += test-conversion TESTS += test-conversion +# test-hashing +test_hashing_SOURCES = test-hashing.c +test_hashing_CFLAGS = ${test_cflags} +test_hashing_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la +check_PROGRAMS += test-hashing +TESTS += test-hashing + # test-loose-ends test_loose_ends_SOURCES = test-loose-ends.c test_loose_ends_CFLAGS = ${test_cflags} diff --git a/test-suite/standalone/test-hashing.c b/test-suite/standalone/test-hashing.c new file mode 100644 index 000000000..5982a0fdb --- /dev/null +++ b/test-suite/standalone/test-hashing.c @@ -0,0 +1,63 @@ +/* Copyright 2022-2023 + Free Software Foundation, Inc. + + This file is part of Guile. + + Guile is free software: you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published + by the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + Guile is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with Guile. If not, see + . */ + +#if HAVE_CONFIG_H +# include +#endif + +#include + +#include + +static void +test_hashing () +{ + // Make sure a utf-8 symbol has the expected hash. In addition to + // catching algorithmic regressions, this would have caught a + // long-standing buffer overflow. + + // Περί + char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0}; + SCM sym = scm_from_utf8_symbol (about_u8); + + // Value determined by calling wide_string_hash on {0x3A0, 0x3B5, + // 0x3C1, 0x3AF} via a temporary test program. + const unsigned long expect = 4029223418961680680; + const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym)); + + if (actual != expect) + { + fprintf (stderr, "fail: unexpected utf-8 symbol hash (%lu != %lu)\n", + actual, expect); + exit (EXIT_FAILURE); + } +} + +static void +tests (void *data, int argc, char **argv) +{ + test_hashing (); +} + +int +main (int argc, char *argv[]) +{ + scm_boot_guile (argc, argv, tests, NULL); + return 0; +} -- 2.39.2 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH v2 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 06 Mar 2023 16:40:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Rob Browning Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.167812075920810 (code B ref 56413); Mon, 06 Mar 2023 16:40:01 +0000 Received: (at 56413) by debbugs.gnu.org; 6 Mar 2023 16:39:19 +0000 Received: from localhost ([127.0.0.1]:43446 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pZDrr-0005Pa-Ik for submit@debbugs.gnu.org; Mon, 06 Mar 2023 11:39:19 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35088) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pZDrq-0005PL-8r for 56413@debbugs.gnu.org; Mon, 06 Mar 2023 11:39:18 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pZDrk-0005hc-Pn; Mon, 06 Mar 2023 11:39:12 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=WenfYAjBfUFUHwHYgnE58EY6ZmyrDEb6G/luyYMxBRk=; b=i2JE71gpkuNcsTCDHocI twcLqBELZNNjYolipg5z7xPza4hb2/57b8vsM5Em3bKcxmKVBeXq+I6M4sFxtKns6TEIvYOS6qKT9 JH9jc1ZK0Z+2c7HUMDefTEMzFfpXxCfLBpvwo03IqY/S2OhxQhsRat66efuLs6uDRT9CtRS+Sg1fL 3XACM4ix3+WBZELxK2qNiSUCDIYMHhBZbs4B2Na3mJwTCvvFJiOrvXp4uAXkWCGuUbPTIR/5m9aAc F0JMOgcqm+EtI2TUR2nUfRc4o5GMvBWfGaKhOQQQA+11yuUR1vHCEvs9XK4j1LngWUBTsumXcsJXd Oca9ZyhSVcPIOw==; Received: from [193.50.110.138] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pZDrk-00073p-DT; Mon, 06 Mar 2023 11:39:12 -0500 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <87tu39rc4i.fsf@gnu.org> <20230305222100.1062507-1-rlb@defaultvalue.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Sextidi 16 =?UTF-8?Q?Vent=C3=B4se?= an 231 de la =?UTF-8?Q?R=C3=A9volution,?= jour de =?UTF-8?Q?l'=C3=89pinard?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 06 Mar 2023 17:39:10 +0100 In-Reply-To: <20230305222100.1062507-1-rlb@defaultvalue.org> (Rob Browning's message of "Sun, 5 Mar 2023 16:21:00 -0600") Message-ID: <87ilfd96mp.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi, Rob Browning skribis: > Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. > > Change the scm_i_str2symbol and scm_i_str2uninterned_symbol internal > hash type to unsigned long to explicitly match the scm_i_string_hash > result type. > > * libguile/hash.c (scm_i_utf8_string_hash): Call u8_mbsnlen not u8_strnle= n. > * libguile/symbols.c (scm_i_str2symbol, scm_i_str2uninterned_symbol): > Use unsigned long for scm_i_string_hash result. > * test-suite/standalone/.gitignore: Add test-hashing. > * test-suite/standalone/Makefile.am: Add test-hashing. > * test-suite/standalone/test-hashing.c: Add. LGTM, thanks! Please update =E2=80=98NEWS=E2=80=99 too, under a new =E2=80=9CBug fixes=E2= =80=9D heading. Ludo=E2=80=99. From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH v3 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Rob Browning Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Sun, 12 Mar 2023 19:31:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: 56413@debbugs.gnu.org Cc: Ludovic =?UTF-8?Q?Court=C3=A8s?= Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.167864944929232 (code B ref 56413); Sun, 12 Mar 2023 19:31:02 +0000 Received: (at 56413) by debbugs.gnu.org; 12 Mar 2023 19:30:49 +0000 Received: from localhost ([127.0.0.1]:33118 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pbRP6-0007bP-Qd for submit@debbugs.gnu.org; Sun, 12 Mar 2023 15:30:49 -0400 Received: from defaultvalue.org ([45.33.119.55]:45104 ident=postfix) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pbRP4-0007bG-Ng for 56413@debbugs.gnu.org; Sun, 12 Mar 2023 15:30:47 -0400 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id 33C4D2013A; Sun, 12 Mar 2023 14:30:45 -0500 (CDT) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id C1FAB14E089; Sun, 12 Mar 2023 14:30:44 -0500 (CDT) From: Rob Browning Date: Sun, 12 Mar 2023 14:30:44 -0500 Message-Id: <20230312193044.2013038-1-rlb@defaultvalue.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <87ilfd96mp.fsf@gnu.org> References: <87ilfd96mp.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Noticed while investigating a migration to utf-8 strings. After making changes that routed non-ascii symbol hashing through this function, encoding-iso88597.test began intermittently failing because it would traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. Change the scm_i_str2symbol and scm_i_str2uninterned_symbol internal hash type to unsigned long to explicitly match the scm_i_string_hash result type. * libguile/hash.c (scm_i_utf8_string_hash): Call u8_mbsnlen not u8_strnlen. * libguile/symbols.c (scm_i_str2symbol, scm_i_str2uninterned_symbol): Use unsigned long for scm_i_string_hash result. * test-suite/standalone/.gitignore: Add test-hashing. * test-suite/standalone/Makefile.am: Add test-hashing. * test-suite/standalone/test-hashing.c: Add. --- Added NEWS. NEWS | 12 ++++++ libguile/hash.c | 2 +- libguile/symbols.c | 4 +- test-suite/standalone/.gitignore | 1 + test-suite/standalone/Makefile.am | 7 ++++ test-suite/standalone/test-hashing.c | 63 ++++++++++++++++++++++++++++ 6 files changed, 86 insertions(+), 3 deletions(-) create mode 100644 test-suite/standalone/test-hashing.c diff --git a/NEWS b/NEWS index a0009406f..a55cb583b 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,18 @@ definitely unused---this is notably the case for modules that are only used at macro-expansion time, such as (srfi srfi-26). In those cases, the compiler reports it as "possibly unused". +* Bug fixes + +* Hashing of UTF-8 symbols with non-ASCII characters avoids corruption + +This issue could cause `scm_from_utf8_symbol' and +`scm_from_utf8_symboln` to incorrectly conclude that the symbol hadn't +already been interned, and then create a new one, which of course +wouldn't be `eq?' to the other(s). The incorrect hash was the result of +a buffer overrun, and so might vary. This problem affected a number of +other operations, given the internal use of those functions. +() + Changes in 3.0.9 (since 3.0.8) diff --git a/libguile/hash.c b/libguile/hash.c index c192ac2e5..5abdfe397 100644 --- a/libguile/hash.c +++ b/libguile/hash.c @@ -185,7 +185,7 @@ scm_i_utf8_string_hash (const char *str, size_t len) /* Invalid UTF-8; punt. */ return scm_i_string_hash (scm_from_utf8_stringn (str, len)); - length = u8_strnlen (ustr, len); + length = u8_mbsnlen (ustr, len); /* Set up the internal state. */ a = b = c = 0xdeadbeef + ((uint32_t)(length<<2)) + 47; diff --git a/libguile/symbols.c b/libguile/symbols.c index 02be7c1c4..086abf585 100644 --- a/libguile/symbols.c +++ b/libguile/symbols.c @@ -239,7 +239,7 @@ static SCM scm_i_str2symbol (SCM str) { SCM symbol; - size_t raw_hash = scm_i_string_hash (str); + unsigned long raw_hash = scm_i_string_hash (str); symbol = lookup_interned_symbol (str, raw_hash); if (scm_is_true (symbol)) @@ -261,7 +261,7 @@ scm_i_str2symbol (SCM str) static SCM scm_i_str2uninterned_symbol (SCM str) { - size_t raw_hash = scm_i_string_hash (str); + unsigned long raw_hash = scm_i_string_hash (str); return scm_i_make_symbol (str, SCM_I_F_SYMBOL_UNINTERNED, raw_hash); } diff --git a/test-suite/standalone/.gitignore b/test-suite/standalone/.gitignore index 794146e60..f38f7fbe2 100644 --- a/test-suite/standalone/.gitignore +++ b/test-suite/standalone/.gitignore @@ -1,5 +1,6 @@ /test-conversion /test-gh +/test-hashing /test-list /test-num2integral /test-round diff --git a/test-suite/standalone/Makefile.am b/test-suite/standalone/Makefile.am index 547241afa..17bb59a18 100644 --- a/test-suite/standalone/Makefile.am +++ b/test-suite/standalone/Makefile.am @@ -167,6 +167,13 @@ test_conversion_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la check_PROGRAMS += test-conversion TESTS += test-conversion +# test-hashing +test_hashing_SOURCES = test-hashing.c +test_hashing_CFLAGS = ${test_cflags} +test_hashing_LDADD = $(LIBGUILE_LDADD) $(top_builddir)/lib/libgnu.la +check_PROGRAMS += test-hashing +TESTS += test-hashing + # test-loose-ends test_loose_ends_SOURCES = test-loose-ends.c test_loose_ends_CFLAGS = ${test_cflags} diff --git a/test-suite/standalone/test-hashing.c b/test-suite/standalone/test-hashing.c new file mode 100644 index 000000000..5982a0fdb --- /dev/null +++ b/test-suite/standalone/test-hashing.c @@ -0,0 +1,63 @@ +/* Copyright 2022-2023 + Free Software Foundation, Inc. + + This file is part of Guile. + + Guile is free software: you can redistribute it and/or modify it + under the terms of the GNU Lesser General Public License as published + by the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + Guile is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public + License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with Guile. If not, see + . */ + +#if HAVE_CONFIG_H +# include +#endif + +#include + +#include + +static void +test_hashing () +{ + // Make sure a utf-8 symbol has the expected hash. In addition to + // catching algorithmic regressions, this would have caught a + // long-standing buffer overflow. + + // Περί + char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0}; + SCM sym = scm_from_utf8_symbol (about_u8); + + // Value determined by calling wide_string_hash on {0x3A0, 0x3B5, + // 0x3C1, 0x3AF} via a temporary test program. + const unsigned long expect = 4029223418961680680; + const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym)); + + if (actual != expect) + { + fprintf (stderr, "fail: unexpected utf-8 symbol hash (%lu != %lu)\n", + actual, expect); + exit (EXIT_FAILURE); + } +} + +static void +tests (void *data, int argc, char **argv) +{ + test_hashing (); +} + +int +main (int argc, char *argv[]) +{ + scm_boot_guile (argc, argv, tests, NULL); + return 0; +} -- 2.39.2 From unknown Sat Jun 21 10:44:43 2025 X-Loop: help-debbugs@gnu.org Subject: bug#56413: [PATCH v3 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: bug-guile@gnu.org Resent-Date: Mon, 13 Mar 2023 11:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 56413 X-GNU-PR-Package: guile X-GNU-PR-Keywords: patch To: Rob Browning Cc: 56413@debbugs.gnu.org Received: via spool by 56413-submit@debbugs.gnu.org id=B56413.167870695310100 (code B ref 56413); Mon, 13 Mar 2023 11:30:02 +0000 Received: (at 56413) by debbugs.gnu.org; 13 Mar 2023 11:29:13 +0000 Received: from localhost ([127.0.0.1]:33731 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pbgMb-0002cq-BB for submit@debbugs.gnu.org; Mon, 13 Mar 2023 07:29:13 -0400 Received: from eggs.gnu.org ([209.51.188.92]:36910) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pbgMa-0002cd-2x for 56413@debbugs.gnu.org; Mon, 13 Mar 2023 07:29:12 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pbgMU-0000uu-F2; Mon, 13 Mar 2023 07:29:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=fMsa9r2pSNvxrOLR/WQA2X7QbcOgOkB3MQwuIN/q49Y=; b=frsQmSML51DJtv+1nByd AWLRD4DMjgqhu8UxEXyI9wbNKHkGaPBUzxrpRTOeCW/qSgHFpMM9FwGF4HKU/LEU21peG1tmpEmMx LzBmzAUaPPQl2B4LISicgHGh2bvmkR7N9DOhoAkG0gQzB8IuhdtFsqjisRe17rf53rkgCRGY/6wQU 3XQ7I48K+1ApYAZaedUczK+wsKfUh/z3wr7gwV+T2GVgKLQLF5DNce72kFsLOb9XJAc7EFI75DFuk 31BbK0RNBExePf+JpcPTohjgkelzihmLHX8t/NWs+ciAQXlTADVZbpQoTJsLXdx8otkhYOoKREM+s 96Ejzq/tHbWThQ==; Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pbgMT-0004O9-Sr; Mon, 13 Mar 2023 07:29:06 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <87ilfd96mp.fsf@gnu.org> <20230312193044.2013038-1-rlb@defaultvalue.org> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: Tridi 23 =?UTF-8?Q?Vent=C3=B4se?= an 231 de la =?UTF-8?Q?R=C3=A9volution,?= jour du =?UTF-8?Q?Cochl=C3=A9aria?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 13 Mar 2023 12:29:04 +0100 In-Reply-To: <20230312193044.2013038-1-rlb@defaultvalue.org> (Rob Browning's message of "Sun, 12 Mar 2023 14:30:44 -0500") Message-ID: <87edps51q7.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Rob, Rob Browning skribis: > Noticed while investigating a migration to utf-8 strings. After making > changes that routed non-ascii symbol hashing through this function, > encoding-iso88597.test began intermittently failing because it would > traverse trailing garbage when u8_strnlen reported 8 chars instead of 4. > > Change the scm_i_str2symbol and scm_i_str2uninterned_symbol internal > hash type to unsigned long to explicitly match the scm_i_string_hash > result type. > > * libguile/hash.c (scm_i_utf8_string_hash): Call u8_mbsnlen not u8_strnle= n. > * libguile/symbols.c (scm_i_str2symbol, scm_i_str2uninterned_symbol): > Use unsigned long for scm_i_string_hash result. > * test-suite/standalone/.gitignore: Add test-hashing. > * test-suite/standalone/Makefile.am: Add test-hashing. > * test-suite/standalone/test-hashing.c: Add. Still LGTM, please push! :-) Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 20 18:24:23 2023 Received: (at control) by debbugs.gnu.org; 20 Mar 2023 22:24:23 +0000 Received: from localhost ([127.0.0.1]:57079 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peNvT-0007mk-0l for submit@debbugs.gnu.org; Mon, 20 Mar 2023 18:24:23 -0400 Received: from hera.aquilenet.fr ([185.233.100.1]:42780) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1peNvR-0007mW-LL for control@debbugs.gnu.org; Mon, 20 Mar 2023 18:24:22 -0400 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 26438640 for ; Mon, 20 Mar 2023 23:24:15 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at hera.aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h6MaBGYsxNby for ; Mon, 20 Mar 2023 23:24:14 +0100 (CET) Received: from ribbon (91-160-117-201.subs.proxad.net [91.160.117.201]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 8CD7161D for ; Mon, 20 Mar 2023 23:24:14 +0100 (CET) Date: Mon, 20 Mar 2023 23:24:13 +0100 Message-Id: <87ttyf2h9u.fsf@gnu.org> To: control@debbugs.gnu.org From: =?utf-8?Q?Ludovic_Court=C3=A8s?= Subject: control message for bug #56413 MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) close 56413 quit