GNU bug report logs -
#56413
[PATCH 1/1] scm_i_utf8_string_hash: compute u8 chars not bytes
Previous Next
Reported by: Rob Browning <rlb <at> defaultvalue.org>
Date: Wed, 6 Jul 2022 01:25:02 UTC
Severity: normal
Tags: patch
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #11 received at 56413 <at> debbugs.gnu.org (full text, mbox):
Hi,
Rob Browning <rlb <at> defaultvalue.org> skribis:
> Noticed while investigating a migration to utf-8 strings. After making
> changes that routed non-ascii symbol hashing through this function,
> encoding-iso88597.test began intermittently failing because it would
> traverse trailing garbage when u8_strnlen reported 8 chars instead of 4.
>
> Change the scm_i_str2symbol internal hash type to unsigned long to
> explicitly match the hashing result type.
Oh, good catch.
For the final patch please add a ChangeLog-style entry.
> + // Make sure a utf-8 symbol has the expected hash. In addition to
> + // catching algorithmic regressions, this would have caught a
> + // long-standing buffer overflow.
> +
> + // περί
> + char about_u8[] = {0xce, 0xa0, 0xce, 0xb5, 0xcf, 0x81, 0xce, 0xaf, 0};
> + SCM sym = scm_from_utf8_symbol (about_u8);
> +
> + const unsigned long expect = 4029223418961680680;
> + const unsigned long actual = scm_to_ulong (scm_symbol_hash (sym));
Is this a documented example of Jenkins? Or did you use a reference
implementation?
> Hmm. I suppose the current test could be handled on the scheme side
> instead. (I'd started off attempting some more direct, elaborate tests
> that didn't pan out.) Happy to rework that if desired.
Yes, it may be nicer to have it in ‘test-suite/tests/hash.test’.
AFAICS this will only change the hash of UTF-8 symbols and won’t have
any effect on the output of ‘string-hash’, right? If not that would be
an incompatibility.
Thanks and sorry for the delay!
Ludo’.
This bug report was last modified 2 years and 65 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.