GNU bug report logs -
#50093
djb2 correction
Previous Next
Reported by: Jim Meyering <jim <at> meyering.net>
Date: Tue, 17 Aug 2021 10:33:02 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 50093 in the body.
You can then email your comments to 50093 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#50093
; Package
grep
.
(Tue, 17 Aug 2021 10:33:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Jim Meyering <jim <at> meyering.net>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Tue, 17 Aug 2021 10:33:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Alex Murray noticed that my djb2 implementation mistakenly initialized
to 0, rather than to 5381. Corrected with this:
>From 54590ca833dba62041af045e7bc7c09b90b54b71 Mon Sep 17 00:00:00 2001
From: Alex Murray <alex.murray <at> canonical.com>
Date: Tue, 17 Aug 2021 03:24:37 -0700
Subject: [PATCH] grep: correct DJB2 initialization
* src/grep.c (hash_pattern): DJB2 starts with 5381, not 0.
---
src/grep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/grep.c b/src/grep.c
index 271b6b9..8fed550 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -126,7 +126,7 @@ static Hash_table *pattern_table;
static size_t _GL_ATTRIBUTE_PURE
hash_pattern (void const *pat, size_t n_buckets)
{
- size_t h = 0;
+ size_t h = 5381;
intptr_t pat_offset = (intptr_t) pat - 1;
unsigned char const *s = (unsigned char const *) pattern_array + pat_offset;
for ( ; *s != '\n'; s++)
--
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50093
; Package
grep
.
(Tue, 17 Aug 2021 21:05:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 50093 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 8/17/21 3:32 AM, Jim Meyering wrote:
> - size_t h = 0;
> + size_t h = 5381;
I expect DJB chose that number because of the primeth recurrence
sequence <https://oeis.org/A007097>:
2 is 1st prime.
3 is 2nd prime.
5 is 3rd prime.
11 is 5th prime.
31 is 11th prime.
127 is 31st prime.
709 is 127th prime.
5381 is 709th prime.
52711 is 5381st prime.
...
Although 5381 is the largest number in this sequence that can fit into
'int' in a portable C program, and that's probably why DJB chose 5381,
we're not limited to such small values here.
How about the attached patch instead?
[0001-grep-djb2-correction.patch (text/x-patch, attachment)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50093
; Package
grep
.
(Wed, 18 Aug 2021 02:53:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 50093 <at> debbugs.gnu.org (full text, mbox):
On Tue, 2021-08-17 at 14:04:26 -0700, Paul Eggert wrote:
> On 8/17/21 3:32 AM, Jim Meyering wrote:
>> - size_t h = 0;
>> + size_t h = 5381;
>
> I expect DJB chose that number because of the primeth recurrence sequence
> <https://oeis.org/A007097>:
>
> 2 is 1st prime.
> 3 is 2nd prime.
> 5 is 3rd prime.
> 11 is 5th prime.
> 31 is 11th prime.
> 127 is 31st prime.
> 709 is 127th prime.
> 5381 is 709th prime.
> 52711 is 5381st prime.
> ...
>
> Although 5381 is the largest number in this sequence that can fit into
> 'int' in a portable C program, and that's probably why DJB chose 5381,
> we're not limited to such small values here.
>
> How about the attached patch instead?
>
How about letting the compiler choose which prime to use? - something like the following:
#define H15_PRIME (unsigned long long)5381
#define H32_PRIME (unsigned long long)3657500101
#define H64_PRIME (unsigned long long)4123221751654370051
size_t h = H64_PRIME <= SIZE_MAX ? H64_PRIME : H32_PRIME <= SIZE_MAX ? H32_PRIME : H15_PRIME;
Information forwarded
to
bug-grep <at> gnu.org
:
bug#50093
; Package
grep
.
(Wed, 18 Aug 2021 14:19:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 50093 <at> debbugs.gnu.org (full text, mbox):
On Tue, Aug 17, 2021 at 11:04 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 8/17/21 3:32 AM, Jim Meyering wrote:
> > - size_t h = 0;
> > + size_t h = 5381;
>
> I expect DJB chose that number because of the primeth recurrence
> sequence <https://oeis.org/A007097>:
>
> 2 is 1st prime.
> 3 is 2nd prime.
> 5 is 3rd prime.
> 11 is 5th prime.
> 31 is 11th prime.
> 127 is 31st prime.
> 709 is 127th prime.
> 5381 is 709th prime.
> 52711 is 5381st prime.
> ...
>
> Although 5381 is the largest number in this sequence that can fit into
> 'int' in a portable C program, and that's probably why DJB chose 5381,
> we're not limited to such small values here.
>
> How about the attached patch instead?
I prefer that, indeed. Thanks.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Wed, 18 Aug 2021 14:49:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Jim Meyering <jim <at> meyering.net>
:
bug acknowledged by developer.
(Wed, 18 Aug 2021 14:49:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 50093-done <at> debbugs.gnu.org (full text, mbox):
On 8/17/21 7:50 PM, Alex Murray wrote:
> How about letting the compiler choose which prime to use? - something like the following:
>
> #define H15_PRIME (unsigned long long)5381
> #define H32_PRIME (unsigned long long)3657500101
> #define H64_PRIME (unsigned long long)4123221751654370051
> size_t h = H64_PRIME <= SIZE_MAX ? H64_PRIME : H32_PRIME <= SIZE_MAX ? H32_PRIME : H15_PRIME;
That's equivalent to the patch I suggested; on my platform it even
generates the same machine code. It's better to avoid macros when that's
easy, as is the case here.
I installed the patch, except with uint_fast64_t instead of unsigned
long long int as that's a better type for the 64-bit constants in
question. (In case you're wondering what that "_fast" is doing there,
POSIX and the C standard guarantee the existence of uint_fast64_t but
not of uint64_t, as a concession to Unisys mainframes where long long is
72 bits.)
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 16 Sep 2021 11:24:07 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 281 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.