GNU bug report logs - #50093
djb2 correction

Previous Next

Package: grep;

Reported by: Jim Meyering <jim <at> meyering.net>

Date: Tue, 17 Aug 2021 10:33:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 50093 in the body.
You can then email your comments to 50093 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#50093; Package grep. (Tue, 17 Aug 2021 10:33:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jim Meyering <jim <at> meyering.net>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Tue, 17 Aug 2021 10:33:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: bug-grep <at> gnu.org, Alex Murray <alex.murray <at> canonical.com>
Subject: djb2 correction
Date: Tue, 17 Aug 2021 12:32:39 +0200
Alex Murray noticed that my djb2 implementation mistakenly initialized
to 0, rather than to 5381. Corrected with this:

>From 54590ca833dba62041af045e7bc7c09b90b54b71 Mon Sep 17 00:00:00 2001
From: Alex Murray <alex.murray <at> canonical.com>
Date: Tue, 17 Aug 2021 03:24:37 -0700
Subject: [PATCH] grep: correct DJB2 initialization

* src/grep.c (hash_pattern): DJB2 starts with 5381, not 0.
---
 src/grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/grep.c b/src/grep.c
index 271b6b9..8fed550 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -126,7 +126,7 @@ static Hash_table *pattern_table;
 static size_t _GL_ATTRIBUTE_PURE
 hash_pattern (void const *pat, size_t n_buckets)
 {
-  size_t h = 0;
+  size_t h = 5381;
   intptr_t pat_offset = (intptr_t) pat - 1;
   unsigned char const *s = (unsigned char const *) pattern_array + pat_offset;
   for ( ; *s != '\n'; s++)
--




Information forwarded to bug-grep <at> gnu.org:
bug#50093; Package grep. (Tue, 17 Aug 2021 21:05:01 GMT) Full text and rfc822 format available.

Message #8 received at 50093 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>
Cc: alex.murray <at> canonical.com, 50093 <at> debbugs.gnu.org
Subject: Re: bug#50093: djb2 correction
Date: Tue, 17 Aug 2021 14:04:26 -0700
[Message part 1 (text/plain, inline)]
On 8/17/21 3:32 AM, Jim Meyering wrote:
> -  size_t h = 0;
> +  size_t h = 5381;

I expect DJB chose that number because of the primeth recurrence 
sequence <https://oeis.org/A007097>:

2 is 1st prime.
3 is 2nd prime.
5 is 3rd prime.
11 is 5th prime.
31 is 11th prime.
127 is 31st prime.
709 is 127th prime.
5381 is 709th prime.
52711 is 5381st prime.
...

Although 5381 is the largest number in this sequence that can fit into 
'int' in a portable C program, and that's probably why DJB chose 5381, 
we're not limited to such small values here.

How about the attached patch instead?
[0001-grep-djb2-correction.patch (text/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#50093; Package grep. (Wed, 18 Aug 2021 02:53:01 GMT) Full text and rfc822 format available.

Message #11 received at 50093 <at> debbugs.gnu.org (full text, mbox):

From: Alex Murray <alex.murray <at> canonical.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Jim Meyering <jim <at> meyering.net>, 50093 <at> debbugs.gnu.org
Subject: Re: bug#50093: djb2 correction
Date: Wed, 18 Aug 2021 12:20:26 +0930
On Tue, 2021-08-17 at 14:04:26 -0700, Paul Eggert wrote:

> On 8/17/21 3:32 AM, Jim Meyering wrote:
>> -  size_t h = 0;
>> +  size_t h = 5381;
>
> I expect DJB chose that number because of the primeth recurrence sequence
> <https://oeis.org/A007097>:
>
> 2 is 1st prime.
> 3 is 2nd prime.
> 5 is 3rd prime.
> 11 is 5th prime.
> 31 is 11th prime.
> 127 is 31st prime.
> 709 is 127th prime.
> 5381 is 709th prime.
> 52711 is 5381st prime.
> ...
>
> Although 5381 is the largest number in this sequence that can fit into
> 'int' in a portable C program, and that's probably why DJB chose 5381,
> we're not limited to such small values here.
>
> How about the attached patch instead?
>

How about letting the compiler choose which prime to use? - something like the following:

#define H15_PRIME (unsigned long long)5381
#define H32_PRIME (unsigned long long)3657500101
#define H64_PRIME (unsigned long long)4123221751654370051
size_t h = H64_PRIME <= SIZE_MAX ? H64_PRIME : H32_PRIME <= SIZE_MAX ? H32_PRIME : H15_PRIME;





Information forwarded to bug-grep <at> gnu.org:
bug#50093; Package grep. (Wed, 18 Aug 2021 14:19:02 GMT) Full text and rfc822 format available.

Message #14 received at 50093 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Alex Murray <alex.murray <at> canonical.com>, 50093 <at> debbugs.gnu.org
Subject: Re: bug#50093: djb2 correction
Date: Wed, 18 Aug 2021 16:18:33 +0200
On Tue, Aug 17, 2021 at 11:04 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 8/17/21 3:32 AM, Jim Meyering wrote:
> > -  size_t h = 0;
> > +  size_t h = 5381;
>
> I expect DJB chose that number because of the primeth recurrence
> sequence <https://oeis.org/A007097>:
>
> 2 is 1st prime.
> 3 is 2nd prime.
> 5 is 3rd prime.
> 11 is 5th prime.
> 31 is 11th prime.
> 127 is 31st prime.
> 709 is 127th prime.
> 5381 is 709th prime.
> 52711 is 5381st prime.
> ...
>
> Although 5381 is the largest number in this sequence that can fit into
> 'int' in a portable C program, and that's probably why DJB chose 5381,
> we're not limited to such small values here.
>
> How about the attached patch instead?

I prefer that, indeed. Thanks.




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Wed, 18 Aug 2021 14:49:02 GMT) Full text and rfc822 format available.

Notification sent to Jim Meyering <jim <at> meyering.net>:
bug acknowledged by developer. (Wed, 18 Aug 2021 14:49:02 GMT) Full text and rfc822 format available.

Message #19 received at 50093-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: alex.murray <at> canonical.com
Cc: Jim Meyering <jim <at> meyering.net>, 50093-done <at> debbugs.gnu.org
Subject: Re: bug#50093: djb2 correction
Date: Wed, 18 Aug 2021 07:48:09 -0700
On 8/17/21 7:50 PM, Alex Murray wrote:
> How about letting the compiler choose which prime to use? - something like the following:
> 
> #define H15_PRIME (unsigned long long)5381
> #define H32_PRIME (unsigned long long)3657500101
> #define H64_PRIME (unsigned long long)4123221751654370051
> size_t h = H64_PRIME <= SIZE_MAX ? H64_PRIME : H32_PRIME <= SIZE_MAX ? H32_PRIME : H15_PRIME;

That's equivalent to the patch I suggested; on my platform it even 
generates the same machine code. It's better to avoid macros when that's 
easy, as is the case here.

I installed the patch, except with uint_fast64_t instead of unsigned 
long long int as that's a better type for the 64-bit constants in 
question. (In case you're wondering what that "_fast" is doing there, 
POSIX and the C standard guarantee the existence of uint_fast64_t but 
not of uint64_t, as a concession to Unisys mainframes where long long is 
72 bits.)




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 16 Sep 2021 11:24:07 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 281 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.