GNU bug report logs -
#68244
hash-table improvements
Previous Next
Full log
View this message in rfc822 format
9 jan. 2024 kl. 01.33 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:
> BTW, what is the "per-entry byte-size" of your new code?
> The old code had about 6 words per entry, IIRC.
With tables filled to capacity, the old code was about 5.25 (assuming an index vector size 1.25 times the table size). Now it's 1.625 words less, thus 3.625 words per entry. I haven't done the maths for what the average per-entry size would be if we take growth space into account.
The index vector can be shrunk further if we use a narrower index for smaller tables. This is a fairly common optimisation and usually the lower memory usage is worth an extra branch or two.
The hash-table object size is also down from 16 words to 10. 8 is actually quite achievable: consolidate the key_and_value, hash and next vectors into a single allocated block. It's just a matter of benchmarking to see what memory arrangement is the most beneficial.
>> We could try switching to a high-quality hash function (or family thereof),
>> like Murmur or Jenkins. Then range reduction is just a matter of masking off
>> the required number of bits.
>
> I don't see a strong need for it.
Maybe not, but I wouldn't discount it out of hand. A few cheap ALU ops could easily pay for themselves if they lead to fewer collisions.
> BTW, I see in the Knuth reduction you extract the bits 32..32+N of
> the multiplication.
It's supposed to be bits [32-N,32) actually (hope I got that right).
> Any reason not to use the top N bits instead (so
> we're not limited to 32 bits, for example)?
I thought about writing a clever expression that would work for other widths as well but it seemed like a waste of time given the current data structures.
This bug report was last modified 1 year and 146 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.