GNU bug report logs - #68244
hash-table improvements

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Thu, 4 Jan 2024 16:29:02 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 68244 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: bug#68244: hash-table improvements
Date: Fri, 5 Jan 2024 17:41:47 +0200
On 05/01/2024 12:33, Mattias Engdegård wrote:
> [ replies to questions asked elsewhere ]
> 
> 4 jan. 2024 kl. 18.32 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
> 
>>> +hash_table_alloc_bytes (ptrdiff_t nbytes)
>>> +{
>>> +  if (nbytes == 0)
>>> +    return NULL;
>>> +  tally_consing (nbytes);
>>> +  return xmalloc (nbytes);
>>> +}
>>
>> Sorry if it's a stupid question, but if the operation doesn't add any Lisp "garbage", why increase the consing counter? That is likely triggers more GCs earlier which otherwise might not run, and if there are no slots to GC, it seems like they would run in vain.
> 
> That's a good question and it all comes down to how we interpret `consing_until_gc`. Here we take the view that it should encompass all parts of an allocation and this seems to be consistent with existing code.

But the existing code used objects that would need to be collected by 
GC, right? And the new one, seemingly, does not.

So I don't quite see the advantage of increasing consing_until_gc then. 
It's like the difference between creating new strings and inserting 
strings into a buffer: new memory is used either way, but the latter 
doesn't increase consing.

> These ancillary allocations are parts of the hash-table object: they are allocated and freed at the same time as the mother object, and subject to the same `consing_until_gc` accounting.
> 
> The new code is radically more efficient when growing tables, because we can free the old vectors immediately (and adjust consing_until_gc accordingly) instead of leaving it as garbage waiting to be collected. This provides several benefits in itself (GCs are made less often, we can re-use hot memory). Not having to traverse them in either the mark or sweep phases is another big gain (the key_and_value parts still have to be marked, of course).

It's great that the new hash tables are garbage-collected more easily 
and produce less garbage overall, but in a real program any GC cycle 
will have to traverse the other data structures anyway. So we might be 
leaving free performance gains on the table when we induce GC cycles 
while no managed allocations are done. I could be missing something, of 
course.




This bug report was last modified 1 year and 146 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.