Package: emacs;
Reported by: Helmut Eller <eller.helmut <at> gmail.com>
Date: Tue, 3 Jun 2025 19:45:08 UTC
Severity: normal
Tags: patch
View this message in rfc822 format
From: Pip Cet <pipcet <at> protonmail.com> To: Helmut Eller <eller.helmut <at> gmail.com> Cc: 78686 <at> debbugs.gnu.org Subject: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Wed, 04 Jun 2025 20:15:46 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes: > On Wed, Jun 04 2025, Pip Cet wrote: > >> "Helmut Eller" <eller.helmut <at> gmail.com> writes: >> >>> Here is a proposal to move charset_table out of the GC heap and make it >>> an exact root area instead. >> >>> Previously, the charset_table was located in various places: >> >>> 1. in charset_table_init during bootstrap (ambiguous root) >>> 2. on the GC heap (loaded from the dump; pinned) >>> 3. on the malloc heap when it needed to grow (ambiguous root) >>> >>> Now, the charset table is copied back to charset_table_init after >>> loading the dump and it is scanned as an exact root area. >> >> Thanks. Seems like the wrong approach to me: >> >> This removes the case we want, which is case 2, by restoring the case we >> don't want, by adding new and larger unprotected roots. Shouldn't we >> simply make it so if igc is in use, we only ever use case 2? > > Not necessarily. My ideal solution would be to replace the > charset.attributes field with something that doesn't need to be traced, > e.g. an index into a Lisp_Vector of attributes. I agree the charset design in general could do with some attention. In particular, we're still leaking the old charset tables when growing for the second or subsequent times on the master branch, and I really think if there were a good reason to do so someone would have updated the comment by now. Plus, there is no way to undefine charsets. However, such changes should not be specific to feature/igc. What problem are we trying to solve here? As far as I can tell, there's just one significant problem: we have a root covering charset_table_init even though that array is unused in pdumped Emacs sessions. It's not a huge problem, just 180 zero words fixed as Lisp objects, but it is unsatisfying and we should fix it. So let's get rid of that unnecessary root. That leaves two very minor problems: 1. undumped Emacs sessions. I don't think these are worth optimizing for. Do you? 2. Emacs sessions which grow charset_table beyond the 180 entries in the dump. Does this happen? In any case, we do handle the case correctly, leaking memory but giving the user access to as many charsets as they desire. There is also the long-term goal of making it so all objects in the pdump section can be relocated elsewhere, allowing us to free the section. We should do so by tracing objects in the dump precisely so they can be moved out. We should not do so by explicitly memcpying objects from the dump to xmalloc'd memory. >> So why not just turn IGC_OBJ_DUMPED_CHARSET_TABLE into >> IGC_OBJ_CHARSET_TABLE, remove some igc_asserts, and avoid roots >> altogether, ensuring that root_create_charset_table is called only in >> temacs, never in pdumped emacs? > > First, The IGC_OBJ_DUMPED_FOO types only exist because changing the GC > heap format was easier than wrestling with the dumper. There was the > expectation that the dumper will eventually be extended to support what > MPS actually needs: sections that correspond to MPS pools. I'm not agreeing or disagreeing with that statement. I think the MPS tradeoff of not having snapshots and requiring client code to handle dump/restore scenarios on its own was probably a good choice, and I see no way of changing it now. > Second, I don't like the practice of using the GC header to create > "shadow types". Noted. That's a good argument for not introducing new ones, but your patch doesn't introduce or delete any; it just copies things back from the dump heap (which is never freed) to an unprotected root in the BSS section or to xmalloc'd memory. > I'd prefer to make them pvec types or keep 'em out of > the GC heap. I agree with the sentiment, but turning struct charset into a pseudovector would be a significant change which should be done on the master branch, too, IMHO, and would need to be discussed first. I assume there were good reasons for making it a flat table rather than a vector of pseudovectors (it does save one level of indirection), but I'm not sure those are still relevant. >> Are you looking at kbd_buffer next? That's a very large ambiguous root, >> but a little tricky, not least due to the GCC bug. > > Maybe, depending on how this charset table business goes. I don't see the dependency. As far as I can tell, we need this patch. It removes the extraneous root. From a13ef4ad9b97455c58682b4f616272a1c09c4b8f Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet <at> protonmail.com> Subject: [PATCH] Destroy the charset_table_init root when pdump is loaded (bug#78686) At this point, pdumper has changed charset_table to point into the dump, not to charset_table_init, which remains unused for the rest of the Emacs session. No need for a root. * src/igc.c (igc_on_pdump_loaded): Destroy unnecessary root. --- src/igc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/igc.c b/src/igc.c index f63b0fe8c88..ab61fa7d3d6 100644 --- a/src/igc.c +++ b/src/igc.c @@ -5204,6 +5204,8 @@ igc_on_pdump_loaded (void *dump_base, void *hot_start, void *hot_end, (uint8_t *) pinned_objects_in_dump + sizeof pinned_objects_in_dump, "dump-pins"); + + igc_destroy_root_with_start (charset_table_init); } void * -- 2.48.1 We should also add a comment explaining that pinned_objects_in_dump[1] is necessary to keep the code space masks around: From 48e21dc0e27e9d0a19579c5cd13cbad43533ada8 Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet <at> protonmail.com> Subject: [PATCH] ; * src/igc.c (fix_charset_table): Add comment about untraced ref. --- src/igc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/igc.c b/src/igc.c index ab61fa7d3d6..7ff263255f9 100644 --- a/src/igc.c +++ b/src/igc.c @@ -1950,6 +1950,9 @@ fix_charset_table (mps_ss_t ss, struct charset *table, size_t nbytes) { for (size_t i = 0, len = nbytes / sizeof (struct charset); i < len; i++) IGC_FIX12_OBJ (ss, &table[i].attributes); + /* FIXME/igc: fix table[i].code_space_mask, once that's no longer an + interior pointer. Currently, it's protected explicitly by + pinned_objects_in_dump[1] == code_space_masks. */ } MPS_SCAN_END (ss); return MPS_RES_OK; -- 2.48.1 Finally, once we're convinced that it's not needed for anything else, we can do this: From 7111cb4562bbd89f12d6a7bf284ec95f93ccf69d Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet <at> protonmail.com> Subject: [PATCH] Make charset_table an exact root (bug#78686) * src/igc.c (fix_charset_table): Adjust comment. (igc_on_pdump_loaded): Turn charset_table into an exact rather than an ambiguous (pinned) root. --- src/igc.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/igc.c b/src/igc.c index 7ff263255f9..70e8eac49e9 100644 --- a/src/igc.c +++ b/src/igc.c @@ -1952,7 +1952,7 @@ fix_charset_table (mps_ss_t ss, struct charset *table, size_t nbytes) IGC_FIX12_OBJ (ss, &table[i].attributes); /* FIXME/igc: fix table[i].code_space_mask, once that's no longer an interior pointer. Currently, it's protected explicitly by - pinned_objects_in_dump[1] == code_space_masks. */ + pinned_objects_in_dump[0] == code_space_masks. */ } MPS_SCAN_END (ss); return MPS_RES_OK; @@ -5153,7 +5153,7 @@ check_dump (mps_addr_t start, mps_addr_t end) return true; } -static mps_addr_t pinned_objects_in_dump[3]; +static mps_addr_t pinned_objects_in_dump[2]; /* Called from pdumper_load. [HOT_START, HOT_END) is the hot section of the dump. [COL_START, COLD_END) is the cold section of the @@ -5195,9 +5195,11 @@ igc_on_pdump_loaded (void *dump_base, void *hot_start, void *hot_end, set_header (heap_end, IGC_OBJ_PAD, relocs_size, 0); eassert (check_dump (h, cold_end)); + + igc_root_create_exact_ptr (&charset_table); + /* Pin some stuff in the dump */ mps_addr_t pinned_roots[] = { - charset_table, cold_start, /* code_space_masks */ cold_user_data_start, }; -- 2.48.1 WDYT? Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.