Package: emacs;
Reported by: Helmut Eller <eller.helmut <at> gmail.com>
Date: Tue, 3 Jun 2025 19:45:08 UTC
Severity: normal
Tags: patch
To reply to this bug, email your comments to 78686 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Tue, 03 Jun 2025 19:45:10 GMT) Full text and rfc822 format available.Helmut Eller <eller.helmut <at> gmail.com>
:bug-gnu-emacs <at> gnu.org
.
(Tue, 03 Jun 2025 19:45:10 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Helmut Eller <eller.helmut <at> gmail.com> To: bug-gnu-emacs <at> gnu.org Subject: feature/igc: [PATCH] Make charset_table an exact root Date: Tue, 03 Jun 2025 21:43:52 +0200
[Message part 1 (text/plain, inline)]
Here is a proposal to move charset_table out of the GC heap and make it an exact root area instead.
[0001-Make-charset_table-an-exact-root.patch (text/x-diff, attachment)]
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Wed, 04 Jun 2025 13:11:02 GMT) Full text and rfc822 format available.Message #8 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Pip Cet <pipcet <at> protonmail.com> To: Helmut Eller <eller.helmut <at> gmail.com> Cc: 78686 <at> debbugs.gnu.org Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Wed, 04 Jun 2025 13:10:12 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes: > Here is a proposal to move charset_table out of the GC heap and make it > an exact root area instead. > Previously, the charset_table was located in various places: > 1. in charset_table_init during bootstrap (ambiguous root) > 2. on the GC heap (loaded from the dump; pinned) > 3. on the malloc heap when it needed to grow (ambiguous root) > > Now, the charset table is copied back to charset_table_init after > loading the dump and it is scanned as an exact root area. Thanks. Seems like the wrong approach to me: This removes the case we want, which is case 2, by restoring the case we don't want, by adding new and larger unprotected roots. Shouldn't we simply make it so if igc is in use, we only ever use case 2? This code doesn't handle charset_table growing above the initial size, then dumping. We should attempt to support unusual dumps like that. There is a warning about making charset_table_init[] large enough so it doesn't grow during bootstrapping, but I think that comment is obsolete and refers to unexec problems. It should be fine now, I think, but there really is no reason for an optimization that applies only to temacs, anyway. At the very least, this deserves an emacs_abort (not an igc_assert which might be disabled) and a comment explaining why it's not usually hit: a silent failure and crash is a bad way of indicating to a user they need to bump the initial charset_table_size. Scanning only to charset_used, rather than to the end of the array, is a questionable optimization: scanning a small number of Qnils is better than relying on what's potentially another thread making memory modifications so they become visible in the right order. charset_table should move to the cold section of the dump if it's no longer expected to be used after restoring the dump. I don't think "clone_charset_table" is a good name for the only thing this function does, which is to restore a dumped charset table, making the dumped copy unsafe to use in the process. I don't understand why it needs to assert the arena is parked. That simply makes future changes more difficult. So why not just turn IGC_OBJ_DUMPED_CHARSET_TABLE into IGC_OBJ_CHARSET_TABLE, remove some igc_asserts, and avoid roots altogether, ensuring that root_create_charset_table is called only in temacs, never in pdumped emacs? Also, I think removing pinned_objects_in_dump is a separate change. Are you looking at kbd_buffer next? That's a very large ambiguous root, but a little tricky, not least due to the GCC bug. Pip
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Wed, 04 Jun 2025 18:57:02 GMT) Full text and rfc822 format available.Message #11 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Helmut Eller <eller.helmut <at> gmail.com> To: Pip Cet <pipcet <at> protonmail.com> Cc: 78686 <at> debbugs.gnu.org Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Wed, 04 Jun 2025 20:55:56 +0200
On Wed, Jun 04 2025, Pip Cet wrote: > "Helmut Eller" <eller.helmut <at> gmail.com> writes: > >> Here is a proposal to move charset_table out of the GC heap and make it >> an exact root area instead. > >> Previously, the charset_table was located in various places: > >> 1. in charset_table_init during bootstrap (ambiguous root) >> 2. on the GC heap (loaded from the dump; pinned) >> 3. on the malloc heap when it needed to grow (ambiguous root) >> >> Now, the charset table is copied back to charset_table_init after >> loading the dump and it is scanned as an exact root area. > > Thanks. Seems like the wrong approach to me: > > This removes the case we want, which is case 2, by restoring the case we > don't want, by adding new and larger unprotected roots. Shouldn't we > simply make it so if igc is in use, we only ever use case 2? Not necessarily. My ideal solution would be to replace the charset.attributes field with something that doesn't need to be traced, e.g. an index into a Lisp_Vector of attributes. [...] > So why not just turn IGC_OBJ_DUMPED_CHARSET_TABLE into > IGC_OBJ_CHARSET_TABLE, remove some igc_asserts, and avoid roots > altogether, ensuring that root_create_charset_table is called only in > temacs, never in pdumped emacs? First, The IGC_OBJ_DUMPED_FOO types only exist because changing the GC heap format was easier than wrestling with the dumper. There was the expectation that the dumper will eventually be extended to support what MPS actually needs: sections that correspond to MPS pools. Second, I don't like the practice of using the GC header to create "shadow types". I'd prefer to make them pvec types or keep 'em out of the GC heap. [...] > Are you looking at kbd_buffer next? That's a very large ambiguous root, > but a little tricky, not least due to the GCC bug. Maybe, depending on how this charset table business goes. Helmut
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Wed, 04 Jun 2025 20:17:05 GMT) Full text and rfc822 format available.Message #14 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Pip Cet <pipcet <at> protonmail.com> To: Helmut Eller <eller.helmut <at> gmail.com> Cc: 78686 <at> debbugs.gnu.org Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Wed, 04 Jun 2025 20:15:46 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes: > On Wed, Jun 04 2025, Pip Cet wrote: > >> "Helmut Eller" <eller.helmut <at> gmail.com> writes: >> >>> Here is a proposal to move charset_table out of the GC heap and make it >>> an exact root area instead. >> >>> Previously, the charset_table was located in various places: >> >>> 1. in charset_table_init during bootstrap (ambiguous root) >>> 2. on the GC heap (loaded from the dump; pinned) >>> 3. on the malloc heap when it needed to grow (ambiguous root) >>> >>> Now, the charset table is copied back to charset_table_init after >>> loading the dump and it is scanned as an exact root area. >> >> Thanks. Seems like the wrong approach to me: >> >> This removes the case we want, which is case 2, by restoring the case we >> don't want, by adding new and larger unprotected roots. Shouldn't we >> simply make it so if igc is in use, we only ever use case 2? > > Not necessarily. My ideal solution would be to replace the > charset.attributes field with something that doesn't need to be traced, > e.g. an index into a Lisp_Vector of attributes. I agree the charset design in general could do with some attention. In particular, we're still leaking the old charset tables when growing for the second or subsequent times on the master branch, and I really think if there were a good reason to do so someone would have updated the comment by now. Plus, there is no way to undefine charsets. However, such changes should not be specific to feature/igc. What problem are we trying to solve here? As far as I can tell, there's just one significant problem: we have a root covering charset_table_init even though that array is unused in pdumped Emacs sessions. It's not a huge problem, just 180 zero words fixed as Lisp objects, but it is unsatisfying and we should fix it. So let's get rid of that unnecessary root. That leaves two very minor problems: 1. undumped Emacs sessions. I don't think these are worth optimizing for. Do you? 2. Emacs sessions which grow charset_table beyond the 180 entries in the dump. Does this happen? In any case, we do handle the case correctly, leaking memory but giving the user access to as many charsets as they desire. There is also the long-term goal of making it so all objects in the pdump section can be relocated elsewhere, allowing us to free the section. We should do so by tracing objects in the dump precisely so they can be moved out. We should not do so by explicitly memcpying objects from the dump to xmalloc'd memory. >> So why not just turn IGC_OBJ_DUMPED_CHARSET_TABLE into >> IGC_OBJ_CHARSET_TABLE, remove some igc_asserts, and avoid roots >> altogether, ensuring that root_create_charset_table is called only in >> temacs, never in pdumped emacs? > > First, The IGC_OBJ_DUMPED_FOO types only exist because changing the GC > heap format was easier than wrestling with the dumper. There was the > expectation that the dumper will eventually be extended to support what > MPS actually needs: sections that correspond to MPS pools. I'm not agreeing or disagreeing with that statement. I think the MPS tradeoff of not having snapshots and requiring client code to handle dump/restore scenarios on its own was probably a good choice, and I see no way of changing it now. > Second, I don't like the practice of using the GC header to create > "shadow types". Noted. That's a good argument for not introducing new ones, but your patch doesn't introduce or delete any; it just copies things back from the dump heap (which is never freed) to an unprotected root in the BSS section or to xmalloc'd memory. > I'd prefer to make them pvec types or keep 'em out of > the GC heap. I agree with the sentiment, but turning struct charset into a pseudovector would be a significant change which should be done on the master branch, too, IMHO, and would need to be discussed first. I assume there were good reasons for making it a flat table rather than a vector of pseudovectors (it does save one level of indirection), but I'm not sure those are still relevant. >> Are you looking at kbd_buffer next? That's a very large ambiguous root, >> but a little tricky, not least due to the GCC bug. > > Maybe, depending on how this charset table business goes. I don't see the dependency. As far as I can tell, we need this patch. It removes the extraneous root. From a13ef4ad9b97455c58682b4f616272a1c09c4b8f Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet <at> protonmail.com> Subject: [PATCH] Destroy the charset_table_init root when pdump is loaded (bug#78686) At this point, pdumper has changed charset_table to point into the dump, not to charset_table_init, which remains unused for the rest of the Emacs session. No need for a root. * src/igc.c (igc_on_pdump_loaded): Destroy unnecessary root. --- src/igc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/igc.c b/src/igc.c index f63b0fe8c88..ab61fa7d3d6 100644 --- a/src/igc.c +++ b/src/igc.c @@ -5204,6 +5204,8 @@ igc_on_pdump_loaded (void *dump_base, void *hot_start, void *hot_end, (uint8_t *) pinned_objects_in_dump + sizeof pinned_objects_in_dump, "dump-pins"); + + igc_destroy_root_with_start (charset_table_init); } void * -- 2.48.1 We should also add a comment explaining that pinned_objects_in_dump[1] is necessary to keep the code space masks around: From 48e21dc0e27e9d0a19579c5cd13cbad43533ada8 Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet <at> protonmail.com> Subject: [PATCH] ; * src/igc.c (fix_charset_table): Add comment about untraced ref. --- src/igc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/igc.c b/src/igc.c index ab61fa7d3d6..7ff263255f9 100644 --- a/src/igc.c +++ b/src/igc.c @@ -1950,6 +1950,9 @@ fix_charset_table (mps_ss_t ss, struct charset *table, size_t nbytes) { for (size_t i = 0, len = nbytes / sizeof (struct charset); i < len; i++) IGC_FIX12_OBJ (ss, &table[i].attributes); + /* FIXME/igc: fix table[i].code_space_mask, once that's no longer an + interior pointer. Currently, it's protected explicitly by + pinned_objects_in_dump[1] == code_space_masks. */ } MPS_SCAN_END (ss); return MPS_RES_OK; -- 2.48.1 Finally, once we're convinced that it's not needed for anything else, we can do this: From 7111cb4562bbd89f12d6a7bf284ec95f93ccf69d Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet <at> protonmail.com> Subject: [PATCH] Make charset_table an exact root (bug#78686) * src/igc.c (fix_charset_table): Adjust comment. (igc_on_pdump_loaded): Turn charset_table into an exact rather than an ambiguous (pinned) root. --- src/igc.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/igc.c b/src/igc.c index 7ff263255f9..70e8eac49e9 100644 --- a/src/igc.c +++ b/src/igc.c @@ -1952,7 +1952,7 @@ fix_charset_table (mps_ss_t ss, struct charset *table, size_t nbytes) IGC_FIX12_OBJ (ss, &table[i].attributes); /* FIXME/igc: fix table[i].code_space_mask, once that's no longer an interior pointer. Currently, it's protected explicitly by - pinned_objects_in_dump[1] == code_space_masks. */ + pinned_objects_in_dump[0] == code_space_masks. */ } MPS_SCAN_END (ss); return MPS_RES_OK; @@ -5153,7 +5153,7 @@ check_dump (mps_addr_t start, mps_addr_t end) return true; } -static mps_addr_t pinned_objects_in_dump[3]; +static mps_addr_t pinned_objects_in_dump[2]; /* Called from pdumper_load. [HOT_START, HOT_END) is the hot section of the dump. [COL_START, COLD_END) is the cold section of the @@ -5195,9 +5195,11 @@ igc_on_pdump_loaded (void *dump_base, void *hot_start, void *hot_end, set_header (heap_end, IGC_OBJ_PAD, relocs_size, 0); eassert (check_dump (h, cold_end)); + + igc_root_create_exact_ptr (&charset_table); + /* Pin some stuff in the dump */ mps_addr_t pinned_roots[] = { - charset_table, cold_start, /* code_space_masks */ cold_user_data_start, }; -- 2.48.1 WDYT? Pip
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Thu, 05 Jun 2025 08:10:02 GMT) Full text and rfc822 format available.Message #17 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Helmut Eller <eller.helmut <at> gmail.com> To: Pip Cet <pipcet <at> protonmail.com> Cc: 78686 <at> debbugs.gnu.org Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Thu, 05 Jun 2025 10:09:12 +0200
On Wed, Jun 04 2025, Pip Cet wrote: > As far as I can tell, there's just one significant problem: we have a > root covering charset_table_init even though that array is unused in > pdumped Emacs sessions. It's not a huge problem, just 180 zero words > fixed as Lisp objects, but it is unsatisfying and we should fix it. It's more like 60kb of zeroes. > So let's get rid of that unnecessary root. > > That leaves two very minor problems: > > 1. undumped Emacs sessions. I don't think these are worth optimizing > for. Do you? No. But there is also no need to make them gratuitously different from normal sessions. > 2. Emacs sessions which grow charset_table beyond the 180 entries in the > dump. Does this happen? In any case, we do handle the case correctly, > leaking memory but giving the user access to as many charsets as they > desire. Defining new charsets is presumably an exotic thing to do. > There is also the long-term goal of making it so all objects in the > pdump section can be relocated elsewhere, allowing us to free the > section. We should do so by tracing objects in the dump precisely so > they can be moved out. As my patch does, in contrast to your idea. [...] >> Second, I don't like the practice of using the GC header to create >> "shadow types". > > Noted. That's a good argument for not introducing new ones, but your > patch doesn't introduce or delete any; it just copies things back from > the dump heap (which is never freed) to an unprotected root in the BSS > section or to xmalloc'd memory. Exactly: my patch copies the charset table to the place where it belongs, the space in the BSS section reserved for it. Just like builtin symbols and subrs are copied to their space in the BSS section. [...] >> Maybe, depending on how this charset table business goes. > > I don't see the dependency. My motivation may depend on it. > As far as I can tell, we need this patch. It removes the extraneous > root. I like my patch better. Helmut
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Thu, 05 Jun 2025 10:51:02 GMT) Full text and rfc822 format available.Message #20 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Pip Cet <pipcet <at> protonmail.com> To: Helmut Eller <eller.helmut <at> gmail.com> Cc: 78686 <at> debbugs.gnu.org, Gerd Möllmann <gerd <at> gnu.org> Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Thu, 05 Jun 2025 10:49:50 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes: > On Wed, Jun 04 2025, Pip Cet wrote: > >> As far as I can tell, there's just one significant problem: we have a >> root covering charset_table_init even though that array is unused in >> pdumped Emacs sessions. It's not a huge problem, just 180 zero words >> fixed as Lisp objects, but it is unsatisfying and we should fix it. > > It's more like 60kb of zeroes. Oops, you're right. All the more reason to remove that root right away. >> So let's get rid of that unnecessary root. I still think we should do that. Do you think getting rid of the root would actually make things worse? If you don't, I'll push that change first. >> That leaves two very minor problems: >> >> 1. undumped Emacs sessions. I don't think these are worth optimizing >> for. Do you? > > No. But there is also no need to make them gratuitously different from > normal sessions. I agree, but the decision has been made on the master branch to do just that (static storage pre-pdump; pdump storage post-pdump; xmalloc if grown), and if we want to revisit it we need to fix it on the master branch. >> 2. Emacs sessions which grow charset_table beyond the 180 entries in the >> dump. Does this happen? In any case, we do handle the case correctly, >> leaking memory but giving the user access to as many charsets as they >> desire. > > Defining new charsets is presumably an exotic thing to do. Still, I see no reason it wouldn't have worked before, and it can't possibly work after your patch. >> There is also the long-term goal of making it so all objects in the >> pdump section can be relocated elsewhere, allowing us to free the >> section. We should do so by tracing objects in the dump precisely so >> they can be moved out. > > As my patch does, in contrast to your idea. Your patch does not make the charset table movable or protectable. >>> Second, I don't like the practice of using the GC header to create >>> "shadow types". >> >> Noted. That's a good argument for not introducing new ones, but your >> patch doesn't introduce or delete any; it just copies things back from >> the dump heap (which is never freed) to an unprotected root in the BSS >> section or to xmalloc'd memory. > > Exactly: my patch copies the charset table to the place where it > belongs, the space in the BSS section reserved for it. Just like > builtin symbols and subrs are copied to their space in the BSS section. I don't think those two special cases (lispsym has very good reasons for being special; subrs aren't in the BSS section) warrant introducing a third special case. >>> Maybe, depending on how this charset table business goes. >> >> I don't see the dependency. > > My motivation may depend on it. Let me just state that removing roots, particularly ambiguous ones, is a very good thing, and you've identified one (the charset table one) where it is particularly easy to do so. I just don't understand why you don't want to remove it, but make it necessary instead. But the situation with kbd_buffer is different: it's currently a large ambiguous root, and reducing the amount of time we spend scanning it is a good thing, even if it stays a root. And, again, the charset table situation in general should be improved, on the master branch, and such improvements would trickle down to feature/igc. >> As far as I can tell, we need this patch. It removes the extraneous >> root. > > I like my patch better. Okay. Still, the first single-line patch would improve things, right? If so, there's no reason not to push the patch we presumably agree improves things (even though we both think it's not ideal) while discussing the one we disagree about. Obviously, just a suggestion. I've CC'd Gerd in case he thinks I'm being silly and we should just apply your original patch (in that case, I'd try to verify it doesn't introduce new crashes if charset_table is grown pre-dump), and fix the bug which I think would introduce such crashes. Pip
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Thu, 05 Jun 2025 11:47:01 GMT) Full text and rfc822 format available.Message #23 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Gerd Möllmann <gerd.moellmann <at> gmail.com> To: Pip Cet <pipcet <at> protonmail.com> Cc: 78686 <at> debbugs.gnu.org, Gerd Möllmann <gerd <at> gnu.org>, Helmut Eller <eller.helmut <at> gmail.com> Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Thu, 05 Jun 2025 13:46:33 +0200
Pip Cet <pipcet <at> protonmail.com> writes: > "Helmut Eller" <eller.helmut <at> gmail.com> writes: > >> On Wed, Jun 04 2025, Pip Cet wrote: >> >>> As far as I can tell, there's just one significant problem: we have a >>> root covering charset_table_init even though that array is unused in >>> pdumped Emacs sessions. It's not a huge problem, just 180 zero words >>> fixed as Lisp objects, but it is unsatisfying and we should fix it. >> >> It's more like 60kb of zeroes. > > Oops, you're right. All the more reason to remove that root right away. > >>> So let's get rid of that unnecessary root. > > I still think we should do that. Do you think getting rid of the root > would actually make things worse? If you don't, I'll push that change > first. > >>> That leaves two very minor problems: >>> >>> 1. undumped Emacs sessions. I don't think these are worth optimizing >>> for. Do you? >> >> No. But there is also no need to make them gratuitously different from >> normal sessions. > > I agree, but the decision has been made on the master branch to do just > that (static storage pre-pdump; pdump storage post-pdump; xmalloc if > grown), and if we want to revisit it we need to fix it on the master > branch. > >>> 2. Emacs sessions which grow charset_table beyond the 180 entries in the >>> dump. Does this happen? In any case, we do handle the case correctly, >>> leaking memory but giving the user access to as many charsets as they >>> desire. >> >> Defining new charsets is presumably an exotic thing to do. > > Still, I see no reason it wouldn't have worked before, and it can't > possibly work after your patch. > >>> There is also the long-term goal of making it so all objects in the >>> pdump section can be relocated elsewhere, allowing us to free the >>> section. We should do so by tracing objects in the dump precisely so >>> they can be moved out. >> >> As my patch does, in contrast to your idea. > > Your patch does not make the charset table movable or protectable. > >>>> Second, I don't like the practice of using the GC header to create >>>> "shadow types". >>> >>> Noted. That's a good argument for not introducing new ones, but your >>> patch doesn't introduce or delete any; it just copies things back from >>> the dump heap (which is never freed) to an unprotected root in the BSS >>> section or to xmalloc'd memory. >> >> Exactly: my patch copies the charset table to the place where it >> belongs, the space in the BSS section reserved for it. Just like >> builtin symbols and subrs are copied to their space in the BSS section. > > I don't think those two special cases (lispsym has very good reasons for > being special; subrs aren't in the BSS section) warrant introducing a > third special case. > >>>> Maybe, depending on how this charset table business goes. >>> >>> I don't see the dependency. >> >> My motivation may depend on it. > > Let me just state that removing roots, particularly ambiguous ones, is a > very good thing, and you've identified one (the charset table one) where > it is particularly easy to do so. I just don't understand why you don't > want to remove it, but make it necessary instead. But the situation > with kbd_buffer is different: it's currently a large ambiguous root, and > reducing the amount of time we spend scanning it is a good thing, even > if it stays a root. > > And, again, the charset table situation in general should be improved, > on the master branch, and such improvements would trickle down to > feature/igc. > >>> As far as I can tell, we need this patch. It removes the extraneous >>> root. >> >> I like my patch better. > > Okay. Still, the first single-line patch would improve things, right? > If so, there's no reason not to push the patch we presumably agree > improves things (even though we both think it's not ideal) while > discussing the one we disagree about. > > Obviously, just a suggestion. I've CC'd Gerd in case he thinks I'm > being silly and we should just apply your original patch (in that case, > I'd try to verify it doesn't introduce new crashes if charset_table is > grown pre-dump), and fix the bug which I think would introduce such > crashes. > I've read Helmut's patch, and I think I understand the idea, but let me check: The idea is to get rid of the special handling of charset_table in the MPS heap after loading a dump. It is currently a pinned object by having an ambiguous root point to the charset_table object in the MPS heap. Helmut rather copies it back to where it came from, and the root that is used for pinning can go. The charset_table is instead an exact root. Hope that's more or less what happens. Makes sense to me 🤷. I find it a bit simpler, because the pinning root is gone, which I find a good thing. One thing less to understand. One pinned object less. I did not understand everything in the following mails. I'm probably missing things that came up/were discussed somewhere else. I'd rather not sprinkle comments when I don't really know what's going on.
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Thu, 05 Jun 2025 16:42:01 GMT) Full text and rfc822 format available.Message #26 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Helmut Eller <eller.helmut <at> gmail.com> To: Pip Cet <pipcet <at> protonmail.com> Cc: 78686 <at> debbugs.gnu.org, Gerd Möllmann <gerd <at> gnu.org> Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Thu, 05 Jun 2025 18:41:25 +0200
On Thu, Jun 05 2025, Pip Cet wrote: [...] >>> So let's get rid of that unnecessary root. > > I still think we should do that. Do you think getting rid of the root > would actually make things worse? Probably not. [...] >>> 1. undumped Emacs sessions. I don't think these are worth optimizing >>> for. Do you? >> >> No. But there is also no need to make them gratuitously different from >> normal sessions. > > I agree, but the decision has been made on the master branch to do just > that (static storage pre-pdump; pdump storage post-pdump; xmalloc if > grown), and if we want to revisit it we need to fix it on the master > branch. Sounds like the inmates are running the asylum :-) [...] >> Exactly: my patch copies the charset table to the place where it >> belongs, the space in the BSS section reserved for it. Just like >> builtin symbols and subrs are copied to their space in the BSS section. > > I don't think those two special cases (lispsym has very good reasons for > being special; subrs aren't in the BSS section) warrant introducing a > third special case. So subrs end up in the .subrs section. Sill, subrs are copied back from the dump. [...] >> I like my patch better. > > Okay. Still, the first single-line patch would improve things, right? Yes. Still not as good as my approach. Helmut
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Fri, 06 Jun 2025 07:59:02 GMT) Full text and rfc822 format available.Message #29 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Pip Cet <pipcet <at> protonmail.com> To: Helmut Eller <eller.helmut <at> gmail.com> Cc: 78686 <at> debbugs.gnu.org, Gerd Möllmann <gerd <at> gnu.org> Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Fri, 06 Jun 2025 07:58:45 +0000
"Helmut Eller" <eller.helmut <at> gmail.com> writes: > On Thu, Jun 05 2025, Pip Cet wrote: > > [...] >>>> So let's get rid of that unnecessary root. >> >> I still think we should do that. Do you think getting rid of the root >> would actually make things worse? > > Probably not. Okay. I'll push that change, then. > [...] >>>> 1. undumped Emacs sessions. I don't think these are worth optimizing >>>> for. Do you? >>> >>> No. But there is also no need to make them gratuitously different from >>> normal sessions. >> >> I agree, but the decision has been made on the master branch to do just >> that (static storage pre-pdump; pdump storage post-pdump; xmalloc if >> grown), and if we want to revisit it we need to fix it on the master >> branch. > > Sounds like the inmates are running the asylum :-) I don't think mental-health analogies are useful here, to be honest. I'd like to propose the following patch for *master*: From 7f2e728623aed6ac04bd129f769b455c27f548dd Mon Sep 17 00:00:00 2001 From: Pip Cet <pipcet <at> protonmail.com> Subject: [PATCH 1/2] Remove charset_table_init (bug#78686) In usual post-pdump Emacs sessions, charset_table is set by pdumper to point to a table of sufficient size contained in the pdump. Stop allocating ~59KB of BSS space for such sessions. * src/charset.c (Fdefine_charset_internal): Grow 'charset_table' to 180 entries right away if this code is ever reached, which it isn't in ordinary post-dump Emacs sessions. (charset_table_init): Remove variable. (syms_of_charset): Initialize 'charset_table' to NULL. --- src/charset.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-) diff --git a/src/charset.c b/src/charset.c index 3264d75f92e..df669b71e46 100644 --- a/src/charset.c +++ b/src/charset.c @@ -1132,7 +1132,7 @@ DEFUN ("define-charset-internal", Fdefine_charset_internal, int old_size = charset_table_size; ptrdiff_t new_size = old_size; struct charset *new_table = - xpalloc (0, &new_size, 1, + xpalloc (0, &new_size, (new_size == 0) ? 180 : 1, min (INT_MAX, MOST_POSITIVE_FIXNUM), sizeof *charset_table); memcpy (new_table, charset_table, old_size * sizeof *new_table); @@ -2340,18 +2340,6 @@ init_charset_once (void) PDUMPER_REMEMBER_SCALAR (charset_ksc5601); } -/* Allocate an initial charset table that is large enough to handle - Emacs while it is bootstrapping. As of September 2011, the size - needs to be at least 166; make it a bit bigger to allow for future - expansion. - - Don't make the value so small that the table is reallocated during - bootstrapping, as glibc malloc calls larger than just under 64 KiB - during an initial bootstrap wreak havoc after dumping; see the - M_MMAP_THRESHOLD value in alloc.c, plus there is an extra overhead - internal to glibc malloc and perhaps to Emacs malloc debugging. */ -static struct charset charset_table_init[180]; - void syms_of_charset (void) { @@ -2377,8 +2365,10 @@ syms_of_charset (void) staticpro (&Vcharset_hash_table); Vcharset_hash_table = CALLN (Fmake_hash_table, QCtest, Qeq); - charset_table = charset_table_init; - charset_table_size = ARRAYELTS (charset_table_init); + /* charset_table is special-cased in pdumper.c, to make it point into + the dump when it is loaded. */ + charset_table = NULL; + charset_table_size = 0; PDUMPER_REMEMBER_SCALAR (charset_table_size); charset_table_used = 0; PDUMPER_REMEMBER_SCALAR (charset_table_used); -- 2.48.1 This fixes the problem of allocating BSS space. This is the one problem that I've understood your patch to fix. I'm not saying that there aren't other problems it fixes, and I'd like to understand what I'm missing here, but I cannot think of any. >>> Exactly: my patch copies the charset table to the place where it >>> belongs, the space in the BSS section reserved for it. Just like >>> builtin symbols and subrs are copied to their space in the BSS section. >> >> I don't think those two special cases (lispsym has very good reasons for >> being special; subrs aren't in the BSS section) warrant introducing a >> third special case. > > So subrs end up in the .subrs section. Sill, subrs are copied back from > the dump. The only reason for that, that I'm aware of, is precisely that they are initialized, and creating dynamic objects from static ones was considered less efficient than using the static ones directly (this is probably not true). So that reason doesn't apply to charset_table, which is not initialized. The reason for lispsym to be static is that we need the address for Lisp_Object <-> struct Lisp_Symbol * conversion, and we also want iQxxx to be constant because it's used in initializers. Note that lispsym and subrs caused a lot of trouble, some of it very recent, because it's hard to enforce the proper alignment on some systems (DJGPP). Allegedly, it's impossible to do so at all on some other systems, but I don't believe that to be correct; still, we have the code and pay the maintenance price for it. In any case, none of these arguments apply to charset_table. I think it was static because it made unexec easier. Unexec's gone, so let's get rid of it. >>> I like my patch better. >> >> Okay. Still, the first single-line patch would improve things, right? > > Yes. Still not as good as my approach. I'm sorry, but can you try once more to point out what the advantage of your approach is? I simply do not see it, and I'd like to understand. Pip
bug-gnu-emacs <at> gnu.org
:bug#78686
; Package emacs
.
(Fri, 06 Jun 2025 16:36:02 GMT) Full text and rfc822 format available.Message #32 received at 78686 <at> debbugs.gnu.org (full text, mbox):
From: Helmut Eller <eller.helmut <at> gmail.com> To: Pip Cet <pipcet <at> protonmail.com> Cc: 78686 <at> debbugs.gnu.org, Gerd Möllmann <gerd <at> gnu.org> Subject: Re: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Fri, 06 Jun 2025 18:35:05 +0200
On Fri, Jun 06 2025, Pip Cet wrote: [...] >This fixes the problem of allocating BSS space. This is the one problem >that I've understood your patch to fix. I'm not saying that there >aren't other problems it fixes, and I'd like to understand what I'm >missing here, but I cannot think of any. There was nothing wrong with placing the charset table in the BSS section, because the table hardly ever grows. [...] >> Yes. Still not as good as my approach. > >I'm sorry, but can you try once more to point out what the advantage of >your approach is? I simply do not see it, and I'd like to understand. - fewer pinned objects - temacs and normal emacs are more similar Helmut
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.