Package: emacs;
Reported by: Helmut Eller <eller.helmut <at> gmail.com>
Date: Tue, 3 Jun 2025 19:45:08 UTC
Severity: normal
Tags: patch
View this message in rfc822 format
From: Gerd Möllmann <gerd.moellmann <at> gmail.com> To: Pip Cet <pipcet <at> protonmail.com> Cc: 78686 <at> debbugs.gnu.org, Gerd Möllmann <gerd <at> gnu.org>, Helmut Eller <eller.helmut <at> gmail.com> Subject: bug#78686: feature/igc: [PATCH] Make charset_table an exact root Date: Thu, 05 Jun 2025 13:46:33 +0200
Pip Cet <pipcet <at> protonmail.com> writes: > "Helmut Eller" <eller.helmut <at> gmail.com> writes: > >> On Wed, Jun 04 2025, Pip Cet wrote: >> >>> As far as I can tell, there's just one significant problem: we have a >>> root covering charset_table_init even though that array is unused in >>> pdumped Emacs sessions. It's not a huge problem, just 180 zero words >>> fixed as Lisp objects, but it is unsatisfying and we should fix it. >> >> It's more like 60kb of zeroes. > > Oops, you're right. All the more reason to remove that root right away. > >>> So let's get rid of that unnecessary root. > > I still think we should do that. Do you think getting rid of the root > would actually make things worse? If you don't, I'll push that change > first. > >>> That leaves two very minor problems: >>> >>> 1. undumped Emacs sessions. I don't think these are worth optimizing >>> for. Do you? >> >> No. But there is also no need to make them gratuitously different from >> normal sessions. > > I agree, but the decision has been made on the master branch to do just > that (static storage pre-pdump; pdump storage post-pdump; xmalloc if > grown), and if we want to revisit it we need to fix it on the master > branch. > >>> 2. Emacs sessions which grow charset_table beyond the 180 entries in the >>> dump. Does this happen? In any case, we do handle the case correctly, >>> leaking memory but giving the user access to as many charsets as they >>> desire. >> >> Defining new charsets is presumably an exotic thing to do. > > Still, I see no reason it wouldn't have worked before, and it can't > possibly work after your patch. > >>> There is also the long-term goal of making it so all objects in the >>> pdump section can be relocated elsewhere, allowing us to free the >>> section. We should do so by tracing objects in the dump precisely so >>> they can be moved out. >> >> As my patch does, in contrast to your idea. > > Your patch does not make the charset table movable or protectable. > >>>> Second, I don't like the practice of using the GC header to create >>>> "shadow types". >>> >>> Noted. That's a good argument for not introducing new ones, but your >>> patch doesn't introduce or delete any; it just copies things back from >>> the dump heap (which is never freed) to an unprotected root in the BSS >>> section or to xmalloc'd memory. >> >> Exactly: my patch copies the charset table to the place where it >> belongs, the space in the BSS section reserved for it. Just like >> builtin symbols and subrs are copied to their space in the BSS section. > > I don't think those two special cases (lispsym has very good reasons for > being special; subrs aren't in the BSS section) warrant introducing a > third special case. > >>>> Maybe, depending on how this charset table business goes. >>> >>> I don't see the dependency. >> >> My motivation may depend on it. > > Let me just state that removing roots, particularly ambiguous ones, is a > very good thing, and you've identified one (the charset table one) where > it is particularly easy to do so. I just don't understand why you don't > want to remove it, but make it necessary instead. But the situation > with kbd_buffer is different: it's currently a large ambiguous root, and > reducing the amount of time we spend scanning it is a good thing, even > if it stays a root. > > And, again, the charset table situation in general should be improved, > on the master branch, and such improvements would trickle down to > feature/igc. > >>> As far as I can tell, we need this patch. It removes the extraneous >>> root. >> >> I like my patch better. > > Okay. Still, the first single-line patch would improve things, right? > If so, there's no reason not to push the patch we presumably agree > improves things (even though we both think it's not ideal) while > discussing the one we disagree about. > > Obviously, just a suggestion. I've CC'd Gerd in case he thinks I'm > being silly and we should just apply your original patch (in that case, > I'd try to verify it doesn't introduce new crashes if charset_table is > grown pre-dump), and fix the bug which I think would introduce such > crashes. > I've read Helmut's patch, and I think I understand the idea, but let me check: The idea is to get rid of the special handling of charset_table in the MPS heap after loading a dump. It is currently a pinned object by having an ambiguous root point to the charset_table object in the MPS heap. Helmut rather copies it back to where it came from, and the root that is used for pinning can go. The charset_table is instead an exact root. Hope that's more or less what happens. Makes sense to me 🤷. I find it a bit simpler, because the pinning root is gone, which I find a good thing. One thing less to understand. One pinned object less. I did not understand everything in the following mails. I'm probably missing things that came up/were discussed somewhere else. I'd rather not sprinkle comments when I don't really know what's going on.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.