GNU bug report logs - #75318
Lisp_Subr header and layout

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> protonmail.com>

Date: Fri, 3 Jan 2025 14:31:02 UTC

Severity: normal

To reply to this bug, email your comments to 75318 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Fri, 03 Jan 2025 14:31:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pip Cet <pipcet <at> protonmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 03 Jan 2025 14:31:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: bug-gnu-emacs <at> gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Lisp_Subr header and layout
Date: Fri, 03 Jan 2025 14:29:58 +0000
Lisp_Subr::command_modes aren't marked by GC; the vectorlike_header of
Lisp_Subr is often incorrect; Lisp_Subr fields don't follow the usual
order

This is not an acute bug, but it's problematic code which might cause a
bug if something changes.

struct Lisp_Subr contains a "Lisp_Object command_modes" field, whether
or not Emacs is built with native-comp support.  It's only marked by GC
if Emacs is built with native-comp support.  Putting arbitrary values in
there would cause crashes.

Right now, command_modes is always Qnil in non-native-comp builds, so
this isn't a problem as nil doesn't need to be marked explicitly by GC;
it does, however, waste some space (5 KB on 32-bit machines).

I think making the existence of the command_modes field depend on
nativecomp would be the best solution; if we want to use a command_modes
field for non-nativecomp builds, in the future, we have to change the GC
marking code and the pdumper code, but I really don't see the point of
allocating and GC-marking a field that we know is always nil on the off
chance it might be useful in the future.

After this change, struct Lisp_Subr contains no Lisp_Object fields, and
if we want to add one (most likely because ::doc should be a Lisp_Object
rather than an EMACS_INT) again, we should consider making it an
ordinary PVEC, with the Lisp_Object fields first and the POD C data at
the end.  This would eventually allow us to remove the special-casing of
subrs in the GC code, but this would also require us to use the expected
pvec header word value for built-in subrs; right now, we don't.

I'll prepare patches for those two changes (remove command_modes;
correct pvec headers for subrs); not the third (reorder Lisp_Subr
fields) because it's a nativecomp change, and there may be a small
performance impact.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Tue, 21 Jan 2025 13:06:01 GMT) Full text and rfc822 format available.

Message #8 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: 75318 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Tue, 21 Jan 2025 13:04:48 +0000
"Pip Cet via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:

> I'll prepare patches for those two changes (remove command_modes;
> correct pvec headers for subrs); not the third (reorder Lisp_Subr
> fields) because it's a nativecomp change, and there may be a small
> performance impact.

The first patch is a simple bug fix. Is this okay?

PVECHEADERSIZE uses "lispsize" and "restsize", but allocate_pseudovector
uses "memlen" and "lisplen".  That's confusing.  Someone was confused,
the header value was wrong but it never caused any trouble.  Still, it's
good to fix this, and it caused trouble for feature/igc.

Ideally, we would fix the confusing situation.  My suggestion is to
always pass all three sizes.  Please let me know if that's desirable.

From 08fd00deb5d4d7ea73d2959577d2b4ab9855f1c2 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH 1/2] Correct PVEC header for main_thread (bug#75318)

* src/thread.c (main_thread): Pass correct restsize to PVECHEADERSIZE.
---
 src/thread.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/thread.c b/src/thread.c
index 5610f8be0dd..5539fae15ab 100644
--- a/src/thread.c
+++ b/src/thread.c
@@ -50,7 +50,9 @@ static_assert (GCALIGNED (union aligned_thread_state));
       .header.size = PVECHEADERSIZE (PVEC_THREAD,
 				     PSEUDOVECSIZE (struct thread_state,
 						    event_object),
-				     VECSIZE (struct thread_state)),
+				     VECSIZE (struct thread_state) -
+				     PSEUDOVECSIZE (struct thread_state,
+						    event_object)),
       .m_last_thing_searched = LISPSYM_INITIALLY (Qnil),
       .m_saved_last_thing_searched = LISPSYM_INITIALLY (Qnil),
       .name = LISPSYM_INITIALLY (Qnil),
-- 
2.47.1






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Tue, 21 Jan 2025 23:12:02 GMT) Full text and rfc822 format available.

Message #11 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: 75318 <at> debbugs.gnu.org
Cc: Po Lu <luangruo <at> yahoo.com>, monnier <at> iro.umontreal.ca
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Tue, 21 Jan 2025 23:11:13 +0000
"Pip Cet via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:

> I'll prepare patches for those two changes (remove command_modes;
> correct pvec headers for subrs); not the third (reorder Lisp_Subr
> fields) because it's a nativecomp change, and there may be a small
> performance impact.

The second patch fixes Lisp_Subr initialization: the old code set the
header to an incorrect value first (PVEC_SUBR was in the type field, but
the PSEUDOVECTOR_FLAG was not set, making it look like a huge ordinary
vector).  Then defsubr wrote to the Lisp_Subr structure to fix that, by
calling XSETPVECTYPE: this didn't affect the type field, but did set the
PSEUDOVECTOR_FLAG.  The size fields always indicated a size of zero.

But three static Lisp_Subr structures weren't defined via defsubr, and
did set the PSEUDOVECTOR_FLAG initially, but otherwise contained
verbatim copies of the DEFUN macro's initialization code.

Ideally, we'd turn these three into plain DEFUNs, with docstrings
indicating they're internal functions.  I'd also like to turn the
underscores into ordinary dashes in the Lisp function names if we do
that.  If we don't, we might at least want to move them to the .subrs
section, or move them out of function scope (function-scope static
variables aren't easy to debug with gdb).

This patch splits up the macros so they're reusable, initializes using
LISPSYM_INITIALLY rather than lisp_h_Qnil, ensures that all Lisp_Object
fields are initialized, uses explicit field-name initializers, and uses
c-mode default indentation (this is incorrect in one place, but it's
hardly noticeable).

This change alone is not sufficient to make the .subrs section read-only.
See store_function_docstring and pdumper.

Like the first one, this second patch fixes things that appear to be
unintended, but that never caused any bugs on the master branch.  The
oddities did cause problems when playing with feature/igc and other GC
stuff, though.

From a67f72e44f5cc9b4af48f0ac3503d3286b0c91a0 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH 2/2] Fix Lisp_Subr initialization peculiarities (bug#75318)

* src/lisp.h (SUBR_HEADER, SUBR_INITIALIZER)
(SUBR_NATIVECOMP_INITIALIZER): New macros.
(DEFUN): Use them.  Initialize the header directly to a correct value.
* src/alloc.c (Swatch_gc_cons_threshold, Swatch_gc_cons_percentage):
* src/sfntfont.c (Scompare_font_entities): Use SUBR_INITIALIZER.
* src/lread.c (defsubr): Don't call XSETPVECTYPE to initialize
pseudovector flag.
---
 src/alloc.c    | 10 ++++------
 src/lisp.h     | 29 +++++++++++++++++++++++++----
 src/lread.c    |  1 -
 src/sfntfont.c | 10 +++-------
 4 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/src/alloc.c b/src/alloc.c
index c4e2ff52015..3425f913362 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -8208,16 +8208,14 @@ syms_of_alloc (void)
   Lisp_Object watcher;
 
   static union Aligned_Lisp_Subr Swatch_gc_cons_threshold =
-     {{{ PSEUDOVECTOR_FLAG | (PVEC_SUBR << PSEUDOVECTOR_AREA_BITS) },
-       { .a4 = watch_gc_cons_threshold },
-       4, 4, "watch_gc_cons_threshold", {0}, lisp_h_Qnil}};
+    SUBR_INITIALIZER ("watch_gc_cons_threshold", watch_gc_cons_threshold,
+		      4, 4, NULL);
   XSETSUBR (watcher, &Swatch_gc_cons_threshold.s);
   Fadd_variable_watcher (Qgc_cons_threshold, watcher);
 
   static union Aligned_Lisp_Subr Swatch_gc_cons_percentage =
-     {{{ PSEUDOVECTOR_FLAG | (PVEC_SUBR << PSEUDOVECTOR_AREA_BITS) },
-       { .a4 = watch_gc_cons_percentage },
-       4, 4, "watch_gc_cons_percentage", {0}, lisp_h_Qnil}};
+    SUBR_INITIALIZER ("watch_gc_cons_percentage", watch_gc_cons_percentage,
+		      4, 4, NULL);
   XSETSUBR (watcher, &Swatch_gc_cons_percentage.s);
   Fadd_variable_watcher (Qgc_cons_percentage, watcher);
   DEFSYM (Qalloc, "alloc");
diff --git a/src/lisp.h b/src/lisp.h
index 8b870119315..e9c975b81e8 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -3481,15 +3481,36 @@ CHECK_SUBR (Lisp_Object x)
     A null string means call interactively with no arguments.
  `doc' is documentation for the user.  */
 
+#define SUBR_HEADER { .size =						\
+      PVECHEADERSIZE (PVEC_SUBR, 0, VECSIZE (union Aligned_Lisp_Subr)) }
+
+#ifdef HAVE_NATIVE_COMP
+#define SUBR_NATIVECOMP_INITIALIZER			\
+    .native_comp_u = LISPSYM_INITIALLY (Qnil),		\
+    .lambda_list = LISPSYM_INITIALLY (Qnil),		\
+    .type = LISPSYM_INITIALLY (Qnil),
+#else
+#define SUBR_NATIVECOMP_INITIALIZER
+#endif
+
+#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
+  {{ .header = SUBR_HEADER,						\
+    .function = { .a ## maxargs = fnname },				\
+    .min_args = minargs,						\
+    .max_args = maxargs,						\
+    .symbol_name = lname,						\
+    .intspec = { .string = intspec_value },				\
+    .command_modes = LISPSYM_INITIALLY (Qnil),				\
+    SUBR_NATIVECOMP_INITIALIZER						\
+    }}
+
 /* This version of DEFUN declares a function prototype with the right
    arguments, so we can catch errors with maxargs at compile-time.  */
 #define DEFUN(lname, fnname, sname, minargs, maxargs, intspec, doc) \
   SUBR_SECTION_ATTRIBUTE                                            \
   static union Aligned_Lisp_Subr sname =                            \
-     {{{ PVEC_SUBR << PSEUDOVECTOR_AREA_BITS },			    \
-       { .a ## maxargs = fnname },				    \
-       minargs, maxargs, lname, {intspec}, lisp_h_Qnil}};	    \
-   Lisp_Object fnname
+    SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec); \
+    Lisp_Object fnname
 
 /* defsubr (Sname);
    is how we define the symbol for function `name' at start-up time.  */
diff --git a/src/lread.c b/src/lread.c
index e5d1ba0e6bb..279d384eed6 100644
--- a/src/lread.c
+++ b/src/lread.c
@@ -5475,7 +5475,6 @@ defsubr (union Aligned_Lisp_Subr *aname)
   struct Lisp_Subr *sname = &aname->s;
   Lisp_Object sym, tem;
   sym = intern_c_string (sname->symbol_name);
-  XSETPVECTYPE (sname, PVEC_SUBR);
   XSETSUBR (tem, sname);
   set_symbol_function (sym, tem);
 #ifdef HAVE_NATIVE_COMP
diff --git a/src/sfntfont.c b/src/sfntfont.c
index 5752ff81c1c..b7a8b7c2671 100644
--- a/src/sfntfont.c
+++ b/src/sfntfont.c
@@ -1985,13 +1985,9 @@ sfntfont_compare_font_entities (Lisp_Object a, Lisp_Object b)
    fields are set within the first than in the second.  */
 
 static union Aligned_Lisp_Subr Scompare_font_entities =
-  {
-    {
-      { PSEUDOVECTOR_FLAG | (PVEC_SUBR << PSEUDOVECTOR_AREA_BITS), },
-      { .a2 = sfntfont_compare_font_entities, },
-      2, 2, "sfntfont_compare_font_entities", {0}, lisp_h_Qnil,
-    },
-  };
+  SUBR_INITIALIZER ("sfntfont_compare_font_entities",
+		    sfntfont_compare_font_entities,
+		    2, 2, NULL);
 
 /* Return a list of font-entities matching the specified
    FONT_SPEC.  */
-- 
2.47.1





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Tue, 21 Jan 2025 23:48:02 GMT) Full text and rfc822 format available.

Message #14 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: 75318 <at> debbugs.gnu.org
Cc: monnier <at> iro.umontreal.ca
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Tue, 21 Jan 2025 23:46:45 +0000
Pip Cet <pipcet <at> protonmail.com> writes:

> "Pip Cet via \"Bug reports for GNU Emacs, the Swiss army knife of text editors\"" <bug-gnu-emacs <at> gnu.org> writes:
>
>> I'll prepare patches for those two changes (remove command_modes;
>> correct pvec headers for subrs); not the third (reorder Lisp_Subr
>> fields) because it's a nativecomp change, and there may be a small
>> performance impact.
>
> The second patch fixes Lisp_Subr initialization: the old code set the

This third, small patch moves the command_modes field into the
HAVE_NATIVE_COMP #ifdef.  It adjusts Fcommand_modes to return Qnil for
builtin subrs in the !HAVE_NATIVE_COMP case, and adjusts pdumper, fixing
a latent bug: it used dump_field_emacs_ptr instead of dump_field_lv.  As
that might have been tricky to debug, I briefly confirmed that no
similar bugs remain by compiling with this and --enable-check-lisp-object-type:

#define CHECK_POINTER(p) ((0) ? (void)(*(p)) : (void) (0))
#define dump_field_emacs_ptr(ctx, out, in_start, in_field) (CHECK_POINTER(*(in_field)), dump_field_emacs_ptr(ctx, out, in_start, in_field))

(similarly for other functions using a void * which itself must point to
a pointer, not a Lisp_Object).

This change isn't about saving memory; on my system, it doesn't, since
GCC aligns all Lisp Subrs to 32-byte boundaries on my system.

From 7b1384a7e71f783f3946e7aaad092e8095386387 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH] Remove unused command_modes from Lisp_Subr structure
 (bug#75318)

This field isn't marked by the garbage collector in !HAVE_NATIVE_COMP
builds, so using it in those would have led to very tricky bugs.

* src/data.c (Fcommand_modes) [!HAVE_NATIVE_COMP]: Return Qnil for
built-in subrs.
* src/lisp.h (struct Lisp_Subr) [!HAVE_NATIVE_COMP]:
(SUBR_NATIVECOMP_INITIALIZER):
(SUBR_INITIALIZER): Remove command_modes field.
* src/pdumper.c (dump_subr): Don't dump command_modes field unless
necessary.  When it is necessary, use dump_field_lv, not
dump_field_emacs_ptr.
---
 src/data.c    | 2 ++
 src/lisp.h    | 4 ++--
 src/pdumper.c | 3 +--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/data.c b/src/data.c
index dcaa5756ebe..bb7238065f5 100644
--- a/src/data.c
+++ b/src/data.c
@@ -1234,7 +1234,9 @@ DEFUN ("command-modes", Fcommand_modes, Scommand_modes, 1, 1, 0,
 
   if (SUBRP (fun))
     {
+#ifdef HAVE_NATIVE_COMP
       return XSUBR (fun)->command_modes;
+#endif
     }
   else if (CLOSUREP (fun))
     {
diff --git a/src/lisp.h b/src/lisp.h
index 9c31630cc1a..46e940dad09 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -2226,12 +2226,12 @@ CHAR_TABLE_SET (Lisp_Object ct, int idx, Lisp_Object val)
       const char *string;
       Lisp_Object native;
     } intspec;
-    Lisp_Object command_modes;
     /* Positive values: offset into etc/DOC.  Negative values: one's
        complement of index into the native comp unit's vector of
        documentation strings.  */
     EMACS_INT doc;
 #ifdef HAVE_NATIVE_COMP
+    Lisp_Object command_modes;
     Lisp_Object native_comp_u;
     char *native_c_name;
     Lisp_Object lambda_list;
@@ -3486,6 +3486,7 @@ #define SUBR_HEADER { .size =						\
 
 #ifdef HAVE_NATIVE_COMP
 #define SUBR_NATIVECOMP_INITIALIZER			\
+    .command_modes = LISPSYM_INITIALLY (Qnil),		\
     .native_comp_u = LISPSYM_INITIALLY (Qnil),		\
     .lambda_list = LISPSYM_INITIALLY (Qnil),		\
     .type = LISPSYM_INITIALLY (Qnil),
@@ -3500,7 +3501,6 @@ #define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
     .max_args = maxargs,						\
     .symbol_name = lname,						\
     .intspec = { .string = intspec_value },				\
-    .command_modes = LISPSYM_INITIALLY (Qnil),				\
     SUBR_NATIVECOMP_INITIALIZER						\
     }}
 
diff --git a/src/pdumper.c b/src/pdumper.c
index e83c7bcf9a1..64afbdd3065 100644
--- a/src/pdumper.c
+++ b/src/pdumper.c
@@ -2992,16 +2992,15 @@ dump_subr (struct dump_context *ctx, const struct Lisp_Subr *subr)
                              COLD_OP_NATIVE_SUBR,
 			     make_lisp_ptr ((void *) subr, Lisp_Vectorlike));
       dump_field_lv (ctx, &out, subr, &subr->intspec.native, WEIGHT_NORMAL);
-      dump_field_lv (ctx, &out, subr, &subr->command_modes, WEIGHT_NORMAL);
     }
   else
     {
       dump_field_emacs_ptr (ctx, &out, subr, &subr->symbol_name);
       dump_field_emacs_ptr (ctx, &out, subr, &subr->intspec.string);
-      dump_field_emacs_ptr (ctx, &out, subr, &subr->command_modes);
     }
   DUMP_FIELD_COPY (&out, subr, doc);
 #ifdef HAVE_NATIVE_COMP
+  dump_field_lv (ctx, &out, subr, &subr->command_modes, WEIGHT_NORMAL);
   dump_field_lv (ctx, &out, subr, &subr->native_comp_u, WEIGHT_NORMAL);
   if (!NILP (subr->native_comp_u))
     dump_field_fixup_later (ctx, &out, subr, &subr->native_c_name);
-- 
2.47.1





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 12:54:01 GMT) Full text and rfc822 format available.

Message #17 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: Po Lu <luangruo <at> yahoo.com>, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 07:53:31 -0500
> Ideally, we'd turn these three into plain DEFUNs, with docstrings
> indicating they're internal functions.  I'd also like to turn the
> underscores into ordinary dashes in the Lisp function names if we
> do that.

+1
Does anyone know why they're not plain DEFUNs?


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 13:19:01 GMT) Full text and rfc822 format available.

Message #20 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 15:17:54 +0200
> Cc: Po Lu <luangruo <at> yahoo.com>, monnier <at> iro.umontreal.ca
> Date: Tue, 21 Jan 2025 23:11:13 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> This patch splits up the macros so they're reusable, initializes using
> LISPSYM_INITIALLY rather than lisp_h_Qnil, ensures that all Lisp_Object
> fields are initialized, uses explicit field-name initializers, and uses
> c-mode default indentation (this is incorrect in one place, but it's
> hardly noticeable).

While I realize that this is a battle lost long ago, I cannot resist
lamenting the once-simple way many lisp.h macros and inline functions
were defined, which is all but gone now, replaced by the never-ending
rabbit hole of macros that call macros that call macros that call...
Add to this the fact that each macro/inline function has several
different definitions, whose conditions are sometimes so complicated
that the only reasonable way to quickly understand what definition a
particular build uses is to fire up good-old GDB and evaluate the
macros there, and you get the picture.

For example, in the below:

>    static union Aligned_Lisp_Subr Swatch_gc_cons_percentage =
> -     {{{ PSEUDOVECTOR_FLAG | (PVEC_SUBR << PSEUDOVECTOR_AREA_BITS) },
> -       { .a4 = watch_gc_cons_percentage },
> -       4, 4, "watch_gc_cons_percentage", {0}, lisp_h_Qnil}};

what is lisp_h_Qnil is relatively easy to understand, whereas the
replacement, viz.:

> +#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
> +  {{ .header = SUBR_HEADER,						\
> +    .function = { .a ## maxargs = fnname },				\
> +    .min_args = minargs,						\
> +    .max_args = maxargs,						\
> +    .symbol_name = lname,						\
> +    .intspec = { .string = intspec_value },				\
> +    .command_modes = LISPSYM_INITIALLY (Qnil),			\

requires one to look up LISPSYM_INITIALLY, which is

  #define LISPSYM_INITIALLY(name) \
    TAG_PTR_INITIALLY (Lisp_Symbol, (intptr_t) ((i##name) * sizeof *lispsym))

and then you need to go down the rabbit hole:

  TAG_PTR_INITIALLY
   => LISP_INITIALLY
       => LISP_WORD_TAG
           => VALBITS

Wonderful, right?

(No, I don't expect you to change anything in your patch, just realize
the plight.)

> +#ifdef HAVE_NATIVE_COMP
> +#define SUBR_NATIVECOMP_INITIALIZER			\
> +    .native_comp_u = LISPSYM_INITIALLY (Qnil),		\
> +    .lambda_list = LISPSYM_INITIALLY (Qnil),		\
> +    .type = LISPSYM_INITIALLY (Qnil),
> +#else
> +#define SUBR_NATIVECOMP_INITIALIZER
> +#endif
> +
> +#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
> +  {{ .header = SUBR_HEADER,						\
> +    .function = { .a ## maxargs = fnname },				\
> +    .min_args = minargs,						\
> +    .max_args = maxargs,						\
> +    .symbol_name = lname,						\
> +    .intspec = { .string = intspec_value },				\
> +    .command_modes = LISPSYM_INITIALLY (Qnil),				\
> +    SUBR_NATIVECOMP_INITIALIZER						\
> +    }}
> +

To make the depth of the proverbial rabbit hole at least to some
extent smaller here, could we please refrain from defining
SUBR_NATIVECOMP_INITIALIZER, and instead include the necessary stuff
directly in SUBR_INITIALIZER, with an #ifdef?  Or do you envision
enough other users of SUBR_NATIVECOMP_INITIALIZER to justify yet
another macro?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 13:32:02 GMT) Full text and rfc822 format available.

Message #23 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>,
 Andrea Corallo <acorallo <at> gnu.org>
Cc: monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 15:31:00 +0200
> Cc: monnier <at> iro.umontreal.ca
> Date: Tue, 21 Jan 2025 23:46:45 +0000
> From:  Pip Cet via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> diff --git a/src/lisp.h b/src/lisp.h
> index 9c31630cc1a..46e940dad09 100644
> --- a/src/lisp.h
> +++ b/src/lisp.h
> @@ -2226,12 +2226,12 @@ CHAR_TABLE_SET (Lisp_Object ct, int idx, Lisp_Object val)
>        const char *string;
>        Lisp_Object native;
>      } intspec;
> -    Lisp_Object command_modes;
>      /* Positive values: offset into etc/DOC.  Negative values: one's
>         complement of index into the native comp unit's vector of
>         documentation strings.  */
>      EMACS_INT doc;
>  #ifdef HAVE_NATIVE_COMP
> +    Lisp_Object command_modes;
>      Lisp_Object native_comp_u;
>      char *native_c_name;
>      Lisp_Object lambda_list;

IME, having different definitions of data structures under different
build conditions is a problematic solution: it makes it harder to make
safe changes that don't break some build.  Many/most people don't try
to build all of the configurations Emacs supports when they make
changes, so it is easy to write code which references a struct member
that doesn't exist in some build configuration, and thus break it, and
miss the fact that build is broken by not trying it.  (It is
unrealistic to expect everyone who has write access to the repository
to test-build all the possible configurations Emacs supports as a
prerequisite for installing a change.)

So: do we really need to remove command_modes from the struct in a
build without native compilation?  What bad things will happen if we
leave it there?  If we would like to be a bit more efficient by
avoiding a few function calls, like here:

> --- a/src/pdumper.c
> +++ b/src/pdumper.c
> @@ -2992,16 +2992,15 @@ dump_subr (struct dump_context *ctx, const struct Lisp_Subr *subr)
>                               COLD_OP_NATIVE_SUBR,
>  			     make_lisp_ptr ((void *) subr, Lisp_Vectorlike));
>        dump_field_lv (ctx, &out, subr, &subr->intspec.native, WEIGHT_NORMAL);
> -      dump_field_lv (ctx, &out, subr, &subr->command_modes, WEIGHT_NORMAL);
>      }
>    else
>      {
>        dump_field_emacs_ptr (ctx, &out, subr, &subr->symbol_name);
>        dump_field_emacs_ptr (ctx, &out, subr, &subr->intspec.string);
> -      dump_field_emacs_ptr (ctx, &out, subr, &subr->command_modes);
>      }

then it should be okay to #ifdef away those calls when
!HAVE_NATIVE_COMP, as that doesn't run the risks mentioned above.

I also would like to hear from Andrea about this patch.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 14:26:02 GMT) Full text and rfc822 format available.

Message #26 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 14:25:22 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Cc: Po Lu <luangruo <at> yahoo.com>, monnier <at> iro.umontreal.ca
>> Date: Tue, 21 Jan 2025 23:11:13 +0000
>> From:  Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> This patch splits up the macros so they're reusable, initializes using
>> LISPSYM_INITIALLY rather than lisp_h_Qnil, ensures that all Lisp_Object
>> fields are initialized, uses explicit field-name initializers, and uses
>> c-mode default indentation (this is incorrect in one place, but it's
>> hardly noticeable).
>
> While I realize that this is a battle lost long ago, I cannot resist
> lamenting the once-simple way many lisp.h macros and inline functions
> were defined, which is all but gone now, replaced by the never-ending
> rabbit hole of macros that call macros that call macros that call...
> Add to this the fact that each macro/inline function has several
> different definitions, whose conditions are sometimes so complicated
> that the only reasonable way to quickly understand what definition a
> particular build uses is to fire up good-old GDB and evaluate the
> macros there, and you get the picture.

I do.  I agree completely.  I don't know how to fix it.

XIL is necessary for CHECK_LISP_OBJECT_TYPE (and there seems to be a
consensus we can't unconditionally enable that one yet), but I don't see
why we can't make an exception for Qnil and make the simple definition
of lisp_h_Qnil apply to this specific symbol in all circumstances:

#define Qnil XIL(0)

If we used that instead, it would make LISPSYM_INITIALLY slightly harder
to find because fewer places would use it.

> For example, in the below:
>
>>    static union Aligned_Lisp_Subr Swatch_gc_cons_percentage =
>> -     {{{ PSEUDOVECTOR_FLAG | (PVEC_SUBR << PSEUDOVECTOR_AREA_BITS) },
>> -       { .a4 = watch_gc_cons_percentage },
>> -       4, 4, "watch_gc_cons_percentage", {0}, lisp_h_Qnil}};
>
> what is lisp_h_Qnil is relatively easy to understand, whereas the
> replacement, viz.:

>> +#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
>> +  {{ .header = SUBR_HEADER,						\
>> +    .function = { .a ## maxargs = fnname },				\
>> +    .min_args = minargs,						\
>> +    .max_args = maxargs,						\
>> +    .symbol_name = lname,						\
>> +    .intspec = { .string = intspec_value },				\
>> +    .command_modes = LISPSYM_INITIALLY (Qnil),			\
>
> requires one to look up LISPSYM_INITIALLY, which is

If you're saying that LISPSYM_INITIALLY is bad, I agree completely.

If you're saying a single LISPSYM_INITIALLY in DEFUN, which is
well-documented to the point that users do not have to read the
definition, is worse than three of them in C files, I don't.

>   #define LISPSYM_INITIALLY(name) \
>     TAG_PTR_INITIALLY (Lisp_Symbol, (intptr_t) ((i##name) * sizeof *lispsym))
>
> and then you need to go down the rabbit hole:
>
>   TAG_PTR_INITIALLY
>    => LISP_INITIALLY
>        => LISP_WORD_TAG
>            => VALBITS
>
> Wonderful, right?

I had forgotten how hard it was to get used to this.

I think we should definitely avoid making the situation worse, even if
that means we can't do fun stuff.

> (No, I don't expect you to change anything in your patch, just realize
> the plight.)

Thank you, in any case.

>> +#ifdef HAVE_NATIVE_COMP
>> +#define SUBR_NATIVECOMP_INITIALIZER			\
>> +    .native_comp_u = LISPSYM_INITIALLY (Qnil),		\
>> +    .lambda_list = LISPSYM_INITIALLY (Qnil),		\
>> +    .type = LISPSYM_INITIALLY (Qnil),
>> +#else
>> +#define SUBR_NATIVECOMP_INITIALIZER
>> +#endif
>> +
>> +#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
>> +  {{ .header = SUBR_HEADER,						\
>> +    .function = { .a ## maxargs = fnname },				\
>> +    .min_args = minargs,						\
>> +    .max_args = maxargs,						\
>> +    .symbol_name = lname,						\
>> +    .intspec = { .string = intspec_value },				\
>> +    .command_modes = LISPSYM_INITIALLY (Qnil),				\
>> +    SUBR_NATIVECOMP_INITIALIZER						\
>> +    }}
>> +
>
> To make the depth of the proverbial rabbit hole at least to some
> extent smaller here, could we please refrain from defining
> SUBR_NATIVECOMP_INITIALIZER, and instead include the necessary stuff
> directly in SUBR_INITIALIZER, with an #ifdef?  Or do you envision
> enough other users of SUBR_NATIVECOMP_INITIALIZER to justify yet
> another macro?

Do you mean

#ifdef HAVE_NATIVE_COMP
#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
  {{ .header = SUBR_HEADER,						\
    .function = { .a ## maxargs = fnname },				\
    .min_args = minargs,						\
    .max_args = maxargs,						\
    .symbol_name = lname,						\
    .intspec = { .string = intspec_value },				\
    .command_modes = LISPSYM_INITIALLY (Qnil),				\
    .native_comp_u = LISPSYM_INITIALLY (Qnil),		\
    .lambda_list = LISPSYM_INITIALLY (Qnil),		\
    .type = LISPSYM_INITIALLY (Qnil),
    }}
#else
#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
  {{ .header = SUBR_HEADER,						\
    .function = { .a ## maxargs = fnname },				\
    .min_args = minargs,						\
    .max_args = maxargs,						\
    .symbol_name = lname,						\
    .intspec = { .string = intspec_value },				\
    }}
#endif

(and, I assume, the same thing for SUBR_HEADER?)

Avoiding the repetition appears to require a second macro no matter what
I do.

While *I* realize *that* battle was lost long ago, I still think
nativecomp subrs should have had their own PVEC type.

It's too late for that now; let's not revisit a discussion that I
remember as very fair, and which definitely concluded that we'd share
the PVEC type.

I'm thinking (very vaguely; no patch expected in 2025) about proposing
to reduce all Lisp_Objects in the subr object to a simple plist, though,
or a vector if we have to.  I think 32 bytes is a very good size for a
subr to have, and that's header, plist (Lisp_Object so first),
minargs+maxargs+4-byte gap, function pointer union.

It's easier to GC-trace an ordinary PVEC with lisplen=1 and restlen=2
without worrying about whether it is a subr, I tihnk.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 15:29:02 GMT) Full text and rfc822 format available.

Message #29 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 17:27:57 +0200
> Date: Wed, 22 Jan 2025 14:25:22 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: 75318 <at> debbugs.gnu.org, luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca
> 
> > For example, in the below:
> >
> >>    static union Aligned_Lisp_Subr Swatch_gc_cons_percentage =
> >> -     {{{ PSEUDOVECTOR_FLAG | (PVEC_SUBR << PSEUDOVECTOR_AREA_BITS) },
> >> -       { .a4 = watch_gc_cons_percentage },
> >> -       4, 4, "watch_gc_cons_percentage", {0}, lisp_h_Qnil}};
> >
> > what is lisp_h_Qnil is relatively easy to understand, whereas the
> > replacement, viz.:
> 
> >> +#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
> >> +  {{ .header = SUBR_HEADER,						\
> >> +    .function = { .a ## maxargs = fnname },				\
> >> +    .min_args = minargs,						\
> >> +    .max_args = maxargs,						\
> >> +    .symbol_name = lname,						\
> >> +    .intspec = { .string = intspec_value },				\
> >> +    .command_modes = LISPSYM_INITIALLY (Qnil),			\
> >
> > requires one to look up LISPSYM_INITIALLY, which is
> 
> If you're saying that LISPSYM_INITIALLY is bad, I agree completely.

It isn't bad, it's that replacing lisp_h_Qnil with it makes the code
harder to understand (without going crazy with M-.).

As for LISPSYM_INITIALLY, its name is the main source of confusion:
the INITIALLY part all but prevents any attempt to get a clear idea
what the macro does, without actually looking up all the way to the
basics.  Unless on remembers that by heart, of course (I don't).

> > To make the depth of the proverbial rabbit hole at least to some
> > extent smaller here, could we please refrain from defining
> > SUBR_NATIVECOMP_INITIALIZER, and instead include the necessary stuff
> > directly in SUBR_INITIALIZER, with an #ifdef?  Or do you envision
> > enough other users of SUBR_NATIVECOMP_INITIALIZER to justify yet
> > another macro?
> 
> Do you mean
> 
> #ifdef HAVE_NATIVE_COMP
> #define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
>   {{ .header = SUBR_HEADER,						\
>     .function = { .a ## maxargs = fnname },				\
>     .min_args = minargs,						\
>     .max_args = maxargs,						\
>     .symbol_name = lname,						\
>     .intspec = { .string = intspec_value },				\
>     .command_modes = LISPSYM_INITIALLY (Qnil),				\
>     .native_comp_u = LISPSYM_INITIALLY (Qnil),		\
>     .lambda_list = LISPSYM_INITIALLY (Qnil),		\
>     .type = LISPSYM_INITIALLY (Qnil),
>     }}
> #else
> #define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
>   {{ .header = SUBR_HEADER,						\
>     .function = { .a ## maxargs = fnname },				\
>     .min_args = minargs,						\
>     .max_args = maxargs,						\
>     .symbol_name = lname,						\
>     .intspec = { .string = intspec_value },				\
>     }}
> #endif
> 
> (and, I assume, the same thing for SUBR_HEADER?)

No.  I meant to use #ifdef inside the macro.  But if that is
impossible in a C macro, I'd even support making this an inline
function.

Anyway, this is a minor comment, so if it's hard to do better, let's
not go overboard.  I'll survive.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 15:31:02 GMT) Full text and rfc822 format available.

Message #32 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Andrea Corallo <acorallo <at> gnu.org>, monnier <at> iro.umontreal.ca,
 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 15:30:24 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Cc: monnier <at> iro.umontreal.ca
>> Date: Tue, 21 Jan 2025 23:46:45 +0000
>> From:  Pip Cet via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>> diff --git a/src/lisp.h b/src/lisp.h
>> index 9c31630cc1a..46e940dad09 100644
>> --- a/src/lisp.h
>> +++ b/src/lisp.h
>> @@ -2226,12 +2226,12 @@ CHAR_TABLE_SET (Lisp_Object ct, int idx, Lisp_Object val)
>>        const char *string;
>>        Lisp_Object native;
>>      } intspec;
>> -    Lisp_Object command_modes;
>>      /* Positive values: offset into etc/DOC.  Negative values: one's
>>         complement of index into the native comp unit's vector of
>>         documentation strings.  */
>>      EMACS_INT doc;
>>  #ifdef HAVE_NATIVE_COMP
>> +    Lisp_Object command_modes;
>>      Lisp_Object native_comp_u;
>>      char *native_c_name;
>>      Lisp_Object lambda_list;
>
> IME, having different definitions of data structures under different
> build conditions is a problematic solution: it makes it harder to make

That statement applies to the status quo before the patch just as much
as it does to the new status, though, right?  Lisp_Subr had three
conditional fields before, plus one that's unconditionally compiled in
but only usable if one of the other three is non-nil.  Now it has four
conditional fields, which are unusable if they don't exist because the
compiler won't let you.

> safe changes that don't break some build.  Many/most people don't try

This doesn't; it's impossible ta safely use the command_modes field for
the reasons I explained.

And making it safe without using it at all seems both wasteful and all
but impossible: we'd have to test the code somehow, making up usages of
a field that hasn't been in real use for a long time.

> to build all of the configurations Emacs supports when they make
> changes, so it is easy to write code which references a struct member
> that doesn't exist in some build configuration, and thus break it, and
> miss the fact that build is broken by not trying it.  (It is
> unrealistic to expect everyone who has write access to the repository
> to test-build all the possible configurations Emacs supports as a
> prerequisite for installing a change.)

I agree, but I don't see how that applies here.  Before the patch, if
someone had started using nativecomp-only command_modes, we would have
had very tricky GC/pdumper bugs which would have taken much longer to
resolve than simply removing an unused field.

The new patch does still make it possible for someone using only
nativecomp builds to accidentally use the field for non-nativecomp
subrs.  If someone tried to build that tree without nativecomp, they
would get a compiler error, which is the best we can do to warn them of
this situation, given the constraint that we use the same structure for
nativecomp subrs and primitive ones.

Of course it would be better if we statically asserted that we're
looking at a nativecomp subr before accessing the field, but we've
discussed that and concluded sharing a C structure is a constraint which
I obeyed in this patch.

If you think this field is significant enough to warrant it (it's used
once in comp.c, once in data.c, and three times by pdumper.c; two of the
pdumper usages can be merged because the buggy one would have to be
replaced by the non-buggy one anyway.  The third is an assertion), we
could add accessors, but it's simply not.

> So: do we really need to remove command_modes from the struct in a
> build without native compilation?  What bad things will happen if we

Yes.

> leave it there?  If we would like to be a bit more efficient by
> avoiding a few function calls, like here:

People might use it, and run into the various bugs that have
accumulated.  It's better to remove the unusable field (and there's all
that talk about not wasting two words of staticvec, but wasting 1,000
words is suddenly okay?).

>> --- a/src/pdumper.c
>> +++ b/src/pdumper.c
>> @@ -2992,16 +2992,15 @@ dump_subr (struct dump_context *ctx, const struct Lisp_Subr *subr)
>>                               COLD_OP_NATIVE_SUBR,
>>  			     make_lisp_ptr ((void *) subr, Lisp_Vectorlike));
>>        dump_field_lv (ctx, &out, subr, &subr->intspec.native, WEIGHT_NORMAL);
>> -      dump_field_lv (ctx, &out, subr, &subr->command_modes, WEIGHT_NORMAL);
>>      }
>>    else
>>      {
>>        dump_field_emacs_ptr (ctx, &out, subr, &subr->symbol_name);
>>        dump_field_emacs_ptr (ctx, &out, subr, &subr->intspec.string);
>> -      dump_field_emacs_ptr (ctx, &out, subr, &subr->command_modes);
>>      }
>
> then it should be okay to #ifdef away those calls when
> !HAVE_NATIVE_COMP, as that doesn't run the risks mentioned above.

That is precisely what I did in the code you cite: the two - lines you
show turned into a single + line behind an #ifdef, which is *precisely*
what we do for the other three fields which are usable in equivalent
circumstances.  Adding a separate #ifdef for two lines that should have
been identical but weren't, a bug that never became relevant only
because the field was unused.

(The bug would have been avoided by some macro-based
check-this-is-a-non-void-pointer code, see the discussion before).

> I also would like to hear from Andrea about this patch.

Absolutely, let's.

Thanks for your comments.

You have convinced me that the .intspec.native union member also
shouldn't exist if we're building without nativecomp.  As we've seen
with Lisp_Subr->doc, using the same field for different purposes is
fragile and leads to bugs.

I won't revise the patch for now as this minor change doesn't really
change things significantly: intspec.native is only set in one place and
read in another one, plus GC/pdumper bookkeeping.  A comment would
suffice, too.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 16:09:02 GMT) Full text and rfc822 format available.

Message #35 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: acorallo <at> gnu.org, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 18:07:43 +0200
> Date: Wed, 22 Jan 2025 15:30:24 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: Andrea Corallo <acorallo <at> gnu.org>, 75318 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> Cc: monnier <at> iro.umontreal.ca
> >> Date: Tue, 21 Jan 2025 23:46:45 +0000
> >> From:  Pip Cet via "Bug reports for GNU Emacs,
> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> >>
> >> diff --git a/src/lisp.h b/src/lisp.h
> >> index 9c31630cc1a..46e940dad09 100644
> >> --- a/src/lisp.h
> >> +++ b/src/lisp.h
> >> @@ -2226,12 +2226,12 @@ CHAR_TABLE_SET (Lisp_Object ct, int idx, Lisp_Object val)
> >>        const char *string;
> >>        Lisp_Object native;
> >>      } intspec;
> >> -    Lisp_Object command_modes;
> >>      /* Positive values: offset into etc/DOC.  Negative values: one's
> >>         complement of index into the native comp unit's vector of
> >>         documentation strings.  */
> >>      EMACS_INT doc;
> >>  #ifdef HAVE_NATIVE_COMP
> >> +    Lisp_Object command_modes;
> >>      Lisp_Object native_comp_u;
> >>      char *native_c_name;
> >>      Lisp_Object lambda_list;
> >
> > IME, having different definitions of data structures under different
> > build conditions is a problematic solution: it makes it harder to make
> 
> That statement applies to the status quo before the patch just as much
> as it does to the new status, though, right?

Could be; I didn't check.  It's a general statement.

> > safe changes that don't break some build.  Many/most people don't try
> 
> This doesn't; it's impossible ta safely use the command_modes field for
> the reasons I explained.

Can you point me to the explanation, or explain again?

> And making it safe without using it at all seems both wasteful and all
> but impossible: we'd have to test the code somehow, making up usages of
> a field that hasn't been in real use for a long time.

I just wanted to avoid breaking a build by referencing the struct
member in a build that doesn't have it.  There's no need to ensure the
member is always used safely.  Doesn't initializing it to nil solve
this?

> 
> > to build all of the configurations Emacs supports when they make
> > changes, so it is easy to write code which references a struct member
> > that doesn't exist in some build configuration, and thus break it, and
> > miss the fact that build is broken by not trying it.  (It is
> > unrealistic to expect everyone who has write access to the repository
> > to test-build all the possible configurations Emacs supports as a
> > prerequisite for installing a change.)
> 
> I agree, but I don't see how that applies here.  Before the patch, if
> someone had started using nativecomp-only command_modes, we would have
> had very tricky GC/pdumper bugs which would have taken much longer to
> resolve than simply removing an unused field.

Well, both problems exist, they just push us in opposite directions.

> The new patch does still make it possible for someone using only
> nativecomp builds to accidentally use the field for non-nativecomp
> subrs.  If someone tried to build that tree without nativecomp, they
> would get a compiler error, which is the best we can do to warn them of
> this situation, given the constraint that we use the same structure for
> nativecomp subrs and primitive ones.

OTOH, adding code that references the field without #ifdef, and
installing it without trying a non-nativecomp build, will break
everybody who builds --without-native-compilation.

> People might use it, and run into the various bugs that have
> accumulated.  It's better to remove the unusable field (and there's all
> that talk about not wasting two words of staticvec, but wasting 1,000
> words is suddenly okay?).

If this leads to such a significant memory waste, then indeed what I
proposed is a non-starter, unfortunately.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 16:09:03 GMT) Full text and rfc822 format available.

Message #38 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 16:07:52 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Wed, 22 Jan 2025 14:25:22 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: 75318 <at> debbugs.gnu.org, luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca
>>
>> > For example, in the below:
>> >
>> >>    static union Aligned_Lisp_Subr Swatch_gc_cons_percentage =
>> >> -     {{{ PSEUDOVECTOR_FLAG | (PVEC_SUBR << PSEUDOVECTOR_AREA_BITS) },
>> >> -       { .a4 = watch_gc_cons_percentage },
>> >> -       4, 4, "watch_gc_cons_percentage", {0}, lisp_h_Qnil}};
>> >
>> > what is lisp_h_Qnil is relatively easy to understand, whereas the
>> > replacement, viz.:
>>
>> >> +#define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
>> >> +  {{ .header = SUBR_HEADER,						\
>> >> +    .function = { .a ## maxargs = fnname },				\
>> >> +    .min_args = minargs,						\
>> >> +    .max_args = maxargs,						\
>> >> +    .symbol_name = lname,						\
>> >> +    .intspec = { .string = intspec_value },				\
>> >> +    .command_modes = LISPSYM_INITIALLY (Qnil),			\
>> >
>> > requires one to look up LISPSYM_INITIALLY, which is
>>
>> If you're saying that LISPSYM_INITIALLY is bad, I agree completely.
>
> It isn't bad, it's that replacing lisp_h_Qnil with it makes the code
> harder to understand (without going crazy with M-.).

> As for LISPSYM_INITIALLY, its name is the main source of confusion:
> the INITIALLY part all but prevents any attempt to get a clear idea
> what the macro does, without actually looking up all the way to the
> basics.  Unless on remembers that by heart, of course (I don't).

I think that second paragraph means that LISPSYM_INITIALLY is, at best,
an unfortunate workaround which may one day be retired.  So, IMHO, yes,
it's bad in an absolute way, but it may be the best option anyway.

(In GDB, with .gdbinit loaded, "p LISPSYM_INITIALLY (Qnil)" prints an
unhelpful "{0x0}"; "p lisp_h_Qnil" prints "{0}".  ptype thinks the first
is a Lisp_Object (but expands the type to its obscure definition), the
second claims to be a singleton array of int.  While all of these are
unhelpful, the last one is dangerous, and we can avoid it by using the
unfortunately-named LISPSYM_INITIALLY.

However, I think Qnil should be usable in static context.  If someone
working on a new GC can't do that (I speak from experience)), they most
likely cannot implement LISPSYM_INITIALLY at all, and have to resort to
some kind of constructor extension; GNU C has one, C++ supports them
natively.

None of that means we should be forced to keep the name, which is
confusing and misleading.

>> > To make the depth of the proverbial rabbit hole at least to some
>> > extent smaller here, could we please refrain from defining
>> > SUBR_NATIVECOMP_INITIALIZER, and instead include the necessary stuff
>> > directly in SUBR_INITIALIZER, with an #ifdef?  Or do you envision
>> > enough other users of SUBR_NATIVECOMP_INITIALIZER to justify yet
>> > another macro?
>>
>> Do you mean
>>
>> #ifdef HAVE_NATIVE_COMP
>> #define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
>>   {{ .header = SUBR_HEADER,						\
>>     .function = { .a ## maxargs = fnname },				\
>>     .min_args = minargs,						\
>>     .max_args = maxargs,						\
>>     .symbol_name = lname,						\
>>     .intspec = { .string = intspec_value },				\
>>     .command_modes = LISPSYM_INITIALLY (Qnil),				\
>>     .native_comp_u = LISPSYM_INITIALLY (Qnil),		\
>>     .lambda_list = LISPSYM_INITIALLY (Qnil),		\
>>     .type = LISPSYM_INITIALLY (Qnil),
>>     }}
>> #else
>> #define SUBR_INITIALIZER(lname, fnname, minargs, maxargs, intspec_value) \
>>   {{ .header = SUBR_HEADER,						\
>>     .function = { .a ## maxargs = fnname },				\
>>     .min_args = minargs,						\
>>     .max_args = maxargs,						\
>>     .symbol_name = lname,						\
>>     .intspec = { .string = intspec_value },				\
>>     }}
>> #endif
>>
>> (and, I assume, the same thing for SUBR_HEADER?)
>
> No.  I meant to use #ifdef inside the macro.  But if that is
> impossible in a C macro, I'd even support making this an inline
> function.

My understanding is you cannot use #ifdef inside a macro in GNU C; there
might be a way to use _Pragma in other C versions, but it's horrible.

Similarly, as the standard C we're attempting to use doesn't have
constexpr and inlining functions is always optional, I'm pretty sure
it's still impossible to statically initialize a variable with the
result of calling a function, inline or otherwise.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 16:39:02 GMT) Full text and rfc822 format available.

Message #41 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: acorallo <at> gnu.org, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 16:38:26 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Wed, 22 Jan 2025 15:30:24 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: Andrea Corallo <acorallo <at> gnu.org>, 75318 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> >> Cc: monnier <at> iro.umontreal.ca
>> >> Date: Tue, 21 Jan 2025 23:46:45 +0000
>> >> From:  Pip Cet via "Bug reports for GNU Emacs,
>> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>> >>
>> >> diff --git a/src/lisp.h b/src/lisp.h
>> >> index 9c31630cc1a..46e940dad09 100644
>> >> --- a/src/lisp.h
>> >> +++ b/src/lisp.h
>> >> @@ -2226,12 +2226,12 @@ CHAR_TABLE_SET (Lisp_Object ct, int idx, Lisp_Object val)
>> >>        const char *string;
>> >>        Lisp_Object native;
>> >>      } intspec;
>> >> -    Lisp_Object command_modes;
>> >>      /* Positive values: offset into etc/DOC.  Negative values: one's
>> >>         complement of index into the native comp unit's vector of
>> >>         documentation strings.  */
>> >>      EMACS_INT doc;
>> >>  #ifdef HAVE_NATIVE_COMP
>> >> +    Lisp_Object command_modes;
>> >>      Lisp_Object native_comp_u;
>> >>      char *native_c_name;
>> >>      Lisp_Object lambda_list;
>> >
>> > IME, having different definitions of data structures under different
>> > build conditions is a problematic solution: it makes it harder to make
>>
>> That statement applies to the status quo before the patch just as much
>> as it does to the new status, though, right?
>
> Could be; I didn't check.  It's a general statement.
>
>> > safe changes that don't break some build.  Many/most people don't try
>>
>> This doesn't; it's impossible ta safely use the command_modes field for
>> the reasons I explained.
>
> Can you point me to the explanation, or explain again?

GC doesn't trace the field, pdumper doesn't save it; the sole remaining
user expects (and insists) on a Qnil value.  Iff we wanted to fix the
first we'd have to add new users for testing, if we want to fix and test
the second we'd have to add pre-dump users, which is harder.  Pdumper
also asserts that the field isn't used, so it's possible other things
will break, because I'm sure that assertion was put there for a reason.

>> And making it safe without using it at all seems both wasteful and all
>> but impossible: we'd have to test the code somehow, making up usages of
>> a field that hasn't been in real use for a long time.
>
> I just wanted to avoid breaking a build by referencing the struct
> member in a build that doesn't have it.  There's no need to ensure the
> member is always used safely.  Doesn't initializing it to nil solve
> this?

I don't understand that at all, sorry.  If anyone uses the field for
anything but checking its nil-ness in a non-nativecomp build, they
either get a compile-time error or a very frustrating inexplicable
crash.  Changing from the latter to the former is the best we can do.

>> The new patch does still make it possible for someone using only
>> nativecomp builds to accidentally use the field for non-nativecomp
>> subrs.  If someone tried to build that tree without nativecomp, they
>> would get a compiler error, which is the best we can do to warn them of
>> this situation, given the constraint that we use the same structure for
>> nativecomp subrs and primitive ones.
>
> OTOH, adding code that references the field without #ifdef, and
> installing it without trying a non-nativecomp build, will break
> everybody who builds --without-native-compilation.

Indeed. It'd be nice to statically check more things before we push to
master (and, generally, compiling a C program and checking it for errors
are two separate things; the same is true for texinfo, it seems).  I
honestly think we should add a separate "lint" pass (by this or any
other name) sooner rather than later, but right now, we don't.  The best
we can do is CI and breaking the build rather than crashing
inexplicably.

(Random example: print.c exposes the string "unreadeable-function"
(meaning "unreadable-function") to Lisp and expects Lisp to provide it
in some contexts.  Obviously not something GCC can check for during
every build, but if we'd run a weekly typo lint, we could have caught it
in time.  I've let the resident typo expert know, but this probably
requires a bug report and discussion now).

But, again, pie-in-the-sky dreams of a better build system aside, this
is strictly about compile-time errors vs run-time bugs.

>> People might use it, and run into the various bugs that have
>> accumulated.  It's better to remove the unusable field (and there's all
>> that talk about not wasting two words of staticvec, but wasting 1,000
>> words is suddenly okay?).
>
> If this leads to such a significant memory waste, then indeed what I
> proposed is a non-starter, unfortunately.

Insignificant for my current system, unfortunately, which aligns
static Lisp_Subr to 32-byte boundaries (for good reason).  However, as
there are many systems with different alignments and sizes, saving a
32-byte line every fourth time we remove a word has the same expected
benefit as regaining 8 bytes the second we do so.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 16:40:02 GMT) Full text and rfc822 format available.

Message #44 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 18:39:12 +0200
> Date: Wed, 22 Jan 2025 16:07:52 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: 75318 <at> debbugs.gnu.org, luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca
> 
> (In GDB, with .gdbinit loaded, "p LISPSYM_INITIALLY (Qnil)" prints an
> unhelpful "{0x0}"; "p lisp_h_Qnil" prints "{0}".  ptype thinks the first
> is a Lisp_Object (but expands the type to its obscure definition), the
> second claims to be a singleton array of int.

I guess your build is with check-lisp-objects?  Because here I get

  (gdb) p LISPSYM_INITIALLY (Qnil)
  $1 = 0
  (gdb) p lisp_h_Qnil
  $3 = 0
  (gdb) ptype LISPSYM_INITIALLY(Qnil)
  type = long long
  (gdb) ptype lisp_h_Qnil
  type = int





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Wed, 22 Jan 2025 17:59:01 GMT) Full text and rfc822 format available.

Message #47 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Wed, 22 Jan 2025 17:57:48 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Wed, 22 Jan 2025 16:07:52 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: 75318 <at> debbugs.gnu.org, luangruo <at> yahoo.com, monnier <at> iro.umontreal.ca
>>
>> (In GDB, with .gdbinit loaded, "p LISPSYM_INITIALLY (Qnil)" prints an
>> unhelpful "{0x0}"; "p lisp_h_Qnil" prints "{0}".  ptype thinks the first
>> is a Lisp_Object (but expands the type to its obscure definition), the
>> second claims to be a singleton array of int.
>
> I guess your build is with check-lisp-objects?  Because here I get

Sorry, meant to mention that and forgot.  It is.

>   (gdb) p LISPSYM_INITIALLY (Qnil)
>   $1 = 0
>   (gdb) p lisp_h_Qnil
>   $3 = 0
>   (gdb) ptype LISPSYM_INITIALLY(Qnil)
>   type = long long
>   (gdb) ptype lisp_h_Qnil
>   type = int

I guess your build is with wide Emacs ints, as usual :-)

I'd also forgotten about "whatis", which splits the difference and tells
me the first is a Lisp_Word, not a Lisp_Object.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#75318; Package emacs. (Thu, 23 Jan 2025 12:00:03 GMT) Full text and rfc822 format available.

Message #50 received at 75318 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Po Lu <luangruo <at> yahoo.com>, 75318 <at> debbugs.gnu.org
Subject: Re: bug#75318: Lisp_Subr header and layout
Date: Thu, 23 Jan 2025 11:59:04 +0000
"Stefan Monnier" <monnier <at> iro.umontreal.ca> writes:

>> Ideally, we'd turn these three into plain DEFUNs, with docstrings
>> indicating they're internal functions.  I'd also like to turn the
>> underscores into ordinary dashes in the Lisp function names if we
>> do that.
>
> +1
> Does anyone know why they're not plain DEFUNs?

The sfntfont.c isn't safe to call from Lisp (it assumes it's passed two
vectors of the right size).

Po Lu, I assume it was originally a normal DEFUN and then modified for
performance reasons?  Are those reasons significant enough to keep the
code as it is, in your opinion, until a better solution than turning it
into a safe DEFUN is found?

(I must confess I don't understand the code at all: the first results
are the ones with the fewest number of fields set? Is the list reversed
somewhere else? I'm not saying this code needs more comments, of
course.  If others can understand it easily, I may just have a blind
spot here.)

About the alloc.c non-DEFUNs:

No idea, couldn't find anything, it looks weird in C-h v
gc-cons-threshold.

Nothing in the commit (97ffa339b6d67cebcbefbdfaa2880214adab639c) or bug
discussion (https://debbugs.gnu.org/cgi/bugreport.cgi?bug=37006).

The comment in lisp.h, above Lisp_Subr says:

/* This structure describes a built-in function.
   It is generated by the DEFUN macro only.
   defsubr makes it into a Lisp object.  */

The least we could do is adjust that by expanding the second line.  (In
the nativecomp case, all three lines need fixing.)

The watcher is exposed to Lisp and can be called directly or removed, so
that's not it.

The underscored name is also printed in C-h v gc-cons-threshold, so
hiding it isn't the reason either.

(My drive-by proposal in bug#75632 is to improve display of variable
watchers, for example by starting with:

diff --git a/lisp/help-fns.el b/lisp/help-fns.el
index 9324cf85454..f4f595aedbe 100644
--- a/lisp/help-fns.el
+++ b/lisp/help-fns.el
@@ -1712,7 +1712,9 @@ help-fns--var-watchpoints
     (when watchpoints
       (princ "  Calls these functions when changed: ")
       ;; FIXME: Turn function names into hyperlinks.
-      (princ watchpoints)
+      (dolist (watchpoint watchpoints)
+        (princ (help-fns-function-name watchpoint))
+        (princ "\n  "))
       (terpri))))

And then fixing help-fns-function-name so we can remove the FIXME.)

No matter whether we change help-fns.el, the name is likely to remain in
the output, so the underscores are confusing; and giving it a proper
symbol seems like a good idea to me, not just because we can then add
the symbol rather than the subr to the watchers list, and that prints
nicely.

If there aren't any good reasons, I'll send a patch turning all three
into DEFUNs and we can discuss that further.  Definitely going to push
this particular change until there's consensus, though.

Pip





This bug report was last modified 143 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.