Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Wed, 22 Jan 2025 10:20:01 UTC
Severity: normal
Done: Pip Cet <pipcet <at> protonmail.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Pip Cet <pipcet <at> protonmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 75754 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu Subject: bug#75754: styled_format stack usage/GC protection Date: Wed, 22 Jan 2025 17:17:47 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes: >> Date: Wed, 22 Jan 2025 10:18:32 +0000 >> From: Pip Cet via "Bug reports for GNU Emacs, >> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org> >> >> While the alloca_size value is small, sa_avail is negative when we enter >> SAFE_ALLOCA, so SAFE_ALLOCA uses xmalloc to allocate the memory on the >> heap. >> >> The structure contains a Lisp_Object. This Lisp_Object must be >> protected from GC by being present on the C stack if GC can ever happen >> in this function. SAFE_ALLOCA doesn't protect it. >> >> I'm not entirely sure this is a problem, but >> >> (let ((print-unreadable-function (lambda (&rest args) (garbage-collect)))) >> (format "%S" (symbol-function '+))) >> >> produces this backtrace: >> >> #0 garbage_collect () at alloc.c:6450 >> #1 0x00005555557f7700 in Fgarbage_collect () at alloc.c:6697 >> #2 0x000055555582e1ee in eval_sub (form=XIL(0x7ffff4dab073)) at eval.c:2584 >> #3 0x0000555555828d9f in Fprogn (body=XIL(0)) at eval.c:439 >> #4 0x0000555555830604 in funcall_lambda (fun=XIL(0x555556045c9d), nargs=2, arg_vector=0x7fffffff6b78) at eval.c:3339 >> #5 0x000055555582f62d in funcall_general (fun=XIL(0x555556045c9d), numargs=2, args=0x7fffffff6b78) at eval.c:3033 >> #6 0x000055555582f89a in Ffuncall (nargs=3, args=0x7fffffff6b70) at eval.c:3082 >> #7 0x0000555555863e8a in print_vectorlike_unreadable >> (obj=XIL(0x555555ee7765), printcharfun=XIL(0), escapeflag=true, buf=0x7fffffff6ce0 "\220m\377\377\377\177") at print.c:1683 >> #8 0x0000555555866e5f in print_object (obj=XIL(0x555555ee7765), printcharfun=XIL(0), escapeflag=true) at print.c:2647 >> #9 0x0000555555862ec2 in print (obj=XIL(0x555555ee7765), printcharfun=XIL(0), escapeflag=true) at print.c:1296 >> #10 0x0000555555861cef in Fprin1_to_string (object=XIL(0x555555ee7765), noescape=XIL(0), overrides=XIL(0)) at print.c:814 >> #11 0x000055555581fd14 in styled_format (nargs=2, args=0x7fffffffca20, message=false) at editfns.c:3633 >> #12 0x000055555581f444 in Fformat (nargs=2, args=0x7fffffffca20) at editfns.c:3370 >> >> indicating that GC can happen. >> >> The code attempts to protect the current argument by keeping it in a ^^^^^^^ This part is important: if there are several arguments, only one of them (at most) is protected by the automatic variable. >> redundant automatic variable: >> >> Lisp_Object arg = spec->argument; >> ... >> spec->argument = arg = Fprin1_to_string (arg, noescape, Qnil); > > I think indeed the protection here is by having the Lisp object in an > automatic variable. The important point is that protection appears to me to be insufficient, because there's a single automatic variable but many args. The very minor potential issue is that it is often an important optimization for code to know it is the sole owner of a malloc'd region. That's part of the reason ATTRIBUTE_MALLOC was added. The GCC analyzer certainly does very smart things with this attribute. The technical details of conservative GC are that an automatic variable has to live on the stack or in a register because it's the only way for the code to reproduce its value (if it's not reused, there's no harm in freeing it early). In this case, a very smart compiler might realize that it is the sole owner of the malloc'd memory (that's what ATTRIBUTE_MALLOC is about) and doesn't have to redundantly keep a stack variable representing a value it can read back from such memory. I wouldn't usually worry about this, because SAFE_ALLOC uses alloca or xmalloc and alloca isn't optimized. In this case, the alloca case is unreachable and presumably optimized out, so it's merely about modifying a variable which lives in two places in only one of them; with -Os, it'd be natural to modify the heap location directly and optimize out the register location. >> In any case, this protects only the current argument; if we detect a >> multibyte situation too late, we may restart the loop and, I think, >> reuse info->argument values which were unprotected during GC. > > Can you elaborate how this could happen by walking through the > relevant code? It's about this label: /* If we start out planning a unibyte result, then discover it has to be multibyte, we jump back to retry. */ retry: When we reach the label, we reestablish ispec (the current index into the info array), but not nspec (the last valid index into that array). That means that this code: if ((conversion == 'S' || (conversion == 's' && ! STRINGP (arg) && ! SYMBOLP (arg)))) { if (EQ (arg, args[n])) { Lisp_Object noescape = conversion == 'S' ? Qnil : Qt; spec->argument = arg = Fprin1_to_string (arg, noescape, Qnil); if (STRING_MULTIBYTE (arg) && ! multibyte) { multibyte = true; goto retry; } } conversion = 's'; } only has a real effect during the first attempt. I'll describe a scenario, but it might be best to test this and provide a gdb log: The first argument is converted to a unibyte string S. S is stored in info[0]->argument. This is one reference to it, the second one is "arg". We advance to the next argument, making "arg" refer to it and leaving S unprotected. Fine, as long as we don't call out to Lisp. But Vprint_unreadable_function is bound and we do call out to Lisp while trying to convert the second argument. GC happens and collects the first string. Then, the Lisp call returns and provides us with an unexpected multibyte string. So we set multibyte = true and goto retry, set "arg" to spec->argument, which is the now-invalid pointer to S. We then print it, but the string data of the collected string has been compacted and now points to another string's data. We call copy_text on SDATA (S), which copies bogus data into buf, which is then converted to a string and returned. However, as far as I can tell, with the old GC, print.c calls out to Lisp very rarely, so this is unlikely to happen in practice. With IGC, the situation is much worse: GC can strike at any time, fail to see the reference to the string in the xmalloc'd buffer, move the string or its data or both, and usually overwrites the old string right away (we could poison the old memory in this case, but doing so will destroy the old data which might have provided the only hint as to what was moved and where). My plan is to test this now, and if it does happen, unconditionally allocate a second SAFE_ALLOCA_LISP area, which will protect the arguments; my first attempt put a pointer to this area in the info struct, but I'm pretty sure that's unnecessary: we simply have to check the right index in both areas. If it doesn't happen with alloc.c, we should probably still do that; it's not obviously less efficient and the slight impact on readability will be outweighed by readers having to wonder about the redundantly-synched arg variable. Once the massive stack allocation has been fixed (in the simplest case, we simply accept that we exceed MAX_ALLOCA and remove the subtraction from sa_avail which accounts for the extra stack usage), performance should return to normal, and while we will have fixed the bug we can sleep soundly knowing that the fix only applies to rare cases anyway (in the simplest case, one would have to provide a 818-character format string to even run out of alloca space with the default settings). I'd like to reiterate my objection to the fact that SAFE_ALLOCA doesn't unconditionally scan all memory it has allocated for all GC implementations. That change in semantics would make both branches safer and we could take our time investigating potential bugs such as this one: until we decide the extra protection is no longer needed, bugs such as this one couldn't happen. Gerd did change SAFE_NALLOCA to scan its buffer on feature/igc, but not SAFE_ALLOCA. Gerd, can we simply: #define SAFE_ALLOCA(size) ({ void *buf; SAFE_NALLOCA(buf, size, 1); buf; }) for the time being? It might reduce the rush to fix this bug. >> On the feature/igc branch, where protection is definitely required, I'm >> thinking about a fix. A quick fix would be to replace all elements of >> struct info by Lisp_Object values and use SAFE_ALLOCA_LISP. > > Are automatic variables not protected on the igc branch as they are > with the old GC? Apart from the SDATA difference (in which case igc protects against GC but the old GC requires a no-GC assumption while the pointer is in use), the protection should be equivalent. When this failed to happen with -fomit-frame-pointer builds, we saw difficult bugs. The problem is that conservative stack marking isn't guaranteed to work by any C standard or ABI that I'm aware of. It used to be the case that an automatic variable would reside in one place so that was good enough; now, it's common for the only live reference to be an internal pointer. If we give the compiler extra space to play with, it might decide to use it, and that's what ATTRIBUTE_MALLOC does. > Can we use AUTO_STRING to solve the problem? No. It's strictly optional and cannot solve a problem that affects all builds. Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.