GNU bug report logs - #68272
[PATCH] Fix -1 leaking from C to lisp in 'read-event' etc.

Previous Next

Package: emacs;

Reported by: Tim Ruffing <crypto <at> timruffing.de>

Date: Fri, 5 Jan 2024 21:20:01 UTC

Severity: normal

Tags: patch

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Tim Ruffing <crypto <at> timruffing.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 68272 <at> debbugs.gnu.org
Subject: bug#68272: [PATCH] Fix -1 leaking from C to lisp in 'read-event' etc.
Date: Sat, 06 Jan 2024 15:32:23 +0100
Hey, thank you very much for the long email. I was somewhat prepared
for hesitant reply, and I totally understand that "there be dragons".
And I'm also aware that it doesn't spark confidence when someone with
(almost) zero code contributions submits a patch of this kind. I feel
the same when new people send complex patches to the C library I
maintain... And sure, it took a me quite a while to navigate this maze
of keyboard.c and to come up with this patch and make the tests pass,
but I feel rather confident now that this is the right approach.

I agree, we can never be fully sure that this introduces regressions,
but let me still try to convince you that this approach and these
patches are carefully crafted and thought through. Here's my analysis
of the situation:

First of all, I think the bug is real. read-event (and read-char and
read-char-exclusive, which I won't mention from now on for brevity) can
return -1, but the docs don't mention -1, and it's arguably a strange
"magic" value for a lisp function. If anything, one would expect nil,
or some special symbol. Of course, we could simply document the -1
return value. 

But the problem is not just that this is not elegant. It's worse
because it's also pretty unclear what the caller should do in this
case. In particular, the caller can't simply skip the -1 and try again:
The caller will see an infinite stream of -1 values, until the keyboard
macro has been resolved properly, i.e., as long as executing_kbd_macro
is non-nil. One thing the caller can simply do is to error out. But if
the caller wants an event, the only thing it can do is to touch
executing_kbd_macro and set it nil explicitly (this is what subr.el
does currently). We could also document this, but I feel something like
"this function can return -1 and if it does, set executing_kbd_macro to
nil and try again" is really a bit too unelegant (and it has more
downsides, see next paragraph).

However, this now hints at a different approach: Could we handle this
in read-event locally and just perform the retrying for the caller?
First of all, we'd probably still do it one level below in the call
stack in order not to mess up with the timeouts (if we simply try
again, we could double the timeout), so we'd want to do it
inread_filtered_event. But I think that approach will then be *not*
localized and thus dangerous: it's not clear if setting
executing_kbd_macro to nil has unexpected side effects have. Resolving
the macro properly is ultimately the responsibility of
Fexecute_kbd_macro and pop_kbd_macro in macros.c, and we shouldn't just
declare at some other point in the code that we wish the macro handling
to be done.

-------------

Let me try to convince you that these commits are more harmless than
they look at the first glance. I'll go through them one by one. I'm
also happy to add a revised version of this to the commit messages.

1. "Extract check for end of macro to function":
This is pure refactoring. A brief code review shows that this does not
change semantics at all.

2. "src/keyboard.c (requeued_events_pending_p): Revive"

2a.

On Sat, 2024-01-06 at 09:42 +0200, Eli Zaretskii wrote:
>   For example, your patches modify
> requeued_events_pending_p, but that function is called in several
> places, including outside of keyboard.c -- how can we be sure we are
> not introducing regressions this way? 

Oh, good catch. In fact, it's (without the patches here) only called
outside of keyboard.c. I was misled by the comment that said "This
isn't used yet. The hope is to make wait_reading_process_output call
it". Indeed, this comment is outdated and the (only) caller is
wait_reading_process_output. wait_reading_process_output really wants
only keyboard output, so it should really just check
Vunread_command_events and not unread_post_input_method_events and
Vunread_input_method_events; the latter two are for non-keyboard input.
I can rework this. I think the way to go is to split this into two
functions requeued_events_pending_p and
requeued_command_events_pending_p or similar, and fix the outdated
comment.

2b.
This changes two !NILP to CONSP to be consistent with the checks within
read_char (see 3. below). This is fine under the assumption that the
variables are always lists (as they should be). I was about to say that
I can take this back if you feel it's too dangerous, but if this
function needs to be split into two anyway, we won't need the change
from !NILP to CONSP.

3. "Fix -1 leaking from C to lisp in 'read-event' etc."

3a.

On Sat, 2024-01-06 at 09:42 +0200, Eli Zaretskii wrote:
> And read_char is called in
> several places, including by lread.c and read_char itself -- did you
> figure out how will this change affect those?
> 

Yes, I carefully checked all the callers. (I know, it's not trivial to
review this commit, and I think any careful reviewer would need to do
the same in the end...) The only caller which acts specially on -1 is
read_key_sequence, so it makes sense to handle this case in
read_key_sequence. This is done in this commit. The other callers are
not prepared to get -1 (which is the root cause of the bug!), and
exactly what we would like to avoid.

3b.

On Sat, 2024-01-06 at 09:42 +0200, Eli Zaretskii wrote:
>   Moreover, the changes in read_char modify code more than is
> necessary to handle the "at end of macro" situation, so we are
> risking
> changes that will break something else.  Here's an example:
> 
> > --- a/src/keyboard.c
> > +++ b/src/keyboard.c
> > @@ -2610,7 +2610,8 @@ read_char (int commandflag, Lisp_Object map,
> >        goto reread_for_input_method;
> >      }
> >  
> > -  if (!NILP (Vexecuting_kbd_macro))
> > +  /* If we're executing a macro, process it unless we are at its
> > end. */
> > +  if (!NILP (Vexecuting_kbd_macro) && !at_end_of_macro_p ())
> >      {
> >        /* We set this to Qmacro; since that's not a frame, nobody
> > will
> >   try to switch frames on us, and the selected window will
> > @@ -2624,15 +2625,6 @@ read_char (int commandflag, Lisp_Object map,
> >   selected.  */
> >        Vlast_event_frame = internal_last_event_frame = Qmacro;
> >  
> > -      /* Exit the macro if we are at the end.
> > - Also, some things replace the macro with t
> > - to force an early exit.  */
> > -      if (at_end_of_macro_p ())
> > - {
> > -   XSETINT (c, -1);
> > -   goto exit;
> > - }
> > -
> 
> This hunk moves the at_end_of_macro_p test to a higher level, which
> has a side effect of not executing the code before the original
> at_end_of_macro_p test -- how do we know this won't break something
> that happens to depend on that code which will now not execute in the
> at-end-of-macro case?

The commit takes care of this, and it's not too hard to verify: If you
look at read_char, except for setting local variables, this is what it
does above the original at_end_of_macro_p test:  
 * if (CONSP (Vunread_post_input_method_events)) ...
 * Vlast_event_device = Qnil;
 * if (CONSP (Vunread_command_events)) ...
 * if (CONSP (Vunread_input_method_events))
 * Vlast_event_frame = internal_last_event_frame = Qmacro; (if in a kbd
   macro)        

The "if" blocks don't matter. That's exactly why the caller checks for
patch check for !requeued_events_pending_p () in the patch: if those
blocks were to become active, we'd *not* skip read_char.

Consider "Vlast_event_frame = internal_last_event_frame = Qmacro". The
purpose this is to indicate that the last event came from a kbd macro.
Since we now don't return the "event" -1 any longer, it also doesn't
make sense to set this in this case. (And when we read a new event,
this will be correctly updated.)

What remains is setting "Vlast_event_device = Qnil". The same logic
applies here. I don't think we should set to this nil at the end of a
keyboard macro, because no new event was processed, i.e., the "last
event" didn't change. That's why I choose to skip this line.

But it's certainly more conservative to keep the two assignments to
avoid any unexpected consequences, I'm happy to add this to the caller
with an appropriate comment, if this is desired

4. "Remove workarounds for solved 'read-event' bug"

On Sat, 2024-01-06 at 09:42 +0200, Eli Zaretskii wrote:
> Also note that at least in subr.el we don't only test the -1 value,
> we
> also test that a keyboard macro is being executed, and in this and
> other callers that handle -1, each application behaves in
> application-specific way when it receives -1.  It is not clear to me
> how your changes modify the behavior in those cases, and your
> explanations and the log message doesn't seem to answer this
> question.
> For example, this code in calc-prog:
> 
> > --- a/lisp/calc/calc-prog.el
> > +++ b/lisp/calc/calc-prog.el
> > @@ -1230,8 +1230,6 @@ calc-kbd-skip-to-else-if
> >   ch)
> >      (while (>= count 0)
> >        (setq ch (read-char))
> > -      (if (= ch -1)
> > -   (error "Unterminated Z[ in keyboard macro"))
> >        (if (= ch ?Z)
> >     (progn
> >       (setq ch (read-char))
> 
> now signals a specific error in the case of an unterminated keyboard
> macro: what will the behavior in that case be after the changes?

The behavior is that it will wait for interactive user input. Say you
define a keyboard macro that performs only part of a calculation. Then
instead of signaling an error, it will simply wait for the user the
provide the remaining key strokes.

On Sat, 2024-01-06 at 10:15 +0100, Andreas Schwab wrote:
> 
> Seems like calc actually wants to know when the kbd macro ends
> prematurely, and removing the option to detect it is a regression

To be honest, I really don't think that signaling an error here was a
crucial feature. I'm pretty sure what happened is that -1 appeared as
an unexpected return case, and then the error was added. Yes, this is a
semantic change, but I think it's an acceptable one. This doesn't
remove any real functionality, it simply avoids an error condition.

But one thing I wasn't sure about is whether the change that -1 can't
occur anymore would warrant a NEWS item or similar. If yes, I'll add
one. 

-------------

On Sat, 2024-01-06 at 09:42 +0200, Eli Zaretskii wrote:
> In this case, given
> that we want a more elegant solution for a situation that we already
> handle, I tend to avoid any such changes to begin with, for the
> reasons I explained above, and instead perhaps document the special
> meaning of -1 in these cases.  And if we want to consider such
> changes
> these dangers notwithstanding, I would like to see them affecting as
> little other code as possible.  Can you suggest a safer changeset?

As I explained above, documenting this is not trivial either. While we
can, of course, simply document that -1 can be returned, it's hard to
give meaningful advice except "this function can return -1 and if it
does and you really want to wait for input, set executing_kbd_macro to
nil and try again", which is not only unelegant, but could have further
consequences. 

I'm happy to make the changes based on your review and on other
feedback, and expand the commit message a lot, and I think this will
make the patches more conservative and safer. 

It's still not trivial to review and verify my analysis. But this is
just how it is. The code change wasn't easy to come up with, so it's
necessarily not very easy to review.

With the modifications I hinted at based on your review, I think what
we end up with is:
1. commit: no semantic change
2. commit: will be reworked to have no semantic change at all
3. commit: only change is that -1 is not propagated to lisp, other
callers of read_char are unaffected as I explained above. (modulo
Vlast_event_device and Vlast_event_frame -- which I can also still set
if you want me to to).
4. commit: this only removes code paths which are unreachable now, so
again no semantic chance.

But then it will be good upfront if it's at least realistic that these
patches get in in some form or another. If I can get conceptual
approval or dismissal of this approach, I know what to do next.

Best,
Tim





This bug report was last modified 1 year and 68 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.