GNU bug report logs - #66363
gdb-control-commands-regexp issues

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Thu, 5 Oct 2023 15:08:01 UTC

Severity: normal

Tags: patch

Done: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 66363 in the body.
You can then email your comments to 66363 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#66363; Package emacs. (Thu, 05 Oct 2023 15:08:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mattias Engdegård <mattias.engdegard <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 05 Oct 2023 15:08:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Emacs Bug Report <bug-gnu-emacs <at> gnu.org>
Subject: gdb-control-commands-regexp issues
Date: Thu, 5 Oct 2023 17:07:22 +0200
[Message part 1 (text/plain, inline)]
The variable `gdb-control-commands-regexp` in gdb-mi.el cannot work as currently defined and used. Only group 3 is of  interest, but that group hasn't referred to anything useful for several years.

Group 3 probably refers to a part of the regex's tail where the command argument is matched:

   "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"

However, this seems to be broken as well, because all groups here are inside repetitions. This part of the regexp is also exponential in form if not in practice but we'd better simplify it anyway.

Attached is a suggested patch which makes explicit the command abbreviations matched, and leaves only a single submatch. It also changes the tail to assuming that the command argument doesn't contain non-newlines (or the final eol anchor wouldn't make sense) but that it can contain spaces (which seems reasonable). However, right now the argument is only checked for being non-empty or not.

I don't have a working gdb setup at the moment so if someone would be kind to test it, I would be very grateful for it.

[gdb-control-commands-regexp.diff (application/octet-stream, attachment)]

Added tag(s) patch. Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 05 Oct 2023 15:14:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66363; Package emacs. (Thu, 05 Oct 2023 16:30:02 GMT) Full text and rfc822 format available.

Message #10 received at 66363 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 66363 <at> debbugs.gnu.org
Subject: Re: bug#66363: gdb-control-commands-regexp issues
Date: Thu, 05 Oct 2023 19:29:06 +0300
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Thu, 5 Oct 2023 17:07:22 +0200
> 
> The variable `gdb-control-commands-regexp` in gdb-mi.el cannot work as currently defined and used. Only group 3 is of  interest, but that group hasn't referred to anything useful for several years.
> 
> Group 3 probably refers to a part of the regex's tail where the command argument is matched:
> 
>    "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
> 
> However, this seems to be broken as well, because all groups here are inside repetitions. This part of the regexp is also exponential in form if not in practice but we'd better simplify it anyway.
> 
> Attached is a suggested patch which makes explicit the command abbreviations matched, and leaves only a single submatch. It also changes the tail to assuming that the command argument doesn't contain non-newlines (or the final eol anchor wouldn't make sense) but that it can contain spaces (which seems reasonable). However, right now the argument is only checked for being non-empty or not.
> 
> I don't have a working gdb setup at the moment so if someone would be kind to test it, I would be very grateful for it.

I'd appreciate a few examples of using this that don't work correctly
(or not at all), as it is otherwise not easy to understand the
problem, and much less the proposed solution and how to test it.

TIA




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66363; Package emacs. (Thu, 05 Oct 2023 17:17:02 GMT) Full text and rfc822 format available.

Message #13 received at 66363 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66363 <at> debbugs.gnu.org
Subject: Re: bug#66363: gdb-control-commands-regexp issues
Date: Thu, 5 Oct 2023 19:16:32 +0200
> I'd appreciate a few examples of using this that don't work correctly
> (or not at all), as it is otherwise not easy to understand the
> problem, and much less the proposed solution and how to test it.

I don't have any examples but the problems are clear from reading the code.
The regexp is defined by:

> (defvar gdb-python-guile-commands-regexp
>   "python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr"
>   "Regexp that matches Python and Guile commands supported by GDB.")
> 
> (defvar gdb-control-commands-regexp
>   (concat
>    "^\\("
>    "comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while"
>    "\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|"
>    gdb-python-guile-commands-regexp
>    "\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions"
>    "\\|expl\\(o\\(re?\\)?\\)?"
>    "\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$")
>   "Regexp matching GDB commands that enter a recursive reading loop.
> As long as GDB is in the recursive reading loop, it does not expect
> commands to be prefixed by \"-interpreter-exec console\".")

which results in the string regexp

> "^\\(comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions\\|expl\\(o\\(re?\\)?\\)?\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"

It is used in one place only:

>   (let* ((control-command-p (string-match gdb-control-commands-regexp string))
>          (command-arg (and control-command-p (match-string 3 string)))

As you can see it refers to group 3, which is matched by "\\(n\\(ds?\\)?\\)" -- clearly not what anybody intended.

As for the tail, even if the match group is corrected it's a rather roundabout way of checking whether there is a non-blank character after the command (and is written using nested repetitions which is how this came to my attention).

For the examples you asked about, perhaps commit 30c0f81f9f which introduced the now broken mechanism can be of help.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66363; Package emacs. (Thu, 05 Oct 2023 17:44:01 GMT) Full text and rfc822 format available.

Message #16 received at 66363 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 66363 <at> debbugs.gnu.org
Subject: Re: bug#66363: gdb-control-commands-regexp issues
Date: Thu, 05 Oct 2023 20:43:40 +0300
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Thu, 5 Oct 2023 19:16:32 +0200
> Cc: 66363 <at> debbugs.gnu.org
> 
> > I'd appreciate a few examples of using this that don't work correctly
> > (or not at all), as it is otherwise not easy to understand the
> > problem, and much less the proposed solution and how to test it.
> 
> I don't have any examples but the problems are clear from reading the code.
> The regexp is defined by:
> 
> > (defvar gdb-python-guile-commands-regexp
> >   "python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr"
> >   "Regexp that matches Python and Guile commands supported by GDB.")
> > 
> > (defvar gdb-control-commands-regexp
> >   (concat
> >    "^\\("
> >    "comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while"
> >    "\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|"
> >    gdb-python-guile-commands-regexp
> >    "\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions"
> >    "\\|expl\\(o\\(re?\\)?\\)?"
> >    "\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$")
> >   "Regexp matching GDB commands that enter a recursive reading loop.
> > As long as GDB is in the recursive reading loop, it does not expect
> > commands to be prefixed by \"-interpreter-exec console\".")
> 
> which results in the string regexp
> 
> > "^\\(comm\\(a\\(n\\(ds?\\)?\\)?\\)?\\|if\\|while\\|def\\(i\\(ne?\\)?\\)?\\|doc\\(u\\(m\\(e\\(nt?\\)?\\)?\\)?\\)?\\|python\\|python-interactive\\|pi\\|guile\\|guile-repl\\|gr\\|while-stepping\\|stepp\\(i\\(ng?\\)?\\)?\\|ws\\|actions\\|expl\\(o\\(re?\\)?\\)?\\)\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
> 
> It is used in one place only:
> 
> >   (let* ((control-command-p (string-match gdb-control-commands-regexp string))
> >          (command-arg (and control-command-p (match-string 3 string)))
> 
> As you can see it refers to group 3, which is matched by "\\(n\\(ds?\\)?\\)" -- clearly not what anybody intended.

So the problem is only that 3 should be changed to the correct group
number?

> As for the tail, even if the match group is corrected it's a rather roundabout way of checking whether there is a non-blank character after the command (and is written using nested repetitions which is how this came to my attention).

Here you are talking about some optimization of the regexp, or is it
another bug?  If the latter, what is the bug here?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66363; Package emacs. (Thu, 05 Oct 2023 18:12:02 GMT) Full text and rfc822 format available.

Message #19 received at 66363 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66363 <at> debbugs.gnu.org
Subject: Re: bug#66363: gdb-control-commands-regexp issues
Date: Thu, 5 Oct 2023 20:11:24 +0200
5 okt. 2023 kl. 19.43 skrev Eli Zaretskii <eliz <at> gnu.org>:

> So the problem is only that 3 should be changed to the correct group
> number?

That would perhaps work, but it wouldn't be a very robust solution. There are many groups preceding that we don't care about, and the bug arose precisely because they were added without considering that a group was being used.

Better then to get rid of all groups save the one in use. (The proposed patch does that.)
That would make it much more difficult for the same bug to arise again.

> Here you are talking about some optimization of the regexp, or is it
> another bug?  If the latter, what is the bug here?

It's related. When the reference to subgroup 3 was added (30c0f81f9f), the tail looked like this:

  "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)?$"
                  ^^^^^^^^^^^^^^^^^^^
and subgroup 3 was the second group in this substring (underlined above). Later (f71afd600a) the last `?` was changed into a `*`, but that made the contents of that group somewhat hazy because of the repetition:

  "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
                                        ^
which doesn't leave us with a useful group for the purpose of detecting command arguments at all.
So the tail of the regexp has to be rewritten anyway, and we might as well do it in a more straightforward way. (The proposed patch does that as well.)

The reason I found this is that it contains a sub-pattern on the form (A+B*)* which is super-linear in general but usually easy to fix so that the result is actually more readable, yet faster.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66363; Package emacs. (Thu, 05 Oct 2023 18:53:02 GMT) Full text and rfc822 format available.

Message #22 received at 66363 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 66363 <at> debbugs.gnu.org
Subject: Re: bug#66363: gdb-control-commands-regexp issues
Date: Thu, 05 Oct 2023 21:52:21 +0300
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Thu, 5 Oct 2023 20:11:24 +0200
> Cc: 66363 <at> debbugs.gnu.org
> 
> 5 okt. 2023 kl. 19.43 skrev Eli Zaretskii <eliz <at> gnu.org>:
> 
> > So the problem is only that 3 should be changed to the correct group
> > number?
> 
> That would perhaps work, but it wouldn't be a very robust solution. There are many groups preceding that we don't care about, and the bug arose precisely because they were added without considering that a group was being used.
> 
> Better then to get rid of all groups save the one in use. (The proposed patch does that.)
> That would make it much more difficult for the same bug to arise again.
> 
> > Here you are talking about some optimization of the regexp, or is it
> > another bug?  If the latter, what is the bug here?
> 
> It's related. When the reference to subgroup 3 was added (30c0f81f9f), the tail looked like this:
> 
>   "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)?$"
>                   ^^^^^^^^^^^^^^^^^^^
> and subgroup 3 was the second group in this substring (underlined above). Later (f71afd600a) the last `?` was changed into a `*`, but that made the contents of that group somewhat hazy because of the repetition:
> 
>   "\\([[:blank:]]+\\([^[:blank:]]*\\)\\)*$"
>                                         ^
> which doesn't leave us with a useful group for the purpose of detecting command arguments at all.
> So the tail of the regexp has to be rewritten anyway, and we might as well do it in a more straightforward way. (The proposed patch does that as well.)
> 
> The reason I found this is that it contains a sub-pattern on the form (A+B*)* which is super-linear in general but usually easy to fix so that the result is actually more readable, yet faster.

Thanks.  I'm okay with the changes in principle, but someone will have
to test them by running all of the control commands and verifying they
work after the fix, before this can be installed.  I myself won't have
the time for doing that any time soon, sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66363; Package emacs. (Fri, 06 Oct 2023 12:10:02 GMT) Full text and rfc822 format available.

Message #25 received at 66363 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66363 <at> debbugs.gnu.org
Subject: Re: bug#66363: gdb-control-commands-regexp issues
Date: Fri, 6 Oct 2023 14:09:01 +0200
[Message part 1 (text/plain, inline)]
5 okt. 2023 kl. 20.52 skrev Eli Zaretskii <eliz <at> gnu.org>:

> I'm okay with the changes in principle, but someone will have
> to test them by running all of the control commands and verifying they
> work after the fix, before this can be installed.  I myself won't have
> the time for doing that any time soon, sorry.

That's fine, I can wait. The bug only affects GDB users so I'm not personally inconvenienced.

Here's an improved version of the patch that also fixes another bug in the original code: the extra match against gdb-python-guile-commands-regexp is both incorrect (not anchored) and superfluous as that information is available in the match just made.

For instance, consider what happens if the command string is "stepping ...". It will match gdb-control-command-regexp and also gdb-python-guile-commands-regexp and set python-or-guile-p, despite not being a Python or Guile command, and  prevent gdb-control-level from being incremented.

[gdb-control-commands-regexp-2.diff (application/octet-stream, attachment)]

Reply sent to Mattias Engdegård <mattias.engdegard <at> gmail.com>:
You have taken responsibility. (Sun, 29 Oct 2023 16:36:01 GMT) Full text and rfc822 format available.

Notification sent to Mattias Engdegård <mattias.engdegard <at> gmail.com>:
bug acknowledged by developer. (Sun, 29 Oct 2023 16:36:01 GMT) Full text and rfc822 format available.

Message #30 received at 66363-done <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66363-done <at> debbugs.gnu.org
Subject: Re: bug#66363: gdb-control-commands-regexp issues
Date: Sun, 29 Oct 2023 17:34:48 +0100
Pushed to master after finally finding a machine with gdb on it.





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 27 Nov 2023 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 263 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.