GNU bug report logs - #44173
28.0.50; gdb-mi mangles strings with octal escapes

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 23 Oct 2020 11:51:02 UTC

Severity: normal

Found in version 28.0.50

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 44173 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 44173 <at> debbugs.gnu.org
Subject: Re: bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes
Date: Fri, 23 Oct 2020 19:31:47 +0200
23 okt. 2020 kl. 16.44 skrev Eli Zaretskii <eliz <at> gnu.org>:

> There's nothing special in the text "\303" that says it must be an
> octal escape.  They are just 4 ASCII characters.

The grammar uses the name 'c-string', so it is reasonable to assume that most of the lexical conventions of C strings are obeyed.

Perhaps you mean that \ooo can occur outside c-string productions? If so, please say where you have seen it, or have evidence of it being produced.

The possibilities are limited. For example, stream records may use an unquoted (newline-terminated) string in place of a c-string, but I haven't seen any evidence of this in practice and it appears that gdb-mi.el does not handle that case either.

The only other possibility in the grammar would be inside 'variable' productions (field keys) which are unquoted, but those only come from a small set of fixed names.

To be clear: the exact encoding of non-ASCII bytes (whether present literally or as octal escapes in c-string tokens) is unclear, and I do not attempt to solve that problem here and now. This is about more fundamental parsing and lexing problems.

Namely: handling octal escapes in gdb-gdbmi-marker-filter is doing it at the wrong level. Moreover, this substitution, when performed, is not correct since it ignores the context; the 4-byte (excluding quotes) string "\\377" then appears to the user as the 2-byte string "\\377", where the '\377' sequence is painted in a distinct colour and really is the raw byte 0xff, and thus not a valid C character escape sequence.

Finally, the JSON mess is evidently the wrong way to go since it does not take care of strings properly -- and heavens know what else, since gdb-jsonify-buffer works at the wrong level (a pattern here) by doing regexp replacement on the whole text prior to parsing.

> That doesn't really answer my question, though, about the use case
> that causes such a string to be in the program.  Without a use case, I
> could tell you to set gdb-mi-decode-strings non-nil and be done with
> it.

Why a string is in a program is irrelevant; the user does not necessarily know that. If Emacs says that a string contains six decimal digits when it really just contains two (nonzero) bytes, then that is a lie no matter what.

> Well, I know that several possible ways exist, but each one of them
> loses in some situations.  You say "the code receiving the parse tree
> could decide", but will that code have information to make that
> decision correctly?  And if you must decide in the parser, how would
> you suggest to make the decision to avoid making incorrect decisions?

Again that problem is outside the scope of this bug but I think we can agree that it is easier, or at least no more difficult, to make a correct decision knowing the context of the string than not.

Let's see what a value/result parser can do and work from there.





This bug report was last modified 4 years and 251 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.