GNU bug report logs - #44173
28.0.50; gdb-mi mangles strings with octal escapes

Previous Next

Package: emacs;

Reported by: Mattias Engdegård <mattiase <at> acm.org>

Date: Fri, 23 Oct 2020 11:51:02 UTC

Severity: normal

Found in version 28.0.50

Done: Mattias Engdegård <mattiase <at> acm.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Mattias Engdegård <mattiase <at> acm.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 44173 <at> debbugs.gnu.org
Subject: bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes
Date: Sat, 24 Oct 2020 20:27:13 +0200
24 okt. 2020 kl. 19.23 skrev Eli Zaretskii <eliz <at> gnu.org>:

>> If gdb-mi-decode-strings is non-nil, then file names, string contents etc are properly decoded as UTF-8 as expected
> 
> Not UTF-8, but the value of gdb-mi-decode-strings, if it's a
> coding-system, right?

Right.

> I hoped/thought you intended to solve this issue as well, but if the
> situation is no worse than it was before, it's fine to leave it at
> that.  However, please retain at least part of the comment regarding
> gdb-mi-decode-strings and the ambiguity related to its use, I think
> it's important that people know that.

Yes, the valid parts of the comment will be kept.
I'm not sure what a solution to the remaining problems would look like, but it would probably involve splitting gdb-mi-decode-strings in separate variables for file names and program values. On the other hand, given that the world is converging to UTF-8, it may be a disappearing problem?

In any case, should we want to decode strings differently depending on their structural position in the answer, I believe that it would be better done in the field accessors instead of the parser. For example,

  (bindat-get-field breakpoint 'fullname)

might become something like

  (gdb-mi--get-string-field breakpoint 'fullname 'filename)

which would tell the accessor how to decode the field.

In the short term I suggest changing the default value of gdb-mi-decode-strings to 't' as this gives the behaviour most commonly expected by the user. However, it is not critical, and in any case orthogonal to the issue at hand. What do you think?

> And I hope you've verified that this does still fix the problem in
> bug#21572, which this variable and the related code tries to fix?

Yes -- I tried debugging programs whose source file names contain Unicode chars and they were shown correctly (with gdb-mi-decode-strings = t).

>> +       (t
>> +        (error "Unrecognised escape char: %c" (following-char))))
> 
> How about leaving the text unchanged instead of signaling an error
> (and thus preventing the entire data from getting to the higher
> levels)?

Maybe, but I really dislike hiding bugs by being overly tolerant. It is precisely this tolerant nature of 'json-read' that caused this bug in the first place. (I'm not sure whether this is compliant with RFC 8259, by the way.)
I think it's fine to signal errors if the syntax isn't what we expect; after all, that is what the JSON parser does in other cases.

Thanks for the helpful comments. I'll prepare a proper patch.





This bug report was last modified 4 years and 192 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.