GNU bug report logs - #27270
display-raw-bytes-as-hex generates ambiguous output for Emacs strings

Previous Next

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Wed, 7 Jun 2017 03:59:01 UTC

Severity: wishlist

Tags: moreinfo

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #63 received at 27270 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: v.schneidermann <at> gmail.com, 27270 <at> debbugs.gnu.org,
 npostavs <at> users.sourceforge.net
Subject: Re: bug#27270: display-raw-bytes-as-hex generates ambiguous output
 for Emacs strings
Date: Sun, 11 Jun 2017 17:48:04 +0300
> Cc: npostavs <at> users.sourceforge.net, 27270 <at> debbugs.gnu.org,
>  v.schneidermann <at> gmail.com
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sat, 10 Jun 2017 17:04:40 -0700
> 
> On 06/10/2017 12:24 AM, Eli Zaretskii wrote:
> > So your proposal would mean a change to the Lisp reader to support
> > such escapes, right?  If so, isn't such a change
> > backward-incompatible?
> 
> Yes, but only in the sense that undocumented escapes evaluate to 
> themselves, e.g., "\F" is currently the same as "F" in Emacs Lisp 
> because there is no escape sequence \F currently defined for character 
> constants. But there's nothing new here, e.g., when we added "\N{...}" 
> last year we changed the interpretation of the formerly-undocumented \N 
> escape.

Then maybe the new hex display should use the \N{U+nnn} format?

> >> Also, display-raw-bytes-as-hex would cause raw bytes to be displayed with this
> >> new X escape, rather than with with the x escape.
> > It could only do that for codepoints below 256 decimal, so that
> > limitation should be taken into account when deciding on the proposal.
> 
> Ouch, I hadn't thought of that.
> 
> Wait -- doesn't that mean that "display-raw-bytes-as-hex" is a 
> misleading name, because it affects the display not only of raw bytes, 
> but of other undisplayable characters?

That's true, but since the chances of a _user_ changing the
printable-chars char-table are pretty slim, I didn't think it was
justified to obfuscate the name.

> Shouldn't we change its name to 
> something more generic and more accurate, like "display-characters-as-hex"?

Codepoints whose printable-chars entry is nil cannot in good faith be
called "characters", IMO.  "Codepoints", maybe?  But again, that makes
the discoverability harder, so I'm not sure it's worth the hassle.

> Anyway, to address the point you raised: how about a different idea? We 
> extend the existing \x syntax in strings so that \x{dddd} has the same 
> meaning as "\xdddd", except that the "}" terminates the escape. This 
> syntax is used by Perl and so is in the same family as \N{...}. We also 
> change display-raw-bytes-as-hex to use this new syntax when a character 
> is immediately followed by a hexadecimal digit. That way, most 
> characters are displayed as before, but my problematic example is 
> displayed as "x\x{90}5y", which is a good visual cue of the unusual 
> situation.

See above: why not \N{U+...}?  The only downside is that it's much
longer than \xNN.  Could be another option, perhaps.




This bug report was last modified 3 years and 109 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.