GNU bug report logs - #27270
display-raw-bytes-as-hex generates ambiguous output for Emacs strings

Previous Next

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Wed, 7 Jun 2017 03:59:01 UTC

Severity: wishlist

Tags: moreinfo

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log

View this message in rfc822 format

From: Vasilij Schneidermann <v.schneidermann <at> gmail.com>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, Paul Eggert <eggert <at> cs.ucla.edu>, 27270 <at> debbugs.gnu.org, npostavs <at> users.sourceforge.net
Subject: bug#27270: display-raw-bytes-as-hex generates ambiguous output for Emacs strings
Date: Sun, 24 Apr 2022 12:51:58 +0200

> You need to use a wide string:
>
>       wslen(L"\x1234")
>
> >     std::string("\x1234").length() // C++: compilation error
>
> Likewise:
>
>       std::wstring(L"\x1234").length()

Thank you for pointing this out. This gives us three camps:

- Languages where "\x1234" is always one character (Emacs Lisp)
- Languages where "\x1234" is an error, but may become one character
when opting into this with wide literals (C, C++)
- Languages where "\x1234" is always multiple characters (everything
else under the sun)

I propose Emacs Lisp to move into camp 3 (not really a point in moving
to camp two as it requires new syntax for a hardly used feature). As
evident by the bug report, this is a footgun waiting to happen. We
already do have syntax in case one truly wants to specify a value
greater than #xFF using Unicode names/values. This would require an
amendment in `(info "(elisp) General Escape Syntax")`, point 3. Like
with oldstyle backquotes, a warning could be emitted if greater hex
values are used in a string.

I've checked Emacs sources for usage of such hex escapes and only
found org-entities.el to represent non-breaking space (nbsp) this way,
so breakage should be limited.

If there is interest, I could extend the survey to include whether
character syntax is/should be affected the same way and/or include
more languages.

This bug report was last modified 3 years and 109 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #27270 display-raw-bytes-as-hex generates ambiguous output for Emacs strings

GNU bug report logs - #27270
display-raw-bytes-as-hex generates ambiguous output for Emacs strings