GNU bug report logs -
#27270
display-raw-bytes-as-hex generates ambiguous output for Emacs strings
Previous Next
Reported by: Paul Eggert <eggert <at> cs.ucla.edu>
Date: Wed, 7 Jun 2017 03:59:01 UTC
Severity: wishlist
Tags: moreinfo
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> You need to use a wide string:
>
> wslen(L"\x1234")
>
> > std::string("\x1234").length() // C++: compilation error
>
> Likewise:
>
> std::wstring(L"\x1234").length()
Thank you for pointing this out. This gives us three camps:
- Languages where "\x1234" is always one character (Emacs Lisp)
- Languages where "\x1234" is an error, but may become one character
when opting into this with wide literals (C, C++)
- Languages where "\x1234" is always multiple characters (everything
else under the sun)
I propose Emacs Lisp to move into camp 3 (not really a point in moving
to camp two as it requires new syntax for a hardly used feature). As
evident by the bug report, this is a footgun waiting to happen. We
already do have syntax in case one truly wants to specify a value
greater than #xFF using Unicode names/values. This would require an
amendment in `(info "(elisp) General Escape Syntax")`, point 3. Like
with oldstyle backquotes, a warning could be emitted if greater hex
values are used in a string.
I've checked Emacs sources for usage of such hex escapes and only
found org-entities.el to represent non-breaking space (nbsp) this way,
so breakage should be limited.
If there is interest, I could extend the survey to include whether
character syntax is/should be affected the same way and/or include
more languages.
This bug report was last modified 3 years and 109 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.