GNU bug report logs - #68751
29.1; "\x0e0" is a multibyte string

Previous Next

Package: emacs;

Reported by: Christopher Yeleighton <giecrilj <at> stegny.2a.pl>

Date: Sat, 27 Jan 2024 06:31:01 UTC

Severity: normal

Found in version 29.1

Fixed in version 30.1

To reply to this bug, email your comments to 68751 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#68751; Package emacs. (Sat, 27 Jan 2024 06:31:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Christopher Yeleighton <giecrilj <at> stegny.2a.pl>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 27 Jan 2024 06:31:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Christopher Yeleighton <giecrilj <at> stegny.2a.pl>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.1; "\x0e0" is a multibyte string
Date: Sat, 27 Jan 2024 06:23:45 +0000

M-: (multibyte-string-p "\x0e0") RET

> t


In GNU Emacs 29.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.38,
cairo version 1.17.8)
Windowing system distributor 'The X.Org Foundation', version 11.0.12101010
System Description: Arch Linux

Configured using:
'configure --sysconfdir=/etc --prefix=/usr --libexecdir=/usr/lib
--with-tree-sitter --localstatedir=/var --with-cairo
--disable-build-details --with-harfbuzz --with-libsystemd
--with-modules --with-x-toolkit=gtk3 'CFLAGS=-march=x86-64
-mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2
-Wformat -Werror=format-security -fstack-clash-protection
-fcf-protection -g
-ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto'
'LDFLAGS=-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto''

Configured features:
ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG
JSON LCMS2 LIBOTF LIBSYSTEMD LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY
PDUMPER PNG RSVG SECCOMP SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS
TREE_SITTER WEBP X11 XDBE XIM XINPUT2 XPM GTK3 ZLIB

Important settings:
value of $LANG: pl_PL.UTF-8
locale-coding-system: utf-8-unix

Major mode: Info

Minor modes in effect:
shell-dirtrack-mode: t
tooltip-mode: t
global-eldoc-mode: t
show-paren-mode: t
electric-indent-mode: t
mouse-wheel-mode: t
tool-bar-mode: t
menu-bar-mode: t
file-name-shadow-mode: t
isearch-fold-quotes-mode: t
global-font-lock-mode: t
font-lock-mode: t
blink-cursor-mode: t
buffer-read-only: t
line-number-mode: t
indent-tabs-mode: t
transient-mark-mode: t
auto-composition-mode: t
auto-encryption-mode: t
auto-compression-mode: t

Load-path shadows:
None found.

Features:
(xref project lpr thai-util thai-word repeat mailalias mailclient
textsec uni-scripts idna-mapping ucs-normalize uni-confusable
textsec-check facemenu shadow sort mail-extr emacsbug message yank-media
puny rfc822 mml mml-sec epa derived epg rfc6068 epg-config gnus-util
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils browse-url url url-proxy url-privacy url-expand url-methods
url-history url-cookie generate-lisp-file url-domsuf url-util url-parse
url-vars mailcap mule-util info tar-mode arc-mode archive-mode sh-script
rx smie treesit executable files-x conf-mode shell pcomplete comint
ansi-osc ansi-color ring dired-aux dired dired-loaddefs noutline outline
icons two-column kmacro debug backtrace find-func face-remap shortdoc
text-property-search cl-extra cl-print erc-lang erc-goodies erc iso8601
auth-source cl-seq eieio eieio-core cl-macs password-cache json map pp
format-spec erc-backend erc-networks byte-opt gv bytecomp byte-compile
erc-common erc-compat erc-loaddefs thingatpt help-fns radix-tree
jka-compr misearch multi-isearch time-date subr-x rfc1345 quail
help-mode cl-loaddefs cl-lib rmc iso-transl tooltip cconv eldoc paren
electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel
term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads dbusbind inotify lcms2
dynamic-setting system-font-setting font-render-setting cairo
move-toolbar gtk x-toolkit xinput2 x multi-tty make-network-process
emacs)

Memory information:
((conses 16 514130 81836)
(symbols 48 26162 44)
(strings 32 129312 5724)
(string-bytes 1 2808230)
(vectors 16 59943)
(vector-slots 8 1889204 140248)
(floats 8 953 203)
(intervals 56 27501 657)
(buffers 984 37))





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#68751; Package emacs. (Sat, 27 Jan 2024 06:47:01 GMT) Full text and rfc822 format available.

Message #8 received at 68751 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Yeleighton <giecrilj <at> stegny.2a.pl>
To: 68751 <at> debbugs.gnu.org
Subject: Re: bug#68751: Acknowledgement (29.1; "\x0e0" is a multibyte string)
Date: Sat, 27 Jan 2024 06:46:36 +0000
Info (elisp) Non-ASCII in Strings says:

> If a string constant contains hexadecimal or octal escape sequences, 
and these
> escape sequences all specify unibyte characters (i.e., less than 256),
> and there are no other literal non-ASCII characters or Unicode-style
> escape sequences in the string, then Emacs automatically assumes that it
> is a unibyte string.

I believe it should say:

| (i.e., less than 256 and octal or written with 2 hexadecimal digits),

and additionally

| Unibyte characters embedded in multibyte string constants evaluate to 
private character codes,
| e.g. "\x0a0\xa0" equals "\x0a0\x3fffa0".

On 27.01.2024 06:31, GNU bug Tracking System wrote:
> Thank you for filing a new bug report with debbugs.gnu.org.
>
> This is an automatically generated reply to let you know your message
> has been received.
>
> Your message is being forwarded to the package maintainers and other
> interested parties for their attention; they will reply in due course.
>
> Your message has been sent to the package maintainer(s):
>   bug-gnu-emacs <at> gnu.org
>
> If you wish to submit further information on this problem, please
> send it to 68751 <at> debbugs.gnu.org.
>
> Please do not send mail to help-debbugs <at> gnu.org unless you wish
> to report a problem with the Bug-tracking system.
>




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#68751; Package emacs. (Sat, 27 Jan 2024 07:40:01 GMT) Full text and rfc822 format available.

Message #11 received at 68751 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Christopher Yeleighton <giecrilj <at> stegny.2a.pl>
Cc: 68751 <at> debbugs.gnu.org
Subject: Re: bug#68751: 29.1; "\x0e0" is a multibyte string
Date: Sat, 27 Jan 2024 09:38:46 +0200
> Date: Sat, 27 Jan 2024 06:23:45 +0000
> From: Christopher Yeleighton <giecrilj <at> stegny.2a.pl>
> 
> M-: (multibyte-string-p "\x0e0") RET
> 
>  > t

Why do you think this is a problem?  U+0E0E is à, a non-ASCII
character, so it has a multibyte representation.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#68751; Package emacs. (Sat, 27 Jan 2024 08:19:02 GMT) Full text and rfc822 format available.

Message #14 received at 68751 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Christopher Yeleighton <giecrilj <at> stegny.2a.pl>
Cc: 68751 <at> debbugs.gnu.org
Subject: Re: bug#68751: 29.1; "\x0e0" is a multibyte string
Date: Sat, 27 Jan 2024 10:18:26 +0200
> Date: Sat, 27 Jan 2024 06:46:36 +0000
> From: Christopher Yeleighton <giecrilj <at> stegny.2a.pl>
> 
> Info (elisp) Non-ASCII in Strings says:
> 
>  > If a string constant contains hexadecimal or octal escape sequences, 
> and these
>  > escape sequences all specify unibyte characters (i.e., less than 256),
>  > and there are no other literal non-ASCII characters or Unicode-style
>  > escape sequences in the string, then Emacs automatically assumes that it
>  > is a unibyte string.
> 
> I believe it should say:
> 
> | (i.e., less than 256 and octal or written with 2 hexadecimal digits),

Right.  I modified the text to that effect.

> and additionally
> 
> | Unibyte characters embedded in multibyte string constants evaluate to 
> private character codes,
> | e.g. "\x0a0\xa0" equals "\x0a0\x3fffa0".

I didn't make this change because I don't see how it is useful.
First, "evaluate" is confusing here.  Also, "private character codes"
is confusing/incorrect, as it could be interpreted to mean Emacs
somehow uses the PUA of Unicode codespace, which it doesn't.  Finally,
when Emacs converts from a single-byte representation of a raw byte to
its multibyte representation is an obscure matter largely defined by
ad-hoc compatibility considerations, and doesn't belong to the ELisp
manual.

I think this bug can be closed now.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#68751; Package emacs. (Sat, 27 Jan 2024 08:31:02 GMT) Full text and rfc822 format available.

Message #17 received at 68751 <at> debbugs.gnu.org (full text, mbox):

From: Krzysztof Żelechowski <giecrilj <at> stegny.2a.pl>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 68751 <at> debbugs.gnu.org
Subject: Re: bug#68751: 29.1; "\x0e0" is a multibyte string
Date: Sat, 27 Jan 2024 08:53:32 +0100
[Message part 1 (text/html, inline)]

bug Marked as fixed in versions 30.1. Request was from Stefan Kangas <stefankangas <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 02 Feb 2024 08:05:02 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 133 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.