From unknown Fri Aug 15 12:43:50 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#16448 <16448@debbugs.gnu.org> To: bug#16448 <16448@debbugs.gnu.org> Subject: Status: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Reply-To: bug#16448 <16448@debbugs.gnu.org> Date: Fri, 15 Aug 2025 19:43:50 +0000 retitle 16448 24.3; Messages from (error "...") with UTF-8 chars are printe= d wrongly in Emacs Lisp scripts reassign 16448 emacs submitter 16448 Sergey Tselikh severity 16448 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 14 19:18:25 2014 Received: (at submit) by debbugs.gnu.org; 15 Jan 2014 00:18:25 +0000 Received: from localhost ([127.0.0.1]:50794 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3EBc-0002MI-AY for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:18:25 -0500 Received: from eggs.gnu.org ([208.118.235.92]:59937) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3E7a-0002E4-EU for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3E7U-0004BY-Ew for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:14 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56333) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7U-0004BU-BE for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:08 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33422) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7O-0008By-7S for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:14:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3E7I-0004An-Bw for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:14:02 -0500 Received: from mail-lb0-x22c.google.com ([2a00:1450:4010:c04::22c]:46930) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7H-0004Aj-W7 for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:13:56 -0500 Received: by mail-lb0-f172.google.com with SMTP id c11so256414lbj.17 for ; Tue, 14 Jan 2014 16:13:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-transfer-encoding; bh=o31UQCco2E5LnhchPANyyhTmsV9FVwx1bZHSESfNJFo=; b=VtAX26PWLTuY7axGK2+wBsCpEORzOAF+BSuqm2DEGDmF4CG6iMLhaPX2wSn8pNmCV2 0Rj4hMUWzkMWKAa+qfFTJL9NRNxFFc1/89XqRYlhO33QibE9meBY28l+KP6ggWEc/AOo JnjTVJ3Am4LFLgAZY4S5XPMXof9qp/mylqcOLNk8w2ZhG3ORxJ5JKW6K+Qbh7fT/ZtSe GkEVgyolb6pEMHWNWQyCQlt8earR7lXP9ESvgmCVbeW93f8AnOWqOz9BR3qr53vWo1ON DtG4tyBLFvugCwLbwQpwnCU+D3V/gW/7CiT0XXV/gnLs8CANJRVopTDL4uO5d8p5xo+r Kklg== X-Received: by 10.152.219.133 with SMTP id po5mr2481515lac.34.1389744834398; Tue, 14 Jan 2014 16:13:54 -0800 (PST) Received: from laptop ([77.35.239.14]) by mx.google.com with ESMTPSA id z3sm1543097lag.10.2014.01.14.16.13.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Jan 2014 16:13:53 -0800 (PST) Date: Wed, 15 Jan 2014 11:10:09 +1100 From: Sergey Tselikh To: bug-gnu-emacs@gnu.org Subject: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Message-Id: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.17; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 14 Jan 2014 19:18:22 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hello. In a script, when (error "...") instruction is executed with some UTF-8 characters in its text, the message is not printed correctly. LANG environment variable is set to en_US.UTF-8 for all programs, my terminal is x11-terms/rxvt-unicode with adequate UTF-8 support, Emacs version is GNU Emacs 24.3.1. Examples (all of them are with LANG=en_US.UTF-8 in environment): $ cat error.el (message "hello привет") (message "привет hello") (error "hello привет") $ emacs --script error.el hello привет привет hello hello ?@825B But: $ emacs -nw --eval '(error "hello привет")' ^^^ successfully prints "hello привет" in minibuffer. This ?@825B is not some trash. Created a small table showing its origins (It is ``echo hello привет | print-bits | cat -t'' vs. ``echo hello привет | high-bits-01 | print-bits | cat -t''): h 01101000 | h 01101000 | e 01100101 | e 01100101 | l 01101100 | l 01101100 | l 01101100 | l 01101100 | o 01101111 | o 01101111 | 00100000 | 00100000 | M-P 11010000 | P 01010000 | M-? 10111111 | ? 00111111 | ? M-Q 11010001 | Q 01010001 | M-^@ 10000000 | @ 01000000 | @ M-P 11010000 | P 01010000 | M-8 10111000 | 8 00111000 | 8 M-P 11010000 | P 01010000 | M-2 10110010 | 2 00110010 | 2 M-P 11010000 | P 01010000 | M-5 10110101 | 5 00110101 | 5 M-Q 11010001 | Q 01010001 | M-^B 10000010 | B 01000010 | B More examples: $ cat any-other.el (error "cons:%s list:%s string:%s" (cons 'на 'речке) '(на речке на том бере) "be Быть beat Бить become Становиться begin Начинать bleed Кровоточить stung Жалить sweep Выметать swell Разбухать swim Плавать swing Качать take Брать, взять") $ emacs --script any-other.el cons:(=0 . @5G:5) list:(=0 @5G:5 =0 B>< 15@5) string:be KBL beat 8BL become !B0=>28BLAO begin 0G8=0BL bleed @>2>B>G8BL stung 0;8BL sweep K<5B0BL swell 071CE0BL swim ;020BL swing 0G0BL take @0BL, 27OBL $ cat ja.el (setq jstr "案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru.") (message "%s" jstr) (error "%s" jstr) $ emacs --script ja.el 案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru. HZ???#?LW Anzuru yori umu ga yasushi. ?moS_?? Deru kui wa utareru. In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.17) of 2013-10-10 on laptop Windowing system distributor `The X.Org Foundation', version 11.0.11403000 Configured using: `configure '--prefix=/usr' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--libdir=/usr/lib64' '--disable-silent-rules' '--disable-dependency-tracking' '--program-suffix=-emacs-24' '--infodir=/usr/share/info/emacs-24' '--enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp' '--with-crt-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../lib64' '--with-gameuser=games' '--without-compress-info' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus' '--with-gnutls' '--with-xml2' '--without-selinux' '--without-wide-int' '--with-sound' '--with-x' '--without-ns' '--with-gconf' '--without-gsettings' '--with-toolkit-scroll-bars' '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm' '--with-imagemagick' '--with-xft' '--with-libotf' '--with-m17n-flt' '--with-x-toolkit=gtk2' 'GENTOO_PACKAGE=app-editors/emacs-24.3-r2' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' 'CFLAGS=-pipe -march=corei7-avx -mno-aes -O2' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed' 'CPPFLAGS='' Important settings: value of $LC_COLLATE: C value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix default enable-multibyte-characters: t -- Sergey Tselikh From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 14 23:02:56 2014 Received: (at 16448) by debbugs.gnu.org; 15 Jan 2014 04:02:57 +0000 Received: from localhost ([127.0.0.1]:50845 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3Hgu-0000AW-7o for submit@debbugs.gnu.org; Tue, 14 Jan 2014 23:02:56 -0500 Received: from forward5l.mail.yandex.net ([84.201.143.138]:43989) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3Hgq-0000AK-QY for 16448@debbugs.gnu.org; Tue, 14 Jan 2014 23:02:54 -0500 Received: from smtp3o.mail.yandex.net (smtp3o.mail.yandex.net [37.140.190.28]) by forward5l.mail.yandex.net (Yandex) with ESMTP id AA26DC40DEB; Wed, 15 Jan 2014 08:02:50 +0400 (MSK) Received: from smtp3o.mail.yandex.net (localhost [127.0.0.1]) by smtp3o.mail.yandex.net (Yandex) with ESMTP id 5603A1E0C65; Wed, 15 Jan 2014 08:02:50 +0400 (MSK) Received: from unknown (unknown [37.139.80.10]) by smtp3o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id AzOq63yh62-2onaDWth; Wed, 15 Jan 2014 08:02:50 +0400 (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (Client certificate not present) X-Yandex-Uniq: 53cae515-1f29-459c-91c1-2cfa7b407ac3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1389758570; bh=DY6tM7E3lkZVywa/t368nPdsjpRqkDO8OF6KJYB1ZO0=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=nK9g7M4E5zL4m9YXunx9NFf30I8cFxZGsdm+wuTEexH3wYcG4ZMAkfQCC7uXudQBG aUhGDXxjLoklmfSuiSQDcJWoFKFZ+Q4bL1OTiMGeUSeajpCHOPkxSQRmv+JqN2uAeq kFz/Xfo3+KrY3YnK1OpI9/dHW4spFuTEKk6yUFTk= Authentication-Results: smtp3o.mail.yandex.net; dkim=pass header.i=@yandex.ru Message-ID: <52D60869.1000206@yandex.ru> Date: Wed, 15 Jan 2014 08:02:49 +0400 From: Dmitry Antipov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Sergey Tselikh Subject: Re: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts References: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> In-Reply-To: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 16448 Cc: 16448@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 01/15/2014 04:10 AM, Sergey Tselikh wrote: > In a script, when (error "...") instruction is executed with some UTF-8 > characters in its text, the message is not printed correctly. In batch mode, (error ...) is handled by external-debugging-output, and the latter just does: putc (XINT (character) & 0xFF, stderr); ^^^^^^ To allow multibyte sequences here, we should use something like: === modified file 'src/print.c' --- src/print.c 2014-01-01 07:43:34 +0000 +++ src/print.c 2014-01-15 03:55:39 +0000 @@ -709,8 +709,14 @@ to make it write to the debugging output. */) (Lisp_Object character) { + unsigned char str[MAX_MULTIBYTE_LENGTH]; + unsigned int ch; + ptrdiff_t len; + CHECK_NUMBER (character); - putc (XINT (character) & 0xFF, stderr); + ch = XINT (character); + len = CHAR_STRING (ch, str); + fwrite (str, len, 1, stderr); #ifdef WINDOWSNT /* Send the output to a debugger (nothing happens if there isn't one). */ Dmitry From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 15 10:35:54 2014 Received: (at 16448) by debbugs.gnu.org; 15 Jan 2014 15:35:54 +0000 Received: from localhost ([127.0.0.1]:52193 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3SVV-0004YM-SW for submit@debbugs.gnu.org; Wed, 15 Jan 2014 10:35:54 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:34127) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3SVS-0004Y9-5j for 16448@debbugs.gnu.org; Wed, 15 Jan 2014 10:35:51 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MZG009008N4BS00@a-mtaout20.012.net.il> for 16448@debbugs.gnu.org; Wed, 15 Jan 2014 17:35:48 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MZG009JF8NN0Y70@a-mtaout20.012.net.il>; Wed, 15 Jan 2014 17:35:48 +0200 (IST) Date: Wed, 15 Jan 2014 17:35:43 +0200 From: Eli Zaretskii Subject: Re: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts In-reply-to: <52D60869.1000206@yandex.ru> X-012-Sender: halo1@inter.net.il To: Dmitry Antipov Message-id: <83vbxl45rk.fsf@gnu.org> References: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> <52D60869.1000206@yandex.ru> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 16448 Cc: 16448@debbugs.gnu.org, stselikh@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > Date: Wed, 15 Jan 2014 08:02:49 +0400 > From: Dmitry Antipov > Cc: 16448@debbugs.gnu.org > > On 01/15/2014 04:10 AM, Sergey Tselikh wrote: > > > In a script, when (error "...") instruction is executed with some UTF-8 > > characters in its text, the message is not printed correctly. > > In batch mode, (error ...) is handled by external-debugging-output, and the > latter just does: > > putc (XINT (character) & 0xFF, stderr); > ^^^^^^ > To allow multibyte sequences here, we should use something like: > > === modified file 'src/print.c' > --- src/print.c 2014-01-01 07:43:34 +0000 > +++ src/print.c 2014-01-15 03:55:39 +0000 > @@ -709,8 +709,14 @@ > to make it write to the debugging output. */) > (Lisp_Object character) > { > + unsigned char str[MAX_MULTIBYTE_LENGTH]; > + unsigned int ch; > + ptrdiff_t len; > + > CHECK_NUMBER (character); > - putc (XINT (character) & 0xFF, stderr); > + ch = XINT (character); > + len = CHAR_STRING (ch, str); > + fwrite (str, len, 1, stderr); This will only work correctly in a UTF-8 locale. In the general case, we need to run the resulting multibyte sequence through ENCODE_SYSTEM, before writing it to stderr. Btw, the way we output text in this case cries for refactoring: we first assemble individual characters from their multibyte sequences, then pass those characters one by one to external-debugging-output, which will now have to unroll each character back into its multibyte sequence, and encode each character individually. Something for after the branch, I guess. From debbugs-submit-bounces@debbugs.gnu.org Sat Feb 01 07:00:29 2014 Received: (at 16448-done) by debbugs.gnu.org; 1 Feb 2014 12:00:29 +0000 Received: from localhost ([127.0.0.1]:43976 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W9ZFM-0000Fy-GP for submit@debbugs.gnu.org; Sat, 01 Feb 2014 07:00:28 -0500 Received: from mtaout29.012.net.il ([80.179.55.185]:54319) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W9ZFH-0000Fl-6n for 16448-done@debbugs.gnu.org; Sat, 01 Feb 2014 07:00:24 -0500 Received: from conversion-daemon.mtaout29.012.net.il by mtaout29.012.net.il (HyperSendmail v2007.08) id <0N0B00600FFD1200@mtaout29.012.net.il> for 16448-done@debbugs.gnu.org; Sat, 01 Feb 2014 14:02:01 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout29.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N0B00NVYG3DTMC0@mtaout29.012.net.il>; Sat, 01 Feb 2014 14:02:01 +0200 (IST) Date: Sat, 01 Feb 2014 14:00:04 +0200 From: Eli Zaretskii Subject: Re: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts In-reply-to: <83vbxl45rk.fsf@gnu.org> X-012-Sender: halo1@inter.net.il To: stselikh@gmail.com Message-id: <831tzn59h7.fsf@gnu.org> References: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> <52D60869.1000206@yandex.ru> <83vbxl45rk.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 16448-done Cc: 16448-done@debbugs.gnu.org, dmantipov@yandex.ru X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > Date: Wed, 15 Jan 2014 17:35:43 +0200 > From: Eli Zaretskii > Cc: 16448@debbugs.gnu.org, stselikh@gmail.com > > > Date: Wed, 15 Jan 2014 08:02:49 +0400 > > From: Dmitry Antipov > > Cc: 16448@debbugs.gnu.org > > > > On 01/15/2014 04:10 AM, Sergey Tselikh wrote: > > > > > In a script, when (error "...") instruction is executed with some UTF-8 > > > characters in its text, the message is not printed correctly. > > > > In batch mode, (error ...) is handled by external-debugging-output, and the > > latter just does: > > > > putc (XINT (character) & 0xFF, stderr); > > ^^^^^^ > > To allow multibyte sequences here, we should use something like: > > > > === modified file 'src/print.c' > > --- src/print.c 2014-01-01 07:43:34 +0000 > > +++ src/print.c 2014-01-15 03:55:39 +0000 > > @@ -709,8 +709,14 @@ > > to make it write to the debugging output. */) > > (Lisp_Object character) > > { > > + unsigned char str[MAX_MULTIBYTE_LENGTH]; > > + unsigned int ch; > > + ptrdiff_t len; > > + > > CHECK_NUMBER (character); > > - putc (XINT (character) & 0xFF, stderr); > > + ch = XINT (character); > > + len = CHAR_STRING (ch, str); > > + fwrite (str, len, 1, stderr); > > This will only work correctly in a UTF-8 locale. In the general case, > we need to run the resulting multibyte sequence through ENCODE_SYSTEM, > before writing it to stderr. Done in trunk revision 116232. From unknown Fri Aug 15 12:43:50 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 01 Mar 2014 12:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator