From unknown Fri Aug 15 15:34:17 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Resent-From: Sergey Tselikh Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 15 Jan 2014 00:19:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 16448 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 16448@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13897451059078 (code B ref -1); Wed, 15 Jan 2014 00:19:01 +0000 Received: (at submit) by debbugs.gnu.org; 15 Jan 2014 00:18:25 +0000 Received: from localhost ([127.0.0.1]:50794 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3EBc-0002MI-AY for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:18:25 -0500 Received: from eggs.gnu.org ([208.118.235.92]:59937) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3E7a-0002E4-EU for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3E7U-0004BY-Ew for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:14 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56333) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7U-0004BU-BE for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:08 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33422) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7O-0008By-7S for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:14:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3E7I-0004An-Bw for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:14:02 -0500 Received: from mail-lb0-x22c.google.com ([2a00:1450:4010:c04::22c]:46930) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7H-0004Aj-W7 for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:13:56 -0500 Received: by mail-lb0-f172.google.com with SMTP id c11so256414lbj.17 for ; Tue, 14 Jan 2014 16:13:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-transfer-encoding; bh=o31UQCco2E5LnhchPANyyhTmsV9FVwx1bZHSESfNJFo=; b=VtAX26PWLTuY7axGK2+wBsCpEORzOAF+BSuqm2DEGDmF4CG6iMLhaPX2wSn8pNmCV2 0Rj4hMUWzkMWKAa+qfFTJL9NRNxFFc1/89XqRYlhO33QibE9meBY28l+KP6ggWEc/AOo JnjTVJ3Am4LFLgAZY4S5XPMXof9qp/mylqcOLNk8w2ZhG3ORxJ5JKW6K+Qbh7fT/ZtSe GkEVgyolb6pEMHWNWQyCQlt8earR7lXP9ESvgmCVbeW93f8AnOWqOz9BR3qr53vWo1ON DtG4tyBLFvugCwLbwQpwnCU+D3V/gW/7CiT0XXV/gnLs8CANJRVopTDL4uO5d8p5xo+r Kklg== X-Received: by 10.152.219.133 with SMTP id po5mr2481515lac.34.1389744834398; Tue, 14 Jan 2014 16:13:54 -0800 (PST) Received: from laptop ([77.35.239.14]) by mx.google.com with ESMTPSA id z3sm1543097lag.10.2014.01.14.16.13.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Jan 2014 16:13:53 -0800 (PST) Date: Wed, 15 Jan 2014 11:10:09 +1100 From: Sergey Tselikh Message-Id: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.17; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Mailman-Approved-At: Tue, 14 Jan 2014 19:18:22 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hello. In a script, when (error "...") instruction is executed with some UTF-8 characters in its text, the message is not printed correctly. LANG environment variable is set to en_US.UTF-8 for all programs, my terminal is x11-terms/rxvt-unicode with adequate UTF-8 support, Emacs version is GNU Emacs 24.3.1. Examples (all of them are with LANG=en_US.UTF-8 in environment): $ cat error.el (message "hello привет") (message "привет hello") (error "hello привет") $ emacs --script error.el hello привет привет hello hello ?@825B But: $ emacs -nw --eval '(error "hello привет")' ^^^ successfully prints "hello привет" in minibuffer. This ?@825B is not some trash. Created a small table showing its origins (It is ``echo hello привет | print-bits | cat -t'' vs. ``echo hello привет | high-bits-01 | print-bits | cat -t''): h 01101000 | h 01101000 | e 01100101 | e 01100101 | l 01101100 | l 01101100 | l 01101100 | l 01101100 | o 01101111 | o 01101111 | 00100000 | 00100000 | M-P 11010000 | P 01010000 | M-? 10111111 | ? 00111111 | ? M-Q 11010001 | Q 01010001 | M-^@ 10000000 | @ 01000000 | @ M-P 11010000 | P 01010000 | M-8 10111000 | 8 00111000 | 8 M-P 11010000 | P 01010000 | M-2 10110010 | 2 00110010 | 2 M-P 11010000 | P 01010000 | M-5 10110101 | 5 00110101 | 5 M-Q 11010001 | Q 01010001 | M-^B 10000010 | B 01000010 | B More examples: $ cat any-other.el (error "cons:%s list:%s string:%s" (cons 'на 'речке) '(на речке на том бере) "be Быть beat Бить become Становиться begin Начинать bleed Кровоточить stung Жалить sweep Выметать swell Разбухать swim Плавать swing Качать take Брать, взять") $ emacs --script any-other.el cons:(=0 . @5G:5) list:(=0 @5G:5 =0 B>< 15@5) string:be KBL beat 8BL become !B0=>28BLAO begin 0G8=0BL bleed @>2>B>G8BL stung 0;8BL sweep K<5B0BL swell 071CE0BL swim ;020BL swing 0G0BL take @0BL, 27OBL $ cat ja.el (setq jstr "案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru.") (message "%s" jstr) (error "%s" jstr) $ emacs --script ja.el 案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru. HZ???#?LW Anzuru yori umu ga yasushi. ?moS_?? Deru kui wa utareru. In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.17) of 2013-10-10 on laptop Windowing system distributor `The X.Org Foundation', version 11.0.11403000 Configured using: `configure '--prefix=/usr' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--libdir=/usr/lib64' '--disable-silent-rules' '--disable-dependency-tracking' '--program-suffix=-emacs-24' '--infodir=/usr/share/info/emacs-24' '--enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp' '--with-crt-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../lib64' '--with-gameuser=games' '--without-compress-info' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus' '--with-gnutls' '--with-xml2' '--without-selinux' '--without-wide-int' '--with-sound' '--with-x' '--without-ns' '--with-gconf' '--without-gsettings' '--with-toolkit-scroll-bars' '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm' '--with-imagemagick' '--with-xft' '--with-libotf' '--with-m17n-flt' '--with-x-toolkit=gtk2' 'GENTOO_PACKAGE=app-editors/emacs-24.3-r2' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' 'CFLAGS=-pipe -march=corei7-avx -mno-aes -O2' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed' 'CPPFLAGS='' Important settings: value of $LC_COLLATE: C value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix default enable-multibyte-characters: t -- Sergey Tselikh From unknown Fri Aug 15 15:34:17 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Resent-From: Dmitry Antipov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 15 Jan 2014 04:03:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16448 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Sergey Tselikh Cc: 16448@debbugs.gnu.org Received: via spool by 16448-submit@debbugs.gnu.org id=B16448.1389758577656 (code B ref 16448); Wed, 15 Jan 2014 04:03:01 +0000 Received: (at 16448) by debbugs.gnu.org; 15 Jan 2014 04:02:57 +0000 Received: from localhost ([127.0.0.1]:50845 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3Hgu-0000AW-7o for submit@debbugs.gnu.org; Tue, 14 Jan 2014 23:02:56 -0500 Received: from forward5l.mail.yandex.net ([84.201.143.138]:43989) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3Hgq-0000AK-QY for 16448@debbugs.gnu.org; Tue, 14 Jan 2014 23:02:54 -0500 Received: from smtp3o.mail.yandex.net (smtp3o.mail.yandex.net [37.140.190.28]) by forward5l.mail.yandex.net (Yandex) with ESMTP id AA26DC40DEB; Wed, 15 Jan 2014 08:02:50 +0400 (MSK) Received: from smtp3o.mail.yandex.net (localhost [127.0.0.1]) by smtp3o.mail.yandex.net (Yandex) with ESMTP id 5603A1E0C65; Wed, 15 Jan 2014 08:02:50 +0400 (MSK) Received: from unknown (unknown [37.139.80.10]) by smtp3o.mail.yandex.net (nwsmtp/Yandex) with ESMTPSA id AzOq63yh62-2onaDWth; Wed, 15 Jan 2014 08:02:50 +0400 (using TLSv1 with cipher CAMELLIA256-SHA (256/256 bits)) (Client certificate not present) X-Yandex-Uniq: 53cae515-1f29-459c-91c1-2cfa7b407ac3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1389758570; bh=DY6tM7E3lkZVywa/t368nPdsjpRqkDO8OF6KJYB1ZO0=; h=Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject: References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=nK9g7M4E5zL4m9YXunx9NFf30I8cFxZGsdm+wuTEexH3wYcG4ZMAkfQCC7uXudQBG aUhGDXxjLoklmfSuiSQDcJWoFKFZ+Q4bL1OTiMGeUSeajpCHOPkxSQRmv+JqN2uAeq kFz/Xfo3+KrY3YnK1OpI9/dHW4spFuTEKk6yUFTk= Authentication-Results: smtp3o.mail.yandex.net; dkim=pass header.i=@yandex.ru Message-ID: <52D60869.1000206@yandex.ru> Date: Wed, 15 Jan 2014 08:02:49 +0400 From: Dmitry Antipov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 References: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> In-Reply-To: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 01/15/2014 04:10 AM, Sergey Tselikh wrote: > In a script, when (error "...") instruction is executed with some UTF-8 > characters in its text, the message is not printed correctly. In batch mode, (error ...) is handled by external-debugging-output, and the latter just does: putc (XINT (character) & 0xFF, stderr); ^^^^^^ To allow multibyte sequences here, we should use something like: === modified file 'src/print.c' --- src/print.c 2014-01-01 07:43:34 +0000 +++ src/print.c 2014-01-15 03:55:39 +0000 @@ -709,8 +709,14 @@ to make it write to the debugging output. */) (Lisp_Object character) { + unsigned char str[MAX_MULTIBYTE_LENGTH]; + unsigned int ch; + ptrdiff_t len; + CHECK_NUMBER (character); - putc (XINT (character) & 0xFF, stderr); + ch = XINT (character); + len = CHAR_STRING (ch, str); + fwrite (str, len, 1, stderr); #ifdef WINDOWSNT /* Send the output to a debugger (nothing happens if there isn't one). */ Dmitry From unknown Fri Aug 15 15:34:17 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 15 Jan 2014 15:36:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16448 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Dmitry Antipov Cc: 16448@debbugs.gnu.org, stselikh@gmail.com Reply-To: Eli Zaretskii Received: via spool by 16448-submit@debbugs.gnu.org id=B16448.138980015417511 (code B ref 16448); Wed, 15 Jan 2014 15:36:01 +0000 Received: (at 16448) by debbugs.gnu.org; 15 Jan 2014 15:35:54 +0000 Received: from localhost ([127.0.0.1]:52193 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3SVV-0004YM-SW for submit@debbugs.gnu.org; Wed, 15 Jan 2014 10:35:54 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:34127) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3SVS-0004Y9-5j for 16448@debbugs.gnu.org; Wed, 15 Jan 2014 10:35:51 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MZG009008N4BS00@a-mtaout20.012.net.il> for 16448@debbugs.gnu.org; Wed, 15 Jan 2014 17:35:48 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MZG009JF8NN0Y70@a-mtaout20.012.net.il>; Wed, 15 Jan 2014 17:35:48 +0200 (IST) Date: Wed, 15 Jan 2014 17:35:43 +0200 From: Eli Zaretskii In-reply-to: <52D60869.1000206@yandex.ru> X-012-Sender: halo1@inter.net.il Message-id: <83vbxl45rk.fsf@gnu.org> References: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> <52D60869.1000206@yandex.ru> X-Spam-Score: 1.0 (+) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > Date: Wed, 15 Jan 2014 08:02:49 +0400 > From: Dmitry Antipov > Cc: 16448@debbugs.gnu.org > > On 01/15/2014 04:10 AM, Sergey Tselikh wrote: > > > In a script, when (error "...") instruction is executed with some UTF-8 > > characters in its text, the message is not printed correctly. > > In batch mode, (error ...) is handled by external-debugging-output, and the > latter just does: > > putc (XINT (character) & 0xFF, stderr); > ^^^^^^ > To allow multibyte sequences here, we should use something like: > > === modified file 'src/print.c' > --- src/print.c 2014-01-01 07:43:34 +0000 > +++ src/print.c 2014-01-15 03:55:39 +0000 > @@ -709,8 +709,14 @@ > to make it write to the debugging output. */) > (Lisp_Object character) > { > + unsigned char str[MAX_MULTIBYTE_LENGTH]; > + unsigned int ch; > + ptrdiff_t len; > + > CHECK_NUMBER (character); > - putc (XINT (character) & 0xFF, stderr); > + ch = XINT (character); > + len = CHAR_STRING (ch, str); > + fwrite (str, len, 1, stderr); This will only work correctly in a UTF-8 locale. In the general case, we need to run the resulting multibyte sequence through ENCODE_SYSTEM, before writing it to stderr. Btw, the way we output text in this case cries for refactoring: we first assemble individual characters from their multibyte sequences, then pass those characters one by one to external-debugging-output, which will now have to unroll each character back into its multibyte sequence, and encode each character individually. Something for after the branch, I guess. From unknown Fri Aug 15 15:34:17 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Sergey Tselikh Subject: bug#16448: closed (Re: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts) Message-ID: References: <831tzn59h7.fsf@gnu.org> <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> X-Gnu-PR-Message: they-closed 16448 X-Gnu-PR-Package: emacs Reply-To: 16448@debbugs.gnu.org Date: Sat, 01 Feb 2014 12:01:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1391256062-1037-1" This is a multi-part message in MIME format... ------------=_1391256062-1037-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wron= gly in Emacs Lisp scripts which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 16448@debbugs.gnu.org. --=20 16448: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D16448 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1391256062-1037-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 16448-done) by debbugs.gnu.org; 1 Feb 2014 12:00:29 +0000 Received: from localhost ([127.0.0.1]:43976 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W9ZFM-0000Fy-GP for submit@debbugs.gnu.org; Sat, 01 Feb 2014 07:00:28 -0500 Received: from mtaout29.012.net.il ([80.179.55.185]:54319) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W9ZFH-0000Fl-6n for 16448-done@debbugs.gnu.org; Sat, 01 Feb 2014 07:00:24 -0500 Received: from conversion-daemon.mtaout29.012.net.il by mtaout29.012.net.il (HyperSendmail v2007.08) id <0N0B00600FFD1200@mtaout29.012.net.il> for 16448-done@debbugs.gnu.org; Sat, 01 Feb 2014 14:02:01 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout29.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0N0B00NVYG3DTMC0@mtaout29.012.net.il>; Sat, 01 Feb 2014 14:02:01 +0200 (IST) Date: Sat, 01 Feb 2014 14:00:04 +0200 From: Eli Zaretskii Subject: Re: bug#16448: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts In-reply-to: <83vbxl45rk.fsf@gnu.org> X-012-Sender: halo1@inter.net.il To: stselikh@gmail.com Message-id: <831tzn59h7.fsf@gnu.org> References: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> <52D60869.1000206@yandex.ru> <83vbxl45rk.fsf@gnu.org> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 16448-done Cc: 16448-done@debbugs.gnu.org, dmantipov@yandex.ru X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > Date: Wed, 15 Jan 2014 17:35:43 +0200 > From: Eli Zaretskii > Cc: 16448@debbugs.gnu.org, stselikh@gmail.com > > > Date: Wed, 15 Jan 2014 08:02:49 +0400 > > From: Dmitry Antipov > > Cc: 16448@debbugs.gnu.org > > > > On 01/15/2014 04:10 AM, Sergey Tselikh wrote: > > > > > In a script, when (error "...") instruction is executed with some UTF-8 > > > characters in its text, the message is not printed correctly. > > > > In batch mode, (error ...) is handled by external-debugging-output, and the > > latter just does: > > > > putc (XINT (character) & 0xFF, stderr); > > ^^^^^^ > > To allow multibyte sequences here, we should use something like: > > > > === modified file 'src/print.c' > > --- src/print.c 2014-01-01 07:43:34 +0000 > > +++ src/print.c 2014-01-15 03:55:39 +0000 > > @@ -709,8 +709,14 @@ > > to make it write to the debugging output. */) > > (Lisp_Object character) > > { > > + unsigned char str[MAX_MULTIBYTE_LENGTH]; > > + unsigned int ch; > > + ptrdiff_t len; > > + > > CHECK_NUMBER (character); > > - putc (XINT (character) & 0xFF, stderr); > > + ch = XINT (character); > > + len = CHAR_STRING (ch, str); > > + fwrite (str, len, 1, stderr); > > This will only work correctly in a UTF-8 locale. In the general case, > we need to run the resulting multibyte sequence through ENCODE_SYSTEM, > before writing it to stderr. Done in trunk revision 116232. ------------=_1391256062-1037-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 15 Jan 2014 00:18:25 +0000 Received: from localhost ([127.0.0.1]:50794 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3EBc-0002MI-AY for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:18:25 -0500 Received: from eggs.gnu.org ([208.118.235.92]:59937) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W3E7a-0002E4-EU for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:14 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3E7U-0004BY-Ew for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:14 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56333) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7U-0004BU-BE for submit@debbugs.gnu.org; Tue, 14 Jan 2014 19:14:08 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33422) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7O-0008By-7S for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:14:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3E7I-0004An-Bw for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:14:02 -0500 Received: from mail-lb0-x22c.google.com ([2a00:1450:4010:c04::22c]:46930) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3E7H-0004Aj-W7 for bug-gnu-emacs@gnu.org; Tue, 14 Jan 2014 19:13:56 -0500 Received: by mail-lb0-f172.google.com with SMTP id c11so256414lbj.17 for ; Tue, 14 Jan 2014 16:13:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-type :content-transfer-encoding; bh=o31UQCco2E5LnhchPANyyhTmsV9FVwx1bZHSESfNJFo=; b=VtAX26PWLTuY7axGK2+wBsCpEORzOAF+BSuqm2DEGDmF4CG6iMLhaPX2wSn8pNmCV2 0Rj4hMUWzkMWKAa+qfFTJL9NRNxFFc1/89XqRYlhO33QibE9meBY28l+KP6ggWEc/AOo JnjTVJ3Am4LFLgAZY4S5XPMXof9qp/mylqcOLNk8w2ZhG3ORxJ5JKW6K+Qbh7fT/ZtSe GkEVgyolb6pEMHWNWQyCQlt8earR7lXP9ESvgmCVbeW93f8AnOWqOz9BR3qr53vWo1ON DtG4tyBLFvugCwLbwQpwnCU+D3V/gW/7CiT0XXV/gnLs8CANJRVopTDL4uO5d8p5xo+r Kklg== X-Received: by 10.152.219.133 with SMTP id po5mr2481515lac.34.1389744834398; Tue, 14 Jan 2014 16:13:54 -0800 (PST) Received: from laptop ([77.35.239.14]) by mx.google.com with ESMTPSA id z3sm1543097lag.10.2014.01.14.16.13.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Jan 2014 16:13:53 -0800 (PST) Date: Wed, 15 Jan 2014 11:10:09 +1100 From: Sergey Tselikh To: bug-gnu-emacs@gnu.org Subject: 24.3; Messages from (error "...") with UTF-8 chars are printed wrongly in Emacs Lisp scripts Message-Id: <20140115111009.dc0d435fa9991c3e15816f84@gmail.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.17; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Tue, 14 Jan 2014 19:18:22 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hello. In a script, when (error "...") instruction is executed with some UTF-8 characters in its text, the message is not printed correctly. LANG environment variable is set to en_US.UTF-8 for all programs, my terminal is x11-terms/rxvt-unicode with adequate UTF-8 support, Emacs version is GNU Emacs 24.3.1. Examples (all of them are with LANG=en_US.UTF-8 in environment): $ cat error.el (message "hello привет") (message "привет hello") (error "hello привет") $ emacs --script error.el hello привет привет hello hello ?@825B But: $ emacs -nw --eval '(error "hello привет")' ^^^ successfully prints "hello привет" in minibuffer. This ?@825B is not some trash. Created a small table showing its origins (It is ``echo hello привет | print-bits | cat -t'' vs. ``echo hello привет | high-bits-01 | print-bits | cat -t''): h 01101000 | h 01101000 | e 01100101 | e 01100101 | l 01101100 | l 01101100 | l 01101100 | l 01101100 | o 01101111 | o 01101111 | 00100000 | 00100000 | M-P 11010000 | P 01010000 | M-? 10111111 | ? 00111111 | ? M-Q 11010001 | Q 01010001 | M-^@ 10000000 | @ 01000000 | @ M-P 11010000 | P 01010000 | M-8 10111000 | 8 00111000 | 8 M-P 11010000 | P 01010000 | M-2 10110010 | 2 00110010 | 2 M-P 11010000 | P 01010000 | M-5 10110101 | 5 00110101 | 5 M-Q 11010001 | Q 01010001 | M-^B 10000010 | B 01000010 | B More examples: $ cat any-other.el (error "cons:%s list:%s string:%s" (cons 'на 'речке) '(на речке на том бере) "be Быть beat Бить become Становиться begin Начинать bleed Кровоточить stung Жалить sweep Выметать swell Разбухать swim Плавать swing Качать take Брать, взять") $ emacs --script any-other.el cons:(=0 . @5G:5) list:(=0 @5G:5 =0 B>< 15@5) string:be KBL beat 8BL become !B0=>28BLAO begin 0G8=0BL bleed @>2>B>G8BL stung 0;8BL sweep K<5B0BL swell 071CE0BL swim ;020BL swing 0G0BL take @0BL, 27OBL $ cat ja.el (setq jstr "案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru.") (message "%s" jstr) (error "%s" jstr) $ emacs --script ja.el 案ずるより産むが易し。 Anzuru yori umu ga yasushi. 出る杭は打たれる。 Deru kui wa utareru. HZ???#?LW Anzuru yori umu ga yasushi. ?moS_?? Deru kui wa utareru. In GNU Emacs 24.3.1 (x86_64-pc-linux-gnu, GTK+ Version 2.24.17) of 2013-10-10 on laptop Windowing system distributor `The X.Org Foundation', version 11.0.11403000 Configured using: `configure '--prefix=/usr' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--libdir=/usr/lib64' '--disable-silent-rules' '--disable-dependency-tracking' '--program-suffix=-emacs-24' '--infodir=/usr/share/info/emacs-24' '--enable-locallisppath=/etc/emacs:/usr/share/emacs/site-lisp' '--with-crt-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.7.3/../../../../lib64' '--with-gameuser=games' '--without-compress-info' '--without-hesiod' '--without-kerberos' '--without-kerberos5' '--with-gpm' '--with-dbus' '--with-gnutls' '--with-xml2' '--without-selinux' '--without-wide-int' '--with-sound' '--with-x' '--without-ns' '--with-gconf' '--without-gsettings' '--with-toolkit-scroll-bars' '--with-gif' '--with-jpeg' '--with-png' '--with-rsvg' '--with-tiff' '--with-xpm' '--with-imagemagick' '--with-xft' '--with-libotf' '--with-m17n-flt' '--with-x-toolkit=gtk2' 'GENTOO_PACKAGE=app-editors/emacs-24.3-r2' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' 'CFLAGS=-pipe -march=corei7-avx -mno-aes -O2' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed' 'CPPFLAGS='' Important settings: value of $LC_COLLATE: C value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix default enable-multibyte-characters: t -- Sergey Tselikh ------------=_1391256062-1037-1--