From unknown Mon Jun 16 23:28:59 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#1215 <1215@debbugs.gnu.org> To: bug#1215 <1215@debbugs.gnu.org> Subject: Status: 23.0.60; unibyte->multibyte conversion problem (in search-forward and friends) Reply-To: bug#1215 <1215@debbugs.gnu.org> Date: Tue, 17 Jun 2025 06:28:59 +0000 retitle 1215 23.0.60; unibyte->multibyte conversion problem (in search-forw= ard and friends) reassign 1215 emacs submitter 1215 "Eduardo Ochs" severity 1215 normal thanks From eduardoochs@gmail.com Tue Oct 21 09:01:05 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=-7.9 required=4.0 tests=BAYES_00,FOURLA, RCVD_IN_DNSWL_MED autolearn=ham version=3.2.3-bugs.debian.org_2005_01_02 Received: (at submit) by emacsbugs.donarmstrong.com; 21 Oct 2008 16:01:06 +0000 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m9LG11rE027813 for ; Tue, 21 Oct 2008 09:01:02 -0700 Received: from mail.gnu.org ([199.232.76.166]:48356 helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1KsJcj-0007nd-4z for emacs-pretest-bug@gnu.org; Tue, 21 Oct 2008 11:58:21 -0400 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1KsJfH-0007oA-S4 for emacs-pretest-bug@gnu.org; Tue, 21 Oct 2008 12:01:00 -0400 Received: from yw-out-1718.google.com ([74.125.46.152]:19517) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KsJfH-0007o0-Cs for emacs-pretest-bug@gnu.org; Tue, 21 Oct 2008 12:00:59 -0400 Received: by yw-out-1718.google.com with SMTP id 9so438154ywk.66 for ; Tue, 21 Oct 2008 09:00:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=D+bhlpNAHlVhjywWPmLgOMW1HFq22EuOV2ecZd4Yn0c=; b=to1gzvuJYN2gAcKODYp8AkLuepKgGaIHC2wGNtHvb3zBpBTY699OEGDJbfSyVd3aLW AITpGp5sTxNCXXW/gCIpic0XlledCCfrtZel+XORiCZYm/celUAVk6DVZ9LdpVr62DdE 8nB24QkTIGJCzM06Us4u0ETww5HPs5pu8rnrc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition; b=VaTEjz0ynzXxwLLi/69bzLZlYbZhxt1Rv0wo6BpMnw2KWo/qM/mMD+4A8aEhR6cARD Bzao4doLAM8p6TcN1jdgLt6Ccl2rz2FtPEObGzbcNy5TPtDZ34AwjEQ2mDDGyLmYz4hE wKIEAHyutzs47uvNITve1r4o0AEC/mMI7St2g= Received: by 10.90.83.2 with SMTP id g2mr9060466agb.7.1224604858134; Tue, 21 Oct 2008 09:00:58 -0700 (PDT) Received: by 10.90.98.4 with HTTP; Tue, 21 Oct 2008 09:00:58 -0700 (PDT) Message-ID: Date: Tue, 21 Oct 2008 12:00:58 -0400 From: "Eduardo Ochs" To: emacs-pretest-bug@gnu.org Subject: 23.0.60; unibyte->multibyte conversion problem (in search-forward and friends) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Hello, this may not be exactly a bug, I'm just struggling with an obscure part of Emacs... anyway, I did my best to make this look like a nice bug report, and to make the tests clear enough to help other people who also find unibyte<->multibyte conversions obscure... The short story =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Let me refer to strings like "<>" - where the "<<" and ">>" stand for guillemets, i.e., the characters that we type with `C-x 8 <' and `C-x 8 >' - as "anchors". So: if I produce an anchor string in a unibyte buffer and then I search for an occurrence of that string in multibyte buffer, the search fails. The two small blocks below illustrate this. Instructions: save the first one to "/tmp/1.txt", the second one to "/tmp/2.txt", and then run: (load-file "/tmp/1.txt") It will show "uni" in the "*Messages*" buffer, and the search will fail. The detailed message about the failure of the search will be like this: progn: Search failed: "\302\253foo\302\273" meaning the anchor string has been incorrectly converted. ;;--------snip,snip-------- ;; -*- coding: raw-text-unix -*- ;; (save-this-block-as "/tmp/1.txt") (progn (find-file "/tmp/2.txt") (goto-char (point-min)) (setq anchorstr "=ABfoo=BB") (message (if (multibyte-string-p anchorstr) "multi" "uni")) (search-forward anchorstr)) ;;--------snip,snip-------- ;;--------snip,snip-------- ;; -*- coding: latin-1 -*- ;; (save-this-block-as "/tmp/2.txt") (search-forward "=ABfoo=BB") ;; =ABfoo=BB ;;--------snip,snip-------- The long story =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Save the block below as "/tmp/3.txt" and follow the instructions in it. Note that it doesn't have any non-ascii characters - the anchors are produced by running the "(insert ...)" sexps. ;;--------snip,snip-------- ;; -*- coding: latin-1 -*- ;; (save-this-block-as "/tmp/3.txt") ;; Run the "progn" below with C-x C-e. ;; It will create a line like this: ;; <>\253anchor\273\253anchor\273\253anchor\273 ;; (but the "<<", ">>", "\253", "\273" are single characters). ;; Don't delete that line, it will be used later. ;; (progn (defun mmb (str) (string-make-multibyte str)) (defun mub (str) (string-make-unibyte str)) (insert 171 "anchor" 187) (insert "\253anchor\273") (insert (mub "\253anchor\273")) (insert (mmb (mub "\253anchor\273"))) ) ;; Now try to save this file. ;; Emacs will complain about the "\253"s and "\273"s - it will ;; say that iso-latin-1-unix and utf-8-unix cannot encode them. ;; The "<<" and ">>" are ok, though... ;; ;; So: leave the "<>" above, delete the "\253anchor\273"s, ;; save this file, and reload it. DON'T SKIP THIS STEP - the ;; charset properties mentioned below behave differently before ;; and after reloads, and I don't know exactly the mechanics of ;; this... 8-\ ;; ;; If we inspect the "<<", ">>" "\253", "\273" with `C-x =3D' ;; we see this: ;; Char: << (171, #o253, #xab, file #xAB) ;; Char: >> (187, #o273, #xbb, file #xBB) ;; Char: \253 (4194219, #o17777653, #x3fffab, raw-byte) ;; Char: \253 (4194235, #o17777673, #x3fffbb, raw-byte) ;; ;; Now mark the "<>" above and copy it to the top of ;; the kill ring with `M-w'. Let's examine the results of ;; several obvious ways to (re)create the "<>" ;; above as a string... ;; Here are some of the results: ;; ;; "\253anchor\273" =3D=3D> "<>" ;; (mub "\253anchor\273") =3D=3D> "<>" ;; (mmb (mub "\253anchor\273")) =3D=3D> "\253anchor\273" ;; (car kill-ring) =3D=3D> ;; #("<>" 0 8 (charset iso-8859-1)) ;; (mub (car kill-ring)) =3D=3D> "<>" ;; (mmb (mub (car kill-ring))) =3D=3D> "\253anchor\273" "\253anchor\273" (mub "\253anchor\273") (mmb (mub "\253anchor\273")) (mub (mmb (mub "\253anchor\273"))) (mapcar 'identity "\253anchor\273") (mapcar 'identity (mub "\253anchor\273")) (mapcar 'identity (mmb (mub "\253anchor\273"))) (car kill-ring) (mub (car kill-ring)) (mmb (mub (car kill-ring))) (mapcar 'identity (car kill-ring)) (mapcar 'identity (mub (car kill-ring))) (mapcar 'identity (mmb (mub (car kill-ring)))) ;; This is the weird part. ;; Let's insert another "<>"/"\253anchor\273" pair, and ;; let's try to jump to its "anchors" with `search-backward'. (insert 171 "anchor" 187 "\n\253anchor\273") (search-backward "\253anchor\273") (search-backward (mub "\253anchor\273")) (search-backward (mmb (mub "\253anchor\273"))) (search-backward (car kill-ring)) (search-backward (mub (car kill-ring))) (search-backward (mmb (mub (car kill-ring)))) ;; Only "(search-backward (car kill-ring))" jumps to ;; "<>" - all the others jump to "\253anchor\273". ;; The trick - aha! - is that "(car kill-ring)" holds this ;; string, ;; ;; (car kill-ring) =3D=3D> ;; #("<>" 0 8 (charset iso-8859-1)) ;; ;; and the "(charset iso-8859-1)" property is essential... ;;--------snip,snip-------- What is the standard way to convert unibyte strings (for example anchor strings, generated from code in raw-text-unix ".el" files) to strings with the right charset property (if needed) and the right encoding? I couldn't find the functions for that... Cheers, thanks in advance, Eduardo Ochs eduardoochs at gmail.com http://angg.twu.net/ P.S.: (emacs-version) =3D=3D> "GNU Emacs 23.0.60.1 (i686-pc-linux-gnu, GTK+ Version 2.8.20) of 2008-10-11 on dekooning" From monnier@iro.umontreal.ca Wed Oct 22 07:51:45 2008 X-Spam-Checker-Version: SpamAssassin 3.2.3-bugs.debian.org_2005_01_02 (2007-08-08) on rzlab.ucr.edu X-Spam-Level: X-Spam-Status: No, score=-9.0 required=4.0 tests=AWL,BAYES_00,HAS_BUG_NUMBER, RCVD_IN_DNSWL_MED autolearn=ham version=3.2.3-bugs.debian.org_2005_01_02 Received: (at submit) by emacsbugs.donarmstrong.com; 22 Oct 2008 14:51:45 +0000 Received: from fencepost.gnu.org (fencepost.gnu.org [140.186.70.10]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id m9MEpdjb012386 for ; Wed, 22 Oct 2008 07:51:40 -0700 Received: from mail.gnu.org ([199.232.76.166]:45437 helo=mx10.gnu.org) by fencepost.gnu.org with esmtp (Exim 4.67) (envelope-from ) id 1Ksf17-0008Jd-Hs for emacs-pretest-bug@gnu.org; Wed, 22 Oct 2008 10:48:57 -0400 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1Ksf3f-0001YZ-Fz for emacs-pretest-bug@gnu.org; Wed, 22 Oct 2008 10:51:38 -0400 Received: from ironport2-out.pppoe.ca ([206.248.154.182]:4263 helo=ironport2-out.teksavvy.com) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Ksf3e-0001X9-RE for emacs-pretest-bug@gnu.org; Wed, 22 Oct 2008 10:51:35 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtoEAPLa/khFxLrB/2dsb2JhbACBcsRYg0+BCw X-IronPort-AV: E=Sophos;i="4.33,464,1220241600"; d="scan'208";a="28714461" Received: from 69-196-186-193.dsl.teksavvy.com (HELO pastel.home) ([69.196.186.193]) by ironport2-out.teksavvy.com with ESMTP; 22 Oct 2008 10:51:29 -0400 Received: by pastel.home (Postfix, from userid 20848) id 5844E8101; Wed, 22 Oct 2008 10:51:29 -0400 (EDT) From: Stefan Monnier To: Eduardo Ochs Cc: 1215@debbugs.gnu.org, emacs-pretest-bug@gnu.org Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in search-forward and friends) Message-ID: References: Date: Wed, 22 Oct 2008 10:51:29 -0400 In-Reply-To: (Eduardo Ochs's message of "Tue, 21 Oct 2008 12:00:58 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by monty-python.gnu.org: Genre and OS details not recognized. X-CrossAssassin-Score: 2 > Let me refer to strings like "<>" - where the "<<" and ">>" stand > for guillemets, i.e., the characters that we type with `C-x 8 <' and > `C-x 8 >' - as "anchors". So: if I produce an anchor string in a > unibyte buffer and then I search for an occurrence of that string in > multibyte buffer, the search fails. There are no guillemets in unibyte buffers. > ;;--------snip,snip-------- > ;; -*- coding: raw-text-unix -*- > ;; (save-this-block-as "/tmp/1.txt") > (progn > (find-file "/tmp/2.txt") > (goto-char (point-min)) > (setq anchorstr "=ABfoo=BB") > (message (if (multibyte-string-p anchorstr) "multi" "uni")) > (search-forward anchorstr)) There's a bug here, indeed: Emacs should refuse to save such a file, because raw-text-unix (to which I prefer to refer as `binary') cannot encode =AB and =BB. Stefan From lekktu@gmail.com Thu Jan 15 16:19:59 2009 Received: (at 1215) by emacsbugs.donarmstrong.com; 16 Jan 2009 00:19:59 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mail-ew0-f21.google.com (mail-ew0-f21.google.com [209.85.219.21]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0G0JtYG010947 for <1215@emacsbugs.donarmstrong.com>; Thu, 15 Jan 2009 16:19:57 -0800 Received: by ewy14 with SMTP id 14so2795866ewy.1 for <1215@emacsbugs.donarmstrong.com>; Thu, 15 Jan 2009 16:19:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=nqFS3CJHKRh3ZAmMwFXlr9JBlnTHjO7yOcLmNV8hyPE=; b=UF0i+fpl84cMVJM1uG+nQuXbjCq3fPDVu+1VqnvGGNR2auVOSjsmZNvtx+qSxWZZpT PcJPhyZw2FRulxRMvi+VFA5ETDPUaIHPyvzj7Wex/yl8sgAgtkYTAQY1QvIhm9fTp4L6 /mb6AQbOIDyakNmhZ7yb+1Qjge1ma8MvgnygI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=I/A0dA5esHvqXj7M2t54YFvr5K61ihF3jgV4KDc7HDsEqDxlqS+fhBqsoPQgLZUhlo CN0tTrkSKU7rweBT9sLxw4uVtHehbo83/vLhuwOoWe1Dds3cCmdJcqluuoMC1tN1F5iX JQhH1bR5D+bToX8fA7evqcTAKuZDDUGOZ0gPs= MIME-Version: 1.0 Received: by 10.210.135.17 with SMTP id i17mr324778ebd.46.1232065190557; Thu, 15 Jan 2009 16:19:50 -0800 (PST) In-Reply-To: References: Date: Fri, 16 Jan 2009 01:19:50 +0100 Message-ID: Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in search-forward and friends) From: Juanma Barranquero To: Stefan Monnier Cc: Eduardo Ochs , 1215@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, Oct 22, 2008 at 15:51, Stefan Monnier wr= ote: > There's a bug here, indeed: Emacs should refuse to save such a file, > because raw-text-unix (to which I prefer to refer as `binary') cannot > encode =C2=AB and =C2=BB. Why not? =C2=AB is U+00AB and =C2=BB is U+00BB. (with-temp-file "/temp/guillemets.txt" (set-buffer-multibyte nil) (setq buffer-file-coding-system 'raw-text-unix) (insert ?=C2=AB "Test" ?=C2=BB ?\n)) =3D> 0000 0000 ab 54 65 73 74 bb 0a =C2=BDTest=E2= =95=97. Juanma From monnier@iro.umontreal.ca Thu Jan 15 18:47:12 2009 Received: (at 1215) by emacsbugs.donarmstrong.com; 16 Jan 2009 02:47:12 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.5 required=4.0 tests=HAS_BUG_NUMBER,XIRONPORT autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ironport2-out.teksavvy.com (ironport2-out.teksavvy.com [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0G2l9hE021958 for <1215@emacsbugs.donarmstrong.com>; Thu, 15 Jan 2009 18:47:11 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAMODb0lMCpxj/2dsb2JhbACBbMt/hXGBdQ X-IronPort-AV: E=Sophos;i="4.37,274,1231131600"; d="scan'208";a="32383716" Received: from 76-10-156-99.dsl.teksavvy.com (HELO pastel.home) ([76.10.156.99]) by ironport2-out.teksavvy.com with ESMTP; 15 Jan 2009 21:47:04 -0500 Received: by pastel.home (Postfix, from userid 20848) id 304A87F41; Thu, 15 Jan 2009 21:47:04 -0500 (EST) From: Stefan Monnier To: Juanma Barranquero Cc: Eduardo Ochs , 1215@debbugs.gnu.org Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in Message-ID: References: Date: Thu, 15 Jan 2009 21:47:04 -0500 In-Reply-To: (Juanma Barranquero's message of "Fri, 16 Jan 2009 01:19:50 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable >> There's a bug here, indeed: Emacs should refuse to save such a file, >> because raw-text-unix (to which I prefer to refer as `binary') cannot >> encode =AB and =BB. > Why not? =AB is U+00AB and =BB is U+00BB. Neither of which is a byte. The byte 0xAB is the Emacs character #x3fffab, as shown by (unibyte-char-to-multibyte #xab). If you save that file and read it back in, you'll see that its content has changed. `save-buffer' should not silently save if it will lose information. Stefan From lekktu@gmail.com Thu Jan 15 18:59:18 2009 Received: (at 1215) by emacsbugs.donarmstrong.com; 16 Jan 2009 02:59:18 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mail-bw0-f11.google.com (mail-bw0-f11.google.com [209.85.218.11]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0G2xEhQ024807 for <1215@emacsbugs.donarmstrong.com>; Thu, 15 Jan 2009 18:59:16 -0800 Received: by bwz4 with SMTP id 4so495204bwz.1 for <1215@emacsbugs.donarmstrong.com>; Thu, 15 Jan 2009 18:59:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZNWVZueIMW7YTdv3Blh2YI7cAE/aPyY3HVLw8IbFK08=; b=a9kC+SKySE1LvuA2uK+eJozKFhzU/oQTizxrSxRU2L4jtlYjJ714RONNcHLWS5SeTK VcGCnM7qdUU3LQPXKls0lPSMPzEpeX3c5mqAXyTE5etBAlVBlJbYyp7LmcawKX1s43Om gJyX64wnUKAIDJPKeQFYSsEKXwQCYsfI3g8cg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=HNpk50nFOVv4+HlZPNtXqC1ytasfgldQh9u/ai/Fz4F3GVFTPPJkGoL+/aXxFPrGQT NZjXvJKfgguslMjSPcctqcILa12rH1T2qpkRIdlfOsUrQjTlyKKNg+MP4hjQeWf5BUi+ znaTX1o8Se6eLdxDROWqkc59MOu0Q+O1Dd2JA= MIME-Version: 1.0 Received: by 10.223.108.140 with SMTP id f12mr62352fap.23.1232074747273; Thu, 15 Jan 2009 18:59:07 -0800 (PST) In-Reply-To: References: Date: Fri, 16 Jan 2009 03:59:07 +0100 Message-ID: Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in From: Juanma Barranquero To: Stefan Monnier Cc: Eduardo Ochs , 1215@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable > If you save that file and read it back in, you'll see that its content > has changed. Sorry, but I don't see that. emacs -Q then I evaluate this: (with-temp-file "/temp/guillemets.txt" (set-buffer-multibyte nil) (setq buffer-file-coding-system 'raw-text-unix) (insert ?=C2=AB "Test" ?=C2=BB ?\n)) then C-x C-f /temp/guillemets.txt I get a buffer guillemets.txt with =C2=ABTest=C2=BB as a multibyte file in iso-latin-1-unix. I can modify it and save it, and still the guillemets are bytes 0xab and 0xbb in the resulting file. Juanma From monnier@iro.umontreal.ca Thu Jan 15 19:38:04 2009 Received: (at 1215) by emacsbugs.donarmstrong.com; 16 Jan 2009 03:38:04 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.5 required=4.0 tests=HAS_BUG_NUMBER,XIRONPORT autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ironport2-out.teksavvy.com (ironport2-out.teksavvy.com [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0G3c0xg003960 for <1215@emacsbugs.donarmstrong.com>; Thu, 15 Jan 2009 19:38:01 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAH2Pb0lMCpxj/2dsb2JhbACBbMwKhXGBdQ X-IronPort-AV: E=Sophos;i="4.37,274,1231131600"; d="scan'208";a="32387157" Received: from 76-10-156-99.dsl.teksavvy.com (HELO pastel.home) ([76.10.156.99]) by ironport2-out.teksavvy.com with ESMTP; 15 Jan 2009 22:37:55 -0500 Received: by pastel.home (Postfix, from userid 20848) id A02A87F41; Thu, 15 Jan 2009 22:37:54 -0500 (EST) From: Stefan Monnier To: Juanma Barranquero Cc: Eduardo Ochs , 1215@debbugs.gnu.org Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in Message-ID: References: Date: Thu, 15 Jan 2009 22:37:54 -0500 In-Reply-To: (Juanma Barranquero's message of "Fri, 16 Jan 2009 03:59:07 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable > Sorry, but I don't see that. > emacs -Q > then I evaluate this: > (with-temp-file "/temp/guillemets.txt" > (set-buffer-multibyte nil) > (setq buffer-file-coding-system 'raw-text-unix) > (insert ?=AB "Test" ?=BB ?\n)) You're cheating: remove the (set-buffer-multibyte nil). Otherwise you're not actually inserting the ?=AB char but the #xAB byte instead. Stefan From lekktu@gmail.com Fri Jan 16 03:09:05 2009 Received: (at 1215) by emacsbugs.donarmstrong.com; 16 Jan 2009 11:09:06 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.175]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0GB8xeX005170 for <1215@emacsbugs.donarmstrong.com>; Fri, 16 Jan 2009 03:09:00 -0800 Received: by ug-out-1314.google.com with SMTP id 17so26505ugm.14 for <1215@emacsbugs.donarmstrong.com>; Fri, 16 Jan 2009 03:08:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=7h6e+taaFNkNuCQe8idF5mZ6RP9R53VfrnzIxt2xIxA=; b=t3rsGmrkfof6T1XqiKzd20yk7INk/36UnApLyVWUs/08dUl5BYbH6Bur7DPWDLxzRC 0/MOYbGJKxpSNtll0dZBH2Z2WJz01tmZ8vGCb7bzQJvfnDwfcMcAQDSSw0hd1O+l59tZ e+9GMULrLNkFRmpS4xtPUp/e9nkrjNpfnj1qI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=cnzoDchw8cxIZQs2oAwib+rdJrSDjRCWwZKcg/3/yffGg7VxFmdaeATnh4gktMy+FZ OJ0EjYJrMNJ07jwCGp8yeBK06sbh4+nmmf4ph/Ne76qygtrxiYBk5r5qdLJogoqA2h9o MiCaBBoepBqy/fScWl0CytIzTIKACl8Dyr+BQ= MIME-Version: 1.0 Received: by 10.210.35.17 with SMTP id i17mr2984568ebi.140.1232104138613; Fri, 16 Jan 2009 03:08:58 -0800 (PST) In-Reply-To: References: Date: Fri, 16 Jan 2009 12:08:58 +0100 Message-ID: Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in From: Juanma Barranquero To: Stefan Monnier Cc: Eduardo Ochs , 1215@debbugs.gnu.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Jan 16, 2009 at 04:37, Stefan Monnier wr= ote: > You're cheating: remove the (set-buffer-multibyte nil). > Otherwise you're not actually inserting the ?=C2=AB char but the #xAB > byte instead. OK, I see. You said: "There's a bug here, indeed: Emacs should refuse to save such a file, because raw-text-unix (to which I prefer to refer as `binary') cannot encode =C2=AB and =C2=BB." but according to raw-text-unix's description: t -- raw-text-unix Raw text, which means text contains random 8-bit codes. Encoding text with this coding system produces the actual byte sequence of the text in buffers and strings. An exception is made for eight-bit-control characters. Each of them is encoded into a single byte. you can save (almost) anything with it. What is the bug? Juanma From monnier@iro.umontreal.ca Fri Jan 16 12:56:54 2009 Received: (at 1215) by emacsbugs.donarmstrong.com; 16 Jan 2009 20:56:54 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.5 required=4.0 tests=HAS_BUG_NUMBER,XIRONPORT autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from ironport2-out.teksavvy.com (ironport2-out.pppoe.ca [206.248.154.182]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0GKunWw007223 for <1215@emacsbugs.donarmstrong.com>; Fri, 16 Jan 2009 12:56:51 -0800 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AkQFAEKDcElMCpxj/2dsb2JhbACBbMwEhXKBfw X-IronPort-AV: E=Sophos;i="4.37,278,1231131600"; d="scan'208";a="32429343" Received: from 76-10-156-99.dsl.teksavvy.com (HELO pastel.home) ([76.10.156.99]) by ironport2-out.teksavvy.com with ESMTP; 16 Jan 2009 15:56:44 -0500 Received: by pastel.home (Postfix, from userid 20848) id 15A727F97; Fri, 16 Jan 2009 15:56:44 -0500 (EST) From: Stefan Monnier To: Juanma Barranquero Cc: Eduardo Ochs , 1215@debbugs.gnu.org Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in Message-ID: References: Date: Fri, 16 Jan 2009 15:56:44 -0500 In-Reply-To: (Juanma Barranquero's message of "Fri, 16 Jan 2009 12:08:58 +0100") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii > but according to raw-text-unix's description: > t -- raw-text-unix > Raw text, which means text contains random 8-bit codes. > Encoding text with this coding system produces the actual byte > sequence of the text in buffers and strings. An exception is made for > eight-bit-control characters. Each of them is encoded into a single > byte. > you can save (almost) anything with it. What is the bug? The bug is that you can currently save (almost) anything with it. This is due to historical reasons, where different notions of "no encoding" were mixed up. So on save, raw-text-unix behaves pretty much like utf-8-mule under Emacs-23 and emacs-mule under Emacs-22. On load, it behaves pretty much like `binary'. Stefan From eliz@gnu.org Sat Jan 17 02:10:24 2009 Received: (at 1215) by emacsbugs.donarmstrong.com; 17 Jan 2009 10:10:25 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-3.0 required=4.0 tests=HAS_BUG_NUMBER autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from mtaout1.012.net.il (mtaout1.012.net.il [84.95.2.1]) by rzlab.ucr.edu (8.13.8/8.13.8/Debian-3) with ESMTP id n0HAAKQU007118 for <1215@emacsbugs.donarmstrong.com>; Sat, 17 Jan 2009 02:10:22 -0800 Received: from conversion-daemon.i-mtaout1.012.net.il by i-mtaout1.012.net.il (HyperSendmail v2007.08) id <0KDM001001K7WI00@i-mtaout1.012.net.il> for 1215@emacsbugs.donarmstrong.com; Sat, 17 Jan 2009 12:10:26 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.144.144]) by i-mtaout1.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0KDM00LR91LDLS00@i-mtaout1.012.net.il>; Sat, 17 Jan 2009 12:10:26 +0200 (IST) Date: Sat, 17 Jan 2009 12:10:17 +0200 From: Eli Zaretskii Subject: Re: bug#1215: 23.0.60; unibyte->multibyte conversion problem (in In-reply-to: X-012-Sender: halo1@inter.net.il To: Stefan Monnier , 1215@debbugs.gnu.org Cc: lekktu@gmail.com Reply-to: Eli Zaretskii Message-id: References: > From: Stefan Monnier > Date: Fri, 16 Jan 2009 15:56:44 -0500 > Cc: 1215@emacsbugs.donarmstrong.com > > > but according to raw-text-unix's description: > > > t -- raw-text-unix > > > Raw text, which means text contains random 8-bit codes. > > Encoding text with this coding system produces the actual byte > > sequence of the text in buffers and strings. An exception is made for > > eight-bit-control characters. Each of them is encoded into a single > > byte. > > > you can save (almost) anything with it. What is the bug? > > The bug is that you can currently save (almost) anything with it. This is > due to historical reasons, where different notions of "no encoding" were > mixed up. So on save, raw-text-unix behaves pretty much like > utf-8-mule under Emacs-23 and emacs-mule under Emacs-22. On load, it > behaves pretty much like `binary'. I documented this in the ELisp manual. From cyd@stupidchicken.com Wed Jul 8 07:04:31 2009 Received: (at control) by emacsbugs.donarmstrong.com; 8 Jul 2009 14:04:31 +0000 X-Spam-Checker-Version: SpamAssassin 3.2.5-bugs.debian.org_2005_01_02 (2008-06-10) on rzlab.ucr.edu X-Spam-Level: X-Spam-Bayes: score:0.5 Bayes not run. spammytokens:Tokens not available. hammytokens:Tokens not available. X-Spam-Status: No, score=-0.4 required=4.0 tests=AWL autolearn=ham version=3.2.5-bugs.debian.org_2005_01_02 Received: from pantheon-po31.its.yale.edu (pantheon-po31.its.yale.edu [130.132.50.82]) by rzlab.ucr.edu (8.14.3/8.14.3/Debian-5) with ESMTP id n68E4RML029464 for ; Wed, 8 Jul 2009 07:04:29 -0700 Received: from furry (dhcp128036014241.central.yale.edu [128.36.14.241]) (authenticated bits=0) by pantheon-po31.its.yale.edu (8.12.11.20060308/8.12.11) with ESMTP id n68E4MxP004683 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 8 Jul 2009 10:04:22 -0400 Received: by furry (Postfix, from userid 1000) id A99E5C09B; Wed, 8 Jul 2009 10:04:22 -0400 (EDT) From: Chong Yidong To: control@debbugs.gnu.org Subject: close 1215 Date: Wed, 08 Jul 2009 10:04:22 -0400 Message-ID: <87k52j6zkp.fsf@stupidchicken.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-YaleITSMailFilter: Version 1.2c (attachment(s) not renamed) close 1215 thanks From unknown Mon Jun 16 23:28:59 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: $requester Subject: Internal Control Message-Id: bug archived. Date: Sun, 09 Aug 2009 14:24:12 +0000 User-Agent: Fakemail v42.6.9 # A New Hope # A log time ago, in a galaxy far, far away # something happened. # # Magically this resulted in the following # action being taken, but this fake control # message doesn't tell you why it happened # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator