From unknown Wed Jun 18 23:11:44 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#13505 <13505@debbugs.gnu.org> To: bug#13505 <13505@debbugs.gnu.org> Subject: Status: Bug#696026: emacs24: file corruption on saving Reply-To: bug#13505 <13505@debbugs.gnu.org> Date: Thu, 19 Jun 2025 06:11:44 +0000 retitle 13505 Bug#696026: emacs24: file corruption on saving reassign 13505 emacs submitter 13505 Rob Browning severity 13505 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 19 23:10:34 2013 Received: (at submit) by debbugs.gnu.org; 20 Jan 2013 04:10:34 +0000 Received: from localhost ([127.0.0.1]:40944 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TwmEr-0003XJ-Pk for submit@debbugs.gnu.org; Sat, 19 Jan 2013 23:10:34 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43586) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TwmEp-0003XB-1o for submit@debbugs.gnu.org; Sat, 19 Jan 2013 23:10:32 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TwmDt-0004f6-JP for submit@debbugs.gnu.org; Sat, 19 Jan 2013 23:09:34 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-101.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD, USER_IN_WHITELIST autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:35797) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TwmDt-0004f2-GT for submit@debbugs.gnu.org; Sat, 19 Jan 2013 23:09:33 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54948) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TwmDs-0004RE-7Q for bug-gnu-emacs@gnu.org; Sat, 19 Jan 2013 23:09:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TwmDq-0004ek-Cz for bug-gnu-emacs@gnu.org; Sat, 19 Jan 2013 23:09:32 -0500 Received: from defaultvalue.org ([70.85.129.156]:36527) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TwmDq-0004eg-6h for bug-gnu-emacs@gnu.org; Sat, 19 Jan 2013 23:09:30 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id 4821A90D24; Sat, 19 Jan 2013 22:14:07 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 4FE4114E078; Sat, 19 Jan 2013 22:09:28 -0600 (CST) From: Rob Browning To: bug-gnu-emacs@gnu.org Subject: Re: Bug#696026: emacs24: file corruption on saving References: <20121215223809.GA7549@xvii.vinc17.org> Date: Sat, 19 Jan 2013 22:09:28 -0600 In-Reply-To: <20121215223809.GA7549@xvii.vinc17.org> (Vincent Lefevre's message of "Sat, 15 Dec 2012 23:38:09 +0100") Message-ID: <877gn8ijgn.fsf@trouble.defaultvalue.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit Cc: 696026-forwarded@bugs.debian.org, Vincent Lefevre , 696026@bugs.debian.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) (If possible, please preserve the *-forwarded address in any replies.) The following bug was reported to Debian. I've tested both the Debian emacs24 package, and current upstream emacs-24, as of: Author: Leo Liu Date: Sat Jan 19 02:35:44 2013 +0800 Prune erroneous values in dired-get-marked-files In both cases, I was able to reproduce the reported issue. Please let me know if I can provide further information. Vincent Lefevre writes: > Package: emacs24 > Version: 24.2+1-1 > Severity: grave > Justification: causes non-serious data loss > > The file "file1" (attached) has the following contents: > > 00000000 6c e2 80 99 c3 a9 0a 74 65 73 74 e9 0a |l......test..| > > 1. Open "file1" with "emacs -Q". It is regarded as > an in-is13194-devanagari-unix file. > > 2. Type M-: (set-buffer-modified-p t) to mark the buffer as modified > (so that one can save it). > > 3. Save the file with C-x C-s. It is proposed: > > [...] > Select one of the safe coding systems listed below, > or cancel the writing with C-g and edit the buffer > to remove or modify the problematic characters, > or specify any other coding system (and risk losing > the problematic characters). > > raw-text emacs-mule no-conversion > > 4. Choose raw-text (the default) or no-conversion. One can assume > that the file will not be modified. But it gets corrupted: one > obtains a file "file2" (attached) with the following contents: > > 00000000 6c e0 a5 88 80 99 e0 a4 a5 e0 a4 8a 0a 74 65 73 |l............tes| > 00000010 74 e0 a4 bc 0a |t....| > > Note: Actually "file1" has mixed UTF-8 and ISO-8859-1 contents due to > a user error. But due to this bug, an attempt to fix the problem with > Emacs makes things even worse! BTW, I had the same problem in the past > when attempting to edit an mbox file with Emacs (in this case, having > mixed UTF-8 and ISO-8859-1 contents is normal). How Emacs interprets > such contents doesn't matter, but by default, it mustn't corrupt the > file on saving. > > There is no such problem with GNU Emacs 23.4.1 (Debian package > emacs23 23.4+1-4). > > -- System Information: > Debian Release: 7.0 > APT prefers unstable > APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') > Architecture: amd64 (x86_64) > > Kernel: Linux 3.5-trunk-amd64 (SMP w/2 CPU cores) > Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /bin/dash > > Versions of packages emacs24 depends on: > ii emacs24-bin-common 24.2+1-1 > ii gconf-service 3.2.5-1+build1 > ii libasound2 1.0.25-4 > ii libatk1.0-0 2.4.0-2 > ii libc6 2.13-37 > ii libcairo2 1.12.2-2 > ii libdbus-1-3 1.6.8-1 > ii libfontconfig1 2.9.0-7.1 > ii libfreetype6 2.4.9-1 > ii libgconf-2-4 3.2.5-1+build1 > ii libgdk-pixbuf2.0-0 2.26.1-1 > ii libgif4 4.1.6-10 > ii libglib2.0-0 2.33.12+really2.32.4-3 > ii libgnutls26 2.12.20-2 > ii libgomp1 4.7.2-4 > ii libgpm2 1.20.4-6 > ii libgtk2.0-0 2.24.10-2 > ii libice6 2:1.0.8-2 > ii libjpeg8 8d-1 > ii libm17n-0 1.6.3-2 > ii libmagickcore5 8:6.7.7.10-5 > ii libmagickwand5 8:6.7.7.10-5 > ii libncurses5 5.9-10 > ii libotf0 0.9.12-2 > ii libpango1.0-0 1.30.0-1 > ii libpng12-0 1.2.49-3 > ii librsvg2-2 2.36.1-1 > ii libselinux1 2.1.9-5 > ii libsm6 2:1.2.1-2 > ii libtiff4 3.9.6-9 > ii libtinfo5 5.9-10 > ii libx11-6 2:1.5.0-1 > ii libxft2 2.3.1-1 > ii libxml2 2.8.0+dfsg1-7 > ii libxpm4 1:3.5.10-1 > ii libxrender1 1:0.9.7-1 > ii zlib1g 1:1.2.7.dfsg-13 > > emacs24 recommends no packages. > > Versions of packages emacs24 suggests: > ii emacs24-common-non-dfsg 24.2+1-1 > > -- no debconf information -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 11:50:53 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 16:50:53 +0000 Received: from localhost ([127.0.0.1]:41719 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Twy6f-0005S6-6p for submit@debbugs.gnu.org; Sun, 20 Jan 2013 11:50:53 -0500 Received: from mtaout21.012.net.il ([80.179.55.169]:47700) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Twy6c-0005Rx-LH for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 11:50:51 -0500 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MGX00A00NY1R700@a-mtaout21.012.net.il> for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 18:49:26 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGX00A3LO2DP930@a-mtaout21.012.net.il>; Sun, 20 Jan 2013 18:49:25 +0200 (IST) Date: Sun, 20 Jan 2013 18:49:38 +0200 From: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving In-reply-to: <877gn8ijgn.fsf@trouble.defaultvalue.org> X-012-Sender: halo1@inter.net.il To: Rob Browning , Kenichi Handa Message-id: <83obgjpzod.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, vincent@vinc17.net, 696026@bugs.debian.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > From: Rob Browning > Date: Sat, 19 Jan 2013 22:09:28 -0600 > Cc: 696026-forwarded@bugs.debian.org, Vincent Lefevre , > 696026@bugs.debian.org > > Vincent Lefevre writes: > > > Package: emacs24 > > Version: 24.2+1-1 > > Severity: grave > > Justification: causes non-serious data loss > > > > The file "file1" (attached) has the following contents: > > > > 00000000 6c e2 80 99 c3 a9 0a 74 65 73 74 e9 0a |l......test..| > > > > 1. Open "file1" with "emacs -Q". It is regarded as > > an in-is13194-devanagari-unix file. > > > > 2. Type M-: (set-buffer-modified-p t) to mark the buffer as modified > > (so that one can save it). > > > > 3. Save the file with C-x C-s. It is proposed: > > > > [...] > > Select one of the safe coding systems listed below, > > or cancel the writing with C-g and edit the buffer > > to remove or modify the problematic characters, > > or specify any other coding system (and risk losing > > the problematic characters). > > > > raw-text emacs-mule no-conversion > > > > 4. Choose raw-text (the default) or no-conversion. One can assume > > that the file will not be modified. But it gets corrupted: one > > obtains a file "file2" (attached) with the following contents: > > > > 00000000 6c e0 a5 88 80 99 e0 a4 a5 e0 a4 8a 0a 74 65 73 |l............tes| > > 00000010 74 e0 a4 bc 0a |t....| > > > > Note: Actually "file1" has mixed UTF-8 and ISO-8859-1 contents due to > > a user error. But due to this bug, an attempt to fix the problem with > > Emacs makes things even worse! BTW, I had the same problem in the past > > when attempting to edit an mbox file with Emacs (in this case, having > > mixed UTF-8 and ISO-8859-1 contents is normal). How Emacs interprets > > such contents doesn't matter, but by default, it mustn't corrupt the > > file on saving. > > > > There is no such problem with GNU Emacs 23.4.1 (Debian package > > emacs23 23.4+1-4). First, this isn't really a regression: Emacs 23 has the same "problem". It's just that Emacs 23 doesn't autodetect in-is13194-devanagari in this file, while Emacs 24 does. If you say "C-x RET c raw-text RET C-x C-f" to visit this file in Emacs 24, the problem will be gone, which is exactly what happens in Emacs 23, because it visits the file in raw-text to begin with. Conversely, if you use "C-x RET c in-is13194-devanagari RET C-x C-f" to visit the file in Emacs 23, you will get the same "problem" saving it. I didn't research the reason why Emacs 24 autodetects this encoding, and whether this is on purpose. Perhaps Handa-san could tell. More to the point: there seems to be a fundamental misunderstanding here regarding the effect of selecting an encoding at save time. It sounds like the OP thought that selecting a "literal" encoding, such as raw-text, which is supposed to leave the binary stream unaltered (apart of the EOL format), will ensure that a buffer will be saved exactly as it was originally found on disk. But this is false. What raw-text and no-conversion do is to write out the _internal_ representation of each character without any conversions. The original encoded form of the characters as found on disk at visit time _cannot_ be recovered by saving with raw-text, because that encoded form is lost without a trace when the file is _visited_ and decoded into the internal representation. The only information that's left is the coding-system used to decode the characters. But since the file's encoding in this case is inconsistent, that coding-system cannot be used to save it back (Emacs will not let you do so, as demonstrated in the report), and therefore the original form cannot be recovered this way. What the user should do to avoid this data loss is prevent the incorrect decoding of the file's contents when the file is visited. To this end, the file should be visited with no-conversion or raw-text, using "C-x RET c raw-text RET C-x C-f". Then it will be possible to repair the file and write it back using the same raw-text encoding. If the fact that the file's encoding is inconsistent is not realized until some time after the file is visited, the user should use "C-x RET r raw-text RET" to re-visit the file using raw-text. IOW, only selecting the appropriate encoding _at_visit_time_ can prevent data loss in these cases. The expectation that "Emacs mustn't corrupt the file on saving" when the file has inconsistent encoding and was decoded with anything but raw-text or no-conversion is unjustified. Personally, I don't think there's a bug here. It's a cockpit error. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 12:32:15 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 17:32:15 +0000 Received: from localhost ([127.0.0.1]:41754 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Twykg-0006RT-KO for submit@debbugs.gnu.org; Sun, 20 Jan 2013 12:32:14 -0500 Received: from defaultvalue.org ([70.85.129.156]:55886) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Twyke-0006RJ-Ky for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 12:32:13 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id 74A8190D24; Sun, 20 Jan 2013 11:35:53 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 6A11614E078; Sun, 20 Jan 2013 11:31:12 -0600 (CST) From: Rob Browning To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> Date: Sun, 20 Jan 2013 11:31:12 -0600 In-Reply-To: <83obgjpzod.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 20 Jan 2013 18:49:38 +0200") Message-ID: <87ehhf69sv.fsf@trouble.defaultvalue.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, Kenichi Handa , vincent@vinc17.net, 696026@bugs.debian.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Eli Zaretskii writes: > More to the point: there seems to be a fundamental misunderstanding > here regarding the effect of selecting an encoding at save time. It > sounds like the OP thought that selecting a "literal" encoding, such > as raw-text, which is supposed to leave the binary stream unaltered > (apart of the EOL format), will ensure that a buffer will be saved > exactly as it was originally found on disk. But this is false. What > raw-text and no-conversion do is to write out the _internal_ > representation of each character without any conversions. The > original encoded form of the characters as found on disk at visit time > _cannot_ be recovered by saving with raw-text, because that encoded > form is lost without a trace when the file is _visited_ and decoded > into the internal representation. The only information that's left is > the coding-system used to decode the characters. But since the file's > encoding in this case is inconsistent, that coding-system cannot be > used to save it back (Emacs will not let you do so, as demonstrated in > the report), and therefore the original form cannot be recovered this > way. Ahh, right; that make sense to me. -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 15:25:16 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 20:25:16 +0000 Received: from localhost ([127.0.0.1]:41849 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx1S8-00027u-Ks for submit@debbugs.gnu.org; Sun, 20 Jan 2013 15:25:16 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:57867) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx1S6-00027m-4z for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 15:25:14 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1Tx1R7-0004DR-FN; Sun, 20 Jan 2013 15:24:13 -0500 From: Glenn Morris To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> X-Spook: nuclear FBI Saudi Arabia Treasury red noise Firefly Rubin X-Ran: 53V4'0W:>=w<8Hvq[j8wa$_xKQWRDpLIg<2?;,`x;N8a{w7QT~!Y_8jL[w{O/AgD_#`|Aa X-Hue: black X-Debbugs-No-Ack: yes X-Attribution: GM Date: Sun, 20 Jan 2013 15:24:13 -0500 In-Reply-To: <83obgjpzod.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 20 Jan 2013 18:49:38 +0200") Message-ID: User-Agent: Gnus (www.gnus.org), GNU Emacs (www.gnu.org/software/emacs/) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 13505 Cc: 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) Eli Zaretskii wrote: > Personally, I don't think there's a bug here. It's a cockpit error. Does this also apply to http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13377 I already marked that important, forwarded it, and suggested it was related to the original Debian report in this issue, but have received no response. Please follow up to 13377 if you have any info. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 16:26:16 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 21:26:16 +0000 Received: from localhost ([127.0.0.1]:41868 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx2P9-0003ck-IS for submit@debbugs.gnu.org; Sun, 20 Jan 2013 16:26:16 -0500 Received: from vinc17.pck.nerim.net ([213.41.242.187]:50605 helo=smtp-xvii.vinc17.net) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx2P4-0003cY-Ou for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 16:26:14 -0500 Received: by xvii.vinc17.org (Postfix, from userid 1000) id C120431000A; Sun, 20 Jan 2013 22:25:08 +0100 (CET) Date: Sun, 20 Jan 2013 22:25:08 +0100 From: Vincent Lefevre To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving Message-ID: <20130120212508.GF2695@xvii.vinc17.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <83obgjpzod.fsf@gnu.org> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.21-6290-vl-r57386 (2013-01-17) Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, Kenichi Handa , 696026@bugs.debian.org, Rob Browning , 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.0 (/) On 2013-01-20 18:49:38 +0200, Eli Zaretskii wrote: > Personally, I don't think there's a bug here. It's a cockpit error. Perhaps it isn't a bug at save time. But then, selecting a lossy encoding by default when visiting the file is the bug (and really a regression), particularly if this isn't clearly told to the user. Actually this is related, since the lossy encoding becomes a real problem only at save time (and for copy-paste I assume, though the file doesn't get overwritten by that). --=20 Vincent Lef=E8vre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 16:41:33 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 21:41:33 +0000 Received: from localhost ([127.0.0.1]:41886 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx2dw-00041H-WE for submit@debbugs.gnu.org; Sun, 20 Jan 2013 16:41:33 -0500 Received: from mtaout21.012.net.il ([80.179.55.169]:42087) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx2dt-000415-IB for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 16:41:30 -0500 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MGY00C001CE4T00@a-mtaout21.012.net.il> for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 23:40:00 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGY00BEZ1IORF90@a-mtaout21.012.net.il>; Sun, 20 Jan 2013 23:40:00 +0200 (IST) Date: Sun, 20 Jan 2013 23:40:14 +0200 From: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving In-reply-to: <20130120212508.GF2695@xvii.vinc17.org> X-012-Sender: halo1@inter.net.il To: Vincent Lefevre Message-id: <83bocjpm81.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > Date: Sun, 20 Jan 2013 22:25:08 +0100 > From: Vincent Lefevre > Cc: Rob Browning , Kenichi Handa , > 13505@debbugs.gnu.org, 696026-forwarded@bugs.debian.org, > 696026@bugs.debian.org > > On 2013-01-20 18:49:38 +0200, Eli Zaretskii wrote: > > Personally, I don't think there's a bug here. It's a cockpit error. > > Perhaps it isn't a bug at save time. But then, selecting a lossy > encoding by default when visiting the file is the bug (and really > a regression), particularly if this isn't clearly told to the user. The encoding isn't lossy. In any case, I don't really understand your proposal. Suppose the file was indeed encoded in in-is13194-devanagari, would you argue then that selecting it would be incorrect or undesirable behavior? > Actually this is related, since the lossy encoding becomes a real > problem only at save time (and for copy-paste I assume, though the > file doesn't get overwritten by that). It is only a problem when you try to save or otherwise output it (e.g., send in an email). But what you should do then is "C-x RET r raw-text RET", and recover. That is the only way to avoid corruption in files that use inconsistent encoding. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 17:11:14 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 22:11:15 +0000 Received: from localhost ([127.0.0.1]:41897 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx36g-0004jP-F1 for submit@debbugs.gnu.org; Sun, 20 Jan 2013 17:11:14 -0500 Received: from vinc17.pck.nerim.net ([213.41.242.187]:61276 helo=smtp-xvii.vinc17.net) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx36c-0004jD-PM for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 17:11:13 -0500 Received: by xvii.vinc17.org (Postfix, from userid 1000) id 5EE7C31000A; Sun, 20 Jan 2013 23:10:08 +0100 (CET) Date: Sun, 20 Jan 2013 23:10:08 +0100 From: Vincent Lefevre To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving Message-ID: <20130120221007.GG2695@xvii.vinc17.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <83bocjpm81.fsf@gnu.org> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.21-6290-vl-r57386 (2013-01-17) Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 2013-01-20 23:40:14 +0200, Eli Zaretskii wrote: > > Date: Sun, 20 Jan 2013 22:25:08 +0100 > > From: Vincent Lefevre > > Cc: Rob Browning , Kenichi Handa , > > 13505@debbugs.gnu.org, 696026-forwarded@bugs.debian.org, > > 696026@bugs.debian.org > >=20 > > On 2013-01-20 18:49:38 +0200, Eli Zaretskii wrote: > > > Personally, I don't think there's a bug here. It's a cockpit error= . > >=20 > > Perhaps it isn't a bug at save time. But then, selecting a lossy > > encoding by default when visiting the file is the bug (and really > > a regression), particularly if this isn't clearly told to the user. >=20 > The encoding isn't lossy. You said: | The original encoded form of the characters as found on disk at | visit time _cannot_ be recovered by saving with raw-text, because | that encoded form is lost without a trace when the file is _visited_ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | and decoded into the internal representation. This is what lossy is. On the opposite, the utf-8 encoding doesn't seem to be lossy: Emacs seems to handle files with invalid UTF-8 sequences without any loss. So, this encoding is safe, even if Emacs wrongly guess the encoding. > In any case, I don't really understand your proposal. Suppose the > file was indeed encoded in in-is13194-devanagari, would you argue then > that selecting it would be incorrect or undesirable behavior? If Emacs modifies the contents when saving the file, it would be incorrect. > > Actually this is related, since the lossy encoding becomes a real > > problem only at save time (and for copy-paste I assume, though the > > file doesn't get overwritten by that). >=20 > It is only a problem when you try to save or otherwise output it > (e.g., send in an email). >=20 > But what you should do then is "C-x RET r raw-text RET", and recover. > That is the only way to avoid corruption in files that use > inconsistent encoding. But Emacs should clearly tell the user what to do after C-x C-s and clearly say when there can be data loss. Currently it says: "These default coding systems were tried to encode text in the buffer `file1': (in-is13194-devanagari-unix (2 . 2376) (3 . 4194176) (4 . 4194201) (5 . 2341) (6 . 2314) (12 . 2364)) (utf-8-unix (3 . 4194176) (4 . 4194201)) However, each of them encountered characters it couldn't encode: in-is13194-devanagari-unix cannot encode these: [...] utf-8-unix cannot encode these: [...]" This shouldn't be regarded as a problem by the user, because if Emacs could read and interpret the file (and such characters have not been added by the user), it should be able to save it. Then Emacs says: "Select one of the safe coding systems listed below [...]", but doesn't say that something has already been lost. So, the words "safe coding systems" are really misleading. --=20 Vincent Lef=E8vre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 17:23:15 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 22:23:15 +0000 Received: from localhost ([127.0.0.1]:41902 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx3II-00050z-Pz for submit@debbugs.gnu.org; Sun, 20 Jan 2013 17:23:15 -0500 Received: from vinc17.pck.nerim.net ([213.41.242.187]:64095 helo=smtp-xvii.vinc17.net) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx3IG-00050r-Jp for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 17:23:13 -0500 Received: by xvii.vinc17.org (Postfix, from userid 1000) id 7F80B31000A; Sun, 20 Jan 2013 23:22:11 +0100 (CET) Date: Sun, 20 Jan 2013 23:22:11 +0100 From: Vincent Lefevre To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving Message-ID: <20130120222211.GH2695@xvii.vinc17.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20130120221007.GG2695@xvii.vinc17.org> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.21-6290-vl-r57386 (2013-01-17) Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 2013-01-20 23:10:08 +0100, Vincent Lefevre wrote: > But Emacs should clearly tell the user what to do after C-x C-s and > clearly say when there can be data loss. Currently it says: [...] In fact, I fear that this may not be sufficient, because some data loss silently occurs when visiting the file. If after the decoding, it appears that there are no problematic characters (is this possible?), the user would be able to save the file without any message from Emacs. --=20 Vincent Lef=E8vre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 18:02:58 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 23:02:58 +0000 Received: from localhost ([127.0.0.1]:41917 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx3uk-000606-AV for submit@debbugs.gnu.org; Sun, 20 Jan 2013 18:02:58 -0500 Received: from mail-out.m-online.net ([212.18.0.9]:40071) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx3uh-0005zx-Dp for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 18:02:57 -0500 Received: from frontend1.mail.m-online.net (unknown [192.168.8.180]) by mail-out.m-online.net (Postfix) with ESMTP id 3YqBGn3nYTz4KK3y; Mon, 21 Jan 2013 00:01:53 +0100 (CET) Received: from localhost (dynscan1.mnet-online.de [192.168.6.68]) by mail.m-online.net (Postfix) with ESMTP id 3YqBGn3P6lzbbgD; Mon, 21 Jan 2013 00:01:53 +0100 (CET) X-Virus-Scanned: amavisd-new at mnet-online.de Received: from mail.mnet-online.de ([192.168.8.180]) by localhost (dynscan1.mail.m-online.net [192.168.6.68]) (amavisd-new, port 10024) with ESMTP id NgUTDGJ-AF6V; Mon, 21 Jan 2013 00:01:23 +0100 (CET) X-Auth-Info: koDAt4R0goj7nRX6sBqPuVkbeZsf+YC6qZJz2BKp19I= Received: from igel.home (ppp-88-217-97-0.dynamic.mnet-online.de [88.217.97.0]) by mail.mnet-online.de (Postfix) with ESMTPA; Mon, 21 Jan 2013 00:01:51 +0100 (CET) Received: by igel.home (Postfix, from userid 501) id 58072CA2A1; Mon, 21 Jan 2013 00:01:50 +0100 (CET) From: Andreas Schwab To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> X-Yow: I'm protected by a ROLL-ON I rented from AVIS.. Date: Mon, 21 Jan 2013 00:01:49 +0100 In-Reply-To: <83obgjpzod.fsf@gnu.org> (Eli Zaretskii's message of "Sun, 20 Jan 2013 18:49:38 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2.92 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13505 Cc: 696026@bugs.debian.org, 696026-forwarded@bugs.debian.org, 13505@debbugs.gnu.org, Kenichi Handa , vincent@vinc17.net, Rob Browning X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Eli Zaretskii writes: > I didn't research the reason why Emacs 24 autodetects this encoding, > and whether this is on purpose. It's a bug, fixed now. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 18:28:46 2013 Received: (at 13505) by debbugs.gnu.org; 20 Jan 2013 23:28:47 +0000 Received: from localhost ([127.0.0.1]:41922 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx4Ji-0006b7-KX for submit@debbugs.gnu.org; Sun, 20 Jan 2013 18:28:46 -0500 Received: from defaultvalue.org ([70.85.129.156]:56057) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx4Jf-0006ay-GV for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 18:28:44 -0500 Received: from trouble.defaultvalue.org (localhost [127.0.0.1]) (Authenticated sender: rlb@defaultvalue.org) by defaultvalue.org (Postfix) with ESMTPSA id 1659190D24; Sun, 20 Jan 2013 17:32:24 -0600 (CST) Received: by trouble.defaultvalue.org (Postfix, from userid 1000) id 1F68B14E078; Sun, 20 Jan 2013 17:27:41 -0600 (CST) From: Rob Browning To: Andreas Schwab Subject: Re: Bug#696026: bug#13505: Bug#696026: emacs24: file corruption on saving References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> Date: Sun, 20 Jan 2013 17:27:40 -0600 In-Reply-To: (Andreas Schwab's message of "Mon, 21 Jan 2013 00:01:49 +0100") Message-ID: <87wqv7juz7.fsf@trouble.defaultvalue.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13505 Cc: 696026@bugs.debian.org, control@bugs.debian.org, 696026-forwarded@bugs.debian.org, 13505@debbugs.gnu.org, Kenichi Handa , vincent@vinc17.net, Eli Zaretskii X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Andreas Schwab writes: > Eli Zaretskii writes: > >> I didn't research the reason why Emacs 24 autodetects this encoding, >> and whether this is on purpose. > > It's a bug, fixed now. Great, and thanks. -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 22:49:07 2013 Received: (at 13505) by debbugs.gnu.org; 21 Jan 2013 03:49:07 +0000 Received: from localhost ([127.0.0.1]:42160 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx8Nf-0005Yr-2U for submit@debbugs.gnu.org; Sun, 20 Jan 2013 22:49:07 -0500 Received: from mtaout23.012.net.il ([80.179.55.175]:38922) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx8Nb-0005Yg-LV for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 22:49:05 -0500 Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0MGY00B00IETMV00@a-mtaout23.012.net.il> for 13505@debbugs.gnu.org; Mon, 21 Jan 2013 05:48:00 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGY00BU8IJZHO20@a-mtaout23.012.net.il>; Mon, 21 Jan 2013 05:48:00 +0200 (IST) Date: Mon, 21 Jan 2013 05:48:14 +0200 From: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving In-reply-to: <20130120221007.GG2695@xvii.vinc17.org> X-012-Sender: halo1@inter.net.il To: Vincent Lefevre Message-id: <83a9s3p56p.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > Date: Sun, 20 Jan 2013 23:10:08 +0100 > From: Vincent Lefevre > Cc: rlb@defaultvalue.org, handa@gnu.org, 13505@debbugs.gnu.org, > 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org > > On 2013-01-20 23:40:14 +0200, Eli Zaretskii wrote: > > > Date: Sun, 20 Jan 2013 22:25:08 +0100 > > > From: Vincent Lefevre > > > Cc: Rob Browning , Kenichi Handa , > > > 13505@debbugs.gnu.org, 696026-forwarded@bugs.debian.org, > > > 696026@bugs.debian.org > > > > > > On 2013-01-20 18:49:38 +0200, Eli Zaretskii wrote: > > > > Personally, I don't think there's a bug here. It's a cockpit error. > > > > > > Perhaps it isn't a bug at save time. But then, selecting a lossy > > > encoding by default when visiting the file is the bug (and really > > > a regression), particularly if this isn't clearly told to the user. > > > > The encoding isn't lossy. > > You said: > > | The original encoded form of the characters as found on disk at > | visit time _cannot_ be recovered by saving with raw-text, because > | that encoded form is lost without a trace when the file is _visited_ > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > | and decoded into the internal representation. > > This is what lossy is. In that sense, every encoding except no-conversion is lossy. > On the opposite, the utf-8 encoding doesn't seem to be lossy: Emacs > seems to handle files with invalid UTF-8 sequences without any loss. > So, this encoding is safe, even if Emacs wrongly guess the encoding. No, it isn't, although you could get away with it most of the time. > But Emacs should clearly tell the user what to do after C-x C-s and > clearly say when there can be data loss. At save time, "data loss" is wrt what's in the buffer. In that sense, the encodings Emacs suggested don't lose any data. > Then Emacs says: "Select one of the safe coding systems listed below > [...]", but doesn't say that something has already been lost. So, the > words "safe coding systems" are really misleading. It's misleading because you misunderstand what is "safe" at buffer save time. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 22:50:06 2013 Received: (at 13505) by debbugs.gnu.org; 21 Jan 2013 03:50:06 +0000 Received: from localhost ([127.0.0.1]:42164 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx8Oc-0005aV-K9 for submit@debbugs.gnu.org; Sun, 20 Jan 2013 22:50:06 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:55190) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx8OZ-0005aH-Un for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 22:50:04 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MGY00F00ICT4Q00@a-mtaout20.012.net.il> for 13505@debbugs.gnu.org; Mon, 21 Jan 2013 05:49:01 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGY00EVZILOQF50@a-mtaout20.012.net.il>; Mon, 21 Jan 2013 05:49:01 +0200 (IST) Date: Mon, 21 Jan 2013 05:49:15 +0200 From: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving In-reply-to: <20130120222211.GH2695@xvii.vinc17.org> X-012-Sender: halo1@inter.net.il To: Vincent Lefevre Message-id: <838v7np550.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> <20130120222211.GH2695@xvii.vinc17.org> X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > Date: Sun, 20 Jan 2013 23:22:11 +0100 > From: Vincent Lefevre > Cc: rlb@defaultvalue.org, handa@gnu.org, 13505@debbugs.gnu.org, > 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org > > On 2013-01-20 23:10:08 +0100, Vincent Lefevre wrote: > > But Emacs should clearly tell the user what to do after C-x C-s and > > clearly say when there can be data loss. Currently it says: > [...] > > In fact, I fear that this may not be sufficient, because some data > loss silently occurs when visiting the file. Exactly! > If after the decoding, it appears that there are no problematic > characters (is this possible?), the user would be able to save the > file without any message from Emacs. I don't know how to do that within the framework of Emacs handling of non-ASCII text. From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 20 23:15:17 2013 Received: (at 13505) by debbugs.gnu.org; 21 Jan 2013 04:15:17 +0000 Received: from localhost ([127.0.0.1]:42183 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx8my-0006B3-Jd for submit@debbugs.gnu.org; Sun, 20 Jan 2013 23:15:17 -0500 Received: from vinc17.pck.nerim.net ([213.41.242.187]:57804 helo=smtp-xvii.vinc17.net) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tx8mw-0006Au-Cu for 13505@debbugs.gnu.org; Sun, 20 Jan 2013 23:15:15 -0500 Received: by xvii.vinc17.org (Postfix, from userid 1000) id B2C1C31001B; Mon, 21 Jan 2013 05:14:10 +0100 (CET) Date: Mon, 21 Jan 2013 05:14:10 +0100 From: Vincent Lefevre To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving Message-ID: <20130121041410.GJ2695@xvii.vinc17.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> <83a9s3p56p.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <83a9s3p56p.fsf@gnu.org> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.21-6290-vl-r57386 (2013-01-17) Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 2013-01-21 05:48:14 +0200, Eli Zaretskii wrote: > > You said: > >=20 > > | The original encoded form of the characters as found on disk at > > | visit time _cannot_ be recovered by saving with raw-text, because > > | that encoded form is lost without a trace when the file is _visited= _ > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > | and decoded into the internal representation. > >=20 > > This is what lossy is. >=20 > In that sense, every encoding except no-conversion is lossy. Even 8-bit encodings such as latin-1? > > On the opposite, the utf-8 encoding doesn't seem to be lossy: Emacs > > seems to handle files with invalid UTF-8 sequences without any loss. > > So, this encoding is safe, even if Emacs wrongly guess the encoding. >=20 > No, it isn't, although you could get away with it most of the time. Could you give an example where one loses data with the utf-8 encoding? > > But Emacs should clearly tell the user what to do after C-x C-s and > > clearly say when there can be data loss. >=20 > At save time, "data loss" is wrt what's in the buffer. In that sense, > the encodings Emacs suggested don't lose any data. "data loss" is the difference between the original file and the saved file. > > Then Emacs says: "Select one of the safe coding systems listed below > > [...]", but doesn't say that something has already been lost. So, the > > words "safe coding systems" are really misleading. >=20 > It's misleading because you misunderstand what is "safe" at buffer > save time. No, it's misleading because Emacs didn't say that data were lost when visiting the file. --=20 Vincent Lef=E8vre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 21 12:56:53 2013 Received: (at 13505) by debbugs.gnu.org; 21 Jan 2013 17:56:53 +0000 Received: from localhost ([127.0.0.1]:43388 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxLc3-0002Fg-So for submit@debbugs.gnu.org; Mon, 21 Jan 2013 12:56:53 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:64521) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxLc0-0002FU-S2 for 13505@debbugs.gnu.org; Mon, 21 Jan 2013 12:56:50 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MGZ00300LQ3W100@a-mtaout22.012.net.il> for 13505@debbugs.gnu.org; Mon, 21 Jan 2013 19:55:05 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MGZ003GJLRSL9A0@a-mtaout22.012.net.il>; Mon, 21 Jan 2013 19:55:05 +0200 (IST) Date: Mon, 21 Jan 2013 19:55:20 +0200 From: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving In-reply-to: <20130121041410.GJ2695@xvii.vinc17.org> X-012-Sender: halo1@inter.net.il To: Vincent Lefevre Message-id: <83vcaqo1yv.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> <83a9s3p56p.fsf@gnu.org> <20130121041410.GJ2695@xvii.vinc17.org> X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > Date: Mon, 21 Jan 2013 05:14:10 +0100 > From: Vincent Lefevre > Cc: rlb@defaultvalue.org, handa@gnu.org, 13505@debbugs.gnu.org, > 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org > > On 2013-01-21 05:48:14 +0200, Eli Zaretskii wrote: > > > You said: > > > > > > | The original encoded form of the characters as found on disk at > > > | visit time _cannot_ be recovered by saving with raw-text, because > > > | that encoded form is lost without a trace when the file is _visited_ > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > | and decoded into the internal representation. > > > > > > This is what lossy is. > > > > In that sense, every encoding except no-conversion is lossy. > > Even 8-bit encodings such as latin-1? Yes. When latin-1 characters are decoded (as part of visiting a file), they are converted to the internal representation, and cease to be single 8-bit bytes. > > > On the opposite, the utf-8 encoding doesn't seem to be lossy: Emacs > > > seems to handle files with invalid UTF-8 sequences without any loss. > > > So, this encoding is safe, even if Emacs wrongly guess the encoding. > > > > No, it isn't, although you could get away with it most of the time. > > Could you give an example where one loses data with the utf-8 encoding? E.g., in your test file, the byte whose value is 0x80 is converted to 0x3fff80 when the file is read into a buffer. Perhaps by "lossless" you mean "reversible", in the sense that saving the same buffer will perform the reverse conversion. In that case, even the in-is13194-devanagari-unix is reversible: if you type this encoding when Emacs prompts you to select one of the coding systems, then you get the same file on disk with no corruption whatsoever. > > > But Emacs should clearly tell the user what to do after C-x C-s and > > > clearly say when there can be data loss. > > > > At save time, "data loss" is wrt what's in the buffer. In that sense, > > the encodings Emacs suggested don't lose any data. > > "data loss" is the difference between the original file and the saved > file. But what do you want Emacs to do with this? When you save the buffer, the original file might be different or no longer be available (or not accessible even in principle, e.g. if the data came from a subprocess). These issues should be detected at file visit time, if at all, not at buffer save time. > > > Then Emacs says: "Select one of the safe coding systems listed below > > > [...]", but doesn't say that something has already been lost. So, the > > > words "safe coding systems" are really misleading. > > > > It's misleading because you misunderstand what is "safe" at buffer > > save time. > > No, it's misleading because Emacs didn't say that data were lost > when visiting the file. Let's be constructive here. Please suggest some practical way for Emacs to handle this situation better. For the record, here are the various alternative ways Emacs supports the use case you described, when a file with inconsistent encoding needs to be repaired manually: . Visit the file with "M-x find-file-literally RET". This yields a unibyte buffer, where each byte stands for itself, and which you can edit without risking en-/decoding issues. . Visit the file normally, then type "M-x hexl-mode RET" (or use "M-x hexl-find-file RET" to visit it in the first place). This revisits (or visits) the file in a unibyte buffer, and in addition lets you edit the binary stuff regardless of its graphic representation. . After visiting the file normally and noticing that it contains weird characters, or after being prompted to select a coding system when saving the buffer, type "C-x RET r raw-text RET" to revisit the file in raw-text encoding. Then edit the bytes and save the file. These alternatives are listed in the descending order of priority (IMO). There are more ways to deal with this, but the rest are more complicated and dangerous, so I don't mention them here. (It is also possible that you will find the second alternative more convenient than the 1st one.) From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 21 21:37:11 2013 Received: (at 13505) by debbugs.gnu.org; 22 Jan 2013 02:37:11 +0000 Received: from localhost ([127.0.0.1]:43819 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxTja-0006xc-S9 for submit@debbugs.gnu.org; Mon, 21 Jan 2013 21:37:11 -0500 Received: from vinc17.pck.nerim.net ([213.41.242.187]:57685 helo=smtp-xvii.vinc17.net) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxTjX-0006xT-1v for 13505@debbugs.gnu.org; Mon, 21 Jan 2013 21:37:09 -0500 Received: by xvii.vinc17.org (Postfix, from userid 1000) id C919B31001E; Tue, 22 Jan 2013 03:35:57 +0100 (CET) Date: Tue, 22 Jan 2013 03:35:57 +0100 From: Vincent Lefevre To: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving Message-ID: <20130122023557.GA25002@xvii.vinc17.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> <83a9s3p56p.fsf@gnu.org> <20130121041410.GJ2695@xvii.vinc17.org> <83vcaqo1yv.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <83vcaqo1yv.fsf@gnu.org> X-Mailer-Info: http://www.vinc17.net/mutt/ User-Agent: Mutt/1.5.21-6291-vl-r57386 (2013-01-20) Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 2013-01-21 19:55:20 +0200, Eli Zaretskii wrote: > > Date: Mon, 21 Jan 2013 05:14:10 +0100 > > From: Vincent Lefevre > > Cc: rlb@defaultvalue.org, handa@gnu.org, 13505@debbugs.gnu.org, > > 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org > >=20 > > On 2013-01-21 05:48:14 +0200, Eli Zaretskii wrote: > > > > You said: > > > >=20 > > > > | The original encoded form of the characters as found on disk at > > > > | visit time _cannot_ be recovered by saving with raw-text, becau= se > > > > | that encoded form is lost without a trace when the file is _vis= ited_ > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > | and decoded into the internal representation. > > > >=20 > > > > This is what lossy is. > > >=20 > > > In that sense, every encoding except no-conversion is lossy. > >=20 > > Even 8-bit encodings such as latin-1? >=20 > Yes. When latin-1 characters are decoded (as part of visiting a > file), they are converted to the internal representation, and cease to > be single 8-bit bytes. Any example where saving the file without modifying it (see below) would modify the data (as a sequence of bytes on the disk)? > > > > On the opposite, the utf-8 encoding doesn't seem to be lossy: Ema= cs > > > > seems to handle files with invalid UTF-8 sequences without any lo= ss. > > > > So, this encoding is safe, even if Emacs wrongly guess the encodi= ng. > > >=20 > > > No, it isn't, although you could get away with it most of the time. > >=20 > > Could you give an example where one loses data with the utf-8 encodin= g? >=20 > E.g., in your test file, the byte whose value is 0x80 is converted to > 0x3fff80 when the file is read into a buffer. No, there are no problems with this example: $ printf "\x80" > file $ hd file 00000000 80 |.| 00000001 $ emacs -q file Here the encoding by Emacs is utf-8-unix. Then I do M-: (set-buffer-modified-p t) to mark the buffer as modified (as in the bug report)., then C-x C-s. Emacs proposes raw-text, which I choose. Then C-x C-c to quit. $ hd file 00000000 80 |.| 00000001 So, the file has *not* been corrupted. > Perhaps by "lossless" you mean "reversible", in the sense that saving > the same buffer will perform the reverse conversion. Actually I don't mind what occurs internally. What I mean is things like: saved file =3D initial file if it hasn't been modified (as above) and with the default encoding(s) proposed by Emacs (when visiting and when saving). > In that case, even the in-is13194-devanagari-unix is reversible: if > you type this encoding when Emacs prompts you to select one of the > coding systems, then you get the same file on disk with no > corruption whatsoever. Then this is what Emacs should propose by default on this example! I suppose that Emacs is able to remember the encoding used to visit the file, so that this should be possible... > > > > But Emacs should clearly tell the user what to do after C-x C-s a= nd > > > > clearly say when there can be data loss. > > >=20 > > > At save time, "data loss" is wrt what's in the buffer. In that sen= se, > > > the encodings Emacs suggested don't lose any data. > >=20 > > "data loss" is the difference between the original file and the saved > > file. >=20 > But what do you want Emacs to do with this? When you save the buffer, > the original file might be different or no longer be available (or not > accessible even in principle, e.g. if the data came from a > subprocess). The file may be different, but in general, the encoding should remain the same. This is particularly true when Emacs is used as the editor by some application: if the encoding of the file has been changed by Emacs, the application will be confused. > These issues should be detected at file visit time, if at all, not > at buffer save time. Possibly (this is something that the end user doesn't have to know if the goal is to modify a file). > > > > Then Emacs says: "Select one of the safe coding systems listed be= low > > > > [...]", but doesn't say that something has already been lost. So,= the > > > > words "safe coding systems" are really misleading. > > >=20 > > > It's misleading because you misunderstand what is "safe" at buffer > > > save time. > >=20 > > No, it's misleading because Emacs didn't say that data were lost > > when visiting the file. >=20 > Let's be constructive here. Please suggest some practical way for > Emacs to handle this situation better. >=20 > For the record, here are the various alternative ways Emacs supports > the use case you described, when a file with inconsistent encoding > needs to be repaired manually: >=20 > . Visit the file with "M-x find-file-literally RET". This yields a > unibyte buffer, where each byte stands for itself, and which you > can edit without risking en-/decoding issues. Though the above is possible, the user often opens files with "emacs ". > . Visit the file normally, then type "M-x hexl-mode RET" (or use=20 > "M-x hexl-find-file RET" to visit it in the first place). This > revisits (or visits) the file in a unibyte buffer, and in addition > lets you edit the binary stuff regardless of its graphic > representation. If Emacs notices a potential problem when visiting the file, this method can be proposed by Emacs, but it shouldn't be the only way, because the file may contain mostly ASCII characters and hex-editing is not the best choice in such a case. > . After visiting the file normally and noticing that it contains > weird characters, or after being prompted to select a coding system > when saving the buffer, type "C-x RET r raw-text RET" to revisit > the file in raw-text encoding. Then edit the bytes and save the > file. But that could be proposed by Emacs directly: instead of decoding the file directly in the buffer, Emacs could ask the user which coding system he wants to use. One drawback of raw-text is that 8-bit characters are completely unreadable. I think that there should be, for instance, a utf-8 degraded coding system: correct UTF-8 sequences are decoded using UTF-8, and invalid sequences are left intact. Emacs can already do such kind of things, but there should be 2 differences from the current behavior: * When visiting the file, ask the user what to do in case Emacs cannot select a clean coding system without any problem. For instance, a "Select coding system" prompt. (BTW, couldn't hexl be regarded as a special coding system at this point? Perhaps "coding system" isn't the right term here, "editing mode" might be better.) Other settings in .emacs could override that, of course, i.e. this would just be the default. * In case of UTF-8 degraded coding system, Emacs should save the file in the same UTF-8 degraded coding system. This is a way for the user to say: "I know that there are invalid sequences, just keep them." UTF-8 is just an example above. There could be the same kind of things with other encodings. --=20 Vincent Lef=E8vre - Web: 100% accessible validated (X)HTML - Blog: Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon) From debbugs-submit-bounces@debbugs.gnu.org Tue Jan 22 02:58:12 2013 Received: (at 13505) by debbugs.gnu.org; 22 Jan 2013 07:58:13 +0000 Received: from localhost ([127.0.0.1]:43994 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxYkF-0006Nv-8l for submit@debbugs.gnu.org; Tue, 22 Jan 2013 02:58:12 -0500 Received: from mtaout21.012.net.il ([80.179.55.169]:55721) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TxYkB-0006Nk-NG for 13505@debbugs.gnu.org; Tue, 22 Jan 2013 02:58:09 -0500 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MH000M00OLEO100@a-mtaout21.012.net.il> for 13505@debbugs.gnu.org; Tue, 22 Jan 2013 09:56:27 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MH000MNZOQ2HWA0@a-mtaout21.012.net.il>; Tue, 22 Jan 2013 09:56:27 +0200 (IST) Date: Tue, 22 Jan 2013 09:56:44 +0200 From: Eli Zaretskii Subject: Re: bug#13505: Bug#696026: emacs24: file corruption on saving In-reply-to: <20130122023557.GA25002@xvii.vinc17.org> X-012-Sender: halo1@inter.net.il To: Vincent Lefevre Message-id: <83k3r5odkz.fsf@gnu.org> References: <20121215223809.GA7549@xvii.vinc17.org> <877gn8ijgn.fsf@trouble.defaultvalue.org> <83obgjpzod.fsf@gnu.org> <20130120212508.GF2695@xvii.vinc17.org> <83bocjpm81.fsf@gnu.org> <20130120221007.GG2695@xvii.vinc17.org> <83a9s3p56p.fsf@gnu.org> <20130121041410.GJ2695@xvii.vinc17.org> <83vcaqo1yv.fsf@gnu.org> <20130122023557.GA25002@xvii.vinc17.org> X-Spam-Score: -1.2 (-) X-Debbugs-Envelope-To: 13505 Cc: 696026-forwarded@bugs.debian.org, handa@gnu.org, 696026@bugs.debian.org, rlb@defaultvalue.org, 13505@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.2 (-) > Date: Tue, 22 Jan 2013 03:35:57 +0100 > From: Vincent Lefevre > Cc: rlb@defaultvalue.org, handa@gnu.org, 13505@debbugs.gnu.org, > 696026-forwarded@bugs.debian.org, 696026@bugs.debian.org > > > > > > | The original encoded form of the characters as found on disk at > > > > > | visit time _cannot_ be recovered by saving with raw-text, because > > > > > | that encoded form is lost without a trace when the file is _visited_ > > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > | and decoded into the internal representation. > > > > > > > > > > This is what lossy is. > > > > > > > > In that sense, every encoding except no-conversion is lossy. > > > > > > Even 8-bit encodings such as latin-1? > > > > Yes. When latin-1 characters are decoded (as part of visiting a > > file), they are converted to the internal representation, and cease to > > be single 8-bit bytes. > > Any example where saving the file without modifying it (see below) > would modify the data (as a sequence of bytes on the disk)? See above: I was talking about changes at file-visit time. > > > > > On the opposite, the utf-8 encoding doesn't seem to be lossy: Emacs > > > > > seems to handle files with invalid UTF-8 sequences without any loss. > > > > > So, this encoding is safe, even if Emacs wrongly guess the encoding. > > > > > > > > No, it isn't, although you could get away with it most of the time. > > > > > > Could you give an example where one loses data with the utf-8 encoding? > > > > E.g., in your test file, the byte whose value is 0x80 is converted to > > 0x3fff80 when the file is read into a buffer. > > No, there are no problems with this example: Again, because we are talking about two different things. > > Perhaps by "lossless" you mean "reversible", in the sense that saving > > the same buffer will perform the reverse conversion. > > Actually I don't mind what occurs internally. What I mean is things > like: saved file = initial file if it hasn't been modified (as above) > and with the default encoding(s) proposed by Emacs (when visiting and > when saving). That's reversibility. > > In that case, even the in-is13194-devanagari-unix is reversible: if > > you type this encoding when Emacs prompts you to select one of the > > coding systems, then you get the same file on disk with no > > corruption whatsoever. > > Then this is what Emacs should propose by default on this example! It can't easily do that. There are 2 different use cases here: 1) A file was visited and its encoding was found to be inconsistent. Then it is being saved. This is your use case. 2) A file was modified by adding to it characters that cannot be encoded by the original encoding. For example, you visit a Latin-1 encoded file, then add to it characters that are outside the coverage of Latin-1. Then you save the file. What Emacs proposes is biased for the second use case, because it is by far the most frequent one. The other use case is supposed to be treated by other means, those which I mentioned in my previous mail. Giving instructions to both use cases is not a good idea, IMO, because it will confuse users who do not necessarily understand what is going on and in particular don't realize which of the two situations they are in. > I suppose that Emacs is able to remember the encoding used to visit > the file, so that this should be possible... It does remember. It actually shows it in the "select safe coding system" prompt. The problem is that its use can do the wrong thing in the second use case above. > > > > > But Emacs should clearly tell the user what to do after C-x C-s and > > > > > clearly say when there can be data loss. > > > > > > > > At save time, "data loss" is wrt what's in the buffer. In that sense, > > > > the encodings Emacs suggested don't lose any data. > > > > > > "data loss" is the difference between the original file and the saved > > > file. > > > > But what do you want Emacs to do with this? When you save the buffer, > > the original file might be different or no longer be available (or not > > accessible even in principle, e.g. if the data came from a > > subprocess). > > The file may be different, but in general, the encoding should remain > the same. That's what Emacs does, as long as it can. But in this case, that encoding might produce inconsistently encoded file, so Emacs doesn't want to do that silently. It has no idea that the file was inconsistently encoded in the first place, nor that you _want_ it to continue being inconsistently encoded. > This is particularly true when Emacs is used as the editor by some > application: if the encoding of the file has been changed by Emacs, > the application will be confused. Again, that's what Emacs does normally, if that encoding can do the job. Producing inconsistent encoding will certainly confuse those other programs. > > These issues should be detected at file visit time, if at all, not > > at buffer save time. > > Possibly (this is something that the end user doesn't have to know if > the goal is to modify a file). This use case proves otherwise. > > . Visit the file with "M-x find-file-literally RET". This yields a > > unibyte buffer, where each byte stands for itself, and which you > > can edit without risking en-/decoding issues. > > Though the above is possible, the user often opens files with > "emacs ". Many users have Emacs up and running for the entire session. > > . Visit the file normally, then type "M-x hexl-mode RET" (or use > > "M-x hexl-find-file RET" to visit it in the first place). This > > revisits (or visits) the file in a unibyte buffer, and in addition > > lets you edit the binary stuff regardless of its graphic > > representation. > > If Emacs notices a potential problem when visiting the file, this > method can be proposed by Emacs, but it shouldn't be the only way, > because the file may contain mostly ASCII characters and hex-editing > is not the best choice in such a case. ??? Hexl Mode shows the printable characters (at the right side of the display) in addition to the codes. What exactly is the problem here? > > . After visiting the file normally and noticing that it contains > > weird characters, or after being prompted to select a coding system > > when saving the buffer, type "C-x RET r raw-text RET" to revisit > > the file in raw-text encoding. Then edit the bytes and save the > > file. > > But that could be proposed by Emacs directly: instead of decoding the > file directly in the buffer, Emacs could ask the user which coding > system he wants to use. That'd be a nuisance, I think, because more often than not, keeping the original inconsistent encoding is not what the user wants. > One drawback of raw-text is that 8-bit characters are completely > unreadable. That's why I listed it the last. From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 28 16:48:31 2013 Received: (at control) by debbugs.gnu.org; 28 Jan 2013 21:48:31 +0000 Received: from localhost ([127.0.0.1]:53019 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TzwZ5-0004tU-Ce for submit@debbugs.gnu.org; Mon, 28 Jan 2013 16:48:31 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:41811) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TzwZ3-0004tN-Ex for control@debbugs.gnu.org; Mon, 28 Jan 2013 16:48:30 -0500 Received: from rgm by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1TzwYb-00073p-Jm for control@debbugs.gnu.org; Mon, 28 Jan 2013 16:48:01 -0500 Date: Mon, 28 Jan 2013 16:48:01 -0500 Message-Id: Subject: control message for bug 13377 To: X-Mailer: mail (GNU Mailutils 2.1) From: Glenn Morris X-Spam-Score: -4.7 (----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.7 (----) forcemerge 13505 13377 From debbugs-submit-bounces@debbugs.gnu.org Sun Oct 06 21:44:40 2013 Received: (at control) by debbugs.gnu.org; 7 Oct 2013 01:44:40 +0000 Received: from localhost ([127.0.0.1]:58663 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VSzsG-0002w2-Ls for submit@debbugs.gnu.org; Sun, 06 Oct 2013 21:44:40 -0400 Received: from fencepost.gnu.org ([208.118.235.10]:55687) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VSzsE-0002vu-Qx for control@debbugs.gnu.org; Sun, 06 Oct 2013 21:44:39 -0400 Received: from rgm by fencepost.gnu.org with local (Exim 4.71) (envelope-from ) id 1VSzsE-0000AJ-6p for control@debbugs.gnu.org; Sun, 06 Oct 2013 21:44:38 -0400 Date: Sun, 06 Oct 2013 21:44:38 -0400 Message-Id: Subject: control message for bug 13505 To: X-Mailer: mail (GNU Mailutils 2.1) From: Glenn Morris X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) # Closing as per http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=696026 close 13505 From unknown Wed Jun 18 23:11:44 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 04 Nov 2013 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator