From unknown Mon Jun 16 23:40:41 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#16286 <16286@debbugs.gnu.org> To: bug#16286 <16286@debbugs.gnu.org> Subject: Status: 24.3.50; insert-file-contents may bring invisible garbage Reply-To: bug#16286 <16286@debbugs.gnu.org> Date: Tue, 17 Jun 2025 06:40:41 +0000 retitle 16286 24.3.50; insert-file-contents may bring invisible garbage reassign 16286 emacs submitter 16286 Andrey Kotlarski severity 16286 important thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 29 09:05:21 2013 Received: (at submit) by debbugs.gnu.org; 29 Dec 2013 14:05:21 +0000 Received: from localhost ([127.0.0.1]:50082 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VxGzY-0001HD-8U for submit@debbugs.gnu.org; Sun, 29 Dec 2013 09:05:21 -0500 Received: from eggs.gnu.org ([208.118.235.92]:37096) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VxGzV-0001H3-Pi for submit@debbugs.gnu.org; Sun, 29 Dec 2013 09:05:18 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VxGzQ-0007OE-L3 for submit@debbugs.gnu.org; Sun, 29 Dec 2013 09:05:17 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:51526) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VxGzQ-0007OA-IP for submit@debbugs.gnu.org; Sun, 29 Dec 2013 09:05:12 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38795) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VxGzL-0001Pq-Ql for bug-gnu-emacs@gnu.org; Sun, 29 Dec 2013 09:05:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VxGzE-00075P-56 for bug-gnu-emacs@gnu.org; Sun, 29 Dec 2013 09:05:07 -0500 Received: from mail-ee0-x230.google.com ([2a00:1450:4013:c00::230]:64897) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VxGzD-000752-UX for bug-gnu-emacs@gnu.org; Sun, 29 Dec 2013 09:05:00 -0500 Received: by mail-ee0-f48.google.com with SMTP id e49so4657472eek.35 for ; Sun, 29 Dec 2013 06:04:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:mime-version:content-type :content-transfer-encoding; bh=1aOB84HqoSo10CvMYA2Wkv6npUkiPlFAf6/IfpMdZGE=; b=xOwzzpvQDJM7UD7ycvUU8/q3tSDd4rnzCm0OMiIkod/AEyFuM+/Dfy6Uo/EmDCFrMH 8YRfoAf2/BQMZ5ft83xJIIyygH8qI3BAceVnXAxoI0YD+0VZ5Zur/KL05l5cevuCmcCD rCO6gXP4zx6wfQBZTGaL0MbWRkdIQFiY1hvbYl9YNixfDakWrp5guK1Ig5cHCifJaicb eEIraemD5HL51WPXxVbY9Rd5pyFo7nEDEKv4+vODErJ6IhDigEbjyTe8mWb8tRMriTO1 YnXZD5UExBVYT6WkNdFTyPLC000n7W2ruzJUyjL8vLTQo/3Jt0WTtyD0CWlgfc9W3/m1 /60g== X-Received: by 10.14.199.197 with SMTP id x45mr51215695een.8.1388325898497; Sun, 29 Dec 2013 06:04:58 -0800 (PST) Received: from andrexhe ([89.106.113.60]) by mx.google.com with ESMTPSA id b41sm99709885eef.16.2013.12.29.06.04.57 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 29 Dec 2013 06:04:57 -0800 (PST) From: Andrey Kotlarski To: bug-gnu-emacs@gnu.org Subject: 24.3.50; insert-file-contents may bring invisible garbage Date: Sun, 29 Dec 2013 16:05:22 +0200 Message-ID: <87sitb4usd.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) In trunk inserting few bytes from file may sometimes result in nothing visible in the buffer while invisible artifacts are present and may affect subsequent operations. Moreover, there doesn't seem to be way to recover from this. Here's example session with emacs -Q: (let ((file "test.txt")) (unless (file-exists-p file) (find-file file) (insert "=D0=B0=D0=B1=D0=B2") ;Cyrillic letters (save-buffer) (kill-buffer)) (let ((buf (generate-new-buffer "test"))) (switch-to-buffer buf) (insert-file-contents file nil 0 2) ;inserts =D0=B0 (goto-char (point-max)) (insert-file-contents file nil 2 3) ;returns 0 bytes inserted, nothing = visible in the buffer ;but actually there is (erase-buffer) ;and still is (insert-file-contents file nil 2 4) ;should insert =D0=B1, instead let:= Wrong type argument: inserted-chars, 1 (message "%S" (buffer-string)) ;"=D0=B1=D0=80" while buffer is visibly = empty )) Trying to insert multibyte characters now brings content length issues, garbage inserted and at some point Emacs crashes. In release 24.3 and earlier insert-file-contents seems to always insert something, be it wrongly decoded or raw eight-bit characters. But it is visible and easy to deal with. The above example works fine there. This is useful for the vlf package (https://github.com/m00natic/vlfi) as a way to detect insufficient amount of bytes requested and allows further adjustment. In GNU Emacs 24.3.50.1 (x86_64-pc-linux-gnu) of 2013-12-29 on andrexhe Bzr revision: 115803 eggert@cs.ucla.edu-20131229075253-hmeofd1oihd5n3rk Windowing system distributor `The X.Org Foundation', version 11.0.11405000 Configured using: `configure --build=3Dx86_64-pc-linux-gnu --enable-link-time-optimization --with-x-toolkit=3Dno --with-wide-int --without-toolkit-scroll-bars --without-xaw3d --without-gpm --without-gconf --without-gsettings build_alias=3Dx86_64-pc-linux-gnu 'CFLAGS=3D-march=3Dnative -mtune=3Dnativ= e -O2 -pipe'' Important settings: value of $LC_COLLATE: C value of $LANG: bg_BG.UTF-8 locale-coding-system: utf-8-unix Major mode: Fundamental Minor modes in effect: tooltip-mode: t electric-indent-mode: t mouse-wheel-mode: t tool-bar-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t line-number-mode: t transient-mark-mode: t Recent input: M-x r e p o - e m - b u Recent messages: For information about GNU Emacs and the GNU system, type C-h C-a. let: Wrong type argument: inserted-chars, 1 Load-path shadows: None found. Features: (shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml easymenu mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util help-fns mail-prsvr mail-utils time-date tooltip electric uniquify ediff-hook vc-hooks lisp-float-type mwheel x-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list newcomment lisp-mode prog-mode register page menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple abbrev minibuffer nadvice loaddefs button faces cus-face macroexp files text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote make-network-process dbusbind gfilenotify dynamic-setting font-render-setting x multi-tty emacs) From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 02 11:30:46 2014 Received: (at 16286) by debbugs.gnu.org; 2 Jan 2014 16:30:46 +0000 Received: from localhost ([127.0.0.1]:58167 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VylAS-0008BG-9s for submit@debbugs.gnu.org; Thu, 02 Jan 2014 11:30:45 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:61145) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VylAL-0008Ay-OI for 16286@debbugs.gnu.org; Thu, 02 Jan 2014 11:30:40 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MYS00I008E7L400@a-mtaout20.012.net.il> for 16286@debbugs.gnu.org; Thu, 02 Jan 2014 18:30:32 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MYS00ICX8IVKY10@a-mtaout20.012.net.il>; Thu, 02 Jan 2014 18:30:31 +0200 (IST) Date: Thu, 02 Jan 2014 18:30:30 +0200 From: Eli Zaretskii Subject: Re: bug#16286: 24.3.50; insert-file-contents may bring invisible garbage In-reply-to: <87sitb4usd.fsf@gmail.com> X-012-Sender: halo1@inter.net.il To: Andrey Kotlarski , Kenichi Handa Message-id: <83y52yxs61.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <87sitb4usd.fsf@gmail.com> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 16286 Cc: 16286@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: Andrey Kotlarski > Date: Sun, 29 Dec 2013 16:05:22 +0200 > > In trunk inserting few bytes from file may sometimes result in nothing > visible in the buffer while invisible artifacts are present and may > affect subsequent operations. Moreover, there doesn't seem to be way to > recover from this. Here's example session with emacs -Q: > > (let ((file "test.txt")) > (unless (file-exists-p file) > (find-file file) > (insert "абв") ;Cyrillic letters > (save-buffer) > (kill-buffer)) > > (let ((buf (generate-new-buffer "test"))) > (switch-to-buffer buf) > (insert-file-contents file nil 0 2) ;inserts а > (goto-char (point-max)) > (insert-file-contents file nil 2 3) ;returns 0 bytes inserted, nothing visible in the buffer > ;but actually there is > (erase-buffer) ;and still is > (insert-file-contents file nil 2 4) ;should insert б, instead let: Wrong type argument: inserted-chars, 1 > (message "%S" (buffer-string)) ;"бЀ" while buffer is visibly empty > )) > > Trying to insert multibyte characters now brings content length issues, > garbage inserted and at some point Emacs crashes. Your Emacs is built without --enable-checking; if that configure-time switch is used, Emacs hits an assertion violation as soon as this sexp is evaluated: (insert-file-contents file nil 2 3) Also, you are wrong about there being some invisible stuff in the buffer. The problem is elsewhere: Emacs gets confused about the number of characters and the number of bytes in the buffer. These two counts should be in sync at all times; once they become unsynchronized, Emacs will generally crash very soon. I'm CC'ing Handa-san in the hope that he will be able to suggest a solution. The problem happens in decode_coding_gap (called from insert-file-contents), in this code fragment (note the call to detect_coding): if (CODING_REQUIRE_DETECTION (coding)) detect_coding (coding); attrs = CODING_ID_ATTRS (coding->id); if (! disable_ascii_optimization && ! coding->src_multibyte && ! NILP (CODING_ATTR_ASCII_COMPAT (attrs)) && NILP (CODING_ATTR_POST_READ (attrs)) && NILP (get_translation_table (attrs, 0, NULL))) { chars = coding->head_ascii; if (chars < 0) chars = check_ascii (coding); if (chars != bytes) { /* There exists a non-ASCII byte. */ if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8)) { if (coding->detected_utf8_chars >= 0) chars = coding->detected_utf8_chars; <<<<<<<<<<<<<< else chars = check_utf_8 (coding); This reuses the number of characters that are valid UTF-8 sequences in the byte stream to be decoded, stored in coding->detected_utf8_chars, which were found by detect_coding_utf_8, which was called by detect_coding. In the case in point, detect_coding_utf_8 finds zero valid UTF-8 sequences, and so 'chars' becomes zero. But the number of decoded bytes is not adjusted to fit that, so it stays at its original value of 1. Then, decode_coding_gap does this: coding->produced = bytes; coding->produced_char = chars; insert_from_gap (chars, bytes, 1); Since 'chars' is zero, but 'bytes' is 1, this causes a mismatch between buffer's Z and Z_BYTE values, and from there it's a slippery slope all the way to an assertion violation during redisplay. Similar problems happen when insert-file-contents is called to read some number of bytes that doesn't end at a UTF-8 sequence boundary. I think I see a potential reason for this in detect_coding_utf_8, near its end: if (nchars < src_end - coding->source) /* The found characters are less than source bytes, which means that we found a valid non-ASCII characters. */ detect_info->found |= CATEGORY_MASK_UTF_8_AUTO | CATEGORY_MASK_UTF_8_NOSIG; This misses the use case such as this one, where the detection loop consumed one byte, found it not to be the head byte of a UTF-8 sequence, and then hit the end of the source bytes. It looks like the function incorrectly returns a success indication in this case, which might be part of the problem. > In release 24.3 and earlier insert-file-contents seems to always insert > something, be it wrongly decoded or raw eight-bit characters. But it is > visible and easy to deal with. The above example works fine there. > This is useful for the vlf package (https://github.com/m00natic/vlfi) as > a way to detect insufficient amount of bytes requested and allows > further adjustment. What vlf does is strange and IMO not the best possible solution to this issue: (cond ((vlf-partial-decode-shown-p) ;remove raw bytes from end (goto-char (point-max)) (while (eq (char-charset (preceding-char)) 'eight-bit) (setq shift-end (1- shift-end)) (delete-char -1))) ((< end vlf-file-size) ;add bytes until new character is displayed (let ((position (or position (point-min))) (expected-size (buffer-size))) (while (and (progn (setq shift-end (1+ shift-end) end (1+ end)) (delete-region position (point-max)) (goto-char position) (insert-file-contents buffer-file-name nil start end) (< end vlf-file-size)) (= expected-size (buffer-size)))))))) This seems to have a subtle misfeature of not supporting files with inconsistent encoding, or files with binary data, because there _all_ characters will belong to the eight-bit charset. Also, I don't understand why the removal of raw bytes is conditioned on Emacs version: why not just remove them unconditionally: if there are none, nothing will be removed. More to the point, I'm not sure whether inserting raw bytes in insert-file-contents when a portion of a multibyte sequence was read (i.e. go back to what Emacs 24.3 did) will be good for vlf. It sounds to me much better if Emacs would only return complete characters read from the file, so that applications will not need to remove those stray bytes. Finally, it would seem a better design for vlf to always read a few more bytes than was requested into some scratch buffer, and then decode them manually to determine just how many to copy to the main buffer. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 04 17:42:30 2014 Received: (at 16286) by debbugs.gnu.org; 4 Jan 2014 22:42:30 +0000 Received: from localhost ([127.0.0.1]:34577 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VzZvK-0007Ey-B7 for submit@debbugs.gnu.org; Sat, 04 Jan 2014 17:42:30 -0500 Received: from mail-ee0-f42.google.com ([74.125.83.42]:54710) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1VzZvI-0007Ep-8o for 16286@debbugs.gnu.org; Sat, 04 Jan 2014 17:42:28 -0500 Received: by mail-ee0-f42.google.com with SMTP id e53so7320313eek.1 for <16286@debbugs.gnu.org>; Sat, 04 Jan 2014 14:42:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-type:content-transfer-encoding; bh=+3mZDZn1myQqixZd2TyEbgtGbAYxwhOQ+CAFGMWVnOw=; b=rvGZHrHNuFX0Oja6ayhosKEw7ZFuV7/FtqrpWNKk9FScLjTD4YI6HaJlbOr5IzVNsC /TNwXl7U0Qks4em2gRMFbX6A6g0o8h+WHTzVlZct/G9D9yfTQ7mQh3ud1sqUnWzFtBtN VvVD0VPoKfYLPmrP9hAGGe+TtM2ugeYZ23ArraGjNhC9n1dy0P17i+bbmvPoqL0AHtfg 47cmNobftOkuPZ6Q9HVKeXHX86Ifr7tI5OPnxIPxpGOe11luHNeYB0ok3jspLt9Au4lJ Hx6s6tpyH+Z7t+THo5wfcLK80V7/qQWzozK5c5227WRHZNQuugXyhYvg/ymoP3tV0Uha 85GQ== X-Received: by 10.14.3.1 with SMTP id 1mr8244288eeg.94.1388875347346; Sat, 04 Jan 2014 14:42:27 -0800 (PST) Received: from andrexhe (77-85-30-63.btc-net.bg. [77.85.30.63]) by mx.google.com with ESMTPSA id e43sm157139755eep.7.2014.01.04.14.42.23 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 04 Jan 2014 14:42:26 -0800 (PST) From: Andrey Kotlarski To: Eli Zaretskii Subject: Re: bug#16286: 24.3.50; insert-file-contents may bring invisible garbage References: <87sitb4usd.fsf@gmail.com> <83y52yxs61.fsf@gnu.org> Date: Sun, 05 Jan 2014 00:42:39 +0200 In-Reply-To: <83y52yxs61.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 02 Jan 2014 18:30:30 +0200") Message-ID: <87ob3rqsgw.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 16286 Cc: Kenichi Handa , 16286@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Thanks a lot for the hints and pointers. [ 2 =D1=8F=D0=BD=D1=83=D0=B0=D1=80=D0=B8 2014, 18:30 +0200, =D1=87=D0=B5= =D1=82=D0=B2=D1=8A=D1=80=D1=82=D1=8A=D0=BA ] Eli Zaretskii: > What vlf does is strange and IMO not the best possible solution to > this issue: > ... > This seems to have a subtle misfeature of not supporting files with > inconsistent encoding, or files with binary data, because there _all_ > characters will belong to the eight-bit charset. There had been changes meanwhile which hopefully address these (no special treatment of eight-bit) along other issues (vlf-base.el). > More to the point, I'm not sure whether inserting raw bytes in > insert-file-contents when a portion of a multibyte sequence was read > (i.e. go back to what Emacs 24.3 did) will be good for vlf. It sounds > to me much better if Emacs would only return complete characters read > from the file, so that applications will not need to remove those > stray bytes. I agree. It would be ideal for vlf if insert-file-contents would also report the number of stray bytes at the end that haven't been utilized. > Finally, it would seem a better design for vlf to always read a few > more bytes than was requested into some scratch buffer, and then > decode them manually to determine just how many to copy to the main > buffer. I see that vlf somehow works only by some accident with current trunk (and --enable-checking disabled), so I'm on it. My initial attempt at naively combining insert-file-contents-literally with decode-coding-inserted-region though often produces wrong decoding where insert-file-contents would be good. From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 25 19:36:34 2014 Received: (at 16286-done) by debbugs.gnu.org; 26 Jan 2014 00:36:34 +0000 Received: from localhost ([127.0.0.1]:36002 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7DiE-0000c1-1j for submit@debbugs.gnu.org; Sat, 25 Jan 2014 19:36:34 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:48076) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7DiA-0000bq-VO for 16286-done@debbugs.gnu.org; Sat, 25 Jan 2014 19:36:31 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id CA3DA39E8012; Sat, 25 Jan 2014 16:36:29 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0N1BT1gb2JSG; Sat, 25 Jan 2014 16:36:29 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 7AF0E39E8008; Sat, 25 Jan 2014 16:36:29 -0800 (PST) Message-ID: <52E4588D.70004@cs.ucla.edu> Date: Sat, 25 Jan 2014 16:36:29 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: 16286-done@debbugs.gnu.org Subject: Re: 24.3.50; insert-file-contents may bring invisible garbage Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 16286-done Cc: Kenichi Handa X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) I installed a patch as trunk bzr 116158, which (at least for me) fixes the reported bug, and am taking the liberty of marking this as done. There may well be a better fix, but at least Emacs shouldn't crash or report nonsense now. From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 27 10:01:10 2014 Received: (at 16286-done) by debbugs.gnu.org; 27 Jan 2014 15:01:10 +0000 Received: from localhost ([127.0.0.1]:38246 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7ngT-0002BS-GE for submit@debbugs.gnu.org; Mon, 27 Jan 2014 10:01:09 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:51449) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7ngQ-0002BE-5K for 16286-done@debbugs.gnu.org; Mon, 27 Jan 2014 10:01:07 -0500 Received: from fl1-119-240-87-91.iba.mesh.ad.jp ([119.240.87.91]:33094 helo=wanchai) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1W7ngP-0000CN-5M; Mon, 27 Jan 2014 10:01:05 -0500 Received: from handa by wanchai with local (Exim 4.80) (envelope-from ) id 1W7ngK-00022p-OO; Tue, 28 Jan 2014 00:01:00 +0900 From: handa@gnu.org (K. Handa) To: Paul Eggert Subject: Re: 24.3.50; insert-file-contents may bring invisible garbage In-Reply-To: <52E4588D.70004@cs.ucla.edu> (message from Paul Eggert on Sat, 25 Jan 2014 16:36:29 -0800) Date: Tue, 28 Jan 2014 00:01:00 +0900 Message-ID: <87ppndfp03.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -5.5 (-----) X-Debbugs-Envelope-To: 16286-done Cc: 16286-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.5 (-----) In article <52E4588D.70004@cs.ucla.edu>, Paul Eggert w= rites: > I installed a patch as trunk bzr 116158, which (at least for me) fixes=20 > the reported bug, and am taking the liberty of marking this as done.=20 > There may well be a better fix, but at least Emacs shouldn't crash or=20 > report nonsense now. Thank you for working on this bug which I introduced when I made decode_coding_gap optimized for ASCII and UTF-8 only files.=20=20 Your change is to set CODING_MODE_LAST_BLOCK in coding->mode before calling decode_coding_gap so that detect_coding doesn't detect a file as utf-8 if it has incomplete utf-8 sequence at the tail (as the reported testcase). But, I think it is better that detect_coding detects such a file as utf-8 and treats the trailing garbage as raw bytes. 24.3 does it, and that is why decode_coding_gap sets CODING_MODE_LAST_BLOCK after calling detect_coding. So, I suggest the attached fix instead of yours. What do you think? --- Kenichi Handa handa@gnu.org =3D=3D=3D modified file 'src/ChangeLog' --- src/ChangeLog 2014-01-26 12:17:55 +0000 +++ src/ChangeLog 2014-01-27 14:53:58 +0000 @@ -1,3 +1,16 @@ +2014-01-27 K. Handa + + These change are to fix bug#16286 in the different way than what + done by revno:116158. + + * coding.h (struct coding_system): New member detected_utf8_bytes. + + * coding.c (detect_coding_utf_8): Set coding->detected_utf8_bytes. + (decode_coding_gap): Use short cut for UTF-8 file reading only + when coding->detected_utf8_bytes equals to coding->src_bytes. + + * fileio.c (Finsert_file_contents): Cancel the previous change. + 2014-01-26 Jan Dj=E4rv =20 * xterm.c (x_focus_changed): Check for non-X terminal-frame (Bug#16540) =3D=3D=3D modified file 'src/coding.c' --- src/coding.c 2014-01-26 01:20:24 +0000 +++ src/coding.c 2014-01-27 14:47:43 +0000 @@ -1300,6 +1300,7 @@ means that we found a valid non-ASCII characters. */ detect_info->found |=3D CATEGORY_MASK_UTF_8_AUTO | CATEGORY_MASK_UTF_8_NO= SIG; } + coding->detected_utf8_bytes =3D src_base - coding->source; coding->detected_utf8_chars =3D nchars; return 1; } @@ -7890,7 +7891,7 @@ coding->dst_multibyte =3D ! NILP (BVAR (current_buffer, enable_multibyte= _characters)); =20 coding->head_ascii =3D -1; - coding->detected_utf8_chars =3D -1; + coding->detected_utf8_bytes =3D coding->detected_utf8_chars =3D -1; coding->eol_seen =3D EOL_SEEN_NONE; if (CODING_REQUIRE_DETECTION (coding)) detect_coding (coding); @@ -7907,7 +7908,8 @@ if (chars !=3D bytes) { /* There exists a non-ASCII byte. */ - if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8)) + if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8) + && coding->detected_utf8_bytes =3D=3D coding->src_bytes) { if (coding->detected_utf8_chars >=3D 0) chars =3D coding->detected_utf8_chars; =3D=3D=3D modified file 'src/coding.h' --- src/coding.h 2014-01-26 01:20:24 +0000 +++ src/coding.h 2014-01-27 14:47:43 +0000 @@ -468,7 +468,9 @@ the eol format. */ ptrdiff_t head_ascii; =20 - ptrdiff_t detected_utf8_chars; + /* How many bytes/chars at the source are detected as valid utf-8 + sequence. Set by detect_coding_utf_8. */ + ptrdiff_t detected_utf8_bytes, detected_utf8_chars; =20 /* Used internally in coding.c. See the comment of detect_ascii. */ int eol_seen; =3D=3D=3D modified file 'src/fileio.c' --- src/fileio.c 2014-01-26 00:32:30 +0000 +++ src/fileio.c 2014-01-27 14:47:59 +0000 @@ -4298,7 +4298,6 @@ Z_BYTE -=3D inserted; ZV -=3D inserted; Z -=3D inserted; - coding.mode |=3D CODING_MODE_LAST_BLOCK; decode_coding_gap (&coding, inserted, inserted); inserted =3D coding.produced_char; coding_system =3D CODING_ID_NAME (coding.id); From debbugs-submit-bounces@debbugs.gnu.org Mon Jan 27 12:01:25 2014 Received: (at 16286-done) by debbugs.gnu.org; 27 Jan 2014 17:01:25 +0000 Received: from localhost ([127.0.0.1]:38308 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7pYr-0006Oy-Bb for submit@debbugs.gnu.org; Mon, 27 Jan 2014 12:01:25 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:36094) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W7pYo-0006Om-UT for 16286-done@debbugs.gnu.org; Mon, 27 Jan 2014 12:01:23 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id E9BB639E801A; Mon, 27 Jan 2014 09:01:21 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RILvOEgaYSJ9; Mon, 27 Jan 2014 09:01:21 -0800 (PST) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id AAE0139E8019; Mon, 27 Jan 2014 09:01:21 -0800 (PST) Message-ID: <52E690DE.6070101@cs.ucla.edu> Date: Mon, 27 Jan 2014 09:01:18 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: "K. Handa" Subject: Re: 24.3.50; insert-file-contents may bring invisible garbage References: <87ppndfp03.fsf@gnu.org> In-Reply-To: <87ppndfp03.fsf@gnu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.8 (--) X-Debbugs-Envelope-To: 16286-done Cc: 16286-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.8 (--) Yes, thanks, that looks like a better patch. From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 29 08:41:03 2014 Received: (at 16286-done) by debbugs.gnu.org; 29 Jan 2014 13:41:03 +0000 Received: from localhost ([127.0.0.1]:39981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8VO3-0006v9-Du for submit@debbugs.gnu.org; Wed, 29 Jan 2014 08:41:03 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:38650) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8VNz-0006uj-La for 16286-done@debbugs.gnu.org; Wed, 29 Jan 2014 08:41:00 -0500 Received: from fl1-119-240-87-91.iba.mesh.ad.jp ([119.240.87.91]:50678 helo=wanchai) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1W8VNz-0006ui-2L; Wed, 29 Jan 2014 08:40:59 -0500 Received: from handa by wanchai with local (Exim 4.80) (envelope-from ) id 1W8VNr-0001GA-Oa; Wed, 29 Jan 2014 22:40:51 +0900 From: handa@gnu.org (K. Handa) To: Paul Eggert Subject: Re: 24.3.50; insert-file-contents may bring invisible garbage In-Reply-To: <52E690DE.6070101@cs.ucla.edu> (message from Paul Eggert on Mon, 27 Jan 2014 09:01:18 -0800) Date: Wed, 29 Jan 2014 22:40:51 +0900 Message-ID: <87iot2ucrg.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -5.5 (-----) X-Debbugs-Envelope-To: 16286-done Cc: 16286-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.5 (-----) In article <52E690DE.6070101@cs.ucla.edu>, Paul Eggert writes: > Yes, thanks, that looks like a better patch. Ok, I've just committed it. --- Kenichi Handa handa@gnu.org From unknown Mon Jun 16 23:40:41 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 27 Feb 2014 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator