GNU bug report logs - #16286
24.3.50; insert-file-contents may bring invisible garbage

Previous Next

Package: emacs;

Reported by: Andrey Kotlarski <m00naticus <at> gmail.com>

Date: Sun, 29 Dec 2013 14:06:02 UTC

Severity: important

Found in version 24.3.50

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #19 received at 16286-done <at> debbugs.gnu.org (full text, mbox):

From: handa <at> gnu.org (K. Handa)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 16286-done <at> debbugs.gnu.org
Subject: Re: 24.3.50; insert-file-contents may bring invisible garbage
Date: Tue, 28 Jan 2014 00:01:00 +0900
In article <52E4588D.70004 <at> cs.ucla.edu>, Paul Eggert <eggert <at> cs.ucla.edu> writes:

> I installed a patch as trunk bzr 116158, which (at least for me) fixes 
> the reported bug, and am taking the liberty of marking this as done. 
> There may well be a better fix, but at least Emacs shouldn't crash or 
> report nonsense now.

Thank you for working on this bug which I introduced when I
made decode_coding_gap optimized for ASCII and UTF-8 only
files.  

Your change is to set CODING_MODE_LAST_BLOCK in coding->mode
before calling decode_coding_gap so that detect_coding
doesn't detect a file as utf-8 if it has incomplete utf-8
sequence at the tail (as the reported testcase).

But, I think it is better that detect_coding detects such a
file as utf-8 and treats the trailing garbage as raw bytes.
24.3 does it, and that is why decode_coding_gap sets
CODING_MODE_LAST_BLOCK after calling detect_coding.

So, I suggest the attached fix instead of yours.  What do
you think?

---
Kenichi Handa
handa <at> gnu.org

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2014-01-26 12:17:55 +0000
+++ src/ChangeLog	2014-01-27 14:53:58 +0000
@@ -1,3 +1,16 @@
+2014-01-27  K. Handa  <handa <at> gnu.org>
+
+	These change are to fix bug#16286 in the different way than what
+	done by revno:116158.
+
+	* coding.h (struct coding_system): New member detected_utf8_bytes.
+
+	* coding.c (detect_coding_utf_8): Set coding->detected_utf8_bytes.
+	(decode_coding_gap): Use short cut for UTF-8 file reading only
+	when coding->detected_utf8_bytes equals to coding->src_bytes.
+
+	* fileio.c (Finsert_file_contents): Cancel the previous change.
+
 2014-01-26  Jan Djärv  <jan.h.d <at> swipnet.se>
 
 	* xterm.c (x_focus_changed): Check for non-X terminal-frame (Bug#16540)

=== modified file 'src/coding.c'
--- src/coding.c	2014-01-26 01:20:24 +0000
+++ src/coding.c	2014-01-27 14:47:43 +0000
@@ -1300,6 +1300,7 @@
 	   means that we found a valid non-ASCII characters.  */
 	detect_info->found |= CATEGORY_MASK_UTF_8_AUTO | CATEGORY_MASK_UTF_8_NOSIG;
     }
+  coding->detected_utf8_bytes = src_base - coding->source;
   coding->detected_utf8_chars = nchars;
   return 1;
 }
@@ -7890,7 +7891,7 @@
   coding->dst_multibyte = ! NILP (BVAR (current_buffer, enable_multibyte_characters));
 
   coding->head_ascii = -1;
-  coding->detected_utf8_chars = -1;
+  coding->detected_utf8_bytes = coding->detected_utf8_chars = -1;
   coding->eol_seen = EOL_SEEN_NONE;
   if (CODING_REQUIRE_DETECTION (coding))
     detect_coding (coding);
@@ -7907,7 +7908,8 @@
       if (chars != bytes)
 	{
 	  /* There exists a non-ASCII byte.  */
-	  if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8))
+	  if (EQ (CODING_ATTR_TYPE (attrs), Qutf_8)
+	      && coding->detected_utf8_bytes == coding->src_bytes)
 	    {
 	      if (coding->detected_utf8_chars >= 0)
 		chars = coding->detected_utf8_chars;

=== modified file 'src/coding.h'
--- src/coding.h	2014-01-26 01:20:24 +0000
+++ src/coding.h	2014-01-27 14:47:43 +0000
@@ -468,7 +468,9 @@
      the eol format.  */
   ptrdiff_t head_ascii;
 
-  ptrdiff_t detected_utf8_chars;
+  /* How many bytes/chars at the source are detected as valid utf-8
+     sequence.  Set by detect_coding_utf_8.  */
+  ptrdiff_t detected_utf8_bytes, detected_utf8_chars;
 
   /* Used internally in coding.c.  See the comment of detect_ascii.  */
   int eol_seen;

=== modified file 'src/fileio.c'
--- src/fileio.c	2014-01-26 00:32:30 +0000
+++ src/fileio.c	2014-01-27 14:47:59 +0000
@@ -4298,7 +4298,6 @@
       Z_BYTE -= inserted;
       ZV -= inserted;
       Z -= inserted;
-      coding.mode |= CODING_MODE_LAST_BLOCK;
       decode_coding_gap (&coding, inserted, inserted);
       inserted = coding.produced_char;
       coding_system = CODING_ID_NAME (coding.id);





This bug report was last modified 11 years and 111 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.