GNU bug report logs - #23595
25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)

Previous Next

Package: emacs;

Reported by: Uwe Brauer <oub <at> mat.ucm.es>

Date: Sat, 21 May 2016 13:03:01 UTC

Severity: normal

Found in version 25.1.50

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Dmitry Gutov <dgutov <at> yandex.ru>, Eli Zaretskii <eliz <at> gnu.org>
Cc: oub <at> mat.ucm.es, 23595 <at> debbugs.gnu.org
Subject: bug#23595: 25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)
Date: Tue, 24 May 2016 23:19:05 -0700
Dmitry Gutov wrote:
> Should `utf-8' altogether replace `undecided' in vc-coding-system-for-diff? Then
> the use of buffer-file-coding-system could be predicated on its being compatible
> with ascii.

That might be going too far.

We want buffer-file-coding-system to be compatible-enough with ASCII for the 
case where diff output might contain ASCII metadata or non-ASCII file contents 
or both. In this case, if buffer-file-coding-system is greatly incompatible with 
ASCII, then ASCII will often be wrong (because the file data in the diff output 
will be mostly UTF-16, say), and buffer-file-coding-system will often be wrong 
too (because the non-file data will be mostly ASCII). So when 
buffer-file-coding-system is greatly incompatible with ASCII, we can't use 
either buffer-file-coding-system or UTF-8; they're both wrong too often.

The way it's *supposed* to work in a POSIX system, is that diff is supposed to 
be applied to a file that is valid text according to the current locale's 
encoding, and diff is supposed to generate both metadata and data that uses the 
current locale's encoding. I expect that we should fall back on this approach 
when buffer-file-coding-system is greatly incompatible with ASCII. This will 
better handle unusual cases such as a system operating in an EBCDIC locale 
(which can happen on IBM mainframes, though admittedly Emacs is not likely to 
work well on such platforms). And this argues for sticking with 'undecided' 
instead of 'utf-8' here.

(In theory it's possible for a GNU/Linux system to establish a locale with 
UTF-16 encoding, so that diff's metadata and data are consistently UTF-16 for 
this example. However, I've never heard of such a thing, and couldn't find any 
evidence of one just now when I searched for it. So I don't think we need to 
worry about this now.)





This bug report was last modified 9 years and 24 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.