GNU bug report logs - #23595
25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)

Previous Next

Package: emacs;

Reported by: Uwe Brauer <oub <at> mat.ucm.es>

Date: Sat, 21 May 2016 13:03:01 UTC

Severity: normal

Found in version 25.1.50

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

Full log


Message #26 received at 23595 <at> debbugs.gnu.org (full text, mbox):

From: Uwe Brauer <oub <at> mat.ucm.es>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: oub <at> mat.ucm.es, 23595 <at> debbugs.gnu.org, Dmitry Gutov <dgutov <at> yandex.ru>
Subject: Re: bug#23595: 25.1.50;
 file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)
Date: Mon, 23 May 2016 17:00:53 +0000
>>> "Eli" == Eli Zaretskii <eliz <at> gnu.org> writes:

   >> From: Dmitry Gutov <dgutov <at> yandex.ru>
   >> Date: Mon, 23 May 2016 14:52:03 +0300
   >> 
   >> > The resulting diff contains either rubbish or fails to run.
   >> > Files attached.

   > I don't see any rubbish in the Git output.  With RCS, the command
   > signals an error, so more digging is needed to find out what's wrong
   > (although it could be that rcsdiff exits with non-zero status when it
   > sees what looks like binary files).

   >> It seems, to an extent, be caused by our setting
   >> coding-system-for-read inside vc-diff-internal (to
   >> utf-16be-with-signature-unix, which is also the value of
   >> buffer-file-coding-system).
   >> 
   >> Without that, the result of vc-diff (at least with Git) is "Binary
   >> files a/test-chin-jap.tex and b/test-chin-jap.tex differ". Emacs
   >> 24.5 does the same.

   > Setting coding-system-for-read is correct, because the important use
   > case is when the diffs are actually output.  The problem is that
   > UTF-16 is not ASCII-compatible, and so text output by Git itself will
   > be mishandled.  Another problem is that Git doesn't show the diffs at
   > all.

   >> Which is weird, considering both vc-diff-internal and
   >> vc-coding-system-for-diff have both been virtually untouched for the
   >> last couple of years.

   > Not sure what do you see as weird.

   >> But even if we figure out why happens, you (Uwe) probably want Git,
   >> Hg, etc, to treat this file as text, and not binary. Only then
   >> you'll be able to get meaningful diffs. I don't have a specific
   >> advice on that.

   > Why can't we invoke "git diff --text"?  That should fix the second
   > problem, I think.

I thought the problem was caused by the fact that I did not entered that
chars, but rather copied it from some tex.stackexchange site, but I see
that was not the reason.


What is about mercurial?[1]


   > As for the first problem, we should probably refrain from binding
   > coding-system-for-read to a CODING-SYSTEM for which

   >    (coding-system-get CODING-SYSTEM :ascii-compatible-p)

   > returns nil.  We should instead bind it to no-conversion and decode
   > the file data parts by hand, skipping the parts that Git itself
   > outputs (yes, this is messy).  Patches to that effect are welcome.

   > Bottom line: users who put UTF-16 encoded files into VCS are playing
   > with fire, and are best advised not to do that!

Right, I see, that was just 2 chars in a document which contained
latin-1 or UTF8. So Chinese and Japanese programmers are in a
disadvantage, no?




Footnotes: 
[1]   I don't care so much about RCS in that context.





This bug report was last modified 9 years and 23 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.