GNU bug report logs - #23595
25.1.50; file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)

Previous Next

Package: emacs;

Reported by: Uwe Brauer <oub <at> mat.ucm.es>

Date: Sat, 21 May 2016 13:03:01 UTC

Severity: normal

Found in version 25.1.50

Done: Dmitry Gutov <dgutov <at> yandex.ru>

Bug is archived. No further changes may be made.

Full log


Message #17 received at 23595 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: oub <at> mat.ucm.es, 23595 <at> debbugs.gnu.org
Subject: Re: bug#23595: 25.1.50;
 file with chinese/japanse chars, vc-diff fails (HG, Git, RCS)
Date: Mon, 23 May 2016 19:48:50 +0300
> From: Dmitry Gutov <dgutov <at> yandex.ru>
> Date: Mon, 23 May 2016 14:52:03 +0300
> 
> > The resulting diff contains either rubbish or fails to run.
> > Files attached.

I don't see any rubbish in the Git output.  With RCS, the command
signals an error, so more digging is needed to find out what's wrong
(although it could be that rcsdiff exits with non-zero status when it
sees what looks like binary files).

> It seems, to an extent, be caused by our setting coding-system-for-read inside vc-diff-internal (to utf-16be-with-signature-unix, which is also the value of buffer-file-coding-system).
> 
> Without that, the result of vc-diff (at least with Git) is "Binary files a/test-chin-jap.tex and b/test-chin-jap.tex differ". Emacs 24.5 does the same.

Setting coding-system-for-read is correct, because the important use
case is when the diffs are actually output.  The problem is that
UTF-16 is not ASCII-compatible, and so text output by Git itself will
be mishandled.  Another problem is that Git doesn't show the diffs at
all.

> Which is weird, considering both vc-diff-internal and vc-coding-system-for-diff have both been virtually untouched for the last couple of years.

Not sure what do you see as weird.

> But even if we figure out why happens, you (Uwe) probably want Git, Hg, etc, to treat this file as text, and not binary. Only then you'll be able to get meaningful diffs. I don't have a specific advice on that.

Why can't we invoke "git diff --text"?  That should fix the second
problem, I think.

As for the first problem, we should probably refrain from binding
coding-system-for-read to a CODING-SYSTEM for which

   (coding-system-get CODING-SYSTEM :ascii-compatible-p)

returns nil.  We should instead bind it to no-conversion and decode
the file data parts by hand, skipping the parts that Git itself
outputs (yes, this is messy).  Patches to that effect are welcome.

Bottom line: users who put UTF-16 encoded files into VCS are playing
with fire, and are best advised not to do that!




This bug report was last modified 9 years and 24 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.