GNU bug report logs - #2741
Decoding of vc-annotate output affected by language environment

Previous Next

Package: emacs;

Reported by: Juanma Barranquero <lekktu <at> gmail.com>

Date: Sat, 21 Mar 2009 23:30:03 UTC

Severity: normal

Done: Juanma Barranquero <lekktu <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 2741 in the body.
You can then email your comments to 2741 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2741; Package emacs. (Sat, 21 Mar 2009 23:30:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juanma Barranquero <lekktu <at> gmail.com>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Sat, 21 Mar 2009 23:30:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Emacs Bug Tracker <submit <at> debbugs.gnu.org>
Subject: Mixed UTF-8 and raw bytes in output of vc-annotate after 
	(set-language-environment "UTF-8")
Date: Sun, 22 Mar 2009 00:23:32 +0100
1) Create a Git repository and add a Latin-1 file with some non-ASCII
characters. In my example, the archive test.txt contains the following
text:

    A few Spanish characters: áéíóúüñ

2) Execute "emacs -Q test.txt -f vc-annotate". The resulting *Annotate
test.txt* buffer has buffer-file-coding-system `iso-latin-1-dos' and
shows:

    ^7fb00c1 (Juanma Barranquero 2009-03-22 00:01:39 +0100 1) A few
Spanish characters: áéíóúüñ

3) Set LANG to UTF-8 (for example, "set LANG=en_US.UTF-8"), and repeat
"emacs -Q test.txt -f vc-annotate". Now the *Annotate* buffer is in
`utf-8-dos', and shows:

    ^7fb00c1 (Juanma Barranquero 2009-03-22 00:01:39 +0100 1) A few
Spanish characters: áéíóúüñ

4) Finally, after unsetting LANG or not (it is irrelevant) do

    emacs -Q --eval "(set-language-environment \"UTF-8\")" test.txt -f
vc-annotate

  Now the *Annotate* buffer is in `utf-8-dos', but contains a mixture
of utf-8 and raw bytes:

    ^7fb00c1 (Juanma Barranquero 2009-03-22 00:01:39 +0100 1) A few
Spanish characters: \341\351\355\363\372\374\361

    Juanma




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2741; Package emacs. (Sun, 22 Mar 2009 01:30:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Sun, 22 Mar 2009 01:30:03 GMT) Full text and rfc822 format available.

Message #10 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Juanma Barranquero <lekktu <at> gmail.com>
Cc: 2741 <at> debbugs.gnu.org,
        Emacs Bug Tracker <submit <at> debbugs.gnu.org>
Subject: Re: bug#2741: Mixed UTF-8 and raw bytes in output of vc-annotate after (set-language-environment "UTF-8")
Date: Sat, 21 Mar 2009 21:23:16 -0400
> 4) Finally, after unsetting LANG or not (it is irrelevant) do

>     emacs -Q --eval "(set-language-environment \"UTF-8\")" test.txt -f
> vc-annotate

>   Now the *Annotate* buffer is in `utf-8-dos', but contains a mixture
> of utf-8 and raw bytes:

>     ^7fb00c1 (Juanma Barranquero 2009-03-22 00:01:39 +0100 1) A few
> Spanish characters: \341\351\355\363\372\374\361

I don't see a mixture of anything, I just see latin-1 encoded chars
decoded incorrectly because Emacs somehow decided to try and decode the
stream using the utf-8 coding-system.
But yes that's a bug.  `vc-annotate' should use the main file's
coding-system to decode the annotated text, regardless of
language environment.


        Stefan



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2741; Package emacs. (Sun, 22 Mar 2009 01:30:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Sun, 22 Mar 2009 01:30:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2741; Package emacs. (Sun, 22 Mar 2009 01:40:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juanma Barranquero <lekktu <at> gmail.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Sun, 22 Mar 2009 01:40:04 GMT) Full text and rfc822 format available.

Message #20 received at 2741 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 2741 <at> debbugs.gnu.org
Subject: Re: bug#2741: Mixed UTF-8 and raw bytes in output of vc-annotate 
	after (set-language-environment "UTF-8")
Date: Sun, 22 Mar 2009 02:31:26 +0100
On Sun, Mar 22, 2009 at 02:23, Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:

> I don't see a mixture of anything, I just see latin-1 encoded chars
> decoded incorrectly because Emacs somehow decided to try and decode the
> stream using the utf-8 coding-system.

Whatever. What I meant is that the buffer is nominally utf-8, but
contains raw bytes.

> But yes that's a bug.  `vc-annotate' should use the main file's
> coding-system to decode the annotated text, regardless of
> language environment.

It seems also a bug that the behavior is different between

   emacs -Q --eval "(set-language-environment \"UTF-8\")"

and

  set LANG=utf8.UTF-8
  emacs -Q

when, in both cases, `current-language-environment' is "UTF-8".

    Juanma




Changed bug title to `Decoding of vc-annotate output affected by language environment' from `Mixed UTF-8 and raw bytes in output of vc-annotate after (set-language-environment "UTF-8")'. Request was from Juanma Barranquero <lekktu <at> gmail.com> to control <at> emacsbugs.donarmstrong.com. (Mon, 23 Mar 2009 10:15:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2741; Package emacs. (Wed, 09 Sep 2009 23:25:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juanma Barranquero <lekktu <at> gmail.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Wed, 09 Sep 2009 23:25:07 GMT) Full text and rfc822 format available.

Message #27 received at 2741 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 2741 <at> debbugs.gnu.org
Subject: Re: bug#2741: Mixed UTF-8 and raw bytes in output of vc-annotate 
	after (set-language-environment "UTF-8")
Date: Thu, 10 Sep 2009 01:18:20 +0200
On Sun, Mar 22, 2009 at 03:23, Stefan Monnier<monnier <at> iro.umontreal.ca> wrote:

> I don't see a mixture of anything, I just see latin-1 encoded chars
> decoded incorrectly because Emacs somehow decided to try and decode the
> stream using the utf-8 coding-system.
> But yes that's a bug.  `vc-annotate' should use the main file's
> coding-system to decode the annotated text, regardless of
> language environment.

The following patch fixes it.

The change is in `vc-annotate' and not `vc-git-annotate-command'
because the bug is not git-specific. I can easily reproduce it with
bzr, for example.

    Juanma


2009-09-09  Juanma Barranquero  <lekktu <at> gmail.com>

	* vc-annotate.el (vc-annotate): Use the main file's coding-system to
	decode annotated text, regardless of language environment.  (Bug#2741)


Index: vc-annotate.el
===================================================================
RCS file: /cvsroot/emacs/emacs/lisp/vc-annotate.el,v
retrieving revision 1.8
diff -u -2 -r1.8 vc-annotate.el
--- vc-annotate.el	10 Mar 2009 00:59:09 -0000	1.8
+++ vc-annotate.el	9 Sep 2009 23:11:24 -0000
@@ -376,5 +376,6 @@
 		(setq temp-buffer-name (buffer-name))))
     (with-output-to-temp-buffer temp-buffer-name
-      (let ((backend (vc-backend file)))
+      (let ((backend (vc-backend file))
+	    (coding-system-for-read buffer-file-coding-system))
         (vc-call-backend backend 'annotate-command file
                          (get-buffer temp-buffer-name) rev)



Reply sent to Juanma Barranquero <lekktu <at> gmail.com>:
You have taken responsibility. (Fri, 11 Sep 2009 11:10:08 GMT) Full text and rfc822 format available.

Notification sent to Juanma Barranquero <lekktu <at> gmail.com>:
bug acknowledged by developer. (Fri, 11 Sep 2009 11:10:09 GMT) Full text and rfc822 format available.

Message #32 received at 2741-done <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 2741-done <at> debbugs.gnu.org
Subject: Re: bug#2741: Mixed UTF-8 and raw bytes in output of vc-annotate 
	after (set-language-environment "UTF-8")
Date: Fri, 11 Sep 2009 13:02:51 +0200
On Thu, Sep 10, 2009 at 01:18, Juanma Barranquero <lekktu <at> gmail.com> wrote:

>        * vc-annotate.el (vc-annotate): Use the main file's coding-system to
>        decode annotated text, regardless of language environment.  (Bug#2741)

I've installed this change.

    Juanma



bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> emacsbugs.donarmstrong.com. (Fri, 09 Oct 2009 14:24:11 GMT) Full text and rfc822 format available.

This bug report was last modified 15 years and 260 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.