GNU bug report logs - #28180
[w32] Unicode characters in subprocess (git) arguments changed to space

Previous Next

Package: emacs;

Reported by: npostavs <at> users.sourceforge.net

Date: Tue, 22 Aug 2017 02:35:02 UTC

Severity: normal

To reply to this bug, email your comments to 28180 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#28180; Package emacs. (Tue, 22 Aug 2017 02:35:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to npostavs <at> users.sourceforge.net:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 22 Aug 2017 02:35:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: npostavs <at> users.sourceforge.net
To: bug-gnu-emacs <at> gnu.org
Subject: [w32] Unicode characters in subprocess (git) arguments changed to
 space
Date: Mon, 21 Aug 2017 22:35:24 -0400
[Message part 1 (text/plain, inline)]
In w32.c there is a comment saying

   . Running subprocesses in non-ASCII directories and with non-ASCII
     file arguments is limited to the current codepage [...]
     This should be fixed, but will also require changes in cmdproxy.
     The current limitation is not terribly bad anyway, since very
     few, if any, Windows console programs that are likely to be
     invoked by Emacs support UTF-16 encoded command lines.

I believe we're running into this limitation with git: staging a file
named 好.txt fails from magit[1] (I tried also with vc, same problem).
A quick way to see the problem is evaluating the call-process form
below, the output shows that the Chinese character has been transformed
into a space.  This happens even if I execute 'chcp 65001' before
starting Emacs (a possible workaround I saw suggested in a few places).
The short file name trick doesn't help either.

(call-process "git" nil '(t t) nil
              "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x" "x" "(好)")
0000000 28 20 29
0000003

As far as I can tell, git does support UTF-16 encoded command lines, as
demonstrated by the attached git-args.c, which produces the utf8
encoding of the character (this is also what the call-process form
produces when I run it on GNU/Linux):

C:\Users\npostavs\src\win32-args>.\git-args.exe
0000000 28 e5 a5 bd 29
0000005

[git-args.c (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
Am I correct that this problem is related the w32.c comment?  It's not
clear to me what changes are needed in cmdproxy (and other places?) to
address it.

[1]: https://github.com/magit/magit/issues/3111

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28180; Package emacs. (Tue, 22 Aug 2017 14:56:02 GMT) Full text and rfc822 format available.

Message #8 received at 28180 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: npostavs <at> users.sourceforge.net
Cc: 28180 <at> debbugs.gnu.org
Subject: Re: bug#28180: [w32] Unicode characters in subprocess (git) arguments
 changed to space
Date: Tue, 22 Aug 2017 17:54:59 +0300
> From: npostavs <at> users.sourceforge.net
> Date: Mon, 21 Aug 2017 22:35:24 -0400
> 
> In w32.c there is a comment saying
> 
>    . Running subprocesses in non-ASCII directories and with non-ASCII
>      file arguments is limited to the current codepage [...]
>      This should be fixed, but will also require changes in cmdproxy.
>      The current limitation is not terribly bad anyway, since very
>      few, if any, Windows console programs that are likely to be
>      invoked by Emacs support UTF-16 encoded command lines.
> 
> I believe we're running into this limitation with git: staging a file
> named 好.txt fails from magit[1] (I tried also with vc, same problem).
> A quick way to see the problem is evaluating the call-process form
> below, the output shows that the Chinese character has been transformed
> into a space.

I'd expect that in a non-Chinese locale (which I believe was what you
did), but the OP of the Magit issue has Windows set up for a Chinese
locale, so there has to be some other explanation, because passing
Chinese characters on the command line ought to work in that case.

> Am I correct that this problem is related the w32.c comment?

The comment is accurate, but it can only explain why command-line
arguments with characters outside of the current Windows locale cannot
be safely passed to sub-processes.  Which AFAIU is not the case with
the OP of that Magit issue.

> It's not clear to me what changes are needed in cmdproxy (and other
> places?) to address it.

cmdproxy is not involved in call-process, but it is involved in
shell-command and its ilk.  As it makes no sense to support Unicode in
the former, but not in the latter, if we want to lift this limitation,
we must teach cmdproxy to use "wide" APIs both for receiving
command-line arguments from Emacs and for passing them to programs it
invokes.

As to the "other places", the only problem I'm aware of is that the
encoding of the command-line arguments, when they arrive at w32proc.c,
is not known in advance, so this must be somehow fixed/changed,
otherwise we will be unable to re-encode them in UTF-16.  I believe
the comment in w32.c does mention that.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28180; Package emacs. (Mon, 28 Aug 2017 14:43:01 GMT) Full text and rfc822 format available.

Message #11 received at 28180 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> users.sourceforge.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 28180 <at> debbugs.gnu.org
Subject: Re: bug#28180: [w32] Unicode characters in subprocess (git) arguments
 changed to space
Date: Mon, 28 Aug 2017 10:42:14 -0400
[Message part 1 (text/plain, inline)]
On Tue, Aug 22, 2017 at 10:54 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:

> As to the "other places", the only problem I'm aware of is that the
> encoding of the command-line arguments, when they arrive at w32proc.c,
> is not known in advance, so this must be somehow fixed/changed,
> otherwise we will be unable to re-encode them in UTF-16.  I believe
> the comment in w32.c does mention that.

Just to understand the issue better, I applied the attached diff to
use CreateProcessW. It seemed to work, but only when I start emacs
from mingw's msys shell. When running from cmd.exe it still translates
to space.

Furthermore, when I run an unpatched Emacs from the msys shell, the
output of the test I posted above is different:

(call-process "git" nil '(t t) nil
              "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x"
"x" "(好)")
0000000 28 c3 a5 c2 a5 c2 bd 29
0000010

Do you have any idea what setting could cause this?
[CreateProcessW.diff (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28180; Package emacs. (Mon, 28 Aug 2017 17:17:02 GMT) Full text and rfc822 format available.

Message #14 received at 28180 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Noam Postavsky <npostavs <at> users.sourceforge.net>
Cc: 28180 <at> debbugs.gnu.org
Subject: Re: bug#28180: [w32] Unicode characters in subprocess (git) arguments
 changed to space
Date: Mon, 28 Aug 2017 20:15:46 +0300
> From: Noam Postavsky <npostavs <at> users.sourceforge.net>
> Date: Mon, 28 Aug 2017 10:42:14 -0400
> Cc: 28180 <at> debbugs.gnu.org
> 
> Just to understand the issue better, I applied the attached diff to
> use CreateProcessW.

I hope you realize that this is just a quick hack, which cannot work
in general, yes?  For starters, the command line is not a file name,
in general, so using filename_to_utf16 is inappropriate.  Also, I
think the environment variables need to be converted to UTF-16.

> It seemed to work, but only when I start emacs from mingw's msys
> shell. When running from cmd.exe it still translates to space.

What exactly did you run from cmd.exe?  What command?

> Furthermore, when I run an unpatched Emacs from the msys shell, the
> output of the test I posted above is different:
> 
> (call-process "git" nil '(t t) nil
>               "-c" "alias.x=!x() { printf '%s' \"$1\" | od -tx1; }; x"
> "x" "(好)")
> 0000000 28 c3 a5 c2 a5 c2 bd 29
> 0000010
> 
> Do you have any idea what setting could cause this?

Windows tries to interpret UTF-8 as something else?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#28180; Package emacs. (Tue, 29 Aug 2017 22:07:02 GMT) Full text and rfc822 format available.

Message #17 received at 28180 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> users.sourceforge.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 28180 <at> debbugs.gnu.org
Subject: Re: bug#28180: [w32] Unicode characters in subprocess (git) arguments
 changed to space
Date: Tue, 29 Aug 2017 18:06:01 -0400
On Mon, Aug 28, 2017 at 10:42 AM, Noam Postavsky
<npostavs <at> users.sourceforge.net> wrote:

> Just to understand the issue better, I applied the attached diff to
> use CreateProcessW. It seemed to work, but only when I start emacs
> from mingw's msys shell. When running from cmd.exe it still translates
> to space.

Ugh, it's just a stupid mistake on my part, I changed the argument
encoding to utf8 in make-process instead of call-process. It
*appeared* to work because msys sets LANG to en_US.UTF-8.




This bug report was last modified 7 years and 290 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.