Package: emacs;
View this message in rfc822 format
From: Eli Zaretskii <eliz <at> gnu.org> To: Ke Wu <ellpih <at> zohomail.jp> Cc: 71472 <at> debbugs.gnu.org Subject: bug#71472: [PATCH] Add pty support by using ConPTY on Windows Date: Tue, 11 Jun 2024 10:27:04 +0300
[Please use Reply All to reply, to keep the bug tracker CC'ed.] > Date: Tue, 11 Jun 2024 12:34:48 +0900 > From: Ke Wu <ellpih <at> zohomail.jp> > > > If we must use UTF-8 as the only encoding to talk to sub-processes via > > ConPTY, that makes the number of applications that can be used this > > way very small, since most programs we are used to run as > > subprocesses, in particularly ports of GNU software like GCC, GDB, > > Grep, Find, and many others, cannot reliably talk to Emacs in UTF-8 > > encoding on MS-Windows. > > The statement is not so accurate. On Emacs side, UTF-8 is assumed due > to the limitation of ConPTY (it would communicate with the console only in > UTF-8). However, on the subprocesses side, ConPTY would respect its > codepage and translate it into UTF-8 when sending to the console. So > we can make these subprocesses run in the codepage other than > 65001(UTF-8). This is inaccurate: ConPTY always assumes the process running on the other side of the connection uses the system codepage. If the subprocess expects some other encoding, ConPTY will not know that, and Emacs has no way of telling ConPTY to use a different encoding. This is the essence of the issue I filed with them, and they basically told me that what ConPTY does is "by design". This is not an academic issue: some very important programs we invoke from Emacs need us to talk to them in encoding different from the system codepage. A notable example is Git, which wants UTF-8 (it can support other encodings, but that is not recommended, and Emacs doesn't really support that well on Windows). > I am not very familiar with these GNU software ports :( > Please let me know if there will be problems with ConPTY translating from > UTF-8 to other codepages. See above. There's no way for Emacs to set that up, except when the "other codepage" is the system codepage. > > https://github.com/microsoft/terminal/issues/9174 > > I think a possible solution to this issue is to use a wrapper program to > set the codepage for the applications that do not call `SetConsoleOutputCP`. > As a proof of concept, the following code snippet uses cmdproxy.exe to > change the codepage to 1255. Please replace the cmdproxy.exe path in the > snippet. > > (progn > (set-buffer > (apply #'make-term > "terminal" > "C:/Users/oracl/Documents/Programs/emacs-master/nt/cmdproxy.exe" > nil > '("-c" "chcp 1255 && call cmd"))) > (term-char-mode) > (pop-to-buffer-same-window "*terminal*")) > > The codepage can be verified by either using `chcp` in the newly created cmd process. > Also, the following hack can be applied to make the created conhost.exe visible. > Therefore, the codepage can be directly verified by viewing the properties of the > conhost.exe window. > > --- a/src/w32.c > +++ b/src/w32.c > @@ -11208,7 +11208,7 @@ make_console_with_pipe (ptrdiff_t nargs, Lisp_Object * args, const int * fds) > > command_new = CALLN (Flist, > build_string ("conhost.exe"), > - build_string ("--headless"), > + /* build_string ("--headless"), */ > build_string ("--feature"), > build_string ("pty")); > if (!NILP (width)) > > Therefore, we can have subprocesses run in codepage other than 65001 or the OEM default > codepage. And as a console program, Emacs talks in UTF-8. It may be feasible if we add a > `:coding` to function `term`, which builds up a wrapper to change the code page before the > real program starts. cmdproxy is only used when invoking programs via the shell. But Emacs also invokes programs directly (call-process etc.), in which case cmdproxy (or any other kind of wrapper) will be very problematic at best, if not impossible. See below about the complications this causes wrt quoting of command-line arguments, for example. Please keep in mind how Emacs arranges to use correct encoding when invoking other programs: we have data structures (process-coding-system-alist etc.) which define the correct encoding by program name, and we also have variables (coding-system-for-read etc.) that can be bound to override those defaults temporarily. The encoding is applied separately to the program's command-line arguments and to the stuff we write and read to and from the process. How can all this work reliably with ConPTY, even if the wrapper trick could sometimes work? Specifically: . how do we control encoding of command-line arguments? most programs running on Windows cannot handle UTF-8 encoded command lines . what if the encoding we need doesn't have a corresponding Windows codepage (which means chcp will not work)? . how can we handle the eol-conversion part of the encoding (some programs _must_ be fed with Unix EOLs)? Also please note that using a wrapper adds another layer of interpreting command-line arguments, which might break some complicated cases that use fancy quoting of special characters. Any wrapper we provide will be compiled with MinGW, so it will use the MinGW startup code to process quoting. But the program the wrapper runs might not be a MinGW program, so it could use different ways of processing quotes. The simplest example of such a combination is cmd.exe itself: its quoting rules are very different from what MinGW uses. This will definitely break some cases. For example, Git uses the '^' character for special purposes, and some Windows styles of quoting interpret '^' as a quote character -- this could easily break Emacs commands that invoke Git. If someone can figure out how to do all this stuff with ConPTY, then okay, we could use it. But it is not a trivial problem, not at all. The way ConPTY was designed is the way Windows works everywhere else: it doesn't allow applications to communicate with raw bytestreams without interpreting; instead, Windows _interprets_ the bytestreams as characters encoded in the encoding it assumes for the source, and then converts those characters to the encoding of the destination. This basic design principle is built into every part of Windows APIs. For example, a program whose 'main' function is declared as accepting wchar_t (i.e. UTF-16) command-line arguments will magically have the command-line arguments converted to UTF-16, even if the calling process uses plain ASCII. ConPTY uses the same design principles, so it is inherently unable to pass through raw bytes without interpreting them. And without that, we cannot easily implement the way Emacs expects this stuff to work, because Emacs assumes the encoding to be a private contract between Emacs and the program it calls, with nothing in-between interfering. I hope I explained some of the issues with ConPTY, and why we cannot install its support without some reasonably reliable solutions for those problematic aspects. Thanks.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.