From unknown Fri Jun 20 07:15:58 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#52816 <52816@debbugs.gnu.org> To: bug#52816 <52816@debbugs.gnu.org> Subject: Status: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows Reply-To: bug#52816 <52816@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:15:58 +0000 retitle 52816 28.0.90; misspelling of windows-nt system-type in reset-langu= age-environment, enquiring about default-process-coding-systems on MS-Windo= ws reassign 52816 emacs submitter 52816 Ioannis Kappas severity 52816 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 27 05:52:32 2021 Received: (at submit) by debbugs.gnu.org; 27 Dec 2021 10:52:32 +0000 Received: from localhost ([127.0.0.1]:42241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n1ncG-00084j-88 for submit@debbugs.gnu.org; Mon, 27 Dec 2021 05:52:32 -0500 Received: from lists.gnu.org ([209.51.188.17]:36898) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n1ncD-00084b-Mg for submit@debbugs.gnu.org; Mon, 27 Dec 2021 05:52:30 -0500 Received: from eggs.gnu.org ([209.51.188.92]:50176) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n1ncC-0001Eg-As for bug-gnu-emacs@gnu.org; Mon, 27 Dec 2021 05:52:29 -0500 Received: from [2607:f8b0:4864:20::22b] (port=35556 helo=mail-oi1-x22b.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n1ncA-0007B9-Gl; Mon, 27 Dec 2021 05:52:28 -0500 Received: by mail-oi1-x22b.google.com with SMTP id m6so24822871oim.2; Mon, 27 Dec 2021 02:52:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to:cc; bh=QdrDJgIyOiaxZebnfl408/oLR9uCd18WanA8vBZROlk=; b=H5D0Uw+AIeqyqhE564QHkyTkk5Ky+HoJphhTMieGhiXD1DdDq+4Dy4QlsLxLwopTgM d7nNgavOwxjVYGtzuozs4G+RAZIBtKFpgYpAlw5pn/UXdFi6uKpOYLs0i2wuS5XlmmQO t4FHQnGQsVm32q7AMhGSmCpgBj2VLzV5tkUSvfQ2eDpDMtENQ8HLLM1UR8d1HNuf2nCO 0Y1AVRWtz8bFuMG6xXQnbloIHUz4s26mjXH2RuVXigaT5rjuf7MduL55QnUpUfOGIC1j DvI3NvQn4B8ojb1kukW2iPnAI/u9BvmuPvhBeCplhmj/qxtP3IDpI9JW3xg0hafrhX1I p3QA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=QdrDJgIyOiaxZebnfl408/oLR9uCd18WanA8vBZROlk=; b=D3Dv7LRtZ2DTZtlmMSBet3DZxN8jx8Q7h4ta9bCSjCQhiIxBVohi+kLvnqO0TJsPFQ AgHiWCfwdb0nPqO+WwuTA3vBFNa7KqMrnlLt5B5a/uPZA23xqkuCnpruCJRukag1afib uIfOUwdSxwJErWMC5P1QYve/9ZPRS1dOHG+aNh+zbqvTER4SPLaJRErBmDq0EfihKx8i AdDCIllIRK8rh3Nphw5C0W05VnVxuaQbkRI+wtKQqDlG80dnkYDV4LRhDjO1/SqhksvG 7mfsDu5rrco2TZN+4VUV1u4ql8+rXyVTY5LjHMCt3GzsWVEQJrkSmozhIz7apnLdPueN Xviw== X-Gm-Message-State: AOAM531NZ1PTEHD7+fpbqv0dlKGyEPxduu36kCCNt/NAYGTX9gAEXIwT 09ospTrX4fmT/M0SA0Nju4513KVm5eK98pAivpjkBit2XpM= X-Google-Smtp-Source: ABdhPJwFEPl95Lc5tvYml92w1td3OhtWJlNlG3QAWw3WfITkq36AUFw/0beK5XXs6LVCdGIz/ARvahMwMSymwVrlCFs= X-Received: by 2002:a05:6808:2396:: with SMTP id bp22mr13202984oib.78.1640602344375; Mon, 27 Dec 2021 02:52:24 -0800 (PST) MIME-Version: 1.0 From: Ioannis Kappas Date: Mon, 27 Dec 2021 10:52:20 +0000 Message-ID: Subject: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows To: bug-gnu-emacs@gnu.org Content-Type: text/plain; charset="UTF-8" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::22b (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::22b; envelope-from=ioannis.kappas@gmail.com; helo=mail-oi1-x22b.google.com X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: submit Cc: rgm@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Hi, there appears to have been a symbol spelling mistake in a recent commit a4bfb0bc5c14e002c0926fc320aeb4a3fc261447 to "Default Emacs to UTF-8 instead of Latin-1". The `window-nt' symbol is used instead of `windows-nt' when checking for membership in `system-type`. It does not appear to be of much consequence though, since both `default-file-name-coding-system' and `default-process-coding-system` affected by it appear to be overwritten later on anyway at runtime. A fix could be diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el index a0a6557c95..2b52d4bf86 100644 --- a/lisp/international/mule-cmds.el +++ b/lisp/international/mule-cmds.el @@ -1873,7 +1873,7 @@ reset-language-environment (set-default-coding-systems nil) (setq default-sendmail-coding-system 'utf-8) (setq default-file-name-coding-system (if (memq system-type - '(window-nt ms-dos)) + '(windows-nt ms-dos)) 'iso-latin-1-unix 'utf-8-unix)) ;; Preserve eol-type from existing default-process-coding-systems. @@ -1892,9 +1892,9 @@ reset-language-environment (condition-case nil (coding-system-change-text-conversion (cdr default-process-coding-system) - (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)) + (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8)) (coding-system-error - (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))))) + (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8))))) (setq default-process-coding-system (cons output-coding input-coding))) I happen to notice this while looking at the default data encoding behaviour when sending data to a sub-process using `call-process-region' on MS-Windows, which I found to differ when compared out of the box to other OSes. When I was sending a UTF-8 region to a sub-process, I was expecting the data reaching the sub-process to have that encoding, though that is not the default behaviour on MS-Windows, which I found confusing, the data are most likely to arrive encoded as iso-8859-1. On GNU/Linux though it is most likely to arrive encoded as UTF-8. I was expecting at first that the encoding of the data sent to the sub-process would be determined by the region's codepage, but it rather seems to be determined by the `default-process-coding-system' (or by a particular sub-process' `process-coding-system-alist', when set). This is all fine, and is document as such. I think the expectation at this time and age is for communication between processes should be in Unicode by default, so as to allow multilingual sets to passed on between them. `flycheck' is an example of a utility which is using `call-process' to marshal buffers to/from a checker sub-processes. Sending multilingual data to the checkers on MS-Windows are likely to cause failure due to the default proc encodings being `undecided-unix', and thus encoded as iso-8859-1 dropping the unicode chars. On GNU/Linux the same operation is most likely to succeed, because the default encoding is most likely to be set to `utf8-unix', courtesy of the LANG env variable being most likely set to a UTF-8 codepage such as `C.UTF-8', and picked up by the locale logic in Emacs. The default process coding system is forced in lisp/w32-fns.el:w32-set-default-process-coding-system: ;; Most programs on Windows will accept Unix line endings on input ;; (and some programs ported from Unix require it) but most will ;; produce DOS line endings on output. (setq default-process-coding-system '(undecided-dos . undecided-unix)) Is now perhaps a good time perhaps now that the utf-8 adaptation is almost universal, to change the default from undecided to utf-8 and thus align it (more or less) with the the most likely out of the box encoding behaviour on GNU/Linux? Of course, a user can set the LANG env variable on MS-Windows to a similar codepage as in Linux, but is rather unlikely a user would ever set this on windows. Also, should the eol type be set to -dos on the input encoding? The comment suggests that this was done because most programs back then were requiring unix eols, but I don't believe that this is the case any more. A final note, the documentation under `Default Coding Systems` gives a warning that `undecided' coding systems do not work reliably with asynchronous sub-process output, perhaps this is an additional argument while we should move away from the undecided default above? https://www.gnu.org/software/emacs/manual/html_node/elisp/Default-Coding-Systems.html """ Warning: Coding systems such as undecided, which determine the coding system from the data, do not work entirely reliably with asynchronous subprocess output. This is because Emacs handles asynchronous subprocess output in batches, as it arrives. If the coding system leaves the character code conversion unspecified, or leaves the end-of-line conversion unspecified, Emacs must try to detect the proper conversion from one batch at a time, and this does not always work. """ Thanks! In GNU Emacs 28.0.90 (build 1, x86_64-w64-mingw32) of 2021-12-26 Repository revision: 89a82182cbca0caa19f5b9463629918b7131ef0c Repository branch: emacs-28 Windowing system distributor 'Microsoft Corp.', version 10 System Description: Microsoft Windows 10 From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 27 12:26:28 2021 Received: (at 52816-done) by debbugs.gnu.org; 27 Dec 2021 17:26:28 +0000 Received: from localhost ([127.0.0.1]:44187 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n1tlT-0002FZ-VO for submit@debbugs.gnu.org; Mon, 27 Dec 2021 12:26:28 -0500 Received: from eggs.gnu.org ([209.51.188.92]:40020) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1n1tlS-0002FK-PS for 52816-done@debbugs.gnu.org; Mon, 27 Dec 2021 12:26:27 -0500 Received: from [2001:470:142:3::e] (port=33986 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n1tlN-0004wR-IX; Mon, 27 Dec 2021 12:26:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=1smWBouaXR9dRSmpHfDX9O4DOZ2aezHGdKJAr6UzuuM=; b=WvQPiM2kAh7A BrNfefpszQ4yVMunbRInDsi++n1yu+zrSyAieufVfnMfNkXfkaqRPk5ILjd0BP9V26UNIpUBiNM67 q4Ne3l9oCiIeV893NH0jUMotCz+2NtpxMZF2XxmEzY/Qd0tHEyHsJrGMkOESgUP/Ux5HYs3Aoj/oy SI8FJE6UrK5weMbgLnrUaqsBw5OoPGSTDMUgHSgJeqU0LwlYO613Xd3SqUEMcXC8bprZzowvy4zKS +51rtcTQqV6Zw7xr2+oPxvX7+HQ4r1HPkN+ZC0NKv3xK2XWN7KmJCompYmAFP15FUM/U6PGQnNra2 44DBvYU1vSOJwdt8VS5+Aw==; Received: from [87.69.77.57] (port=4123 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n1tlN-00019J-DH; Mon, 27 Dec 2021 12:26:21 -0500 Date: Mon, 27 Dec 2021 19:26:17 +0200 Message-Id: <83k0fqm01y.fsf@gnu.org> From: Eli Zaretskii To: Ioannis Kappas In-Reply-To: (message from Ioannis Kappas on Mon, 27 Dec 2021 10:52:20 +0000) Subject: Re: bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows References: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 52816-done Cc: rgm@gnu.org, 52816-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Ioannis Kappas > Date: Mon, 27 Dec 2021 10:52:20 +0000 > Cc: rgm@gnu.org > > there appears to have been a symbol spelling mistake in a recent > commit a4bfb0bc5c14e002c0926fc320aeb4a3fc261447 to "Default Emacs to > UTF-8 instead of Latin-1". The `window-nt' symbol is used instead of > `windows-nt' when checking for membership in `system-type`. Thanks, fixed. > It does not appear to be of much consequence though, since both > `default-file-name-coding-system' and `default-process-coding-system` > affected by it appear to be overwritten later on anyway at runtime. Yes, because fortunately those typos are in reset-language-environment, which is immediately followed by the likes of set-language-environment. IOW, those are defaults that are never seen in real usage. > I think the expectation at this time and age is for communication > between processes should be in Unicode by default, so as to allow > multilingual sets to passed on between them. We still cannot use UTF-8 by default for process I/O on MS-Windows, as UTF-8 is still not a first-class citizen there. Latest versions of Windows support it better, but not as well as other (fixed-length) encoding, and AFIK even that incomplete support needs that the user turns on an optional feature that is meant for developers. > `flycheck' is an example of a utility which is using `call-process' to > marshal buffers to/from a checker sub-processes. Sending multilingual > data to the checkers on MS-Windows are likely to cause failure due to > the default proc encodings being `undecided-unix', and thus encoded as > iso-8859-1 dropping the unicode chars. On GNU/Linux the same operation > is most likely to succeed, because the default encoding is most likely > to be set to `utf8-unix', courtesy of the LANG env variable being most > likely set to a UTF-8 codepage such as `C.UTF-8', and picked up by the > locale logic in Emacs. If the checker sub-processes used with flycheck indeed support UTF-8 I/O (I sincerely doubt that, unless you are using Cygwin or MSYS programs, not native MS-Windows programs), then your customizations of flycheck should ensure it uses UTF-8 for communicating with those programs. We have process-coding-system-alist for that purpose. > Also, should the eol type be set to -dos on the input encoding? The > comment suggests that this was done because most programs back then > were requiring unix eols, but I don't believe that this is the case > any more. What comes _from_ a subprocess on Windows can have DOS-style CRLF EOLs, so using -dos in that case makes sure we decode the EOLs correctly, and don't leave ^M characters in the text that ends up in Emacs buffers. What goes _to_ a subprocess can have Unix EOLs because MS-Windows programs don't mind if they get LF without a CR. > A final note, the documentation under `Default Coding Systems` gives a > warning that `undecided' coding systems do not work reliably with > asynchronous sub-process output, perhaps this is an additional > argument while we should move away from the undecided default above? No, because the problems with UTF-8 on Windows are worse. I'm closing this bug, as the problem you reported is now fixed on the emacs-28 branch. From unknown Fri Jun 20 07:15:58 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 25 Jan 2022 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator