GNU bug report logs - #75207
29.4; Path conversion from native codepage to UTF-8 fails when Windows is set by default to UTF-8

Previous Next

Package: emacs;

Reported by: michal <at> 0lock.xyz

Date: Mon, 30 Dec 2024 18:30:02 UTC

Severity: wishlist

Found in version 29.4

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #16 received at 75207 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michał Lach <michal <at> 0lock.xyz>
Cc: 75207 <at> debbugs.gnu.org
Subject: Re: Fwd: bug#75207: 29.4;
 Path conversion from native codepage to UTF-8 fails when Windows is
 set by default to UTF-8
Date: Fri, 03 Jan 2025 15:23:48 +0200
> Date: Fri, 03 Jan 2025 11:49:34 +0000
> From: Michał Lach <michal <at> 0lock.xyz>
> Cc: Eli Zaretskii <eliz <at> gnu.org>
> 
> Forgot to CC the bug report mail.
> 
> > Begin forwarded message:
> > 
> > From: <michal <at> 0lock.xyz>
> > Subject: RE: bug#75207: 29.4; Path conversion from native codepage to UTF-8 fails when Windows is set by default to UTF-8
> > Date: 3 January 2025 at 02:48:53 CET
> > To: "'Eli Zaretskii'" <eliz <at> gnu.org>
> > Reply-To: <michal <at> 0lock.xyz>
> > 
> > M-: (getenv "ENU") -> nil
> > M-: current-locale-environment -> "ENG"
> > M-: w32-ansi-code-page -> 65001
> > M-: (default-value 'buffer-file-coding-system) -> iso-latin-1-dos

OK.  I think I see the problem (and it is not specific to UTF-8
codepage), but just to be sure, please show some more values:

  M-: w32-multibyte-code-page RET
  M-: locale-coding-system RET
  M-: file-name-coding-system RET
  M-: default-file-name-coding-system RET

> > Here is the repro.
> > 1. Put a path to your "PATH" environmental variable with some diacritic 
> > character (ł in my case, maybe it won't work for some)
> > 2. M-: exec-path returns gibberish
> > 
> > Here, "Michał" becomes "MichaÅ‚", you can get a similar result if you do 
> > MultiByteToWideChar using Windows-1252 codepage on a UTF-8 path.

We think that PATH is encoded in Windows-1252 codepage, and the
question is why and where do we err.  The above additional values I
ask about might help answer that question.

> > I've digged around and it looks like codepage_for_filenames (src/w32.c) at 
> > somepoint returns the Windows-1252 codepage.
> > This is then passed to MultiByteToWideChar() and the scenario that I 
> > described above happens.
> > I've checked this hypothesis with API Monitor and this is what actually 
> > happens, I can attach a trace if you will find it useful.

Not necessary for now, thanks.

If I send you a C-level patch, are you able to build Emacs after
patching it, preferably the master branch of our Git repository?




This bug report was last modified 192 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.