GNU bug report logs - #12807
24.2; Emacs cannot edit file with funny Unicode characters in the file name on Windows

Previous Next

Packages: emacs, w32;

Reported by: Nils Gösche <cartan <at> cartan.de>

Date: Mon, 5 Nov 2012 21:02:01 UTC

Severity: wishlist

Merged with 7100, 15236

Found in versions 24.0.50, 24.2, 24.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 12807 in the body.
You can then email your comments to 12807 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#12807; Package emacs. (Mon, 05 Nov 2012 21:02:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Nils Gösche <cartan <at> cartan.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 05 Nov 2012 21:02:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nils Gösche <cartan <at> cartan.de>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.2;
	Emacs cannot edit file with funny Unicode characters in the file name
	on Windows
Date: Mon, 05 Nov 2012 21:52:05 +0100
Dear Sirs,

I keep a bunch of text files on my Windows 7 desktop containing my thoughts
about the solutions of chess problems I am trying to solve. Now, one of these
problems was composed by a Russian. So, I named the file Кузовков_Lösung.txt:
First the name of the Russian composer, then the German word for »solution«.
However, when I tried to edit that file in Emacs, I only got error messages,
probably because of the funny Unicode characters in the file name. (See below
for the exact wording of the messages.)

Another file with only English/German characters in the name,
Thorton_Lösung.txt, does not cause any trouble at all (oh, but it seems I
misspelled the name, actually).

(BTW, Notepad does not have any problems editing the same file. So, it is
not some weird, OS-related problem, either).

Regards,
Nils Gösche

======== End of bug report======


In GNU Emacs 24.2.1 (i386-mingw-nt6.1.7601)
 of 2012-08-29 on MARVIN
Windowing system distributor `Microsoft Corp.', version 6.1.7601
Configured using:
 `configure --with-gcc (4.6) --cflags
 -ID:/devel/emacs/libs/libXpm-3.5.8/include
 -ID:/devel/emacs/libs/libXpm-3.5.8/src
 -ID:/devel/emacs/libs/libpng-dev_1.4.3-1/include
 -ID:/devel/emacs/libs/zlib-dev_1.2.5-2/include
 -ID:/devel/emacs/libs/giflib-4.1.4-1/include
 -ID:/devel/emacs/libs/jpeg-6b-4/include
 -ID:/devel/emacs/libs/tiff-3.8.2-1/include
 -ID:/devel/emacs/libs/gnutls-3.0.9/include'

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: en_US
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: DEU
  value of $XMODIFIERS: nil
  locale-coding-system: cp1252
  default enable-multibyte-characters: t

Major mode: Lisp Interaction

Minor modes in effect:
  display-time-mode: t
  tooltip-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t

Recent input:
<down-mouse-1> <mouse-1> <down-mouse-1> <mouse-1> C-x 
C-f d e s k <tab> K u s o w k o w _ L ö s u n g . t 
x t C-g <help-echo> <down-mouse-1> <mouse-1> C-x C-f 
K u <backspace> <backspace> d e s k <tab> K u s o w 
k o s <backspace> w _ L ö s u n g . t x t <return> 
b l a r k <return> C-x C-s C-x k <return> C-z <down-mouse-1> 
<mouse-1> <down-mouse-1> <mouse-1> d x f o i p g d 
o j g <return> C-x C-s <down-mouse-1> <mouse-1> <down-mouse-1> 
<mouse-1> C-x k <return> y e s <return> C-z <down-mouse-1> 
<mouse-1> <return> C-x C-s <backspace> C-x C-s C-x 
k <return> C-z C-x k <return> C-z <down-mouse-1> <mouse-1> 
C-x k <return> C-z C-x C-f d e s k <tab> k <tab> <backspace> 
<tab> <tab> <down-mouse-1> <mouse-2> <end> F a r k 
. <return> C-x C-s C-x k <return> y e s <return> C-z 
M-x M-x C-g M-x r e p o r <tab> <return>

Recent messages:
Wrote c:/Users/cartan/Desktop/Thorton_Lösung.txt
Saving file c:/Users/cartan/Desktop/Thorton_Lösung.txt...
Wrote c:/Users/cartan/Desktop/Thorton_Lösung.txt
(New file) [2 times]
Making completion list...
Mark set
Saving file c:/Users/cartan/Desktop/????????_Lösung.txt...
basic-save-buffer-2: Opening output file: invalid argument, c:/Users/cartan/Desktop/????????_Lösung.txt
completing-read-default: Command attempted to use minibuffer while in minibuffer
Quit
Quit

Load-path shadows:
None found.

Features:
(shadow sort gnus-util mail-extr emacsbug message format-spec rfc822 mml
mml-sec mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail regexp-opt rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils help-mode easymenu view eliserv doctor server time cl time-date
tooltip ediff-hook vc-hooks lisp-float-type mwheel dos-w32 disp-table ls-lisp
w32-win w32-vars tool-bar dnd fontset image fringe lisp-mode register page
menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock syntax
facemenu font-core frame cham georgian utf-8-lang misc-lang vietnamese tibetan
thai tai-viet lao korean japanese hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese case-table epa-hook jka-cmpr-hook help simple
abbrev minibuffer loaddefs button faces cus-face files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote make-network-process multi-tty emacs)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12807; Package emacs. (Mon, 05 Nov 2012 21:52:01 GMT) Full text and rfc822 format available.

Message #8 received at 12807 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Nils Gösche <cartan <at> cartan.de>
Cc: 12807 <at> debbugs.gnu.org
Subject: Re: bug#12807: 24.2;
	Emacs cannot edit file with funny Unicode characters in the file
	name	on Windows
Date: Mon, 05 Nov 2012 23:47:45 +0200
> From: Nils Gösche <cartan <at> cartan.de>
> Date: Mon, 05 Nov 2012 21:52:05 +0100
> 
> I keep a bunch of text files on my Windows 7 desktop containing my thoughts
> about the solutions of chess problems I am trying to solve. Now, one of these
> problems was composed by a Russian. So, I named the file Кузовков_Lösung.txt:
> First the name of the Russian composer, then the German word for »solution«.
> However, when I tried to edit that file in Emacs, I only got error messages,
> probably because of the funny Unicode characters in the file name. (See below
> for the exact wording of the messages.)
> 
> Another file with only English/German characters in the name,
> Thorton_Lösung.txt, does not cause any trouble at all (oh, but it seems I
> misspelled the name, actually).

Emacs on Windows currently supports only file names that can be
expressed in the system codepage.  So unless someone writes the code
to support the Unicode APIs throughout, this limitation will remain
for some time to come.  Volunteers are welcome.

> (BTW, Notepad does not have any problems editing the same file. So, it is
> not some weird, OS-related problem, either).

Yes, but the Explorer and the Notepad are about the only programs that
do.  Many others don't.  Emacs is one of them.

Sorry.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12807; Package emacs,w32. (Mon, 05 Nov 2012 22:10:01 GMT) Full text and rfc822 format available.

Message #11 received at 12807 <at> debbugs.gnu.org (full text, mbox):

From: Nils Gösche <cartan <at> cartan.de>
To: "'Eli Zaretskii'" <eliz <at> gnu.org>
Cc: 12807 <at> debbugs.gnu.org
Subject: AW: bug#12807: 24.2;
	Emacs cannot edit file with funny Unicode characters in the file
	name	on Windows
Date: Mon, 5 Nov 2012 23:05:57 +0100
You wrote:

> Emacs on Windows currently supports only file names that can be
> expressed in the system codepage.  So unless someone writes the code to
> support the Unicode APIs throughout, this limitation will remain for
> some time to come.  Volunteers are welcome.

Subtle hint noted. Ok ok, I'll look into it.

> > (BTW, Notepad does not have any problems editing the same file. So,
> it
> > is not some weird, OS-related problem, either).
> 
> Yes, but the Explorer and the Notepad are about the only programs that
> do.  Many others don't.  Emacs is one of them.

»About the only« is a bit of an exaggeration ;-)  Anything that is written
in C# or Java shouldn't have that problem; or Common Lisp, come to think of
it. But yeah, back in the old days, pretty much nobody felt like using
wchar_t instead of char everywhere in C. I didn't, either, back then. (Not
to mention that in the really old days, wchar_t didn't even exist ;-)

I'll see what I can do.

Regards,
-- 
Nils Gösche
Don't ask for whom the <Ctrl-G> tolls.






Merged 7100 12807. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 05 Nov 2012 22:23:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12807; Package emacs,w32. (Tue, 06 Nov 2012 04:02:02 GMT) Full text and rfc822 format available.

Message #16 received at 12807 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Nils Gösche <cartan <at> cartan.de>
Cc: 12807 <at> debbugs.gnu.org
Subject: Re: AW: bug#12807: 24.2;
	Emacs cannot edit file with funny Unicode characters in the file
	name	on Windows
Date: Tue, 06 Nov 2012 05:57:45 +0200
> From: Nils Gösche <cartan <at> cartan.de>
> Cc: <12807 <at> debbugs.gnu.org>
> Date: Mon, 5 Nov 2012 23:05:57 +0100
> 
> > Yes, but the Explorer and the Notepad are about the only programs that
> > do.  Many others don't.  Emacs is one of them.
> 
> »About the only« is a bit of an exaggeration ;-)  Anything that is written
> in C# or Java shouldn't have that problem; or Common Lisp, come to think of
> it. But yeah, back in the old days, pretty much nobody felt like using
> wchar_t instead of char everywhere in C. I didn't, either, back then. (Not
> to mention that in the really old days, wchar_t didn't even exist ;-)

Using wchar_t is not going to solve the whole problem, unfortunately.
The problem is that the mainline Emacs code uses APIs that don't
accept wide characters.  Examples include 'stat', 'access', 'open',
'fopen', etc.  To fix the problem, we'd need to provide our own
implementation of these APIs that would accept a UTF-8 encoded file
name, then re-encode the file name in UTF-16, and call the Unicode
APIs as part of the implementation.  This is a large job.





Merged 7100 12807 15236. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 01 Sep 2013 19:17:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 10 Jan 2014 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 214 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.