GNU bug report logs - #2940
[macOS] C-s in dired fails to find umlauts in filenames (due to wrong file-name-coding-system)

Previous Next

Package: emacs;

Reported by: Markus Triska <markus.triska <at> gmx.at>

Date: Thu, 9 Apr 2009 16:35:04 UTC

Severity: normal

To reply to this bug, email your comments to 2940 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Thu, 09 Apr 2009 16:35:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Markus Triska <markus.triska <at> gmx.at>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 09 Apr 2009 16:35:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Markus Triska <markus.triska <at> gmx.at>
To: emacs-pretest-bug <at> gnu.org
Subject: 23.0.92; C-s in dired fails to find files with umlauts
Date: Thu, 09 Apr 2009 18:28:48 +0200
With ~/töst.txt existing, when I do:

   $ emacs -Q ~/

and press:

   C-\ german-postfix RET C-s oe RET

to search for "ö" in the dired buffer (the input method correctly
converts the entered "oe" to "ö" in the minibuffer), I get:

   Failing wrapped I-search [DE<]: ö

C-u C-x = on the "ö" in the dired buffer yields:

             character: o (111, #o157, #x6f)
     preferred charset: ascii (ASCII (ISO646 IRV))
            code point: 0x6F
                syntax: w 	which means: word
              category: .:Base, a:ASCII, l:Latin, r:Roman
           buffer code: #x6F
             file code: #x6F (encoded by coding system utf-8-unix)
               display: composed to form "ö" (see below)

     Composed with the following character(s) "̈" using this font:
       xft:-unknown-Cochin-normal-normal-normal-*-20-*-*-*-*-0-iso10646-1
     by these glyphs:
       [0 1 111 82 11 1 10 8 0 nil]
       [0 1 776 235 6 0 6 12 -9 [-9 -1 0]]

     Character code properties: customize what to show
       name: LATIN SMALL LETTER O
       general-category: Ll (Letter, Lowercase)

     There are text properties here:
       dired-filename       t
       fontified            t
       help-echo            "mouse-2: visit this file in other window"
       mouse-face           highlight

C-u C-x = on the first "t" in "töst.txt" yields:

             character: t (116, #o164, #x74)
     preferred charset: ascii (ASCII (ISO646 IRV))
            code point: 0x74
                syntax: w 	which means: word
              category: .:Base, a:ASCII, l:Latin, r:Roman
           buffer code: #x74
             file code: #x74 (encoded by coding system utf-8-unix)
               display: by this font (glyph code)
         xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x57)

     Character code properties: customize what to show
       name: LATIN SMALL LETTER T
       general-category: Ll (Letter, Lowercase)

     There are text properties here:
       dired-filename       t
       fontified            t
       help-echo            "mouse-2: visit this file in other window"
       mouse-face           highlight


In GNU Emacs 23.0.92.3 (i386-apple-darwin9.6.1, GTK+ Version 2.14.7)
 of 2009-04-09 on mt-imac.local
Windowing system distributor `The X.Org Foundation', version 11.0.10402000
configured using `configure  '--with-tiff=no''

Important settings:
  value of $LC_ALL: nil
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: nil
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: en.UTF-8
  value of $XMODIFIERS: nil
  locale-coding-system: utf-8-unix
  default-enable-multibyte-characters: t





Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Thu, 09 Apr 2009 17:30:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eli Zaretskii <eliz <at> gnu.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 09 Apr 2009 17:30:06 GMT) Full text and rfc822 format available.

Message #10 received at 2940 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Markus Triska <markus.triska <at> gmx.at>, 2940 <at> debbugs.gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Thu, 09 Apr 2009 20:22:06 +0300
> From: Markus Triska <markus.triska <at> gmx.at>
> Date: Thu, 09 Apr 2009 18:28:48 +0200
> Cc: 
> 
> 
> With ~/töst.txt existing, when I do:
> 
>    $ emacs -Q ~/
> 
> and press:
> 
>    C-\ german-postfix RET C-s oe RET
> 
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
> 
>    Failing wrapped I-search [DE<]: ö

What's your value of file-name-coding-system?  Does it help to say

  C-x RET c utf-8 RET C-x d

instead of just "C-x d"?




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Thu, 09 Apr 2009 17:40:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Markus Triska <markus.triska <at> gmx.at>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 09 Apr 2009 17:40:07 GMT) Full text and rfc822 format available.

Message #15 received at 2940 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Markus Triska <markus.triska <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 2940 <at> debbugs.gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Thu, 09 Apr 2009 19:33:37 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> What's your value of file-name-coding-system?

It is nil, and default-file-name-coding-system is 'utf-8.

> Does it help to say
>
>   C-x RET c utf-8 RET C-x d
>
> instead of just "C-x d"?

No, unfortunately not. Also for C-s it does not seem to make a
difference. When I enter an "ö" in *scratch*, C-u C-x = on it says:

              character: ö (246, #o366, #xf6)
      preferred charset: unicode (Unicode (ISO10646))
             code point: 0xF6
                 syntax: w 	which means: word
               category: .:Base, j:Japanese, l:Latin
               to input: type "oe" with german-postfix
            buffer code: #xC3 #xB6
              file code: #xC3 #xB6 (encoded by coding system utf-8-unix)
                display: by this font (glyph code)
          xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-20-*-*-*-m-0-iso10646-1 (#x7C)

      Character code properties: customize what to show
        name: LATIN SMALL LETTER O WITH DIAERESIS
        old-name: LATIN SMALL LETTER O DIAERESIS
        general-category: Ll (Letter, Lowercase)
        decomposition: (111 776) ('o' '̈')

      There are text properties here:
        fontified            t

This "ö" is thus also rendered with the expected font, in contrast to
the one in dired.




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Fri, 10 Apr 2009 02:15:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Miles Bader <miles <at> gnu.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 10 Apr 2009 02:15:03 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Miles Bader <miles <at> gnu.org>
To: Markus Triska <markus.triska <at> gmx.at>
Cc: 2940 <at> debbugs.gnu.org, emacs-pretest-bug <at> gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Fri, 10 Apr 2009 11:08:40 +0900
It looks to me like the problem is that you're on a mac, and [some?] mac
filesystems silently convert accented characters in filenames to
"composed form", which is different than the pre-composed characters
people tend to use.

Perhaps the new "ucs-normalize" code (which should be added soon I
think) would help:

> The attached is an Unicode normalization tool contributed by
> Kawabata-san.  It performs all the Unicode normalization
> NFC/NFD/NFKD/NFKC, and provides a coding system utf-8-hfs
> that is suitable to be used for Mac OS 8.1's file names.

[Search for "normalize.el" on recent emacs-devel messages]

-Miles

-- 
A zen-buddhist walked into a pizza shop and
said, "Make me one with everything."




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Fri, 10 Apr 2009 02:15:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Miles Bader <miles <at> gnu.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 10 Apr 2009 02:15:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Fri, 10 Apr 2009 06:00:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 10 Apr 2009 06:00:03 GMT) Full text and rfc822 format available.

Message #30 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Miles Bader <miles <at> gnu.org>
Cc: 2940 <at> debbugs.gnu.org, Markus Triska <markus.triska <at> gmx.at>,
        emacs-pretest-bug <at> gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Fri, 10 Apr 2009 01:53:19 -0400
> It looks to me like the problem is that you're on a mac, and [some?]
> mac filesystems silently convert accented characters in filenames to
> "composed form", which is different than the pre-composed characters
> people tend to use.

Indeed, that looks like the culprit (IIUC it's not done by the
filesystem, but by the OS itself before it passes the file names to the
filesystem, so it applies to all filesystems).

> Perhaps the new "ucs-normalize" code (which should be added soon I
> think) would help:

Rather than "perhaps", it should say "supposedly".   Please try out this
new ucs-normalize package and tell us if it solves your problem and/or
suffers from other problems.  It likely won't make it for Emacs-23.1 but
should be included in Emacs-23.2.


        Stefan




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Fri, 10 Apr 2009 06:00:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 10 Apr 2009 06:00:04 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Fri, 10 Apr 2009 10:55:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Markus Triska <markus.triska <at> gmx.at>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 10 Apr 2009 10:55:04 GMT) Full text and rfc822 format available.

Message #40 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Markus Triska <markus.triska <at> gmx.at>
To: gnu-emacs-bug <at> moderators.isc.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Fri, 10 Apr 2009 12:48:27 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> Please try out this new ucs-normalize package and tell us if it solves
> your problem and/or suffers from other problems.

M-x eval-buffer on the (ucs-)normalize.el file posted at:

   http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html

yields:

   End of file during parsing

After I insert an additional closing parenthesis on line 128 (after the
(defconst ucs-normalize-composition-exclusions ...), it yields:

   Symbol's value as variable is void: in






Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Fri, 10 Apr 2009 11:40:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 10 Apr 2009 11:40:05 GMT) Full text and rfc822 format available.

Message #45 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Markus Triska <markus.triska <at> gmx.at>, 2940 <at> debbugs.gnu.org
Cc: gnu-emacs-bug <at> moderators.isc.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Fri, 10 Apr 2009 20:34:01 +0900
In article <m27i1syebo.fsf <at> gmx.at>, Markus Triska <markus.triska <at> gmx.at> writes:

> Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> > Please try out this new ucs-normalize package and tell us if it solves
> > your problem and/or suffers from other problems.

> M-x eval-buffer on the (ucs-)normalize.el file posted at:

>    http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html

> yields:

>    End of file during parsing

The above page puts extra ";" at line 127.  Why does that happen?

Anyway, it seems that the posted ucs-normalize.el (and the
original normalize.el) has a bug.  I'm now asking
Kawabata-san to fix it.

---
Kenichi Handa
handa <at> m17n.org





Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Fri, 10 Apr 2009 11:40:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 10 Apr 2009 11:40:07 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Thu, 13 Aug 2009 12:35:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 13 Aug 2009 12:35:04 GMT) Full text and rfc822 format available.

Message #55 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Markus Triska <markus.triska <at> gmx.at>, 2940 <at> debbugs.gnu.org
Cc: gnu-emacs-bug <at> moderators.isc.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with
 umlauts
Date: Thu, 13 Aug 2009 21:25:06 +0900
In article <m27i1syebo.fsf <at> gmx.at>, Markus Triska <markus.triska <at> gmx.at> writes:

> Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> > Please try out this new ucs-normalize package and tell us if it solves
> > your problem and/or suffers from other problems.

> M-x eval-buffer on the (ucs-)normalize.el file posted at:

>    http://lists.gnu.org/archive/html/emacs-devel/2009-04/msg00185.html

> yields:

>    End of file during parsing

I've just committed a new version of ucs-normalize.el.
Could you please try it?

By the way, it contains several autoload cookies.  Should I
re-generate loaddefs.el, copy it to ldefs-boot.el, and
commit it?

---
Kenichi Handa
handa <at> m17n.org




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#2940; Package emacs. (Thu, 13 Aug 2009 12:35:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 13 Aug 2009 12:35:06 GMT) Full text and rfc822 format available.

Added tag(s) moreinfo. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 26 Jan 2010 01:11:02 GMT) Full text and rfc822 format available.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Sun, 10 Jul 2011 18:27:01 GMT) Full text and rfc822 format available.

Message #65 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Markus Triska <markus.triska <at> gmx.at>
Cc: 2940 <at> debbugs.gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Sun, 10 Jul 2011 14:26:27 -0400
It was suggested that the ucs-normalize library, which has been part of
Emacs for some time, should fix this. Does it? Please reply and let us
know if you still see this problem in 23.3.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Sun, 10 Jul 2011 21:14:02 GMT) Full text and rfc822 format available.

Message #68 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Markus Triska <markus.triska <at> gmx.at>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 2940 <at> debbugs.gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Sun, 10 Jul 2011 23:11:51 +0200
Glenn Morris <rgm <at> gnu.org> writes:

> It was suggested that the ucs-normalize library, which has been part of
> Emacs for some time, should fix this. Does it?

Thank you, it does if I add this to ~/.emacs:

   (require 'ucs-normalize)
   (setq file-name-coding-system 'utf-8-hfs)




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Mon, 11 Jul 2011 02:02:02 GMT) Full text and rfc822 format available.

Message #71 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Markus Triska <markus.triska <at> gmx.at>
Cc: 2940 <at> debbugs.gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Sun, 10 Jul 2011 22:01:44 -0400
Markus Triska wrote:

> Thank you, it does if I add this to ~/.emacs:
>
>    (require 'ucs-normalize)
>    (setq file-name-coding-system 'utf-8-hfs)

I know nothing about this area: Is this an acceptable solution (in which
case I will close this report); or should it work out-of-the-box with no
configuration?




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Mon, 11 Jul 2011 02:55:01 GMT) Full text and rfc822 format available.

Message #74 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 2940 <at> debbugs.gnu.org, markus.triska <at> gmx.at
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Mon, 11 Jul 2011 05:56:18 +0300
> From: Glenn Morris <rgm <at> gnu.org>
> Date: Sun, 10 Jul 2011 22:01:44 -0400
> Cc: 2940 <at> debbugs.gnu.org
> 
> Markus Triska wrote:
> 
> > Thank you, it does if I add this to ~/.emacs:
> >
> >    (require 'ucs-normalize)
> >    (setq file-name-coding-system 'utf-8-hfs)
> 
> I know nothing about this area: Is this an acceptable solution (in which
> case I will close this report); or should it work out-of-the-box with no
> configuration?

It could be that Emacs should do this on that platform automatically,
yes.  But some Darwin expert should look into this and provide
feedback, before we decide.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Mon, 11 Jul 2011 16:23:02 GMT) Full text and rfc822 format available.

Message #77 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Markus Triska <markus.triska <at> gmx.at>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 2940 <at> debbugs.gnu.org
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Mon, 11 Jul 2011 18:21:30 +0200
Glenn Morris <rgm <at> gnu.org> writes:

>>    (require 'ucs-normalize)
>>    (setq file-name-coding-system 'utf-8-hfs)
>
> I know nothing about this area: Is this an acceptable solution (in
> which case I will close this report); or should it work out-of-the-box
> with no configuration?

In my personal use, I expected it to work out of the box. It was also
initially unclear to me that you need to explicitly require
ucs-normalize in order to use utf8-8-hfs. When you do, in "emacs -Q":

   (setq file-name-coding-system 'utf-8-hfs)

you cannot do much anymore with that Emacs instance, since you get
"Invalid coding system: utf-8-hfs" on almost all key presses. This is a
general issue though that also happens when you mistype a coding system.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Mon, 11 Jul 2011 22:03:02 GMT) Full text and rfc822 format available.

Message #80 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Alp Aker <aker <at> pitt.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Glenn Morris <rgm <at> gnu.org>, 2940 <at> debbugs.gnu.org, markus.triska <at> gmx.at
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Mon, 11 Jul 2011 18:02:16 -0400 (EDT)
Eli Zaretskii wrote:

>>>    (require 'ucs-normalize)
>>>    (setq file-name-coding-system 'utf-8-hfs)
>
> It could be that Emacs should do this on that platform automatically, 
> yes.  But some Darwin expert should look into this and provide feedback, 
> before we decide.

I'm no expert, but it doesn't look as if this is necessary. 
/lisp/term/ns-win.el already defines a coding system utf-8-nfd that 
performs normalization and it sets that as the value of 
file-name-coding-system.  This takes care of the fact that the HFS+ 
filesystem uses decomposed file names, and indeed I can't reproduce (in 
either 24.0.50 or 23.3) the behavior described in the original bug report.

OTOH, the code in question has been present in ns-win.el since the NS code 
was first merged into the main branch (rev 89434), so I'm not sure how the 
OP's problem arose in the first place.





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Fri, 15 Jul 2011 20:39:02 GMT) Full text and rfc822 format available.

Message #83 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Alp Aker <aker <at> pitt.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 2940 <at> debbugs.gnu.org, markus.triska <at> gmx.at
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Fri, 15 Jul 2011 16:38:08 -0400
Alp Aker wrote:

> OTOH, the code in question has been present in ns-win.el since the NS
> code was first merged into the main branch (rev 89434), so I'm not
> sure how the OP's problem arose in the first place.

IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
happens to be running on a Mac. So ns-win.el isn't in use.




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Sat, 16 Jul 2011 17:39:01 GMT) Full text and rfc822 format available.

Message #86 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Alp Aker <aker <at> pitt.edu>
To: Glenn Morris <rgm <at> gnu.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 2940 <at> debbugs.gnu.org, markus.triska <at> gmx.at
Subject: Re: bug#2940: 23.0.92; C-s in dired fails to find files with umlauts
Date: Sat, 16 Jul 2011 13:38:19 -0400 (EDT)
Glenn Morris wrote:

> IIUC, he's not using a --with-ns build. It's a "normal", gtk build that
> happens to be running on a Mac. So ns-win.el isn't in use.

My mistake; since it was running on Darwin I just assumed an NS build, and 
didn't look at the build info in the original bug report.

Making this the default behavior for non-NS builds running on a Mac is 
probably TRT.  It was once possible to use Darwin with UFS, but that 
hasn't been true for the last three major versions, so going forward it 
will be a vanishingly rare case where (eq system-type 'darwin) doesn't 
imply that the file system is a variant of HFS+.  And it's reasonable for 
users to expect that Emacs will, out of the box, properly handle file 
names on the system it was built on.

OTOH, just adding something like:

 (when (eq system-type 'darwin)
    (require 'ucs-normalize)
    (setq file-name-coding-system 'utf-8-hfs))

to x-win.el might not be the best solution.  The utf-8-hfs coding system 
does both post-read conversion (normalizing to precomposed utf-8) and 
pre-write conversion (normalizing to Apple's variant of decomposed utf-8). 
The latter is unnecessary:  the OS itself will do normalization on any 
filename handed to it.  (Observe that the coding system defined in 
ns-win.el only does post-read conversion.)

For local operations, the redundant pre-write conversion is harmless. 
But using decomposed utf-8 might cause trouble when dealing with remote 
files.  So it's probably more robust to follow ns-win.el's lead and define 
a coding system that only does post-read conversion.  Thus:

  (when (eq system-type 'darwin)
    (require 'ucs-normalize)
    (define-coding-system 'utf-8-hfs-for-read
      "UTF-8 based coding system for HFS+ file names."
      :coding-type 'utf-8
      :mnemonic ?U
      :charset-list '(unicode)
      :post-read-conversion 'ucs-normalize-hfs-nfd-post-read-conversion)
    (setq file-name-coding-system 'utf-8-hfs-for-read))

would be the addition to make to x-win.el.





bug reassigned from package 'emacs' to 'emacs,ns'. Request was from Lars Magne Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sun, 18 Sep 2011 08:28:01 GMT) Full text and rfc822 format available.

Removed tag(s) moreinfo. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 25 Dec 2015 23:02:02 GMT) Full text and rfc822 format available.

Changed bug title to '[macOS] C-s in dired fails to find umlauts in filenames (due to wrong file-name-coding-system)' from '23.0.92; C-s in dired fails to find files with umlauts' Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 17 Jul 2019 17:12:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Sat, 02 Nov 2019 06:13:02 GMT) Full text and rfc822 format available.

Message #95 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Markus Triska <markus.triska <at> gmx.at>
Cc: 2940 <at> debbugs.gnu.org
Subject: Re: 23.0.92; C-s in dired fails to find files with umlauts
Date: Sat, 02 Nov 2019 07:12:35 +0100
Markus Triska <markus.triska <at> gmx.at> writes:

> With ~/töst.txt existing, when I do:
>
>    $ emacs -Q ~/
>
> and press:
>
>    C-\ german-postfix RET C-s oe RET
>
> to search for "ö" in the dired buffer (the input method correctly
> converts the entered "oe" to "ö" in the minibuffer), I get:
>
>    Failing wrapped I-search [DE<]: ö

I can't reproduce this on current master.  Are you still seeing this
on a modern version of Emacs?

If I don't hear back from you within a couple of weeks, I'll just
close this bug as unreproducible.

Best regards,
Stefan Kangas




Added tag(s) moreinfo. Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Sat, 02 Nov 2019 06:13:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#2940; Package emacs. (Sat, 02 Nov 2019 09:18:02 GMT) Full text and rfc822 format available.

Message #100 received at 2940 <at> debbugs.gnu.org (full text, mbox):

From: "Markus Triska" <markus.triska <at> gmx.at>
To: "Stefan Kangas" <stefan <at> marxist.se>
Cc: 2940 <at> debbugs.gnu.org
Subject: Aw: Re: 23.0.92; C-s in dired fails to find files with umlauts
Date: Sat, 2 Nov 2019 10:17:24 +0100
> I can't reproduce this on current master. Are you still seeing this
> on a modern version of Emacs?

Yes, I can reproduce this exact same issue with Emacs 26.1 on OSX,
and also with a recent version of Debian.

All the best,
Markus




Removed tag(s) moreinfo. Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Thu, 13 Aug 2020 01:42:03 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 304 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.