GNU bug report logs - #15803
default-file-name-coding-system: utf-8 better than latin-1 these days?

Previous Next

Package: emacs;

Reported by: Glenn Morris <rgm <at> gnu.org>

Date: Mon, 4 Nov 2013 18:46:01 UTC

Severity: normal

Tags: fixed

Found in version 24.3

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #21 received at 15803 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Glenn Morris <rgm <at> gnu.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 15803 <at> debbugs.gnu.org
Subject: Re: bug#15803: default-file-name-coding-system: utf-8 better than
 latin-1 these days?
Date: Wed, 09 Sep 2020 15:15:09 +0200
Glenn Morris <rgm <at> gnu.org> writes:

> utf-8 is the sensible, "modern" (ie, non-ancient) default.
> If there is no reason to use latin-1, Emacs should use utf-8.
> I'm not claiming it's critical.
>
> Take it or leave it, as you wish.

That was the final message in the thread.  Glenn's patch from six years
ago no longer applied, so I've respun it for Emacs 28 now (included
below).

Glenn's arguments make sense to me, but I'm not a domain expert here.
Does anybody object to applying this patch to Emacs 28?

diff --git a/doc/emacs/mule.texi b/doc/emacs/mule.texi
index 6eff0ca0d2..b78019020a 100644
--- a/doc/emacs/mule.texi
+++ b/doc/emacs/mule.texi
@@ -1215,11 +1215,8 @@ File Name Coding
 
   If @code{file-name-coding-system} is @code{nil}, Emacs uses a
 default coding system determined by the selected language environment,
-and stored in the @code{default-file-name-coding-system} variable.
-@c FIXME?  Is this correct?  What is the "default language environment"?
-In the default language environment, non-@acronym{ASCII} characters in
-file names are not encoded specially; they appear in the file system
-using the internal Emacs representation.
+and stored in the @code{default-file-name-coding-system} variable
+(normally UTF-8).
 
 @cindex file-name encoding, MS-Windows
 @vindex w32-unicode-filenames
diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index ccc8ac9f9e..e3155dfc52 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -1799,13 +1799,11 @@ reset-language-environment
    'raw-text)
 
   (set-default-coding-systems nil)
-  (setq default-sendmail-coding-system 'iso-latin-1)
-  ;; On Darwin systems, this should be utf-8-unix, but when this file is loaded
-  ;; that is not yet defined, so we set it in set-locale-environment instead.
-  ;; [Actually, it seems to work fine to use utf-8-unix here, and not just
-  ;; on Darwin.  The previous comment seems to be outdated?
-  ;; See patch at https://debbugs.gnu.org/15803 ]
-  (setq default-file-name-coding-system 'iso-latin-1-unix)
+  (setq default-sendmail-coding-system 'utf-8)
+  (setq default-file-name-coding-system (if (memq system-type
+                                                  '(window-nt ms-dos))
+                                            'iso-latin-1-unix
+                                          'utf-8-unix))
   ;; Preserve eol-type from existing default-process-coding-systems.
   ;; On non-unix-like systems in particular, these may have been set
   ;; carefully by the user, or by the startup code, to deal with the
@@ -1821,8 +1819,10 @@ reset-language-environment
 	(input-coding
 	 (condition-case nil
 	     (coding-system-change-text-conversion
-	      (cdr default-process-coding-system) 'iso-latin-1)
-	   (coding-system-error 'iso-latin-1))))
+	      (cdr default-process-coding-system)
+	      (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))
+	   (coding-system-error
+	    (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)))))
     (setq default-process-coding-system
 	  (cons output-coding input-coding)))
 
diff --git a/lisp/mail/sendmail.el b/lisp/mail/sendmail.el
index dd6eecbfd0..7610939e57 100644
--- a/lisp/mail/sendmail.el
+++ b/lisp/mail/sendmail.el
@@ -975,7 +975,7 @@ sendmail-coding-system
 See also the function `select-message-coding-system'.")
 
 ;;;###autoload
-(defvar default-sendmail-coding-system 'iso-latin-1
+(defvar default-sendmail-coding-system 'utf-8
   "Default coding system for encoding the outgoing mail.
 This variable is used only when `sendmail-coding-system' is nil.
 
diff --git a/lisp/mh-e/mh-comp.el b/lisp/mh-e/mh-comp.el
index f7e30bfbb3..8a69adbb75 100644
--- a/lisp/mh-e/mh-comp.el
+++ b/lisp/mh-e/mh-comp.el
@@ -305,6 +305,7 @@ mh-send-letter
   (let ((draft-buffer (current-buffer))
         (file-name buffer-file-name)
         (config mh-previous-window-config)
+        ;; FIXME this is subtly different to select-message-coding-system.
         (coding-system-for-write
          (if (fboundp 'select-message-coding-system)
              (select-message-coding-system) ; Emacs has this since at least 21.1
@@ -318,7 +319,7 @@ mh-send-letter
              (or (and (boundp 'sendmail-coding-system) sendmail-coding-system)
                  (and (default-boundp 'buffer-file-coding-system)
                       (default-value 'buffer-file-coding-system))
-                 'iso-latin-1)))))
+                 'utf-8)))))
     ;; Older versions of spost do not support -msgid and -mime.
     (unless mh-send-uses-spost-flag
       ;; Adding a Message-ID field looks good, makes it easier to search for

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




This bug report was last modified 4 years and 256 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.