GNU bug report logs - #5553
23.1.92; Archives with wrong coding system

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> jurta.org>

Date: Tue, 9 Feb 2010 21:28:02 UTC

Severity: normal

To reply to this bug, email your comments to 5553 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#5553; Package emacs. (Tue, 09 Feb 2010 21:28:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juri Linkov <juri <at> jurta.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 09 Feb 2010 21:28:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 23.1.92; Archives with wrong coding system
Date: Tue, 09 Feb 2010 23:19:27 +0200
When `archive-mode' is enabled for an archive file with an unknown file
extension, using the rule ("\\(PK00\\)?[P]K\003\004" . archive-mode)
from `magic-fallback-mode-alist', visiting such a file fails with the
args-out-of-range error.

The following patch should fix this bug using the same regexp as in
`magic-fallback-mode-alist' and the same coding system as for archive
file extensions in `auto-coding-alist':

=== modified file 'lisp/international/mule.el'
--- lisp/international/mule.el	2010-02-01 22:57:45 +0000
+++ lisp/international/mule.el	2010-02-09 21:18:51 +0000
@@ -1653,7 +1653,9 @@ (defcustom auto-coding-regexp-alist
     ("\\`\xFE\xFF" . utf-16be-with-signature)
     ("\\`\xFF\xFE" . utf-16le-with-signature)
     ("\\`\xEF\xBB\xBF" . utf-8-with-signature)
-    ("\\`;ELC\024\0\0\0" . emacs-mule)))	; Emacs 20-compiled
+    ("\\`;ELC\024\0\0\0" . emacs-mule)	; Emacs 20-compiled
+    ;; For `archive-mode' in `magic-fallback-mode-alist':
+    ("\\(PK00\\)?[P]K\003\004" . no-conversion-multibyte)))
   "Alist of patterns vs corresponding coding systems.
 Each element looks like (REGEXP . CODING-SYSTEM).
 A file whose first bytes match REGEXP is decoded by CODING-SYSTEM on reading.

-- 
Juri Linkov
http://www.jurta.org/emacs/





Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#5553; Package emacs. (Tue, 09 Feb 2010 22:26:01 GMT) Full text and rfc822 format available.

Message #8 received at 5553 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: 5553 <at> debbugs.gnu.org
Subject: Re: bug#5553: 23.1.92; Archives with wrong coding system
Date: Wed, 10 Feb 2010 00:19:35 +0200
> When `archive-mode' is enabled for an archive file with an unknown file
> extension, using the rule ("\\(PK00\\)?[P]K\003\004" . archive-mode)
> from `magic-fallback-mode-alist', visiting such a file fails with the
> args-out-of-range error.
>
> The following patch should fix this bug using the same regexp as in
> `magic-fallback-mode-alist' and the same coding system as for archive
> file extensions in `auto-coding-alist':

The same problem exists also for images.  `magic-fallback-mode-alist' contains:

  (image-type-auto-detected-p . image-mode)

but visiting an image file with a non-standard file extension
(i.e. not in `auto-mode-alist') doesn't display it as an image.

The following patch fixes this problem, but it seems duplicating
image regexps from `image-type-header-regexps' is too ugly?

=== modified file 'lisp/international/mule.el'
--- lisp/international/mule.el	2010-02-09 05:00:56 +0000
+++ lisp/international/mule.el	2010-02-09 22:16:28 +0000
@@ -1655,7 +1655,14 @@ (defcustom auto-coding-regexp-alist
     ("\\`\xEF\xBB\xBF" . utf-8-with-signature)
     ("\\`;ELC\024\0\0\0" . emacs-mule)	; Emacs 20-compiled
     ;; For `archive-mode' in `magic-fallback-mode-alist':
-    ("\\(PK00\\)?[P]K\003\004" . no-conversion-multibyte)))
+    ("\\(PK00\\)?[P]K\003\004" . no-conversion-multibyte)
+    ;; For `image-mode' in `magic-fallback-mode-alist'
+    ;; (regexps duplicated from `image-type-header-regexps'):
+    ("\\`GIF8[79]a"                 . no-conversion) ; gif
+    ("\\`\x89PNG\r\n\x1a\n"         . no-conversion) ; png
+    ("\\`\\(?:MM\0\\*\\|II\\*\0\\)" . no-conversion) ; tiff
+    ("\\`\xff\xd8"                  . no-conversion) ; jpeg
+    ))
   "Alist of patterns vs corresponding coding systems.
 Each element looks like (REGEXP . CODING-SYSTEM).
 A file whose first bytes match REGEXP is decoded by CODING-SYSTEM on reading.

-- 
Juri Linkov
http://www.jurta.org/emacs/




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#5553; Package emacs. (Tue, 09 Feb 2010 22:36:02 GMT) Full text and rfc822 format available.

Message #11 received at 5553 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> jurta.org>
Cc: 5553 <at> debbugs.gnu.org
Subject: Re: bug#5553: 23.1.92; Archives with wrong coding system
Date: Wed, 10 Feb 2010 00:34:00 +0200
> From: Juri Linkov <juri <at> jurta.org>
> Date: Tue, 09 Feb 2010 23:19:27 +0200
> Cc: 
> 
> When `archive-mode' is enabled for an archive file with an unknown file
> extension, using the rule ("\\(PK00\\)?[P]K\003\004" . archive-mode)
> from `magic-fallback-mode-alist', visiting such a file fails with the
> args-out-of-range error.
> 
> The following patch should fix this bug using the same regexp as in
> `magic-fallback-mode-alist' and the same coding system as for archive
> file extensions in `auto-coding-alist':

Thanks, but please provide a self-contained recipe for reproducing the
problem, starting with "emacs -Q".




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#5553; Package emacs. (Wed, 10 Feb 2010 01:05:01 GMT) Full text and rfc822 format available.

Message #14 received at 5553 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 5553 <at> debbugs.gnu.org
Subject: Re: bug#5553: 23.1.92; Archives with wrong coding system
Date: Wed, 10 Feb 2010 02:09:56 +0200
> Thanks, but please provide a self-contained recipe for reproducing the
> problem, starting with "emacs -Q".

AFAICS, it is not reproducible with "emacs -Q" where visited archives
and images with non-standard file extensions are visited in proper modes.

The problem appears with using Unicad (http://code.google.com/p/unicad/).
Basically what is does boils down to the following line:

  (add-to-list 'auto-coding-functions 'unicad-universal-charset-detect)

The rest is just statistical guessing of the coding system based solely
on the content of the file, and in case of archives and images, the
guess is incorrect, and `magic-fallback-mode-alist' fails to match
a mode regexp at the beginning of the buffer.

So the question is whether we should complement entries in
`magic-fallback-mode-alist' with the corresponding entries in
`auto-coding-regexp-alist' with the same regexps (like we complement
entries in `auto-mode-alist' with entries in `auto-coding-alist')?

Or every function in `auto-coding-functions' that determines a coding system
should somehow take care of exceptions in `magic-fallback-mode-alist'?

-- 
Juri Linkov
http://www.jurta.org/emacs/




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#5553; Package emacs. (Wed, 10 Feb 2010 20:15:02 GMT) Full text and rfc822 format available.

Message #17 received at 5553 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Juri Linkov <juri <at> jurta.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 5553 <at> debbugs.gnu.org
Subject: Re: bug#5553: 23.1.92; Archives with wrong coding system
Date: Wed, 10 Feb 2010 15:14:42 -0500
> So the question is whether we should complement entries in
> `magic-fallback-mode-alist' with the corresponding entries in
> `auto-coding-regexp-alist' with the same regexps (like we complement
> entries in `auto-mode-alist' with entries in `auto-coding-alist')?

> Or every function in `auto-coding-functions' that determines a coding system
> should somehow take care of exceptions in `magic-fallback-mode-alist'?

I think that auto-coding-alist should allow mapping not only file-names
but also major modes to coding-systems.  This should hopefully take care
of those issues by mapping image-mode and archive-mode to no-conversion.


        Stefan




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#5553; Package emacs. (Wed, 10 Feb 2010 22:40:03 GMT) Full text and rfc822 format available.

Message #20 received at 5553 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> jurta.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 5553 <at> debbugs.gnu.org
Subject: Re: bug#5553: 23.1.92; Archives with wrong coding system
Date: Thu, 11 Feb 2010 00:33:23 +0200
>> So the question is whether we should complement entries in
>> `magic-fallback-mode-alist' with the corresponding entries in
>> `auto-coding-regexp-alist' with the same regexps (like we complement
>> entries in `auto-mode-alist' with entries in `auto-coding-alist')?
>
>> Or every function in `auto-coding-functions' that determines a coding system
>> should somehow take care of exceptions in `magic-fallback-mode-alist'?
>
> I think that auto-coding-alist should allow mapping not only file-names
> but also major modes to coding-systems.  This should hopefully take care
> of those issues by mapping image-mode and archive-mode to no-conversion.

I don't understand how this is possible because currently a coding system
should be recognized before mode is chosen:

1. Recognizing Coding Systems
1.1. coding-system-for-read if non-nil
1.2. auto-coding-alist matching a filename
1.3. auto-coding-regexp-alist matching first bytes
1.4. `-*- coding: -*-' tag
1.5. auto-coding-functions (e.g. unicad-universal-charset-detect)
1.6. file-coding-system-alist matching a filename

2. Choosing Modes
2.1. `-*- mode: -*-' tag
2.2. interpreter-mode-alist
2.3. magic-mode-alist
2.4. auto-mode-alist
2.5. magic-fallback-mode-alist

-- 
Juri Linkov
http://www.jurta.org/emacs/




Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#5553; Package emacs. (Thu, 11 Feb 2010 02:13:02 GMT) Full text and rfc822 format available.

Message #23 received at 5553 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Juri Linkov <juri <at> jurta.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 5553 <at> debbugs.gnu.org
Subject: Re: bug#5553: 23.1.92; Archives with wrong coding system
Date: Wed, 10 Feb 2010 21:12:42 -0500
>> I think that auto-coding-alist should allow mapping not only file-names
>> but also major modes to coding-systems.  This should hopefully take care
>> of those issues by mapping image-mode and archive-mode to no-conversion.
> I don't understand how this is possible because currently a coding system
> should be recognized before mode is chosen:

This is the reason why my suggestion did not come with a patch ;-)
This said, I don't think it's impossible, but it would require
a reorganization indeed.


        Stefan




This bug report was last modified 15 years and 125 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.