GNU bug report logs -
#79316
End-of-line problems with text files inside zip files
Previous Next
Full log
View this message in rfc822 format
> Date: Tue, 26 Aug 2025 09:29:48 +0200
> From: "R. Diez" via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
> I am using Emacs 29, and I have built Emacs myself on Ubuntu 22.04. But I have had this issue with older Emacs and Ubuntu versions too.
>
> I have attached a zip file with 3 text files:
>
> test1.txt - with encoding utf-8-with-signature-dos
> test2.xml - with encoding utf-8-dos
> test3.xml - with encoding utf-8-dos too
>
> If if unpack the zip file in the shell and open those files with Emacs, everything is fine.
>
> However, if you open the zip file with Emacs, and then open those text files inside, there are end-of-line problems:
>
> test1.txt shows encoding utf-8-with-signature-unix. That is, it loses the DOS CR LF line terminators.
>
> test2.xml has a similar problem. The encoding is then shown as utf-8-unix.
>
> test3.xml has the same problem, but each line shows a ^M marker at the end. The only difference between test2.xml and test3.xml is that the latter starts with this line:
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> This problem is probably known, but I could not find a description or a workaround on the Internet. Or maybe I did not come up with the right search keywords.
Thanks for an easy-to-use test case.
The root cause here was that we were losing information about the EOL
type of the file determined when we decode it after extraction from
the archive. And the reason why this went unnoticed (at least AFAIK)
is probably that the problem rears its ugly head only when the EOL
type of the file is NOT the default EOL type of the platform (DOS on
Unix, Unix on DOS/Windows, etc.), which probably happens rather
rarely.
Long story short, please try the patch below, and see if it gives good
results without introducing any new problems.
diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el
index 8f6c71a..fbfd7cc 100644
--- a/lisp/arc-mode.el
+++ b/lisp/arc-mode.el
@@ -1067,8 +1067,18 @@ archive-set-buffer-as-visiting-file
(setq coding
(coding-system-change-text-conversion coding 'raw-text)))
(unless (memq coding '(nil no-conversion))
+ ;; If CODING specifies a certain EOL conversion, reset that, to
+ ;; force 'decode-coding-region' below determine EOL conversion
+ ;; from the file's data...
+ (if (numberp (coding-system-eol-type coding))
+ (setq coding (coding-system-change-eol-conversion coding nil)))
(decode-coding-region (point-min) (point-max) coding)
- (setq last-coding-system-used coding))
+ ;; ...then augment CODING with the actual EOL conversion
+ ;; determined from the file's data.
+ (setq last-coding-system-used
+ (coding-system-change-eol-conversion
+ coding
+ (coding-system-eol-type last-coding-system-used))))
(set-buffer-modified-p nil)
(kill-local-variable 'buffer-file-coding-system)
(after-insert-file-set-coding (- (point-max) (point-min))))))
This bug report was last modified 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.