GNU bug report logs - #79316
End-of-line problems with text files inside zip files

Previous Next

Package: emacs;

Reported by: "R. Diez" <rdiez-2006 <at> rd10.de>

Date: Tue, 26 Aug 2025 07:31:01 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

To reply to this bug, email your comments to 79316 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#79316; Package emacs. (Tue, 26 Aug 2025 07:31:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "R. Diez" <rdiez-2006 <at> rd10.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 26 Aug 2025 07:31:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "R. Diez" <rdiez-2006 <at> rd10.de>
To: bug-gnu-emacs <at> gnu.org
Subject: End-of-line problems with text files inside zip files
Date: Tue, 26 Aug 2025 09:29:48 +0200
[Message part 1 (text/plain, inline)]
Hi all:

I am using Emacs 29, and I have built Emacs myself on Ubuntu 22.04. But I have had this issue with older Emacs and Ubuntu versions too.

I have attached a zip file with 3 text files:

test1.txt - with encoding utf-8-with-signature-dos
test2.xml - with encoding utf-8-dos
test3.xml - with encoding utf-8-dos too

If if unpack the zip file in the shell and open those files with Emacs, everything is fine.

However, if you open the zip file with Emacs, and then open those text files inside, there are end-of-line problems:

test1.txt shows encoding utf-8-with-signature-unix. That is, it loses the DOS CR LF line terminators.

test2.xml has a similar problem. The encoding is then shown as utf-8-unix.

test3.xml has the same problem, but each line shows a ^M marker at the end. The only difference between test2.xml and test3.xml is that the latter starts with this line:

<?xml version="1.0" encoding="UTF-8"?>

This problem is probably known, but I could not find a description or a workaround on the Internet. Or maybe I did not come up with the right search keywords.

Regards,
  rdiez
[test.zip (application/zip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79316; Package emacs. (Thu, 28 Aug 2025 12:26:02 GMT) Full text and rfc822 format available.

Message #8 received at 79316 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "R. Diez" <rdiez-2006 <at> rd10.de>
Cc: 79316 <at> debbugs.gnu.org
Subject: Re: bug#79316: End-of-line problems with text files inside zip files
Date: Thu, 28 Aug 2025 15:25:42 +0300
> Date: Tue, 26 Aug 2025 09:29:48 +0200
> From:  "R. Diez" via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> I am using Emacs 29, and I have built Emacs myself on Ubuntu 22.04. But I have had this issue with older Emacs and Ubuntu versions too.
> 
> I have attached a zip file with 3 text files:
> 
> test1.txt - with encoding utf-8-with-signature-dos
> test2.xml - with encoding utf-8-dos
> test3.xml - with encoding utf-8-dos too
> 
> If if unpack the zip file in the shell and open those files with Emacs, everything is fine.
> 
> However, if you open the zip file with Emacs, and then open those text files inside, there are end-of-line problems:
> 
> test1.txt shows encoding utf-8-with-signature-unix. That is, it loses the DOS CR LF line terminators.
> 
> test2.xml has a similar problem. The encoding is then shown as utf-8-unix.
> 
> test3.xml has the same problem, but each line shows a ^M marker at the end. The only difference between test2.xml and test3.xml is that the latter starts with this line:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> 
> This problem is probably known, but I could not find a description or a workaround on the Internet. Or maybe I did not come up with the right search keywords.

Thanks for an easy-to-use test case.

The root cause here was that we were losing information about the EOL
type of the file determined when we decode it after extraction from
the archive.  And the reason why this went unnoticed (at least AFAIK)
is probably that the problem rears its ugly head only when the EOL
type of the file is NOT the default EOL type of the platform (DOS on
Unix, Unix on DOS/Windows, etc.), which probably happens rather
rarely.

Long story short, please try the patch below, and see if it gives good
results without introducing any new problems.

diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el
index 8f6c71a..fbfd7cc 100644
--- a/lisp/arc-mode.el
+++ b/lisp/arc-mode.el
@@ -1067,8 +1067,18 @@ archive-set-buffer-as-visiting-file
         (setq coding
               (coding-system-change-text-conversion coding 'raw-text)))
       (unless (memq coding '(nil no-conversion))
+        ;; If CODING specifies a certain EOL conversion, reset that, to
+        ;; force 'decode-coding-region' below determine EOL conversion
+        ;; from the file's data...
+        (if (numberp (coding-system-eol-type coding))
+            (setq coding (coding-system-change-eol-conversion coding nil)))
         (decode-coding-region (point-min) (point-max) coding)
-	(setq last-coding-system-used coding))
+        ;; ...then augment CODING with the actual EOL conversion
+        ;; determined from the file's data.
+	(setq last-coding-system-used
+              (coding-system-change-eol-conversion
+               coding
+               (coding-system-eol-type last-coding-system-used))))
       (set-buffer-modified-p nil)
       (kill-local-variable 'buffer-file-coding-system)
       (after-insert-file-set-coding (- (point-max) (point-min))))))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79316; Package emacs. (Sat, 30 Aug 2025 16:32:02 GMT) Full text and rfc822 format available.

Message #11 received at 79316 <at> debbugs.gnu.org (full text, mbox):

From: "R. Diez" <rdiez-2006 <at> rd10.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79316 <at> debbugs.gnu.org
Subject: Re: bug#79316: End-of-line problems with text files inside zip files
Date: Sat, 30 Aug 2025 18:30:48 +0200
> [...]
> Long story short, please try the patch below, and see if it gives good
> results without introducing any new problems.

I had some difficulty with the patch. It seems to be for Git, but I haven't built Emacs from Git, and tool 'patch' could not find the file.

But after some manual twiddling, it applied fine on my Emacs 29. I then opened the usual .zip with .xml files inside that always gave me trouble, and this time it worked fine.

Thanks,
  rdiez





Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 30 Aug 2025 16:59:02 GMT) Full text and rfc822 format available.

Notification sent to "R. Diez" <rdiez-2006 <at> rd10.de>:
bug acknowledged by developer. (Sat, 30 Aug 2025 16:59:03 GMT) Full text and rfc822 format available.

Message #16 received at 79316-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "R. Diez" <rdiez-2006 <at> rd10.de>
Cc: 79316-done <at> debbugs.gnu.org
Subject: Re: bug#79316: End-of-line problems with text files inside zip files
Date: Sat, 30 Aug 2025 19:57:39 +0300
> Date: Sat, 30 Aug 2025 18:30:48 +0200
> Cc: 79316 <at> debbugs.gnu.org
> From: "R. Diez" <rdiez-2006 <at> rd10.de>
> 
> > [...]
> > Long story short, please try the patch below, and see if it gives good
> > results without introducing any new problems.
> 
> I had some difficulty with the patch. It seems to be for Git, but I haven't built Emacs from Git, and tool 'patch' could not find the file.

Thanks.

For the future, you can apply patches meant for Git using the 'patch'
utility if you pass the -pN switch to 'patch', where N is the number
of slashes to remove from the file names mentioned in the patch.  So,
for example, if the patch says

  diff --git a/lisp/arc-mode.el b/lisp/arc-mode.el
  index 8f6c71a..fbfd7cc 100644
  --- a/lisp/arc-mode.el
  +++ b/lisp/arc-mode.el

and the unpatched arc-mode.el lives in the directory /foo/bar/lisp,
you can invoke 'patch' like this:

  $ patch -d /foo/bar/lisp -p2 < PATCH-FILE

since removing 2 slashes from a/lisp/arc-mode.el leaves you with just
arc-mode.el, which makes it the correct file name relative to the
directory /foo/bar/lisp given as argument to the -d switch.

> But after some manual twiddling, it applied fine on my Emacs 29. I then opened the usual .zip with .xml files inside that always gave me trouble, and this time it worked fine.

Thanks for testing, I've now installed the fix (with some additional
tweaking, per some more thorough testing) on the master branch, and
I'm therefore closing this bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79316; Package emacs. (Sat, 30 Aug 2025 17:00:03 GMT) Full text and rfc822 format available.

Message #19 received at 79316-done <at> debbugs.gnu.org (full text, mbox):

From: "R. Diez" <rdiez-2006 <at> rd10.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 79316-done <at> debbugs.gnu.org
Subject: Re: bug#79316: End-of-line problems with text files inside zip files
Date: Sat, 30 Aug 2025 18:59:28 +0200
> [...]
> I've now installed the fix (with some additional
> tweaking, per some more thorough testing) on the master branch, and
> I'm therefore closing this bug.

Great, thanks!





This bug report was last modified 7 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.