From unknown Sat Aug 16 13:42:57 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#61005 <61005@debbugs.gnu.org> To: bug#61005 <61005@debbugs.gnu.org> Subject: Status: 28.1.91; Encoding not detected in HTML files inside archives Reply-To: bug#61005 <61005@debbugs.gnu.org> Date: Sat, 16 Aug 2025 20:42:57 +0000 retitle 61005 28.1.91; Encoding not detected in HTML files inside archives reassign 61005 emacs submitter 61005 Benjamin Riefenstahl severity 61005 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 22 08:14:03 2023 Received: (at submit) by debbugs.gnu.org; 22 Jan 2023 13:14:03 +0000 Received: from localhost ([127.0.0.1]:50906 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaAd-0007g9-3r for submit@debbugs.gnu.org; Sun, 22 Jan 2023 08:14:03 -0500 Received: from lists.gnu.org ([209.51.188.17]:36538) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaAY-0007fb-PU for submit@debbugs.gnu.org; Sun, 22 Jan 2023 08:14:01 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJaAY-000134-JO for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:13:58 -0500 Received: from odoacer.turtle-trading.net ([93.241.193.16]) by eggs.gnu.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.90_1) (envelope-from ) id 1pJaAV-0001lJ-Om for bug-gnu-emacs@gnu.org; Sun, 22 Jan 2023 08:13:58 -0500 Received: from zenobia.turtle-trading.net ([192.168.2.111]) by odoacer.turtle-trading.net with esmtp (Exim 4.80) (envelope-from ) id 1pJaAQ-00077S-M1; Sun, 22 Jan 2023 14:13:50 +0100 Received: from benny by zenobia.turtle-trading.net with local (Exim 4.94.2) (envelope-from ) id 1pJaAQ-0009AD-Dq; Sun, 22 Jan 2023 14:13:50 +0100 From: Benjamin Riefenstahl To: bug-gnu-emacs@gnu.org Subject: 28.1.91; Encoding not detected in HTML files inside archives Date: Sun, 22 Jan 2023 14:13:50 +0100 Message-ID: <87bkmqempd.fsf@turtle-trading.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: none client-ip=93.241.193.16; envelope-from=benny@turtle-trading.net; helo=odoacer.turtle-trading.net X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_HTML_ATTACH=0.01, T_OBFU_HTML_ATT_MALW=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain Content-Disposition: inline Problem ---- * Given an HTML file with charset "windows-1255". * Opening the file from disk detects the encoding correctly. * Opening a ZIP archive with the same file inside and than opening the HTML archive member does not detect the encoding, instead the coding system for saving is the default according to M-x describe-coding-system. Attached are two files test.html and test.zip. Call "emacs -Q test.html test.zip" and press RET on the archive member to reproduce. --=-=-= Content-Type: text/html; charset=windows-1255 Content-Disposition: attachment; filename=test.html Content-Transfer-Encoding: quoted-printable =F9=C8=D1=EC=E5=C9=ED

=F9=C8=D1=EC=E5=C9=ED

--=-=-= Content-Type: application/zip Content-Disposition: attachment; filename=test.zip Content-Transfer-Encoding: base64 UEsDBBQAAAAIAPGdMVauwGXsbwAAAKIAAAAJABwAdGVzdC5odG1sVVQJAAM138Zj9d7GY3V4CwAB BOgDAAAE6AMAALNRdPF3DokMcFXIKMnNseOygVAKCjYZqYkpIAaQmZtakqiQnJFYVJxaYqtUnpmX kl9erGtoZGqqZGejD5KFKizJLMlJtVP4eeLim6cn3yrY6EMEQMbpw8yzScpPqYSqzzBEVgzkgVVC FAD5YKcAAFBLAQIeAxQAAAAIAPGdMVauwGXsbwAAAKIAAAAJABgAAAAAAAEAAACkgQAAAAB0ZXN0 Lmh0bWxVVAUAAzXfxmN1eAsAAQToAwAABOgDAABQSwUGAAAAAAEAAQBPAAAAsgAAAAAA --=-=-= Content-Type: text/plain Content-Disposition: inline Solution ---- The problem seems to be the function sgml-html-meta-auto-coding-function. It is missing a condition similar to the one added to code in sgml-xml-auto-coding-function with commit #df7ed10e in 2018. modified lisp/international/mule.el @@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function (bfcs-type (coding-system-type buffer-file-coding-system))) (if (and enable-multibyte-characters + ;; 'charset' will signal an error in + ;; coding-system-equal, since it isn't a + ;; coding-system. So test that up front. + (not (equal sym-type 'charset)) (coding-system-equal 'utf-8 sym-type) (coding-system-equal 'utf-8 bfcs-type)) buffer-file-coding-system I will send this as a patch as soon as I have a bug number to mention in the commit message. ---- In GNU Emacs 28.1.91 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.24, cairo version 1.16.0) of 2022-08-29 built on arrian Repository revision: f4168b8143008b787a11366462c928d761e90dd0 Repository branch: emacs-28 Windowing system distributor 'The X.Org Foundation', version 11.0.12011000 System Description: Debian GNU/Linux 11 (bullseye) Configured features: ACL CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GPM GSETTINGS HARFBUZZ JPEG JSON LCMS2 LIBOTF LIBSELINUX LIBXML2 M17N_FLT MODULES NOTIFY INOTIFY PDUMPER PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XDBE XIM XPM GTK3 ZLIB Important settings: value of $LANG: en_US.UTF-8 locale-coding-system: utf-8-unix Major mode: Dired by date Minor modes in effect: shell-dirtrack-mode: t desktop-save-mode: t display-time-mode: t xclip-mode: t xterm-mouse-mode: t delete-selection-mode: t cua-mode: t display-battery-mode: t tooltip-mode: t global-eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t menu-bar-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t buffer-read-only: t column-number-mode: t line-number-mode: t transient-mark-mode: t Load-path shadows: ~/Projects/ttf-mode/arc-mode-compat hides ~/emacs/arc-mode-compat /home/benny/.emacs.d/elpa/transient-20210723.1601/transient hides /usr/local/share/emacs/28.1.91/lisp/transient /home/benny/.emacs.d/elpa/dictionary-20201001.1727/dictionary hides /usr/local/share/emacs/28.1.91/lisp/net/dictionary Features: (shadow sort mail-extr emacsbug message rmc puny rfc822 mml mml-sec epa epg rfc6068 epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader arc-mode archive-mode benny-images dirtrack shell pcomplete misearch multi-isearch thai-util thai-word lao-util enriched view tabify benny-auto-insert ttf-glyphs rng-xsd xsd-regexp rng-cmpct rng-nxml rng-valid rng-loc rng-uri rng-parse nxml-parse rng-match rng-dt rng-util rng-pttrn nxml-ns nxml-mode nxml-outln nxml-rap sgml-mode facemenu dom nxml-util nxml-enc xmltok mule-util jka-compr dired-aux time-date bug-reference imenu desktop frameset highline benny-calendar-cfg ange-ftp generic-x autoinsert cc-mode cc-fonts cc-guess cc-menus cc-styles cc-align cc-cmds cc-engine cc-vars cc-defs ps-print ps-print-loaddefs ps-def lpr advice cl-extra help-mode dired dired-loaddefs derived benny-x-clipboard disp-table time server protbuf xclip term/xterm xterm xt-mouse cal-china lunar solar cal-dst cal-bahai cal-islam cal-hebrew holidays hol-loaddefs vc-git diff-mode easy-mmode vc-dispatcher vc-fossil diary-lib diary-loaddefs cal-menu calendar cal-loaddefs delsel grep compile text-property-search comint ansi-color ring cua-base cus-load format-spec battery dbus xml sendmail mail-utils .loaddefs benny-tools autoload radix-tree lisp-mnt mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr edmacro kmacro info package browse-url url url-proxy url-privacy url-expand url-methods url-history url-cookie url-domsuf url-util mailcap url-handlers url-parse auth-source cl-seq eieio eieio-core cl-macs eieio-loaddefs password-cache json subr-x map url-vars seq byte-opt gv bytecomp byte-compile cconv cl-loaddefs cl-lib iso-transl tooltip eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel term/x-win x-win term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer cl-generic cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help simple abbrev obarray cl-preloaded nadvice button loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget hashtable-print-readable backquote threads dbusbind inotify lcms2 dynamic-setting system-font-setting font-render-setting cairo move-toolbar gtk x-toolkit x multi-tty make-network-process emacs) Memory information: ((conses 16 273770 13520) (symbols 48 18619 1) (strings 32 66582 2920) (string-bytes 1 2318045) (vectors 16 39996) (vector-slots 8 1131973 174560) (floats 8 762 66) (intervals 56 1039 60) (buffers 992 50)) --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 22 08:24:17 2023 Received: (at 61005) by debbugs.gnu.org; 22 Jan 2023 13:24:17 +0000 Received: from localhost ([127.0.0.1]:50920 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaKX-0007vz-8I for submit@debbugs.gnu.org; Sun, 22 Jan 2023 08:24:17 -0500 Received: from odoacer.turtle-trading.net ([93.241.193.16]:49764) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJaKU-0007vk-JU for 61005@debbugs.gnu.org; Sun, 22 Jan 2023 08:24:15 -0500 Received: from zenobia.turtle-trading.net ([192.168.2.111]) by odoacer.turtle-trading.net with esmtp (Exim 4.80) (envelope-from ) id 1pJaKO-00077x-2X; Sun, 22 Jan 2023 14:24:08 +0100 Received: from benny by zenobia.turtle-trading.net with local (Exim 4.94.2) (envelope-from ) id 1pJaKN-0009wO-QW; Sun, 22 Jan 2023 14:24:07 +0100 From: Benjamin Riefenstahl To: 61005@debbugs.gnu.org Subject: Re: bug#61005: 28.1.91; Encoding not detected in HTML files inside archives References: <87bkmqempd.fsf@turtle-trading.net> Date: Sun, 22 Jan 2023 14:24:07 +0100 In-Reply-To: <87bkmqempd.fsf@turtle-trading.net> (Benjamin Riefenstahl's message of "Sun, 22 Jan 2023 14:13:50 +0100") Message-ID: <877cxeem88.fsf@turtle-trading.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2.50 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 61005 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --=-=-= Content-Type: text/plain The promised patch. This is against master. Also a small test-suite for sgml-html-meta-auto-coding-function, if you want that. If you care, I could also add one for sgml-xml-auto-coding-function. --=-=-= Content-Type: text/x-diff Content-Disposition: attachment; filename=0001-Fix-decoding-HTML-files-from-archives.patch >From 95b63baf1bf411422c61b76470abb1aa681f2db2 Mon Sep 17 00:00:00 2001 From: Benjamin Riefenstahl Date: Tue, 17 Jan 2023 20:08:15 +0200 Subject: [PATCH 1/2] Fix decoding HTML files from archives * lisp/international/mule.el (sgml-xml-auto-coding-function): Avoid signaling an error from coding-system-equal when the XML encoding tag specifies an encoding whose type is 'charset'. (Bug#61005) This is the same fix as in #df7ed10e for sgml-xml-auto-coding-function. --- lisp/international/mule.el | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lisp/international/mule.el b/lisp/international/mule.el index 4f6addea387..9480213be9a 100644 --- a/lisp/international/mule.el +++ b/lisp/international/mule.el @@ -2539,6 +2539,10 @@ sgml-html-meta-auto-coding-function (bfcs-type (coding-system-type buffer-file-coding-system))) (if (and enable-multibyte-characters + ;; 'charset' will signal an error in + ;; coding-system-equal, since it isn't a + ;; coding-system. So test that up front. + (not (equal sym-type 'charset)) (coding-system-equal 'utf-8 sym-type) (coding-system-equal 'utf-8 bfcs-type)) buffer-file-coding-system -- 2.30.2 --=-=-= Content-Type: text/x-diff Content-Disposition: attachment; filename=0002-Add-test-suite-for-sgml-html-meta-auto-coding-functi.patch >From 29996e07c23c9716f731dde224c8ca47e321e697 Mon Sep 17 00:00:00 2001 From: Benjamin Riefenstahl Date: Tue, 17 Jan 2023 20:13:39 +0200 Subject: [PATCH 2/2] Add test suite for sgml-html-meta-auto-coding-function * test/lisp/international/mule-tests.el (sgml-html-meta-pre) (sgml-html-meta-post, sgml-html-meta-run, sgml-html-meta-utf-8) (sgml-html-meta-windows-hebrew, sgml-html-meta-none) (sgml-html-meta-unknown-coding, sgml-html-meta-no-pre) (sgml-html-meta-no-post-less-than-10lines) (sgml-html-meta-no-post-10lines, sgml-html-meta-utf-8-with-bom): Add. --- test/lisp/international/mule-tests.el | 66 +++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/test/lisp/international/mule-tests.el b/test/lisp/international/mule-tests.el index 4f70b275848..6e23d8c5421 100644 --- a/test/lisp/international/mule-tests.el +++ b/test/lisp/international/mule-tests.el @@ -70,6 +70,72 @@ mule-hz ;; The chinese-hz encoding is not ASCII compatible. (should-not (coding-system-get 'chinese-hz :ascii-compatible-p))) +;;; Testing `sgml-html-meta-auto-coding-function'. + +(defconst sgml-html-meta-pre "" + "The beginning of a minimal HTML document.") + +(defconst sgml-html-meta-post "" + "The end of a minimal HTML document.") + +(defun sgml-html-meta-run (coding-system) + "Run `sgml-html-meta-auto-coding-function' on a minimal HTML. +When CODING-SYSTEM is not nil, insert it, wrapped in a '' +element. When CODING-SYSTEM contains HTML meta characters or +white space, insert it as-is, without additional formatting. Use +the variables `sgml-html-meta-pre' and `sgml-html-meta-post' to +provide HTML fragments. Some tests override those variables." + (with-temp-buffer + (insert sgml-html-meta-pre + (cond ((not coding-system) + "") + ((string-match "[<>'\"\n ]" coding-system) + coding-system) + (t + (format "" coding-system))) + sgml-html-meta-post) + (goto-char (point-min)) + (sgml-html-meta-auto-coding-function (- (point-max) (point-min))))) + +(ert-deftest sgml-html-meta-utf-8 () + "Baseline: UTF-8." + (should (eq 'utf-8 (sgml-html-meta-run "utf-8")))) + +(ert-deftest sgml-html-meta-windows-hebrew () + "A non-Unicode charset." + (should (eq 'windows-1255 (sgml-html-meta-run "windows-1255")))) + +(ert-deftest sgml-html-meta-none () + (should (eq nil (sgml-html-meta-run nil)))) + +(ert-deftest sgml-html-meta-unknown-coding () + (should (eq nil (sgml-html-meta-run "XXX")))) + +(ert-deftest sgml-html-meta-no-pre () + "Without the prefix, so not HTML." + (let ((sgml-html-meta-pre "")) + (should (eq nil (sgml-html-meta-run "utf-8"))))) + +(ert-deftest sgml-html-meta-no-post-less-than-10lines () + "No '', detect charset in the first 10 lines." + (let ((sgml-html-meta-post "")) + (should (eq 'utf-8 (sgml-html-meta-run + (concat "\n\n\n\n\n\n\n\n\n" + "")))))) + +(ert-deftest sgml-html-meta-no-post-10lines () + "No '', do not detect charset after the first 10 lines." + (let ((sgml-html-meta-post "")) + (should (eq nil (sgml-html-meta-run + (concat "\n\n\n\n\n\n\n\n\n\n" + "")))))) + +(ert-deftest sgml-html-meta-utf-8-with-bom () + "Requesting 'UTF-8' does not override `utf-8-with-signature'. +Check fix for Bug#20623." + (let ((buffer-file-coding-system 'utf-8-with-signature)) + (should (eq 'utf-8-with-signature (sgml-html-meta-run "utf-8"))))) + ;; Stop "Local Variables" above causing confusion when visiting this file. -- 2.30.2 --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sun Jan 22 09:10:12 2023 Received: (at 61005-done) by debbugs.gnu.org; 22 Jan 2023 14:10:12 +0000 Received: from localhost ([127.0.0.1]:50959 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJb2y-0000ak-GW for submit@debbugs.gnu.org; Sun, 22 Jan 2023 09:10:12 -0500 Received: from eggs.gnu.org ([209.51.188.92]:42232) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pJb2u-0000aR-EF for 61005-done@debbugs.gnu.org; Sun, 22 Jan 2023 09:10:10 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJb2U-0008D8-O4; Sun, 22 Jan 2023 09:09:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=3O/NexPgajED4XU6HlmZ/Uvz0C9WP77ekPPDufzHueY=; b=J02wPSDnsqSo 9AZnXs5sgYNSxxh4Qh8WkZQJ/lPN41DoOlALId87yrgKsx3pHuXDny0si2fN1WX9QH9s4J8E1cxcs k08EhIq+31vsjj99XWR/f8CVGsK52kMEtDcxzrSI2fVgvXVcF96iomr44/XnWYDSUnqvrrsaAKVOl MQMsoUiEd5QROr33IQ8Fu5irrVg0DTIQenwAqHVVVjtbbN6QrQLcJNxE2NXzeiya+e+5reY62P0gR Vq0OxBwEG/UGSb5b+AFsURXQvnAEdbs/5HwS2/alSbBqi3dIJ7hFL9MSp5oAL833zRG9eGr1tjwRf ew96tseVLBy7WLnG4jaGtA==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pJb2T-0008NF-Q6; Sun, 22 Jan 2023 09:09:42 -0500 Date: Sun, 22 Jan 2023 16:09:47 +0200 Message-Id: <83leluk6dw.fsf@gnu.org> From: Eli Zaretskii To: Benjamin Riefenstahl In-Reply-To: <877cxeem88.fsf@turtle-trading.net> (message from Benjamin Riefenstahl on Sun, 22 Jan 2023 14:24:07 +0100) Subject: Re: bug#61005: 28.1.91; Encoding not detected in HTML files inside archives References: <87bkmqempd.fsf@turtle-trading.net> <877cxeem88.fsf@turtle-trading.net> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 61005-done Cc: 61005-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Benjamin Riefenstahl > Date: Sun, 22 Jan 2023 14:24:07 +0100 > > The promised patch. This is against master. > > Also a small test-suite for sgml-html-meta-auto-coding-function, if you > want that. If you care, I could also add one for > sgml-xml-auto-coding-function. Thanks, I installed this on the emacs-29 branch, and I'm closing the bug. From unknown Sat Aug 16 13:42:57 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 20 Feb 2023 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator