From unknown Sat Aug 16 18:45:35 2025 X-Loop: help-debbugs@gnu.org Subject: bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs Resent-From: Visuwesh Original-Sender: "Debbugs-submit" Resent-CC: tsdh@gnu.org, bug-gnu-emacs@gnu.org Resent-Date: Sat, 05 Oct 2024 11:07:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 73638 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 73638@debbugs.gnu.org Cc: "Tassilo Horn" X-Debbugs-Original-To: bug-gnu-emacs@gnu.org X-Debbugs-Original-Xcc: "Tassilo Horn" Received: via spool by submit@debbugs.gnu.org id=B.172812640432432 (code B ref -1); Sat, 05 Oct 2024 11:07:02 +0000 Received: (at submit) by debbugs.gnu.org; 5 Oct 2024 11:06:44 +0000 Received: from localhost ([127.0.0.1]:37378 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sx2cV-0008R2-Eb for submit@debbugs.gnu.org; Sat, 05 Oct 2024 07:06:43 -0400 Received: from lists.gnu.org ([209.51.188.17]:53886) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sx2cP-0008Qp-Ak for submit@debbugs.gnu.org; Sat, 05 Oct 2024 07:06:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sx2cK-0005PB-Jr for bug-gnu-emacs@gnu.org; Sat, 05 Oct 2024 07:06:32 -0400 Received: from mail-pf1-x443.google.com ([2607:f8b0:4864:20::443]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sx2cI-0007Ov-VD for bug-gnu-emacs@gnu.org; Sat, 05 Oct 2024 07:06:32 -0400 Received: by mail-pf1-x443.google.com with SMTP id d2e1a72fcca58-71dbdb7afe7so2413770b3a.0 for ; Sat, 05 Oct 2024 04:06:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728126388; x=1728731188; darn=gnu.org; h=mime-version:message-id:date:user-agent:subject:to:from:from:to:cc :subject:date:message-id:reply-to; bh=jDyaHVUAyY1rYZs2b3+41V5fEwixi599EoH+0ixIoqg=; b=Em0azMkhYDE2nJBqbNUmrjeuS4UZAYvpjFpWQMjuDRIf030IALMESjb7uMa+BFPtBo DJdTqvxzZRaHx29XRuqKvhtE984Yue6xdPz+4vr9TTEJCF2HMN3R6UXpM1ygPJxnyIZr y6+5zyo3y7c9nPobZeD/pxBMqMzHYT3rCIox9mSga72ZLAkITBSQNwXxv6CzrhB3K9Lo W1ySM2rRwQvpD4oRp/qPtwXDQ4/n7jovK5fUa9Sg78ICJvhn3+gCMnINtojpbRqMU9IY TXHCdqIqV1FpDIcdK3y9WxPJVzP/RYzCoOqlDjqB0GAVntKCBMLx85T4s9D1dSkNghRp Fu5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728126388; x=1728731188; h=mime-version:message-id:date:user-agent:subject:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jDyaHVUAyY1rYZs2b3+41V5fEwixi599EoH+0ixIoqg=; b=g8NEryDD+Kg0i/uw1lUram1rfq7fxsYa9iT54f/u5c3w9cIT0OdwikGZ/J8n1bkpJE wHO/OI6jLa3r1T0oTm/LIn93gFAns0GHIO2uhAGZDOracgP5RTsd1jh/N4g9UB00Xb/Z 5BeR5z7PuoyUpkGzpvlTlZTp4pJXIDnnz5ZJwXM0hMJWqF9Guj/D9AXpsRcbTWuGImKj 3fuJtOsa6WLX0xGzAEM/MzS1h4OWfcTODTNCFttOtM1UgcEpYQOsB7adxfuyObjUY+CA tsE2X+k+0hM6IucGpc+Y/s9z75RThlLGU+uK8rjkuynO99rKKHya13WXdt2LRPwJE/zQ YevQ== X-Gm-Message-State: AOJu0Yy1T5c51mqyy24d1ggP+zIKO3Ucn+KTnsQe0it++JjkQgvuTaos 5+pB5/XgAHRi54QnP5UNKA/Z30IAP4Y3WbGDzQTD8GDISg/Zb5/nCBgRebSx X-Google-Smtp-Source: AGHT+IGY5ivhkIAsxZj78L3xdqmhVg3nbmL0t/UZQM8vsmH33QhkP+pke/8TCmirqlSmGupowo8tzg== X-Received: by 2002:a05:6a20:c791:b0:1cf:3a52:6ad6 with SMTP id adf61e73a8af0-1d6dfa46ademr10176012637.24.1728126388073; Sat, 05 Oct 2024 04:06:28 -0700 (PDT) Received: from localhost ([1.7.159.70]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-71df0d7d407sm1276019b3a.210.2024.10.05.04.06.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 05 Oct 2024 04:06:27 -0700 (PDT) From: Visuwesh User-Agent: Gnus/5.13 (Gnus v5.13) Date: Sat, 05 Oct 2024 16:36:23 +0530 Message-ID: <87ploebyhc.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=2607:f8b0:4864:20::443; envelope-from=visuweshm@gmail.com; helo=mail-pf1-x443.google.com X-Spam_score_int: 12 X-Spam_score: 1.2 X-Spam_bar: + X-Spam_report: (1.2 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_SBL_CSS=3.335, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: 2.2 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: This is a follow up to bug#73530 where a discussion on how to obtain the outlines for LaTeX PDFs was held. Currently, if mutool reports the outline as % mutool show test.pdf outline | "Text" #nameddest=section.1 | "Annotations" #nameddest=section.2 | "Links" #nameddest=section.3 | "Attachments" #nameddest=section.4 + "Outline" #nameddest=section.5 [...] Content analysis details: (2.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 3.6 RCVD_IN_SBL_CSS RBL: Received via a relay in Spamhaus SBL-CSS [1.7.159.70 listed in zen.spamhaus.org] 0.0 RCVD_IN_VALIDITY_SAFE_BLOCKED RBL: ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. [209.51.188.17 listed in sa-accredit.habeas.com] 0.0 RCVD_IN_VALIDITY_RPBL_BLOCKED RBL: ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. [209.51.188.17 listed in bl.score.senderscore.com] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (visuweshm[at]gmail.com) -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.51.188.17 listed in wl.mailspike.net] -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [209.51.188.17 listed in list.dnswl.org] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: This is a follow up to bug#73530 where a discussion on how to obtain the outlines for LaTeX PDFs was held. Currently, if mutool reports the outline as % mutool show test.pdf outline | "Text" #nameddest=section.1 | "Annotations" #nameddest=section.2 | "Links" #nameddest=section.3 | "Attachments" #nameddest=section.4 + "Outline" #nameddest=section.5 [...] Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.51.188.17 listed in wl.mailspike.net] 0.0 RCVD_IN_VALIDITY_SAFE_BLOCKED RBL: ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. [209.51.188.17 listed in sa-accredit.habeas.com] 0.0 RCVD_IN_VALIDITY_RPBL_BLOCKED RBL: ADMINISTRATOR NOTICE: The query to Validity was blocked. See https://knowledge.validity.com/hc/en-us/articles/20961730681243 for more information. [209.51.188.17 listed in bl.score.senderscore.com] -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [209.51.188.17 listed in list.dnswl.org] 3.6 RCVD_IN_SBL_CSS RBL: Received via a relay in Spamhaus SBL-CSS [1.7.159.70 listed in zen.spamhaus.org] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (visuweshm[at]gmail.com) -1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list manager This is a follow up to bug#73530 where a discussion on how to obtain the outlines for LaTeX PDFs was held. Currently, if mutool reports the outline as % mutool show test.pdf outline | "Text" #nameddest=section.1 | "Annotations" #nameddest=section.2 | "Links" #nameddest=section.3 | "Attachments" #nameddest=section.4 + "Outline" #nameddest=section.5 + "subsection" #nameddest=subsection.5.1 | "subsubsection" #nameddest=subsubsection.5.1.1 then nothing can be done. Looking at the source code of mutool, it looks like the "#..." part is simply a URI. AFAICT, there's no way to resolve the URI to get the page number using mutool. However, one can write a JS script instead. Use the "attached" "outline.js" script and run mutool as follows with a LaTeX PDF: % mutool run outline.js test.pdf ( ((level . 1) (title . "Text") (page . 0)) ((level . 1) (title . "Annotations") (page . 1)) ((level . 1) (title . "Links") (page . 2)) ((level . 1) (title . "Attachments") (page . 3)) ((level . 1) (title . "Outline") (page . 4)) ((level . 2) (title . "subsection") (page . 4)) ((level . 3) (title . "subsubsection") (page . 4)) ) This can be directly `read' from Emacs skipping the parsing entirely. JS evaluation takes the same amount of time as `mutool show PDF outline': % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null 0m00.32s real 0m00.29s user 0m00.02s system % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null 0m00.31s real 0m00.29s user 0m00.02s system % time mutool show atkins_physical_chemistry.pdf outline >/dev/null 0m00.33s real 0m00.29s user 0m00.04s system % time mutool show atkins_physical_chemistry.pdf outline >/dev/null 0m00.30s real 0m00.25s user 0m00.04s system [ where atkins_physical_chemistry.pdf is the same 90+MB file I was testing in the previous bug report. ] I don't know JS at all so the script can probably be improved. The docs for the JS interface is at https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html If this approach is acceptable, we can simply run the JS script instead. WDYT? [ I couldn't attach the JS script thanks to Gmail's blocking the message. ] outline.js: var document = new Document.openDocument(scriptArgs[0], "application/pdf") var outline = document.loadOutline() if(!outline) quit() print("(") function pp(outl, level){ print("((level . " + level + ")") print("(title . " + repr(outl.title) + ")") print("(page . " + document.resolveLink(outl.uri) + "))") if(outl.down){ for(var i=0; i Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 05 Oct 2024 19:57:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 73638 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Visuwesh Cc: 73638@debbugs.gnu.org, Eli Zaretskii Received: via spool by 73638-submit@debbugs.gnu.org id=B73638.172815820414189 (code B ref 73638); Sat, 05 Oct 2024 19:57:02 +0000 Received: (at 73638) by debbugs.gnu.org; 5 Oct 2024 19:56:44 +0000 Received: from localhost ([127.0.0.1]:39641 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sxAtQ-0003gm-0P for submit@debbugs.gnu.org; Sat, 05 Oct 2024 15:56:44 -0400 Received: from eggs.gnu.org ([209.51.188.92]:46328) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sxAtM-0003gQ-AD for 73638@debbugs.gnu.org; Sat, 05 Oct 2024 15:56:42 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sxAtA-0000Vz-P6; Sat, 05 Oct 2024 15:56:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:Date:References:In-Reply-To:Subject:To: From; bh=j5mmU/+g2ub5ZF5Rmi7CSC+C0Mw/rODSAMGzqu0G6iY=; b=TXGMw6V4JyldsW0U8erQ GgipN9zdC6l+/Vo8v8fNEKwIUjx7HXRlIMgFybfekyM7wOUDSTR2pwUs7z5A8zQ+A5olrCWRskdVH i6IuZFUouSxEw1BwrRLpKkUTS1yKIAQfoTe8Dm0cyRyCbEUujYq9SYYF+ssFlTe8uLOjAGhaJR2E8 InEd4hCpiomvHfC6oIgjOPMmhQ3li+WFnHVc0xyIrHF1fF3VEWaroeg4r6Q+DAiEBqoNGPjIrVSo3 wRSZTU8UMOej5RCT34ZBLKKiSEoz8LKwTA4mngiRk283DZdcwqCxZ6RKiyfY8UoEg5I9aaMlmSqtt 0uHV5/UqWQdNoQ==; X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeftddrvddvhedgudeggecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivg hnthhsucdlqddutddtmdenucfjughrpefhvfevufgjfhgffffkgggtsehttdertddtredt necuhfhrohhmpefvrghsshhilhhoucfjohhrnhcuoehtshguhhesghhnuhdrohhrgheqne cuggftrfgrthhtvghrnhepjeehheeiveehieeludefjeelffeugeefkeegtddufeffteeu tdethfduvdehfeegnecuffhomhgrihhnpehrvggrughthhgvughotghsrdhiohenucevlh hushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehthhhorhhnodhm vghsmhhtphgruhhthhhpvghrshhonhgrlhhithihqdekieejfeekjeekgedqieefhedvle ekqdhtshguhheppehgnhhurdhorhhgsehfrghsthhmrghilhdrfhhmpdhnsggprhgtphht thhopeefpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopegvlhhiiiesghhnuhdroh hrghdprhgtphhtthhopeejfeeifeekseguvggssghughhsrdhgnhhurdhorhhgpdhrtghp thhtohepvhhishhufigvshhhmhesghhmrghilhdrtghomh X-ME-Proxy: Feedback-ID: ib2b94485:Fastmail From: Tassilo Horn In-Reply-To: <87ploebyhc.fsf@gmail.com> (Visuwesh's message of "Sat, 05 Oct 2024 16:36:23 +0530") References: <87ploebyhc.fsf@gmail.com> User-Agent: mu4e 1.12.6; emacs 31.0.50 Date: Sat, 05 Oct 2024 21:56:24 +0200 Message-ID: <87ed4upbmf.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Visuwesh writes: > However, one can write a JS script instead. Use the "attached" > "outline.js" script and run mutool as follows with a LaTeX PDF: > > % mutool run outline.js test.pdf > ( > ((level . 1) > (title . "Text") > (page . 0)) > ... > ) > > This can be directly `read' from Emacs skipping the parsing entirely. That's really nice. > JS evaluation takes the same amount of time as `mutool show PDF outline': > > % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null > 0m00.32s real 0m00.29s user 0m00.02s system > % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null > 0m00.31s real 0m00.29s user 0m00.02s system > % time mutool show atkins_physical_chemistry.pdf outline >/dev/null > 0m00.33s real 0m00.29s user 0m00.04s system > % time mutool show atkins_physical_chemistry.pdf outline >/dev/null > 0m00.30s real 0m00.25s user 0m00.04s system > > [ where atkins_physical_chemistry.pdf is the same 90+MB file I was > testing in the previous bug report. ] > > I don't know JS at all so the script can probably be improved. The > docs for the JS interface is at > > https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html > > If this approach is acceptable, we can simply run the JS script > instead. WDYT? To me it sounds great. But let's ask Eli as well. Eli, the executive summary is this. We already can read a PDFs outline mutool and use that for quick access to chapters and sections through imenu. However, it turned out that it depends on the PDF at hand if the outline is usable for our purpose where "usable" means we get page references. And it seems that many PDFs (e.g., those produced by LaTeX) have no page references but named references which won't do the trick for doc-view. Visuwesh figured out that one can run a JS script using "mutool run