GNU bug report logs - #73638
31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs

Previous Next

Package: emacs;

Reported by: Visuwesh <visuweshm <at> gmail.com>

Date: Sat, 5 Oct 2024 11:07:02 UTC

Severity: normal

Found in version 31.0.50

Done: Tassilo Horn <tsdh <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Visuwesh <visuweshm <at> gmail.com>
To: 73638 <at> debbugs.gnu.org
Cc: "Tassilo Horn" <tsdh <at> gnu.org>
Subject: bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Date: Sat, 05 Oct 2024 16:36:23 +0530
This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.

Currently, if mutool reports the outline as

    % mutool show test.pdf outline
    |	"Text"	#nameddest=section.1
    |	"Annotations"	#nameddest=section.2
    |	"Links"	#nameddest=section.3
    |	"Attachments"	#nameddest=section.4
    +	"Outline"	#nameddest=section.5
    +		"subsection"	#nameddest=subsection.5.1
    |			"subsubsection"	#nameddest=subsubsection.5.1.1

then nothing can be done.  Looking at the source code of mutool, it
looks like the "#..." part is simply a URI.  AFAICT, there's no way to
resolve the URI to get the page number using mutool.  However, one can
write a JS script instead.  Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:

    % mutool run outline.js test.pdf
    (
    ((level . 1)
    (title . "Text")
    (page . 0))
    ((level . 1)
    (title . "Annotations")
    (page . 1))
    ((level . 1)
    (title . "Links")
    (page . 2))
    ((level . 1)
    (title . "Attachments")
    (page . 3))
    ((level . 1)
    (title . "Outline")
    (page . 4))
    ((level . 2)
    (title . "subsection")
    (page . 4))
    ((level . 3)
    (title . "subsubsection")
    (page . 4))
    )

This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':

    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.32s real     0m00.29s user     0m00.02s system
    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.31s real     0m00.29s user     0m00.02s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.33s real     0m00.29s user     0m00.04s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.30s real     0m00.25s user     0m00.04s system

[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
  testing in the previous bug report.  ]

I don't know JS at all so the script can probably be improved.  The docs
for the JS interface is at

    https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html

If this approach is acceptable, we can simply run the JS script instead.
WDYT?

[ I couldn't attach the JS script thanks to Gmail's blocking the
  message.  ]

outline.js:

var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()

print("(")

function pp(outl, level){
    print("((level . " + level + ")")
    print("(title . " + repr(outl.title) + ")")
    print("(page . " + document.resolveLink(outl.uri) + "))")
    if(outl.down){
	for(var i=0; i<outl.down.length; i++){
	    pp(outl.down[i], level+1)
	}
    }
}

for(var i=0; i<outline.length; i++){
    pp(outline[i], 1)
}

print(")")




This bug report was last modified 226 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.