GNU bug report logs -
#73638
31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Previous Next
Reported by: Visuwesh <visuweshm <at> gmail.com>
Date: Sat, 5 Oct 2024 11:07:02 UTC
Severity: normal
Found in version 31.0.50
Done: Tassilo Horn <tsdh <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.
Currently, if mutool reports the outline as
% mutool show test.pdf outline
| "Text" #nameddest=section.1
| "Annotations" #nameddest=section.2
| "Links" #nameddest=section.3
| "Attachments" #nameddest=section.4
+ "Outline" #nameddest=section.5
+ "subsection" #nameddest=subsection.5.1
| "subsubsection" #nameddest=subsubsection.5.1.1
then nothing can be done. Looking at the source code of mutool, it
looks like the "#..." part is simply a URI. AFAICT, there's no way to
resolve the URI to get the page number using mutool. However, one can
write a JS script instead. Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:
% mutool run outline.js test.pdf
(
((level . 1)
(title . "Text")
(page . 0))
((level . 1)
(title . "Annotations")
(page . 1))
((level . 1)
(title . "Links")
(page . 2))
((level . 1)
(title . "Attachments")
(page . 3))
((level . 1)
(title . "Outline")
(page . 4))
((level . 2)
(title . "subsection")
(page . 4))
((level . 3)
(title . "subsubsection")
(page . 4))
)
This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.32s real 0m00.29s user 0m00.02s system
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.31s real 0m00.29s user 0m00.02s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.33s real 0m00.29s user 0m00.04s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.30s real 0m00.25s user 0m00.04s system
[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
testing in the previous bug report. ]
I don't know JS at all so the script can probably be improved. The docs
for the JS interface is at
https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html
If this approach is acceptable, we can simply run the JS script instead.
WDYT?
[ I couldn't attach the JS script thanks to Gmail's blocking the
message. ]
outline.js:
var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()
print("(")
function pp(outl, level){
print("((level . " + level + ")")
print("(title . " + repr(outl.title) + ")")
print("(page . " + document.resolveLink(outl.uri) + "))")
if(outl.down){
for(var i=0; i<outl.down.length; i++){
pp(outl.down[i], level+1)
}
}
}
for(var i=0; i<outline.length; i++){
pp(outline[i], 1)
}
print(")")
This bug report was last modified 226 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.