GNU bug report logs -
#73638
31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Previous Next
Reported by: Visuwesh <visuweshm <at> gmail.com>
Date: Sat, 5 Oct 2024 11:07:02 UTC
Severity: normal
Found in version 31.0.50
Done: Tassilo Horn <tsdh <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your message dated Tue, 08 Oct 2024 17:43:33 +0200
with message-id <87o73uwqfu.fsf <at> gnu.org>
and subject line Re: bug#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
has caused the debbugs.gnu.org bug report #73638,
regarding 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
to be marked as done.
(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)
--
73638: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=73638
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.
Currently, if mutool reports the outline as
% mutool show test.pdf outline
| "Text" #nameddest=section.1
| "Annotations" #nameddest=section.2
| "Links" #nameddest=section.3
| "Attachments" #nameddest=section.4
+ "Outline" #nameddest=section.5
+ "subsection" #nameddest=subsection.5.1
| "subsubsection" #nameddest=subsubsection.5.1.1
then nothing can be done. Looking at the source code of mutool, it
looks like the "#..." part is simply a URI. AFAICT, there's no way to
resolve the URI to get the page number using mutool. However, one can
write a JS script instead. Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:
% mutool run outline.js test.pdf
(
((level . 1)
(title . "Text")
(page . 0))
((level . 1)
(title . "Annotations")
(page . 1))
((level . 1)
(title . "Links")
(page . 2))
((level . 1)
(title . "Attachments")
(page . 3))
((level . 1)
(title . "Outline")
(page . 4))
((level . 2)
(title . "subsection")
(page . 4))
((level . 3)
(title . "subsubsection")
(page . 4))
)
This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.32s real 0m00.29s user 0m00.02s system
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.31s real 0m00.29s user 0m00.02s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.33s real 0m00.29s user 0m00.04s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.30s real 0m00.25s user 0m00.04s system
[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
testing in the previous bug report. ]
I don't know JS at all so the script can probably be improved. The docs
for the JS interface is at
https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html
If this approach is acceptable, we can simply run the JS script instead.
WDYT?
[ I couldn't attach the JS script thanks to Gmail's blocking the
message. ]
outline.js:
var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()
print("(")
function pp(outl, level){
print("((level . " + level + ")")
print("(title . " + repr(outl.title) + ")")
print("(page . " + document.resolveLink(outl.uri) + "))")
if(outl.down){
for(var i=0; i<outl.down.length; i++){
pp(outl.down[i], level+1)
}
}
}
for(var i=0; i<outline.length; i++){
pp(outline[i], 1)
}
print(")")
[Message part 3 (message/rfc822, inline)]
Visuwesh <visuweshm <at> gmail.com> writes:
>> Nope, now I get off-by-many-hundreds errors. The Imenu entries have
>> the page number in parens, right? If so, I have many references to
>> pages that are thrice as large as the actual number of pages, e.g.,
>> here some parts of the *Completions* buffer for the Atkins book:
>>
>> FOCUS.1.The.properties.of.gases.(341)
>> FOCUS.10.Molecular.symmetry.(4181)
>> FOCUS.11.Molecular.spectroscopy.(4481)
>> FOCUS.12.Magnetic.resonance.(5181)
>> FOCUS.13.Statistical.thermodynamics.(5621)
>> FOCUS.14.Molecular.interactions.(6141)
>> FOCUS.15.Solids.(6701)
>> FOCUS.16.Molecules.in.motio.(7201)
>> FOCUS.17.Chemical.kinetics.(7521)
>> FOCUS.18.Reaction.dynamics.(8101)
>> FOCUS.19.Processes.at.solid.surfaces.(8541)
>>
>> It's large but doesn't have more than 8000 pages.
>
> I messed up by not considering the precedence of operators. :-( Fixed
> in the attached patch.
Works! Applied and pushed.
Thanks a lot,
Tassilo
This bug report was last modified 226 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.