GNU bug report logs -
#73638
31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Previous Next
Reported by: Visuwesh <visuweshm <at> gmail.com>
Date: Sat, 5 Oct 2024 11:07:02 UTC
Severity: normal
Found in version 31.0.50
Done: Tassilo Horn <tsdh <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
which was filed against the emacs package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 73638 <at> debbugs.gnu.org.
--
73638: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=73638
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
Visuwesh <visuweshm <at> gmail.com> writes:
>> Nope, now I get off-by-many-hundreds errors. The Imenu entries have
>> the page number in parens, right? If so, I have many references to
>> pages that are thrice as large as the actual number of pages, e.g.,
>> here some parts of the *Completions* buffer for the Atkins book:
>>
>> FOCUS.1.The.properties.of.gases.(341)
>> FOCUS.10.Molecular.symmetry.(4181)
>> FOCUS.11.Molecular.spectroscopy.(4481)
>> FOCUS.12.Magnetic.resonance.(5181)
>> FOCUS.13.Statistical.thermodynamics.(5621)
>> FOCUS.14.Molecular.interactions.(6141)
>> FOCUS.15.Solids.(6701)
>> FOCUS.16.Molecules.in.motio.(7201)
>> FOCUS.17.Chemical.kinetics.(7521)
>> FOCUS.18.Reaction.dynamics.(8101)
>> FOCUS.19.Processes.at.solid.surfaces.(8541)
>>
>> It's large but doesn't have more than 8000 pages.
>
> I messed up by not considering the precedence of operators. :-( Fixed
> in the attached patch.
Works! Applied and pushed.
Thanks a lot,
Tassilo
[Message part 3 (message/rfc822, inline)]
This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.
Currently, if mutool reports the outline as
% mutool show test.pdf outline
| "Text" #nameddest=section.1
| "Annotations" #nameddest=section.2
| "Links" #nameddest=section.3
| "Attachments" #nameddest=section.4
+ "Outline" #nameddest=section.5
+ "subsection" #nameddest=subsection.5.1
| "subsubsection" #nameddest=subsubsection.5.1.1
then nothing can be done. Looking at the source code of mutool, it
looks like the "#..." part is simply a URI. AFAICT, there's no way to
resolve the URI to get the page number using mutool. However, one can
write a JS script instead. Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:
% mutool run outline.js test.pdf
(
((level . 1)
(title . "Text")
(page . 0))
((level . 1)
(title . "Annotations")
(page . 1))
((level . 1)
(title . "Links")
(page . 2))
((level . 1)
(title . "Attachments")
(page . 3))
((level . 1)
(title . "Outline")
(page . 4))
((level . 2)
(title . "subsection")
(page . 4))
((level . 3)
(title . "subsubsection")
(page . 4))
)
This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.32s real 0m00.29s user 0m00.02s system
% time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
0m00.31s real 0m00.29s user 0m00.02s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.33s real 0m00.29s user 0m00.04s system
% time mutool show atkins_physical_chemistry.pdf outline >/dev/null
0m00.30s real 0m00.25s user 0m00.04s system
[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
testing in the previous bug report. ]
I don't know JS at all so the script can probably be improved. The docs
for the JS interface is at
https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html
If this approach is acceptable, we can simply run the JS script instead.
WDYT?
[ I couldn't attach the JS script thanks to Gmail's blocking the
message. ]
outline.js:
var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()
print("(")
function pp(outl, level){
print("((level . " + level + ")")
print("(title . " + repr(outl.title) + ")")
print("(page . " + document.resolveLink(outl.uri) + "))")
if(outl.down){
for(var i=0; i<outl.down.length; i++){
pp(outl.down[i], level+1)
}
}
}
for(var i=0; i<outline.length; i++){
pp(outline[i], 1)
}
print(")")
This bug report was last modified 227 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.