GNU bug report logs - #73638
31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs

Previous Next

Package: emacs;

Reported by: Visuwesh <visuweshm <at> gmail.com>

Date: Sat, 5 Oct 2024 11:07:02 UTC

Severity: normal

Found in version 31.0.50

Done: Tassilo Horn <tsdh <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 73638 <at> debbugs.gnu.org (full text, mbox):

From: Tassilo Horn <tsdh <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 73638 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#73638: 31.0.50; doc-view: imenu index cannot be made for
 LaTeX PDFs
Date: Sat, 05 Oct 2024 21:56:24 +0200
Visuwesh <visuweshm <at> gmail.com> writes:

> However, one can write a JS script instead.  Use the "attached"
> "outline.js" script and run mutool as follows with a LaTeX PDF:
>
>     % mutool run outline.js test.pdf
>     (
>     ((level . 1)
>     (title . "Text")
>     (page . 0))
>     ...
>     )
>
> This can be directly `read' from Emacs skipping the parsing entirely.

That's really nice.

> JS evaluation takes the same amount of time as `mutool show PDF outline':
>
>     % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
>         0m00.32s real     0m00.29s user     0m00.02s system
>     % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
>         0m00.31s real     0m00.29s user     0m00.02s system
>     % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
>         0m00.33s real     0m00.29s user     0m00.04s system
>     % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
>         0m00.30s real     0m00.25s user     0m00.04s system
>
> [ where atkins_physical_chemistry.pdf is the same 90+MB file I was
>   testing in the previous bug report.  ]
>
> I don't know JS at all so the script can probably be improved.  The
> docs for the JS interface is at
>
>     https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html
>
> If this approach is acceptable, we can simply run the JS script
> instead.  WDYT?

To me it sounds great.  But let's ask Eli as well.

Eli, the executive summary is this.  We already can read a PDFs outline
mutool and use that for quick access to chapters and sections through
imenu.  However, it turned out that it depends on the PDF at hand if the
outline is usable for our purpose where "usable" means we get page
references.  And it seems that many PDFs (e.g., those produced by LaTeX)
have no page references but named references which won't do the trick
for doc-view.

Visuwesh figured out that one can run a JS script using "mutool run
<script> foo.pdf" for accessing the PDFs internal structure using the JS
mupdf API and wrote the below simple script which spits out the outline
with page references as sexp structure.

Would it be ok to distribute the below JS helper script with Emacs so
that doc-view can use it?  If so, how?  Maybe the simplest way would be
to just put it in some doc-view--mutool-outline-script variable and copy
it to doc-view-cache-directory when invoking imenu on a PDF file the
first time?

Bye,
  Tassilo

> outline.js:
>
> var document = new Document.openDocument(scriptArgs[0], "application/pdf")
> var outline = document.loadOutline()
> if(!outline) quit()
>
> print("(")
>
> function pp(outl, level){
>     print("((level . " + level + ")")
>     print("(title . " + repr(outl.title) + ")")
>     print("(page . " + document.resolveLink(outl.uri) + "))")
>     if(outl.down){
> 	for(var i=0; i<outl.down.length; i++){
> 	    pp(outl.down[i], level+1)
> 	}
>     }
> }
>
> for(var i=0; i<outline.length; i++){
>     pp(outline[i], 1)
> }
>
> print(")")




This bug report was last modified 227 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.