GNU bug report logs - #73638
31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs

Previous Next

Package: emacs;

Reported by: Visuwesh <visuweshm <at> gmail.com>

Date: Sat, 5 Oct 2024 11:07:02 UTC

Severity: normal

Found in version 31.0.50

Done: Tassilo Horn <tsdh <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Visuwesh <visuweshm <at> gmail.com>
Subject: bug#73638: closed (Re: bug#73638: 31.0.50; doc-view: imenu index
 cannot be made for LaTeX PDFs)
Date: Tue, 08 Oct 2024 15:44:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#73638: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs

which was filed against the emacs package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 73638 <at> debbugs.gnu.org.

-- 
73638: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=73638
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Tassilo Horn <tsdh <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 73638-done <at> debbugs.gnu.org
Subject: Re: bug#73638: 31.0.50; doc-view: imenu index cannot be made for
 LaTeX PDFs
Date: Tue, 08 Oct 2024 17:43:33 +0200
Visuwesh <visuweshm <at> gmail.com> writes:

>> Nope, now I get off-by-many-hundreds errors.  The Imenu entries have
>> the page number in parens, right?  If so, I have many references to
>> pages that are thrice as large as the actual number of pages, e.g.,
>> here some parts of the *Completions* buffer for the Atkins book:
>>
>> FOCUS.1.The.properties.of.gases.(341)
>> FOCUS.10.Molecular.symmetry.(4181)
>> FOCUS.11.Molecular.spectroscopy.(4481)
>> FOCUS.12.Magnetic.resonance.(5181)
>> FOCUS.13.Statistical.thermodynamics.(5621)
>> FOCUS.14.Molecular.interactions.(6141)
>> FOCUS.15.Solids.(6701)
>> FOCUS.16.Molecules.in.motio.(7201)
>> FOCUS.17.Chemical.kinetics.(7521)
>> FOCUS.18.Reaction.dynamics.(8101)
>> FOCUS.19.Processes.at.solid.surfaces.(8541)
>>
>> It's large but doesn't have more than 8000 pages.
>
> I messed up by not considering the precedence of operators.  :-( Fixed
> in the attached patch.

Works!  Applied and pushed.

Thanks a lot,
  Tassilo

[Message part 3 (message/rfc822, inline)]
From: Visuwesh <visuweshm <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 31.0.50; doc-view: imenu index cannot be made for LaTeX PDFs
Date: Sat, 05 Oct 2024 16:36:23 +0530
This is a follow up to bug#73530 where a discussion on how to obtain the
outlines for LaTeX PDFs was held.

Currently, if mutool reports the outline as

    % mutool show test.pdf outline
    |	"Text"	#nameddest=section.1
    |	"Annotations"	#nameddest=section.2
    |	"Links"	#nameddest=section.3
    |	"Attachments"	#nameddest=section.4
    +	"Outline"	#nameddest=section.5
    +		"subsection"	#nameddest=subsection.5.1
    |			"subsubsection"	#nameddest=subsubsection.5.1.1

then nothing can be done.  Looking at the source code of mutool, it
looks like the "#..." part is simply a URI.  AFAICT, there's no way to
resolve the URI to get the page number using mutool.  However, one can
write a JS script instead.  Use the "attached" "outline.js" script and run
mutool as follows with a LaTeX PDF:

    % mutool run outline.js test.pdf
    (
    ((level . 1)
    (title . "Text")
    (page . 0))
    ((level . 1)
    (title . "Annotations")
    (page . 1))
    ((level . 1)
    (title . "Links")
    (page . 2))
    ((level . 1)
    (title . "Attachments")
    (page . 3))
    ((level . 1)
    (title . "Outline")
    (page . 4))
    ((level . 2)
    (title . "subsection")
    (page . 4))
    ((level . 3)
    (title . "subsubsection")
    (page . 4))
    )

This can be directly `read' from Emacs skipping the parsing entirely.
JS evaluation takes the same amount of time as `mutool show PDF outline':

    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.32s real     0m00.29s user     0m00.02s system
    % time mutool run outline.js atkins_physical_chemistry.pdf >/dev/null
        0m00.31s real     0m00.29s user     0m00.02s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.33s real     0m00.29s user     0m00.04s system
    % time mutool show atkins_physical_chemistry.pdf outline >/dev/null
        0m00.30s real     0m00.25s user     0m00.04s system

[ where atkins_physical_chemistry.pdf is the same 90+MB file I was
  testing in the previous bug report.  ]

I don't know JS at all so the script can probably be improved.  The docs
for the JS interface is at

    https://mupdf.readthedocs.io/en/latest/mutool-run-js-api.html

If this approach is acceptable, we can simply run the JS script instead.
WDYT?

[ I couldn't attach the JS script thanks to Gmail's blocking the
  message.  ]

outline.js:

var document = new Document.openDocument(scriptArgs[0], "application/pdf")
var outline = document.loadOutline()
if(!outline) quit()

print("(")

function pp(outl, level){
    print("((level . " + level + ")")
    print("(title . " + repr(outl.title) + ")")
    print("(page . " + document.resolveLink(outl.uri) + "))")
    if(outl.down){
	for(var i=0; i<outl.down.length; i++){
	    pp(outl.down[i], level+1)
	}
    }
}

for(var i=0; i<outline.length; i++){
    pp(outline[i], 1)
}

print(")")



This bug report was last modified 227 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.