GNU bug report logs - #48334
No <title> elements in HTML manual pages

Previous Next

Package: emacs;

Reported by: Maxim Nikulin <m.a.nikulin <at> gmail.com>

Date: Mon, 10 May 2021 14:49:02 UTC

Severity: normal

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 48334 in the body.
You can then email your comments to 48334 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Mon, 10 May 2021 14:49:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxim Nikulin <m.a.nikulin <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 10 May 2021 14:49:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxim Nikulin <m.a.nikulin <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: No <title> elements in HTML manual pages
Date: Mon, 10 May 2021 21:48:26 +0700
HTML pages of Emacs manual, e.g.
https://www.gnu.org/software/emacs/manual/html_node/elisp/Motion.html
do not have <title> element. Open page source in browser,
inspector in browser developer tools, or just fetch the page
using e.g. curl to see that metadata in <head> element
are rather scarce.

As a result, browser tab title is not informative. In the case of
Firefox in can be "google.com/url?q=http..." due to intermediate
redirection and a bug in Firefox https://bugzilla.mozilla.org/1401091
Even if Firefox had not this bug, node names instead of URLs
it tab titles would provide better user experience.

For the particular page, my expectation for <title> element content
is something like
- "30.2 Motion (Emacs Lisp)"
- "(elisp) Motion"
- "30.2 Motion"

Texinfo manual is not affected, its pages contains reasonable
<title>, e.g.
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Generating-HTML.html
I hope, it is enough to change some settings of HTML export for Emacs
manuals to improve quality of generated pages. However I am not familiar
with texinfo enough to provide instructions which options should be tuned.

The reason why I use HTML format of Emacs manuals is that I have not
enough experience with Emacs yet. So it easier to find particular
sections using search engines that take into account relevance or even
synonyms. Docstrings for Emacs functions and variables rarely have
direct links to texinfo nodes from manuals that provides higher level
overview or guide for related functionality.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Tue, 05 Oct 2021 14:15:02 GMT) Full text and rfc822 format available.

Message #8 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Nikulin <m.a.nikulin <at> gmail.com>
To: 48334 <at> debbugs.gnu.org
Subject: Re: No <title> elements in HTML manual pages
Date: Tue, 5 Oct 2021 21:14:04 +0700
> HTML pages of Emacs manual, e.g.
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Motion.html
> do not have <title> element.
...
> Texinfo manual is not affected, its pages contains reasonable
> <title>, e.g.
> https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Generating-HTML.html 

Emacs manual is generated by texi2html, texinfo and e.g. Org mode by
    makeinfo --html ...
In the latter case pages have <title> element, in the former they do not 
(at least without some tuning).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Sat, 02 Jul 2022 16:20:02 GMT) Full text and rfc822 format available.

Message #11 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Maxim Nikulin <m.a.nikulin <at> gmail.com>
Cc: 48334 <at> debbugs.gnu.org
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Sat, 02 Jul 2022 18:19:44 +0200
Maxim Nikulin <m.a.nikulin <at> gmail.com> writes:

>> HTML pages of Emacs manual, e.g.
>> https://www.gnu.org/software/emacs/manual/html_node/elisp/Motion.html
>> do not have <title> element.
> ...
>> Texinfo manual is not affected, its pages contains reasonable
>> <title>, e.g.
>> https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Generating-HTML.html
>
> Emacs manual is generated by texi2html, texinfo and e.g. Org mode by
>     makeinfo --html ...
> In the latter case pages have <title> element, in the former they do
> not (at least without some tuning).

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

These manuals still seem to be missing <title>s.  And texi2html has been
superseded by texi2any, which should be adding <title> elements
according to:

https://www.gnu.org/software/texinfo/manual/texinfo/html_node/HTML-Customization-Variables.html

Anybody know who's responsible for generating the HTML manuals?  

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Sat, 02 Jul 2022 17:03:01 GMT) Full text and rfc822 format available.

Message #14 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 48334 <at> debbugs.gnu.org, m.a.nikulin <at> gmail.com
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Sat, 02 Jul 2022 20:02:26 +0300
> Cc: 48334 <at> debbugs.gnu.org
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Date: Sat, 02 Jul 2022 18:19:44 +0200
> 
> Maxim Nikulin <m.a.nikulin <at> gmail.com> writes:
> 
> >> HTML pages of Emacs manual, e.g.
> >> https://www.gnu.org/software/emacs/manual/html_node/elisp/Motion.html
> >> do not have <title> element.
> > ...
> >> Texinfo manual is not affected, its pages contains reasonable
> >> <title>, e.g.
> >> https://www.gnu.org/software/texinfo/manual/texinfo/html_node/Generating-HTML.html
> >
> > Emacs manual is generated by texi2html, texinfo and e.g. Org mode by
> >     makeinfo --html ...
> > In the latter case pages have <title> element, in the former they do
> > not (at least without some tuning).
> 
> (I'm going through old bug reports that unfortunately weren't resolved
> at the time.)
> 
> These manuals still seem to be missing <title>s.  And texi2html has been
> superseded by texi2any, which should be adding <title> elements
> according to:
> 
> https://www.gnu.org/software/texinfo/manual/texinfo/html_node/HTML-Customization-Variables.html
> 
> Anybody know who's responsible for generating the HTML manuals?  

We are.  See the instructions in admin/make-tarball.txt and the
scripts admin/make-manuals and admin/upload-manuals.

I don't remember if texi2any produces <title>, but the above scripts
modify the HTML produced by texi2any, so what we eventually have is
the result of those scripts.

We could decide dropping admin/make-manuals, or at least the parts
that modify the produced HTML, but presumably those parts were written
for a reason.  Unfortunately, I see no detailed documentation of the
reasons for those changes, so it's hard to decide whether any of them
are still valid, what with Texinfo's progress since the time those
changes were coded.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Sun, 03 Jul 2022 12:17:01 GMT) Full text and rfc822 format available.

Message #17 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 48334 <at> debbugs.gnu.org, m.a.nikulin <at> gmail.com
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Sun, 03 Jul 2022 14:16:27 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> > Emacs manual is generated by texi2html, texinfo and e.g. Org mode by
>> >     makeinfo --html ...
>> > In the latter case pages have <title> element, in the former they do
>> > not (at least without some tuning).

[...]

> We are.  See the instructions in admin/make-tarball.txt and the
> scripts admin/make-manuals and admin/upload-manuals.
>
> I don't remember if texi2any produces <title>, but the above scripts
> modify the HTML produced by texi2any, so what we eventually have is
> the result of those scripts.

Hm...  it looks like the manuals are produced with "makeinfo --html",
though -- I can't see any usage of texi2html or texi2any there, but I
may be missing something.

> We could decide dropping admin/make-manuals, or at least the parts
> that modify the produced HTML, but presumably those parts were written
> for a reason.  Unfortunately, I see no detailed documentation of the
> reasons for those changes, so it's hard to decide whether any of them
> are still valid, what with Texinfo's progress since the time those
> changes were coded.

Ah, it's this code:

(defun manual-html-fix-headers ()
  "Fix up HTML headers for the Emacs manual in the current buffer."
  (let ((texi5 (search-forward "<!DOCTYPE" nil t))
	opoint)

[...]

    (search-forward "<meta")
    (setq opoint (match-beginning 0))
    (unless texi5
      (search-forward "<!--")
      (goto-char (match-beginning 0))
      (delete-region opoint (point))
      (search-forward "<meta http-equiv=\"Content-Style")
      (setq opoint (match-beginning 0)))
    (search-forward "</title>\n")
    (delete-region opoint (point))

So we delete the <title> that makeinfo --html has created.  Perhaps
that's just a bug?  I see that you adjusted this code in May...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Sun, 03 Jul 2022 13:14:01 GMT) Full text and rfc822 format available.

Message #20 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 48334 <at> debbugs.gnu.org, m.a.nikulin <at> gmail.com
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Sun, 03 Jul 2022 16:13:20 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: m.a.nikulin <at> gmail.com,  48334 <at> debbugs.gnu.org
> Date: Sun, 03 Jul 2022 14:16:27 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > I don't remember if texi2any produces <title>, but the above scripts
> > modify the HTML produced by texi2any, so what we eventually have is
> > the result of those scripts.
> 
> Hm...  it looks like the manuals are produced with "makeinfo --html",
> though -- I can't see any usage of texi2html or texi2any there, but I
> may be missing something.

makeinfo is supposed to be a symlink to texi2any.

> Ah, it's this code:
> 
> (defun manual-html-fix-headers ()
>   "Fix up HTML headers for the Emacs manual in the current buffer."
>   (let ((texi5 (search-forward "<!DOCTYPE" nil t))
> 	opoint)
> 
> [...]
> 
>     (search-forward "<meta")
>     (setq opoint (match-beginning 0))
>     (unless texi5
>       (search-forward "<!--")
>       (goto-char (match-beginning 0))
>       (delete-region opoint (point))
>       (search-forward "<meta http-equiv=\"Content-Style")
>       (setq opoint (match-beginning 0)))
>     (search-forward "</title>\n")
>     (delete-region opoint (point))

Yes.  (But that's not the only editing we do, although the rest isn't
relevant to <title>, I think.)

> So we delete the <title> that makeinfo --html has created.  Perhaps
> that's just a bug?

It is definitely done on purpose, but I don't know what is the purpose
of deleting <title> (and many other parts of the headers as well).

> I see that you adjusted this code in May...

I made changes there because someone reported a problem with reading
the manuals on mobile devices, because we were deleting the line with
'<meta name="viewport"...', which in latest Texinfo takes care of
adjusting the viewport to the width of the device display.  My changes
were supposed to avoid deletion of this header (and a few others), but
I don't think I kept <title>.

I think the solution to this is for some HTML5 expert to look at our
edits vs what Texinfo 6.8 produces, and tell which parts of the
editing are needed (and why) and which aren't.  I'm far from being
that expert.

Failing that, I think the only alternative is to see how the original
Texinfo output looks in a browser, compare that with the edited
manuals, and then decide which of the edits are really needed.  One
problem with that is that we'll probably have to require Texinfo 6.8
or later if we go that way, because maintaining compatibility with
multiple Texinfo versions is really too much.  Ideally, we should keep
the edits to the absolute minimum.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Sun, 03 Jul 2022 14:49:02 GMT) Full text and rfc822 format available.

Message #23 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Max Nikulin <manikulin <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>, Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 48334 <at> debbugs.gnu.org
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Sun, 3 Jul 2022 21:48:13 +0700
On 03/07/2022 20:13, Eli Zaretskii wrote:
>> From: Lars Ingebrigtsen
>> Date: Sun, 03 Jul 2022 14:16:27 +0200
>>        (setq opoint (match-beginning 0)))
>>      (search-forward "</title>\n")
>>      (delete-region opoint (point))
> 
> Yes.  (But that's not the only editing we do, although the rest isn't
> relevant to <title>, I think.)

Deleting of text till "<title>" should be a rather local change. Till 
May the region till "</head>" was removed.

By the way, is there a reason why DC.title meta is set to gnu.org, not 
to the title of current node or at least the manual. I am not familiar 
with Dublin Core, but I expect it is rich enough to express both and 
gnu.org as well.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Mon, 04 Jul 2022 10:43:02 GMT) Full text and rfc822 format available.

Message #26 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 48334 <at> debbugs.gnu.org, m.a.nikulin <at> gmail.com
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Mon, 04 Jul 2022 12:42:42 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> makeinfo is supposed to be a symlink to texi2any.

Yes, indeed.

> I made changes there because someone reported a problem with reading
> the manuals on mobile devices, because we were deleting the line with
> '<meta name="viewport"...', which in latest Texinfo takes care of
> adjusting the viewport to the width of the device display.  My changes
> were supposed to avoid deletion of this header (and a few others), but
> I don't think I kept <title>.

I tried running the code now (and commented out the
manual-html-fix-headers function), and I ended up with:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!-- This file describes the Emacs auth-source library.

Copyright (C) 2008-2022 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with the Front-Cover Texts being "A GNU Manual,"
and with the Back-Cover Texts as in (a) below.  A copy of the license
is included in the section entitled "GNU Free Documentation License".

(a) The FSF's Back-Cover Text is: "You have the freedom to copy and
modify this GNU manual." -->
<title>Emacs auth-source Library 0.3</title>

This is with texi2any (GNU texinfo) 6.8.  If I'm reading the code right,
the delete-region here is just deleting that <meta, the comment, and the
<title>.

It's probably different in every texinfo version, but altering the

    (search-forward "</title>\n")

to

    (search-forward "<title>")

should be safe in any case, so I'll go ahead and do that.

> Failing that, I think the only alternative is to see how the original
> Texinfo output looks in a browser, compare that with the edited
> manuals, and then decide which of the edits are really needed.  One
> problem with that is that we'll probably have to require Texinfo 6.8
> or later if we go that way, because maintaining compatibility with
> multiple Texinfo versions is really too much.  Ideally, we should keep
> the edits to the absolute minimum.

I think altering the HTML in this way isn't idea.  It'd be much better
to just parse the HTML, alter the DOM (to remove/insert elements), and
then write the DOM out to HTML again.  That'd be a whole lot less
brittle.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 29.1, send any further explanations to 48334 <at> debbugs.gnu.org and Maxim Nikulin <m.a.nikulin <at> gmail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Mon, 04 Jul 2022 10:48:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Mon, 04 Jul 2022 11:37:02 GMT) Full text and rfc822 format available.

Message #31 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 48334 <at> debbugs.gnu.org, m.a.nikulin <at> gmail.com
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Mon, 04 Jul 2022 14:36:07 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: m.a.nikulin <at> gmail.com,  48334 <at> debbugs.gnu.org
> Date: Mon, 04 Jul 2022 12:42:42 +0200
> 
> I tried running the code now (and commented out the
> manual-html-fix-headers function), and I ended up with:
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
> <html>
> <!-- Created by GNU Texinfo 6.8, https://www.gnu.org/software/texinfo/ -->
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
> <!-- This file describes the Emacs auth-source library.
> 
> Copyright (C) 2008-2022 Free Software Foundation, Inc.
> 
> Permission is granted to copy, distribute and/or modify this document
> under the terms of the GNU Free Documentation License, Version 1.3 or
> any later version published by the Free Software Foundation; with no
> Invariant Sections, with the Front-Cover Texts being "A GNU Manual,"
> and with the Back-Cover Texts as in (a) below.  A copy of the license
> is included in the section entitled "GNU Free Documentation License".
> 
> (a) The FSF's Back-Cover Text is: "You have the freedom to copy and
> modify this GNU manual." -->
> <title>Emacs auth-source Library 0.3</title>
> 
> This is with texi2any (GNU texinfo) 6.8.  If I'm reading the code right,
> the delete-region here is just deleting that <meta, the comment, and the
> <title>.

That's strange, because I remember testing the changes, and I also
used Texinfo 6.8.  Did you compare the produced HTML with what's on
the Web site?  That should show the differences clearly.  Also, I
think the title (and the file I worked mostly) is index.html -- did
you look at that, or did you look at some other file?

> > > Failing that, I think the only alternative is to see how the original
> > > Texinfo output looks in a browser, compare that with the edited
> > > manuals, and then decide which of the edits are really needed.  One
> > > problem with that is that we'll probably have to require Texinfo 6.8
> > > or later if we go that way, because maintaining compatibility with
> > > multiple Texinfo versions is really too much.  Ideally, we should keep
> > > the edits to the absolute minimum.
> > 
> > I think altering the HTML in this way isn't idea.  It'd be much better
> > to just parse the HTML, alter the DOM (to remove/insert elements), and
> > then write the DOM out to HTML again.  That'd be a whole lot less
> > brittle.

That's fine with me, but that, too, assumes someone who can understand
the resulting DOM, and which of its parts we want to change and why.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#48334; Package emacs. (Tue, 05 Jul 2022 11:10:02 GMT) Full text and rfc822 format available.

Message #34 received at 48334 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 48334 <at> debbugs.gnu.org, m.a.nikulin <at> gmail.com
Subject: Re: bug#48334: No <title> elements in HTML manual pages
Date: Tue, 05 Jul 2022 13:09:05 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> That's strange, because I remember testing the changes, and I also
> used Texinfo 6.8.  Did you compare the produced HTML with what's on
> the Web site?  That should show the differences clearly.  Also, I
> think the title (and the file I worked mostly) is index.html -- did
> you look at that, or did you look at some other file?

I looked at the auth-source mono version of the HTML mainly.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 02 Aug 2022 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.