GNU bug report logs - #34023
Support double colons in Info index entries

Previous Next

Package: emacs;

Reported by: Gavin Smith <gavinsmith0123 <at> gmail.com>

Date: Wed, 9 Jan 2019 21:14:01 UTC

Severity: normal

To reply to this bug, email your comments to 34023 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Wed, 09 Jan 2019 21:14:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gavin Smith <gavinsmith0123 <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 09 Jan 2019 21:14:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Gavin Smith <gavinsmith0123 <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Cc: bug-texinfo <at> gnu.org
Subject: Support double colons in Info index entries
Date: Wed, 9 Jan 2019 21:14:33 +0000
Emacs version checked: 26.1.

In the Info format colons are special, and for this reason, there is 
limited support for colons in index entries.  The Emacs Info mode 
supports single colons in index entries as long as they are not followed 
by a space.

There is this comment at the start of info.el:

;; Note that nowadays we expect Info files to be made using makeinfo.
;; In particular we make these assumptions:
;;  - a menu item MAY contain colons but not colon-space ": "
;;  - a menu item ending with ": " (but not ":: ") is an index entry
;;  - a node name MAY NOT contain a colon
;; This distinction is to support indexing of computer programming
;; language terms that may contain ":" but not ": ".

It doesn't state it, but when I tested it double colons don't work even 
if they are not followed by a space.

There is a fairly simple solution to this problem that I haven't seen 
suggested in all the messages posted on this topic in the mailing list 
archives. In index nodes only (which have a special marker included, 
^@^H[index^@^H]), use a colon to terminate the text of the index entry, 
but instead of looking for the first colon in the line, look for the 
last.  So this entry:

* a::b:  a colon b.  (line 129)

would refer to line 129 of the node "a colon b".  This is possible 
because node names cannot contain colons.  This restriction is not too 
important, whereas the inability to index items containing colons is 
quite important.  This is what is implemented in the standalone info 
browser (since change on 2017-04-08).

This change shouldn't be made for all nodes, because the comment after 
the closing '.' could contain a colon:

* label: node.  comment: with a colon.

This shouldn't be interpreted as refering to a node "with a colon".

However, the "(line ...)" comment can't contain a colon.

I'm not familiar with Emacs Lisp enough to propose a patch to implement 
this change myself.

The standalone info program also implemented a quoting mechanism 
(surrounding the text with a pair of 0x7F bytes) to allow nearly all 
characters to be included in node names and index entries.  This has 
never been implemented in Emacs Info and has never been used by default 
in texi2any's output.  I think my suggestion above would be sufficient 
and would work with existing Info files and versions of 
texi2any/makeinfo without anything breaking.  The quoting mechanism could 
potentially be removed from texi2any and info as nobody has ever used it 
and it makes things more complicated for no reason.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 00:10:02 GMT) Full text and rfc822 format available.

Message #8 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Gavin Smith <GavinSmith0123 <at> gmail.com>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Fri, 11 Jan 2019 02:04:32 +0200
[Message part 1 (text/plain, inline)]
Hi Gavin,

> In the Info format colons are special, and for this reason, there is 
> limited support for colons in index entries.  The Emacs Info mode 
> supports single colons in index entries as long as they are not followed 
> by a space.

Thanks for the detailed description.

> It doesn't state it, but when I tested it double colons don't work even 
> if they are not followed by a space.
>
> There is a fairly simple solution to this problem that I haven't seen 
> suggested in all the messages posted on this topic in the mailing list 
> archives. In index nodes only (which have a special marker included, 
> ^@^H[index^@^H]), use a colon to terminate the text of the index entry, 
> but instead of looking for the first colon in the line, look for the 
> last.  So this entry:
>
> * a::b:  a colon b.  (line 129)
>
> would refer to line 129 of the node "a colon b".  This is possible 
> because node names cannot contain colons.  This restriction is not too 
> important, whereas the inability to index items containing colons is 
> quite important.  This is what is implemented in the standalone info 
> browser (since change on 2017-04-08).

The following patch handles the cases that you presented,
but it's hard to predict what other cases it might break.

Do you have a sample test file that covers different cases?
We could add such file to Emacs regression tests.

> This change shouldn't be made for all nodes, because the comment after 
> the closing '.' could contain a colon:
>
> * label: node.  comment: with a colon.
>
> This shouldn't be interpreted as refering to a node "with a colon".
>
> However, the "(line ...)" comment can't contain a colon.

The following change is made only for index nodes.

I have to say that the current regexp-based parsing is
an inherently fragile approach.  Do you think it would be possible
to add more markup to Info files instead of relying on regexps?

Like index nodes having a special marker ^@^H[index^@^H]
maybe adding some markers to identify index entries,
node references, line numbers?

Better yet would be to read Info manual in HTML format in Info reader.
That would allow extracting all information unambiguously.

[info.el.support-double-colons-in-Info-index-entries.patch (text/x-diff, inline)]
diff --git a/lisp/info.el b/lisp/info.el
index 6038273c37..2f7e293297 100644
--- a/lisp/info.el
+++ b/lisp/info.el
@@ -2664,9 +2664,15 @@ Info-menu-entry-name-re
 Because of ambiguities, this should be concatenated with something like
 `:' and `Info-following-node-name-re'.")
 
+(defconst Info-index-entry-name-re "\\(?:[^:]\\|:[^,.;() \t\n]\\)*"
+  "Regexp that matches an index entry name possibly including a colon.")
+
 (defun Info-extract-menu-node-name (&optional multi-line index-node)
   (skip-chars-forward " \t\n")
-  (when (looking-at (concat Info-menu-entry-name-re ":\\(:\\|"
+  (when (looking-at (concat (if index-node
+                                Info-index-entry-name-re
+                                Info-menu-entry-name-re
+                              ) ":\\(:\\|"
 			    (Info-following-node-name-re
                              (cond
                               (index-node "^,\t\n")
@@ -2741,7 +2747,9 @@ Info-complete-menu-item
          (t
           (let ((pattern (concat "\n\\* +\\("
                                  (regexp-quote string)
-                                 Info-menu-entry-name-re "\\):"
+                                 (if (Info-index-node)
+                                     Info-index-entry-name-re
+                                   Info-menu-entry-name-re) "\\):"
                                  Info-node-spec-re))
                 completions
                 (complete-nodes Info-complete-nodes))
@@ -3966,7 +3974,8 @@ Info-try-follow-nearest-node
 	      (setq node t))
 	  (setq node nil))))
      ;; menu item: node name
-     ((setq node (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::"))
+     ((setq node (unless (Info-index-node)
+                   (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::")))
       (Info-goto-node node fork))
      ;; menu item: node name or index entry
      ((Info-get-token (point) "\\* +" "\\* +\\(.*\\): ")
@@ -4929,7 +4938,9 @@ Info-fontify-node
         (let ((n 0)
               cont)
           (while (re-search-forward
-                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" Info-menu-entry-name-re "\\)\\(:"
+                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" (if (Info-index-node)
+                                                            Info-index-entry-name-re
+                                                          Info-menu-entry-name-re) "\\)\\(:"
                           Info-node-spec-re "\\([ \t]*\\)\\)\\)")
                   nil t)
 	    (when (match-beginning 1)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 00:29:01 GMT) Full text and rfc822 format available.

Message #11 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Juri Linkov <juri <at> linkov.net>, Gavin Smith <GavinSmith0123 <at> gmail.com>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: RE: bug#34023: Support double colons in Info index entries
Date: Thu, 10 Jan 2019 16:28:00 -0800 (PST)
> The Emacs Info mode supports single colons in index
> entries as long as they are not followed by a space.

I thought they were verboten altogether.  Does this
mean that we can finally have index entries such as
`:type'?  That would be good.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 00:54:01 GMT) Full text and rfc822 format available.

Message #14 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Gavin Smith <GavinSmith0123 <at> gmail.com>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Thu, 10 Jan 2019 19:53:52 -0500
Gavin Smith wrote:

> This is what is implemented in the standalone info browser (since
> change on 2017-04-08).

"Defining the Entries of an Index" in the Texinfo manual continues to
say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry".




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 19:49:02 GMT) Full text and rfc822 format available.

Message #17 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Gavin Smith <gavinsmith0123 <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>, 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Fri, 11 Jan 2019 19:49:32 +0000
[Message part 1 (text/plain, inline)]
On Fri, Jan 11, 2019 at 07:46:31PM +0000, Gavin Smith wrote:
> I've attached a file that includes different possibilities.

Attaching file.
[index-test-cases.info (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 19:53:02 GMT) Full text and rfc822 format available.

Message #20 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Gavin Smith <gavinsmith0123 <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Fri, 11 Jan 2019 19:46:32 +0000
On Fri, Jan 11, 2019 at 02:04:32AM +0200, Juri Linkov wrote:
> The following patch handles the cases that you presented,
> but it's hard to predict what other cases it might break.
> 
> Do you have a sample test file that covers different cases?
> We could add such file to Emacs regression tests.

I've attached a file that includes different possibilities.

> I have to say that the current regexp-based parsing is
> an inherently fragile approach.  Do you think it would be possible
> to add more markup to Info files instead of relying on regexps?

I don't understand.  Whatever markup is added has to be read somehow, 
with regexp or other.

> Better yet would be to read Info manual in HTML format in Info reader.
> That would allow extracting all information unambiguously.

That would be a different project with several unresolved questions; this 
could be the way forward in the long term.  I would be opposed to making 
the standalone info program read HTML as this would be a complete 
rewrite of the program and there are probably better ways of dealing 
with it.


> diff --git a/lisp/info.el b/lisp/info.el
> index 6038273c37..2f7e293297 100644
> --- a/lisp/info.el
> +++ b/lisp/info.el
> @@ -2664,9 +2664,15 @@ Info-menu-entry-name-re
>  Because of ambiguities, this should be concatenated with something like
>  `:' and `Info-following-node-name-re'.")
>  
> +(defconst Info-index-entry-name-re "\\(?:[^:]\\|:[^,.;() \t\n]\\)*"
> +  "Regexp that matches an index entry name possibly including a colon.")
> +
>  (defun Info-extract-menu-node-name (&optional multi-line index-node)
>    (skip-chars-forward " \t\n")
> -  (when (looking-at (concat Info-menu-entry-name-re ":\\(:\\|"
> +  (when (looking-at (concat (if index-node
> +                                Info-index-entry-name-re
> +                                Info-menu-entry-name-re
> +                              ) ":\\(:\\|"
>  			    (Info-following-node-name-re
>                               (cond
>                                (index-node "^,\t\n")
> @@ -2741,7 +2747,9 @@ Info-complete-menu-item
>           (t
>            (let ((pattern (concat "\n\\* +\\("
>                                   (regexp-quote string)
> -                                 Info-menu-entry-name-re "\\):"
> +                                 (if (Info-index-node)
> +                                     Info-index-entry-name-re
> +                                   Info-menu-entry-name-re) "\\):"
>                                   Info-node-spec-re))
>                  completions
>                  (complete-nodes Info-complete-nodes))
> @@ -3966,7 +3974,8 @@ Info-try-follow-nearest-node
>  	      (setq node t))
>  	  (setq node nil))))
>       ;; menu item: node name
> -     ((setq node (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::"))
> +     ((setq node (unless (Info-index-node)
> +                   (Info-get-token (point) "\\* +" "\\* +\\([^:]*\\)::")))
>        (Info-goto-node node fork))
>       ;; menu item: node name or index entry
>       ((Info-get-token (point) "\\* +" "\\* +\\(.*\\): ")
> @@ -4929,7 +4938,9 @@ Info-fontify-node
>          (let ((n 0)
>                cont)
>            (while (re-search-forward
> -                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" Info-menu-entry-name-re "\\)\\(:"
> +                  (concat "^\\* Menu:\\|\\(?:^\\* +\\(" (if (Info-index-node)
> +                                                            Info-index-entry-name-re
> +                                                          Info-menu-entry-name-re) "\\)\\(:"
>                            Info-node-spec-re "\\([ \t]*\\)\\)\\)")
>                    nil t)
>  	    (when (match-beginning 1)





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 20:12:01 GMT) Full text and rfc822 format available.

Message #23 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Gavin Smith <gavinsmith0123 <at> gmail.com>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Fri, 11 Jan 2019 20:13:23 +0000
[Message part 1 (text/plain, inline)]
On Thu, Jan 10, 2019 at 07:53:52PM -0500, Glenn Morris wrote:
> Gavin Smith wrote:
> 
> > This is what is implemented in the standalone info browser (since
> > change on 2017-04-08).
> 
> "Defining the Entries of an Index" in the Texinfo manual continues to
> say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry".

Even if Info mode and the standalone Info browser are changed to 
support colons in index entries, people running older versions of these 
won't be able to read them.  However, texi2any does output the colon in 
the index entry without complaint.  See attached Texinfo input and Info 
output.  Newer versions of 'info' can deal with the colons in the index 
entries that are output here.

[colon-index.info (text/plain, attachment)]
[colon-index.texi (application/x-texinfo, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 20:14:02 GMT) Full text and rfc822 format available.

Message #26 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Gavin Smith <gavinsmith0123 <at> gmail.com>
To: Glenn Morris <rgm <at> gnu.org>, bug-texinfo <at> gnu.org, 34023 <at> debbugs.gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Fri, 11 Jan 2019 20:14:44 +0000
On Fri, Jan 11, 2019 at 08:13:23PM +0000, Gavin Smith wrote:
> On Thu, Jan 10, 2019 at 07:53:52PM -0500, Glenn Morris wrote:
> > Gavin Smith wrote:
> > 
> > > This is what is implemented in the standalone info browser (since
> > > change on 2017-04-08).
> > 
> > "Defining the Entries of an Index" in the Texinfo manual continues to
> > say (through Texinfo 6.5.90) "Caution: Do not use a colon in an index entry".
> 
> Even if Info mode and the standalone Info browser are changed to 
> support colons in index entries, people running older versions of these 
> won't be able to read them.  However, texi2any does output the colon in 
> the index entry without complaint.  See attached Texinfo input and Info 
> output.  Newer versions of 'info' can deal with the colons in the index 
> entries that are output here.
> 

There should still be a warning about this in the Texinfo manual, but it 
could be toned down.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Fri, 11 Jan 2019 20:33:01 GMT) Full text and rfc822 format available.

Message #29 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Gavin Smith <GavinSmith0123 <at> gmail.com>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Fri, 11 Jan 2019 15:32:35 -0500
Gavin Smith wrote:

> Even if Info mode and the standalone Info browser are changed to 
> support colons in index entries, people running older versions of these 
> won't be able to read them.

Sure. However, if Texinfo is intending to support them from version X,
IMO it should document that.

> However, texi2any does output the colon in the index entry without
> complaint.

Personally I think this is a bug, but Texinfo's previous maintainer
disagreed about what warnings were appropriate.

http://lists.gnu.org/r/bug-texinfo/2014-02/msg00029.html




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Sun, 13 Jan 2019 03:05:03 GMT) Full text and rfc822 format available.

Message #32 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Gavin Smith <GavinSmith0123 <at> gmail.com>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Sun, 13 Jan 2019 02:55:17 +0200
>> The following patch handles the cases that you presented,
>> but it's hard to predict what other cases it might break.
>>
>> Do you have a sample test file that covers different cases?
>> We could add such file to Emacs regression tests.
>
> I've attached a file that includes different possibilities.

Thanks.

>> I have to say that the current regexp-based parsing is
>> an inherently fragile approach.  Do you think it would be possible
>> to add more markup to Info files instead of relying on regexps?
>
> I don't understand.  Whatever markup is added has to be read somehow,
> with regexp or other.

This is a hint for using more XML-like markup languages with more
reliable parsing.

>> Better yet would be to read Info manual in HTML format in Info reader.
>> That would allow extracting all information unambiguously.
>
> That would be a different project with several unresolved questions; this
> could be the way forward in the long term.  I would be opposed to making
> the standalone info program read HTML as this would be a complete
> rewrite of the program and there are probably better ways of dealing
> with it.

Maybe not rewrite, but just adding a HTML "add-on" to the info program.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34023; Package emacs. (Wed, 16 Jan 2019 19:17:02 GMT) Full text and rfc822 format available.

Message #35 received at 34023 <at> debbugs.gnu.org (full text, mbox):

From: Gavin Smith <gavinsmith0123 <at> gmail.com>
To: Glenn Morris <rgm <at> gnu.org>
Cc: 34023 <at> debbugs.gnu.org, bug-texinfo <at> gnu.org
Subject: Re: bug#34023: Support double colons in Info index entries
Date: Wed, 16 Jan 2019 19:17:44 +0000
On Fri, Jan 11, 2019 at 03:32:35PM -0500, Glenn Morris wrote:
> Gavin Smith wrote:
> 
> > Even if Info mode and the standalone Info browser are changed to 
> > support colons in index entries, people running older versions of these 
> > won't be able to read them.
> 
> Sure. However, if Texinfo is intending to support them from version X,
> IMO it should document that.

I changed the wording a bit in git revision 3381bcb.




This bug report was last modified 6 years and 158 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.