GNU bug report logs - #21055
Info reader fails to follow xrefs to anchors

Previous Next

Package: emacs;

Reported by: Eli Zaretskii <eliz <at> gnu.org>

Date: Tue, 14 Jul 2015 14:59:02 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> jurta.org>
Cc: ludo <at> gnu.org, bug-gnu-emacs <at> gnu.org
Subject: Re: Info reader fails to follow xrefs to anchors
Date: Wed, 15 Jul 2015 18:09:56 +0300
> From: Juri Linkov <juri <at> jurta.org>
> Cc: ludo <at> gnu.org (Ludovic Courtès),
>   bug-gnu-emacs <at> gnu.org
> Date: Wed, 15 Jul 2015 02:16:32 +0300
> 
> I'm attaching here all the files that I used to fix bug#14125,
> so you could compare the output of different makeinfo versions
> and see the problem.  The command line used to translate
> Texinfo files was: makeinfo --split-size=2000 test.texi

Thanks.

I see the problem now.  It only happened in makeinfo 5.0 and 5.1, and
is fixed since 5.2.  Furthermore, it only rears its ugly head if the
Texinfo source has an @ifnottex block before the Top node; any other
blurbs usually put there, like @copying, @direntry, etc. -- don't
trigger the problem even in those 2 versions of makeinfo.  Moreover,
when this problem happens, it only affects the 1st subfile; the rest
have their offsets set correctly.  So it's a pretty rare combination
of conditions.

Therefore, I think we should fix the anchor use case by making the
value returned from Info-read-subfile as accurate as possible, and
then cater to the problematic output of makeinfo 5.0 and 5.1 by
attempting another search for a node with a larger slop value.

So any objections to the patch below?  It introduces a new
infrastructure, and then uses it to get the file byte offset
corresponding to the first node on a subfile.

--- lisp/international/mule-util.el~0	2015-06-21 06:45:33.000000000 +0300
+++ lisp/international/mule-util.el	2015-07-15 18:00:57.053036400 +0300
@@ -412,6 +412,79 @@
                        (decode-coding-region (point-min)
                                              (min (point-max) (+ pm byte))
                                              coding-system t))))))))))))
+;;;###autoload
+(defun bufferpos-to-filepos (position &optional quality coding-system)
+  "Try to return the file byte corresponding to a particular buffer POSITION.
+Value is the file position given as a (0-based) byte count.
+The function presumes the file is encoded with CODING-SYSTEM, which defaults
+to `buffer-file-coding-system'.
+QUALITY can be:
+  `approximate', in which case we may cut some corners to avoid
+    excessive work.
+  `exact', in which case we may end up re-(en/de)coding a large
+    part of the file/buffer.
+  nil, in which case we may return nil rather than an approximation."
+  (unless coding-system (setq coding-system buffer-file-coding-system))
+  (let* ((eol (coding-system-eol-type coding-system))
+         (lineno (if (= eol 1) (1- (line-number-at-pos position)) 0))
+         (type (coding-system-type coding-system))
+         (base (coding-system-base coding-system))
+         byte)
+    (and (eq type 'utf-8)
+         ;; Any post-read/pre-write conversions mean it's not really UTF-8.
+         (not (null (coding-system-get coding-system :post-read-conversion)))
+         (setq type 'not-utf-8))
+    (and (memq type '(charset raw-text undecided))
+         ;; The following are all of type 'charset', but they are
+         ;; actually variable-width encodings.
+         (not (memq base '(chinese-gbk chinese-gb18030 euc-tw euc-jis-2004
+                                       korean-iso-8bit chinese-iso-8bit
+                                       japanese-iso-8bit chinese-big5-hkscs
+                                       japanese-cp932 korean-cp949)))
+         (setq type 'single-byte))
+    (pcase type
+      (`utf-8
+       (setq byte (position-bytes position))
+       (when (null byte)
+         (if (<= position 0)
+             (setq byte 1)
+           (setq byte (position-bytes (point-max)))))
+       (setq byte (1- byte))
+       (+ byte
+          ;; Account for BOM, if any.
+          (if (coding-system-get coding-system :bom) 3 0)
+          ;; Account for CR in CRLF pairs.
+          lineno))
+      (`single-byte
+       (+ position -1 lineno))
+      ((and `utf-16
+            ;; FIXME: For utf-16, we could use the same approach as used for
+            ;; dos EOLs (counting the number of non-BMP chars instead of the
+            ;; number of lines).
+            (guard (not (eq quality 'exact))))
+       ;; In approximate mode, assume all characters are within the
+       ;; BMP, i.e. each one takes up 2 bytes.
+       (+ (* (1- position) 2)
+          ;; Account for BOM, if any.
+          (if (coding-system-get coding-system :bom) 2 0)
+          ;; Account for CR in CRLF pairs.
+          lineno))
+      (_
+       (pcase quality
+         (`approximate (+ (position-bytes position) -1 lineno))
+         (`exact
+          ;; Rather than assume that the file exists and still holds the right
+          ;; data, we reconstruct its relevant portion.
+          (let ((buf (current-buffer)))
+            (with-temp-buffer
+              (set-buffer-multibyte nil)
+              (let ((tmp-buf (current-buffer)))
+                (with-current-buffer buf
+                  (save-restriction
+                    (widen)
+                    (encode-coding-region (point-min) (min (point-max) position)
+                                          coding-system tmp-buf)))
+                (1- (point-max)))))))))))
 
 (provide 'mule-util)
 
--- lisp/info.el~0	2015-06-16 10:34:22.000000000 +0300
+++ lisp/info.el	2015-07-15 18:08:58.585385400 +0300
@@ -1217,6 +1217,18 @@
 		  (goto-char pos)
 		  (throw 'foo t)))
 
+              ;; If the Texinfo source had an @ifnottex block of text
+              ;; before the Top node, makeinfo 5.0 and 5.1 mistakenly
+              ;; omitted that block's size from the starting position
+              ;; of the 1st subfile, which makes GUESSPOS overshoot
+              ;; the correct position by the length of that text.  So
+              ;; we try again with a larger slop.
+              (goto-char (max (point-min) (- guesspos 10000)))
+	      (let ((pos (Info-find-node-in-buffer regexp strict-case)))
+		(when pos
+		  (goto-char pos)
+		  (throw 'foo t)))
+
               (when (string-match "\\([^.]+\\)\\." nodename)
                 (let (Info-point-loc)
                   (Info-find-node-2
@@ -1553,10 +1565,13 @@
     (if (looking-at "\^_")
 	(forward-char 1)
       (search-forward "\n\^_"))
-    ;; Don't add the length of the skipped summary segment to
-    ;; the value returned to `Info-find-node-2'.  (Bug#14125)
     (if (numberp nodepos)
-	(- nodepos lastfilepos))))
+        ;; Our caller ('Info-find-node-2') wants the (zero-based) byte
+        ;; offset corresponding to NODEPOS, from the beginning of the
+        ;; subfile.  This is especially important if NODEPOS is for an
+        ;; anchor reference, because for those the position is all we
+        ;; have.
+	(+ (- nodepos lastfilepos) (bufferpos-to-filepos (point) 'exact)))))
 
 (defun Info-unescape-quotes (value)
   "Unescape double quotes and backslashes in VALUE."





This bug report was last modified 10 years and 42 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.