#20704 - info.el bug fix; Interprets Info format wrongly

GNU bug report logs - #20704
info.el bug fix; Interprets Info format wrongly

Package: emacs;

Reported by: Teddy Hogeborn <teddy <at> recompile.se>

Date: Sun, 31 May 2015 17:53:03 UTC

Severity: normal

Tags: patch

Merged with 13431

Found in version 24.2

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Message #21 received at 20704 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org> To: Teddy Hogeborn <teddy <at> recompile.se> Cc: monnier <at> iro.umontreal.ca, 20704 <at> debbugs.gnu.org Subject: Re: bug#20704: info.el bug fix; Interprets Info format wrongly Date: Tue, 09 Jun 2015 17:29:09 +0300

> From: Teddy Hogeborn <teddy <at> recompile.se> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, 20704 <at> debbugs.gnu.org > Date: Tue, 09 Jun 2015 13:09:08 +0200 > > Eli Zaretskii <eliz <at> gnu.org> writes: > > > > > + (+ (point-min) (byte-to-position > > > > + (read (current-buffer)))) > > > > > > Hmm... this only works if the Info file is encoded in UTF-8. I > > > guess in the case of Info, 99% of the files are just ASCII and > > > there's a chance that the vast majority of the rest is (or will be) > > > UTF-8, so maybe this hack works well in practice. > > > > Using byte-to-position would make things worse for Latin-1 and the > > likes. > > No, byte-to-position already checks for that: > > ---- src/marker.c, line 302 > /* If this buffer has as many characters as bytes, > each character must be one byte. > This takes care of the case where enable-multibyte-characters is nil. */ > if (best_above == best_above_byte) > return bytepos; > ---- I think you are misreading the code: the above snippet is for unibyte buffers, whereas a Latin-1 encoded Info file will be read into a multibyte buffer (and decoded into the internal Emacs representation of characters during the read). So this optimization is not going to work in that case. IOW, what matters for byte-to-position is the encoding used in representing characters in Emacs buffers, not the one used externally by the Info file on disk. > Therefore, an Info file in Latin-1 should work just fine. > > > But it shouldn't be hard to add a simple test of > > buffer-file-coding-system: if it states fixed-size encoding, like any > > of the 8-bit encodings, or UTF-16, > > the conversion to character position is trivial. > > I think you mean UTF-32 instead of UTF-16, since UTF-16 is variable- > length. UTF-16 is fixed length for characters in the BMP.

This bug report was last modified 5 years and 333 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #20704 info.el bug fix; Interprets Info format wrongly

GNU bug report logs - #20704
info.el bug fix; Interprets Info format wrongly