GNU bug report logs - #20783
25.0.50; [PATCH] byte-to-position has internal off-by-one bug

Previous Next

Package: emacs;

Reported by: Wolfgang Jenkner <wjenkner <at> inode.at>

Date: Wed, 10 Jun 2015 15:20:05 UTC

Severity: normal

Tags: patch

Found in version 25.0.50

Fixed in version 25.1

Done: Wolfgang Jenkner <wjenkner <at> inode.at>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 20783 <at> debbugs.gnu.org (full text, mbox):

From: Wolfgang Jenkner <wjenkner <at> inode.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 20783 <at> debbugs.gnu.org
Subject: Re: bug#20783: 25.0.50;
 [PATCH] byte-to-position has internal off-by-one bug
Date: Tue, 16 Jun 2015 17:40:38 +0200
[Message part 1 (text/plain, inline)]
On Thu, Jun 11 2015, Wolfgang Jenkner wrote:

> The loop could be improved a bit by doing pointer arithmetic like in
> DEC_POS

I wondered whether such a change (to avoid unnecessary buffer gap
considerations while in the middle of a multibyte character) would
actually make a measurable difference, so, silly as that may be, I wrote
a simple benchmark for byte-to-position, using the tutorials as data
samples, and passed the results to ministat(1)[*], please see the
attached btp-ministat.el and ministat.sh for details.

[*] https://www.freebsd.org/cgi/man.cgi?query=ministat&sektion=1&manpath=FreeBSD+10.1-RELEASE

The result is that ministat reports statistical differences for several
of the tutorials (but not generally for the same languages at each run,
system load apparently generating too much statistical noise) and I find
that the version with the DEC_POS like loop is _always_ faster in those
cases (judging from the average values or just by taking a quick look at
the histograms).

So, while this is not really very important, it seems that it would be
safe to use the following patch with the improved loop instead:

[0001-src-editfns.c-Fbyte_to_position-Fix-bytepos-not-at-c.patch (text/x-diff, inline)]
From be2adf5b7b427ee5d84c9ae011d8d11d452c2f4e Mon Sep 17 00:00:00 2001
From: Wolfgang Jenkner <wjenkner <at> inode.at>
Date: Thu, 11 Jun 2015 16:21:21 +0200
Subject: [PATCH] * src/editfns.c (Fbyte_to_position): Fix bytepos not at char
 boundary.

The behavior now matches the description in the manual.  (Bug#20783)
---
 src/editfns.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/editfns.c b/src/editfns.c
index cddb0d4..ff54e73 100644
--- a/src/editfns.c
+++ b/src/editfns.c
@@ -1025,10 +1025,28 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0,
 If BYTEPOS is out of range, the value is nil.  */)
   (Lisp_Object bytepos)
 {
+  ptrdiff_t pos_byte;
+
   CHECK_NUMBER (bytepos);
-  if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE)
+  pos_byte = XINT (bytepos);
+  if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE)
     return Qnil;
-  return make_number (BYTE_TO_CHAR (XINT (bytepos)));
+  if (Z != Z_BYTE)
+    /* There are multibyte characters in the buffer.
+       The argument of BYTE_TO_CHAR must be a byte position at
+       a character boundary, so search for the start of the current
+       character.  */
+    {
+      unsigned char *chp = BYTE_POS_ADDR (pos_byte);
+
+      while (!CHAR_HEAD_P (*chp))
+	{
+	  pos_byte--;
+	  /* There's no buffer gap in the middle of a character.  */
+	  chp--;
+	}
+    }
+  return make_number (BYTE_TO_CHAR (pos_byte));
 }
 
 DEFUN ("following-char", Ffollowing_char, Sfollowing_char, 0, 0, 0,
-- 
2.4.2

[btp-ministat.el (application/emacs-lisp, attachment)]
[ministat.sh (text/x-sh, attachment)]

This bug report was last modified 9 years and 345 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.