From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Wolfgang Jenkner Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 10 Jun 2015 15:20:05 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: 20783@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.143394958622116 (code B ref -1); Wed, 10 Jun 2015 15:20:05 +0000 Received: (at submit) by debbugs.gnu.org; 10 Jun 2015 15:19:46 +0000 Received: from localhost ([127.0.0.1]:47377 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z2hn8-0005ke-3y for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:46 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46890) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z2hn6-0005kV-EX for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z2hn5-0005zg-HM for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:44 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:60872) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2hn5-0005zc-6e for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:43 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60387) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2hn4-0006Sq-0y for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 11:19:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z2hmy-0005xT-W2 for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 11:19:41 -0400 Received: from b2bfep12.mx.upcmail.net ([62.179.121.57]:39296) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2hmy-0005vy-N3 for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 11:19:36 -0400 Received: from edge12.upcmail.net ([192.168.13.82]) by b2bfep12.mx.upcmail.net (InterMail vM.8.01.05.11 201-2260-151-128-20120928) with ESMTP id <20150610151933.QUNI604.b2bfep12-int.chello.at@edge12.upcmail.net> for ; Wed, 10 Jun 2015 17:19:33 +0200 Received: from iznogoud.viz ([91.119.92.228]) by edge12.upcmail.net with edge id efKY1q01E4vdLJb0CfKYSp; Wed, 10 Jun 2015 17:19:33 +0200 X-SourceIP: 91.119.92.228 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z2hmu-000157-Hj for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 17:19:32 +0200 From: Wolfgang Jenkner Date: Wed, 10 Jun 2015 17:13:30 +0200 Message-ID: <85fv5za8vv.fsf@iznogoud.viz> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Here's a test case for the bug: (with-temp-buffer (insert "=C3=A9=C3=A9") (let ((i 1) pos res) (while (setq pos (byte-to-position i)) (push pos res) (setq i (1+ i))) (nreverse res))) =3D> (1 2 2 2 3) while the correct result is =3D> (1 1 2 2 3) I found that this bug had been noticed before in http://stackoverflow.com/questions/17588117/emacs-byte-to-position-function= -is-not-consistent-with-document Here's a patch. The fix may look a bit clumsy but it's actually meant to avoid pessimizing the presumably common case where the initial bytepos is at a character boundary. -- >8 -- Subject: [PATCH] * src/marker.c (buf_bytepos_to_charpos): Fix best_below_by= te count. If bytepos is not after a character boundary the preceding loop overshoots by one character position. --- src/marker.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/marker.c b/src/marker.c index 73928ba..94d676b 100644 --- a/src/marker.c +++ b/src/marker.c @@ -341,6 +341,12 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t by= tepos) BUF_INC_POS (b, best_below_byte); } =20 + if (best_below_byte !=3D bytepos) + { + best_below--; + BUF_DEC_POS (b, best_below_byte); + } + /* If this position is quite far from the nearest known position, cache the correspondence by creating a marker here. It will last until the next GC. --=20 2.4.2 From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 10 Jun 2015 17:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Wolfgang Jenkner Cc: 20783@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.14339566534323 (code B ref 20783); Wed, 10 Jun 2015 17:18:02 +0000 Received: (at 20783) by debbugs.gnu.org; 10 Jun 2015 17:17:33 +0000 Received: from localhost ([127.0.0.1]:47415 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z2jd5-00017e-UW for submit@debbugs.gnu.org; Wed, 10 Jun 2015 13:17:32 -0400 Received: from mtaout28.012.net.il ([80.179.55.184]:41349) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z2jd4-00017S-7w for 20783@debbugs.gnu.org; Wed, 10 Jun 2015 13:17:31 -0400 Received: from conversion-daemon.mtaout28.012.net.il by mtaout28.012.net.il (HyperSendmail v2007.08) id <0NPQ00J00NVV3K00@mtaout28.012.net.il> for 20783@debbugs.gnu.org; Wed, 10 Jun 2015 20:16:57 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout28.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NPQ009F3O0969B0@mtaout28.012.net.il>; Wed, 10 Jun 2015 20:16:57 +0300 (IDT) Date: Wed, 10 Jun 2015 20:17:20 +0300 From: Eli Zaretskii In-reply-to: <85fv5za8vv.fsf@iznogoud.viz> X-012-Sender: halo1@inter.net.il Message-id: <83vbevsctb.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <85fv5za8vv.fsf@iznogoud.viz> X-Spam-Score: 1.0 (+) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: Wolfgang Jenkner > Date: Wed, 10 Jun 2015 17:13:30 +0200 > > Here's a test case for the bug: > > (with-temp-buffer > (insert "éé") > (let ((i 1) pos res) > (while (setq pos (byte-to-position i)) > (push pos res) > (setq i (1+ i))) > (nreverse res))) > > => (1 2 2 2 3) > > while the correct result is > > => (1 1 2 2 3) > > I found that this bug had been noticed before in > > http://stackoverflow.com/questions/17588117/emacs-byte-to-position-function-is-not-consistent-with-document > > Here's a patch. The fix may look a bit clumsy but it's actually meant > to avoid pessimizing the presumably common case where the initial > bytepos is at a character boundary. Wouldn't it be better to handle this use case in Fbyte_to_position? The BYTE_TO_CHAR macro is called an awful lot in the Emacs innermost loops, and it's _always_ called with a byte position that's on a character boundary. So punishing all that code with even a single comparison, for the benefit of a use case whose importance is unclear to me is not necessarily TRT. Thanks. From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug In-Reply-To: <85fv5za8vv.fsf@iznogoud.viz> Resent-From: Wolfgang Jenkner Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Jun 2015 15:49:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 20783@debbugs.gnu.org Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143403768827996 (code B ref 20783); Thu, 11 Jun 2015 15:49:02 +0000 Received: (at 20783) by debbugs.gnu.org; 11 Jun 2015 15:48:08 +0000 Received: from localhost ([127.0.0.1]:50946 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z34i7-0007HT-Jg for submit@debbugs.gnu.org; Thu, 11 Jun 2015 11:48:07 -0400 Received: from b2bfep13.mx.upcmail.net ([62.179.121.58]:35474) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z34i5-0007Gy-Po for 20783@debbugs.gnu.org; Thu, 11 Jun 2015 11:48:06 -0400 Received: from edge12.upcmail.net ([192.168.13.82]) by b2bfep13.mx.upcmail.net (InterMail vM.8.01.05.11 201-2260-151-128-20120928) with ESMTP id <20150611154758.GKXM20529.b2bfep13-int.chello.at@edge12.upcmail.net> for <20783@debbugs.gnu.org>; Thu, 11 Jun 2015 17:47:58 +0200 Received: from iznogoud.viz ([85.127.86.146]) by edge12.upcmail.net with edge id f3ny1q00E39SPx90C3nyYG; Thu, 11 Jun 2015 17:47:58 +0200 X-SourceIP: 85.127.86.146 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z34hx-0001FY-SV; Thu, 11 Jun 2015 17:47:57 +0200 From: Wolfgang Jenkner Date: Thu, 11 Jun 2015 17:24:42 +0200 References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> Message-ID: <85oakmxn4i.fsf@iznogoud.viz> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Wed, Jun 10 2015, Eli Zaretskii wrote: > Wouldn't it be better to handle this use case in Fbyte_to_position? > The BYTE_TO_CHAR macro is called an awful lot in the Emacs innermost > loops, and it's _always_ called with a byte position that's on a > character boundary. I see. How about something like the patch below? The loop could be improved a bit by doing pointer arithmetic like in DEC_POS but it's perhaps not worth complicating things for the case where bytepos is not at a character boundary. -- >8 -- Subject: [PATCH] * editfns.c (Fbyte_to_position): Fix bytepos not at char boundary. The behavior now matches the description in the manual. (Bug#20783) --- src/editfns.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/src/editfns.c b/src/editfns.c index cddb0d4..94715fe 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -1025,10 +1025,18 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0, If BYTEPOS is out of range, the value is nil. */) (Lisp_Object bytepos) { + ptrdiff_t pos_byte; + CHECK_NUMBER (bytepos); - if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE) + pos_byte = XINT (bytepos); + if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE) return Qnil; - return make_number (BYTE_TO_CHAR (XINT (bytepos))); + if (Z != Z_BYTE) + /* There are multibyte characters in the buffer. + Search for the start of the current character. */ + while (!CHAR_HEAD_P (FETCH_BYTE (pos_byte))) + pos_byte--; + return make_number (BYTE_TO_CHAR (pos_byte)); } DEFUN ("following-char", Ffollowing_char, Sfollowing_char, 0, 0, 0, -- 2.4.2 From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Jun 2015 16:05:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Wolfgang Jenkner Cc: 20783@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143403870029443 (code B ref 20783); Thu, 11 Jun 2015 16:05:03 +0000 Received: (at 20783) by debbugs.gnu.org; 11 Jun 2015 16:05:00 +0000 Received: from localhost ([127.0.0.1]:50954 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z34yR-0007ep-7U for submit@debbugs.gnu.org; Thu, 11 Jun 2015 12:04:59 -0400 Received: from mtaout29.012.net.il ([80.179.55.185]:59070) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z34yO-0007eV-G5 for 20783@debbugs.gnu.org; Thu, 11 Jun 2015 12:04:57 -0400 Received: from conversion-daemon.mtaout29.012.net.il by mtaout29.012.net.il (HyperSendmail v2007.08) id <0NPS00100F9SLQ00@mtaout29.012.net.il> for 20783@debbugs.gnu.org; Thu, 11 Jun 2015 19:03:57 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout29.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NPS00KE0FAK6G70@mtaout29.012.net.il>; Thu, 11 Jun 2015 19:03:57 +0300 (IDT) Date: Thu, 11 Jun 2015 19:04:24 +0300 From: Eli Zaretskii In-reply-to: <85oakmxn4i.fsf@iznogoud.viz> X-012-Sender: halo1@inter.net.il Message-id: <83bngms03b.fsf@gnu.org> References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> X-Spam-Score: 3.5 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Wolfgang Jenkner > Cc: 20783@debbugs.gnu.org > Date: Thu, 11 Jun 2015 17:24:42 +0200 > > I see. How about something like the patch below? The loop could be > improved a bit by doing pointer arithmetic like in DEC_POS but it's > perhaps not worth complicating things for the case where bytepos is not > at a character boundary. > > -- >8 -- > Subject: [PATCH] * editfns.c (Fbyte_to_position): Fix bytepos not at char > boundary. > > The behavior now matches the description in the manual. (Bug#20783) > --- > src/editfns.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/src/editfns.c b/src/editfns.c > index cddb0d4..94715fe 100644 > --- a/src/editfns.c > +++ b/src/editfns.c > @@ -1025,10 +1025,18 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0, > If BYTEPOS is out of range, the value is nil. */) > (Lisp_Object bytepos) > { > + ptrdiff_t pos_byte; > + > CHECK_NUMBER (bytepos); > - if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE) > + pos_byte = XINT (bytepos); > + if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE) > return Qnil; > - return make_number (BYTE_TO_CHAR (XINT (bytepos))); > + if (Z != Z_BYTE) > + /* There are multibyte characters in the buffer. > + Search for the start of the current character. */ > + while (!CHAR_HEAD_P (FETCH_BYTE (pos_byte))) > + pos_byte--; > + return make_number (BYTE_TO_CHAR (pos_byte)); > } [...] Content analysis details: (3.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.185 listed in list.dnswl.org] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 2.5 URIBL_DBL_ABUSE_BOTCC Contains an abused botnet C&C URL listed in the DBL blocklist [URIs: inode.at] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 3.5 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Wolfgang Jenkner > Cc: 20783@debbugs.gnu.org > Date: Thu, 11 Jun 2015 17:24:42 +0200 > > I see. How about something like the patch below? The loop could be > improved a bit by doing pointer arithmetic like in DEC_POS but it's > perhaps not worth complicating things for the case where bytepos is not > at a character boundary. > > -- >8 -- > Subject: [PATCH] * editfns.c (Fbyte_to_position): Fix bytepos not at char > boundary. > > The behavior now matches the description in the manual. (Bug#20783) > --- > src/editfns.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/src/editfns.c b/src/editfns.c > index cddb0d4..94715fe 100644 > --- a/src/editfns.c > +++ b/src/editfns.c > @@ -1025,10 +1025,18 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0, > If BYTEPOS is out of range, the value is nil. */) > (Lisp_Object bytepos) > { > + ptrdiff_t pos_byte; > + > CHECK_NUMBER (bytepos); > - if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE) > + pos_byte = XINT (bytepos); > + if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE) > return Qnil; > - return make_number (BYTE_TO_CHAR (XINT (bytepos))); > + if (Z != Z_BYTE) > + /* There are multibyte characters in the buffer. > + Search for the start of the current character. */ > + while (!CHAR_HEAD_P (FETCH_BYTE (pos_byte))) > + pos_byte--; > + return make_number (BYTE_TO_CHAR (pos_byte)); > } [...] Content analysis details: (3.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.185 listed in list.dnswl.org] 2.5 URIBL_DBL_ABUSE_BOTCC Contains an abused botnet C&C URL listed in the DBL blocklist [URIs: inode.at] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) > From: Wolfgang Jenkner > Cc: 20783@debbugs.gnu.org > Date: Thu, 11 Jun 2015 17:24:42 +0200 > > I see. How about something like the patch below? The loop could be > improved a bit by doing pointer arithmetic like in DEC_POS but it's > perhaps not worth complicating things for the case where bytepos is not > at a character boundary. > > -- >8 -- > Subject: [PATCH] * editfns.c (Fbyte_to_position): Fix bytepos not at char > boundary. > > The behavior now matches the description in the manual. (Bug#20783) > --- > src/editfns.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/src/editfns.c b/src/editfns.c > index cddb0d4..94715fe 100644 > --- a/src/editfns.c > +++ b/src/editfns.c > @@ -1025,10 +1025,18 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0, > If BYTEPOS is out of range, the value is nil. */) > (Lisp_Object bytepos) > { > + ptrdiff_t pos_byte; > + > CHECK_NUMBER (bytepos); > - if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE) > + pos_byte = XINT (bytepos); > + if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE) > return Qnil; > - return make_number (BYTE_TO_CHAR (XINT (bytepos))); > + if (Z != Z_BYTE) > + /* There are multibyte characters in the buffer. > + Search for the start of the current character. */ > + while (!CHAR_HEAD_P (FETCH_BYTE (pos_byte))) > + pos_byte--; > + return make_number (BYTE_TO_CHAR (pos_byte)); > } Works for me, thanks. But please add a comment there about BYTE_TO_CHAR expecting byte positions that are on a character boundary, so that the reason for the loop is clear. From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Wolfgang Jenkner Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Jun 2015 16:43:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 20783@debbugs.gnu.org Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143404095632672 (code B ref 20783); Thu, 11 Jun 2015 16:43:01 +0000 Received: (at 20783) by debbugs.gnu.org; 11 Jun 2015 16:42:36 +0000 Received: from localhost ([127.0.0.1]:50968 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z35Yp-0008Ut-Oq for submit@debbugs.gnu.org; Thu, 11 Jun 2015 12:42:36 -0400 Received: from b2bfep15.mx.upcmail.net ([62.179.121.60]:44943) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z35Ym-0008Ud-Ul for 20783@debbugs.gnu.org; Thu, 11 Jun 2015 12:42:33 -0400 Received: from edge12.upcmail.net ([192.168.13.82]) by b2bfep16.mx.upcmail.net (InterMail vM.8.01.05.05 201-2260-151-110-20120111) with ESMTP id <20150611164120.CESR8910.b2bfep16-int.chello.at@edge12.upcmail.net> for <20783@debbugs.gnu.org>; Thu, 11 Jun 2015 18:41:20 +0200 Received: from iznogoud.viz ([85.127.86.146]) by edge12.upcmail.net with edge id f4hb1q01K39SPx90C4hbAL; Thu, 11 Jun 2015 18:41:36 +0200 X-SourceIP: 85.127.86.146 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z35Xr-0001Hr-HE; Thu, 11 Jun 2015 18:41:35 +0200 From: Wolfgang Jenkner References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> <83bngms03b.fsf@gnu.org> Date: Thu, 11 Jun 2015 18:41:35 +0200 In-Reply-To: <83bngms03b.fsf@gnu.org> (Eli Zaretskii's message of "Thu, 11 Jun 2015 19:04:24 +0300") Message-ID: <85k2vaxkn4.fsf@iznogoud.viz> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Thu, Jun 11 2015, Eli Zaretskii wrote: > But please add a comment there about > BYTE_TO_CHAR expecting byte positions that are on a character > boundary, Wouldn't it make sense to add this to the comment before the definition of BYTE_TO_CHAR instead (or to both)? From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 11 Jun 2015 19:10:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Wolfgang Jenkner Cc: 20783@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143404976413224 (code B ref 20783); Thu, 11 Jun 2015 19:10:03 +0000 Received: (at 20783) by debbugs.gnu.org; 11 Jun 2015 19:09:24 +0000 Received: from localhost ([127.0.0.1]:51015 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z37qt-0003RD-QX for submit@debbugs.gnu.org; Thu, 11 Jun 2015 15:09:24 -0400 Received: from mtaout26.012.net.il ([80.179.55.182]:55019) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z37qq-0003Qz-QU for 20783@debbugs.gnu.org; Thu, 11 Jun 2015 15:09:22 -0400 Received: from conversion-daemon.mtaout26.012.net.il by mtaout26.012.net.il (HyperSendmail v2007.08) id <0NPS00500NV65X00@mtaout26.012.net.il> for 20783@debbugs.gnu.org; Thu, 11 Jun 2015 22:10:33 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by mtaout26.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NPS004BVNXL6M10@mtaout26.012.net.il>; Thu, 11 Jun 2015 22:10:33 +0300 (IDT) Date: Thu, 11 Jun 2015 22:08:16 +0300 From: Eli Zaretskii In-reply-to: <85k2vaxkn4.fsf@iznogoud.viz> X-012-Sender: halo1@inter.net.il Message-id: <83a8w6rrkv.fsf@gnu.org> References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> <83bngms03b.fsf@gnu.org> <85k2vaxkn4.fsf@iznogoud.viz> X-Spam-Score: 3.5 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Wolfgang Jenkner > Cc: 20783@debbugs.gnu.org > Date: Thu, 11 Jun 2015 18:41:35 +0200 > > On Thu, Jun 11 2015, Eli Zaretskii wrote: > > > But please add a comment there about > > BYTE_TO_CHAR expecting byte positions that are on a character > > boundary, > > Wouldn't it make sense to add this to the comment before the definition > of BYTE_TO_CHAR instead (or to both)? [...] Content analysis details: (3.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.182 listed in list.dnswl.org] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 2.5 URIBL_DBL_ABUSE_BOTCC Contains an abused botnet C&C URL listed in the DBL blocklist [URIs: inode.at] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 3.5 (+++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Wolfgang Jenkner > Cc: 20783@debbugs.gnu.org > Date: Thu, 11 Jun 2015 18:41:35 +0200 > > On Thu, Jun 11 2015, Eli Zaretskii wrote: > > > But please add a comment there about > > BYTE_TO_CHAR expecting byte positions that are on a character > > boundary, > > Wouldn't it make sense to add this to the comment before the definition > of BYTE_TO_CHAR instead (or to both)? [...] Content analysis details: (3.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 2.5 URIBL_DBL_ABUSE_BOTCC Contains an abused botnet C&C URL listed in the DBL blocklist [URIs: inode.at] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.182 listed in list.dnswl.org] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) > From: Wolfgang Jenkner > Cc: 20783@debbugs.gnu.org > Date: Thu, 11 Jun 2015 18:41:35 +0200 > > On Thu, Jun 11 2015, Eli Zaretskii wrote: > > > But please add a comment there about > > BYTE_TO_CHAR expecting byte positions that are on a character > > boundary, > > Wouldn't it make sense to add this to the comment before the definition > of BYTE_TO_CHAR instead (or to both)? Both, I'd say. From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug In-Reply-To: <85fv5za8vv.fsf@iznogoud.viz> Resent-From: Wolfgang Jenkner Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 16 Jun 2015 15:53:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 20783@debbugs.gnu.org Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143446992316889 (code B ref 20783); Tue, 16 Jun 2015 15:53:02 +0000 Received: (at 20783) by debbugs.gnu.org; 16 Jun 2015 15:52:03 +0000 Received: from localhost ([127.0.0.1]:56013 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4t9e-0004OL-MM for submit@debbugs.gnu.org; Tue, 16 Jun 2015 11:52:03 -0400 Received: from b2bfep14.mx.upcmail.net ([62.179.121.59]:58943) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4t9b-0004Np-Q3 for 20783@debbugs.gnu.org; Tue, 16 Jun 2015 11:52:00 -0400 Received: from edge12.upcmail.net ([192.168.13.82]) by b2bfep14.mx.upcmail.net (InterMail vM.8.01.05.11 201-2260-151-128-20120928) with ESMTP id <20150616155152.VUOS1806.b2bfep14-int.chello.at@edge12.upcmail.net> for <20783@debbugs.gnu.org>; Tue, 16 Jun 2015 17:51:52 +0200 Received: from iznogoud.viz ([91.119.133.139]) by edge12.upcmail.net with edge id h3rs1q00Y30cwdn0C3rsEo; Tue, 16 Jun 2015 17:51:52 +0200 X-SourceIP: 91.119.133.139 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z4t9U-0001JV-4b; Tue, 16 Jun 2015 17:51:52 +0200 From: Wolfgang Jenkner Date: Tue, 16 Jun 2015 17:40:38 +0200 References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> Message-ID: <85k2v3veg7.fsf@iznogoud.viz> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) --=-=-= Content-Type: text/plain On Thu, Jun 11 2015, Wolfgang Jenkner wrote: > The loop could be improved a bit by doing pointer arithmetic like in > DEC_POS I wondered whether such a change (to avoid unnecessary buffer gap considerations while in the middle of a multibyte character) would actually make a measurable difference, so, silly as that may be, I wrote a simple benchmark for byte-to-position, using the tutorials as data samples, and passed the results to ministat(1)[*], please see the attached btp-ministat.el and ministat.sh for details. [*] https://www.freebsd.org/cgi/man.cgi?query=ministat&sektion=1&manpath=FreeBSD+10.1-RELEASE The result is that ministat reports statistical differences for several of the tutorials (but not generally for the same languages at each run, system load apparently generating too much statistical noise) and I find that the version with the DEC_POS like loop is _always_ faster in those cases (judging from the average values or just by taking a quick look at the histograms). So, while this is not really very important, it seems that it would be safe to use the following patch with the improved loop instead: --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-src-editfns.c-Fbyte_to_position-Fix-bytepos-not-at-c.patch Content-Description: improve loop >From be2adf5b7b427ee5d84c9ae011d8d11d452c2f4e Mon Sep 17 00:00:00 2001 From: Wolfgang Jenkner Date: Thu, 11 Jun 2015 16:21:21 +0200 Subject: [PATCH] * src/editfns.c (Fbyte_to_position): Fix bytepos not at char boundary. The behavior now matches the description in the manual. (Bug#20783) --- src/editfns.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/src/editfns.c b/src/editfns.c index cddb0d4..ff54e73 100644 --- a/src/editfns.c +++ b/src/editfns.c @@ -1025,10 +1025,28 @@ DEFUN ("byte-to-position", Fbyte_to_position, Sbyte_to_position, 1, 1, 0, If BYTEPOS is out of range, the value is nil. */) (Lisp_Object bytepos) { + ptrdiff_t pos_byte; + CHECK_NUMBER (bytepos); - if (XINT (bytepos) < BEG_BYTE || XINT (bytepos) > Z_BYTE) + pos_byte = XINT (bytepos); + if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE) return Qnil; - return make_number (BYTE_TO_CHAR (XINT (bytepos))); + if (Z != Z_BYTE) + /* There are multibyte characters in the buffer. + The argument of BYTE_TO_CHAR must be a byte position at + a character boundary, so search for the start of the current + character. */ + { + unsigned char *chp = BYTE_POS_ADDR (pos_byte); + + while (!CHAR_HEAD_P (*chp)) + { + pos_byte--; + /* There's no buffer gap in the middle of a character. */ + chp--; + } + } + return make_number (BYTE_TO_CHAR (pos_byte)); } DEFUN ("following-char", Ffollowing_char, Sfollowing_char, 0, 0, 0, -- 2.4.2 --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=btp-ministat.el Content-Transfer-Encoding: quoted-printable Content-Description: byte-to-position benchmark (defun benchmark--byte-to-position (file n &optional move-gap) "Loop N times through all bytes in FILE and compute their position." (let ((form '(let ((i 1)) (while (byte-to-position i) (setq i (1+ i)))))) (with-temp-buffer (insert-file-contents file) (when move-gap ;; Try to minimize buffer gap calculations by moving it to eob. (goto-char (point-max)) (insert " ")) (eval `(benchmark-run-compiled ,n ,form))))) (defun benchmark--byte-to-position-files () "Return a list of the absolute file names of the emacs tutorials." (directory-files (expand-file-name "tutorials" data-directory) t "TUTORIAL\\(:?\\.[a-z][a-z]\\(:?_[A-Z][A-Z]\\)?\\)?\\'")) (defun benchmark--byte-to-position-results (n &optional move-gap) "Generate an input file suitable for ministat(1). Each column corresponds to one of the tutorials and holds the results of 10 * N times running N loops of the benchmark above for it." (let ((files (benchmark--byte-to-position-files))) ;; Print a header line. (princ "#") (dolist (file files) (let ((locale (or (file-name-extension file) "en")) ;; Compute the average byte/char count. (avg (with-temp-buffer (insert-file-contents file) (/ (position-bytes (point-max)) (point-max) 1.0)))) (princ (format "%s:%f\t" locale avg)))) (dotimes (_ (* 10 n)) (terpri) (dolist (file files) (let* ((result (benchmark--byte-to-position file n move-gap))) (princ (format "%f\t" (car result)) t)))) (terpri))) --=-=-= Content-Type: text/x-sh Content-Disposition: attachment; filename=ministat.sh Content-Description: ministat wrapper #! /bin/sh old="$(mktemp -t old)" new="$(mktemp -t new)" emacs_version=25.0.50 old_emacs=./src/emacs-${emacs_version}.1 new_emacs=./src/emacs-${emacs_version}.2 emacs_flags="--batch -Q -L . --load btp-ministat.el --eval '(benchmark--byte-to-position-results 10)'" eval $old_emacs $emacs_flags >"$old" eval $new_emacs $emacs_flags >"$new" locales="$(head -1 "$old" | sed s'/^#//')" i=1 for l in $locales; do echo echo "--- $l ---" ministat -s -C$i "$old" "$new" i=$(($i + 1)) done --=-=-=-- From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 16 Jun 2015 16:10:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Wolfgang Jenkner Cc: 20783@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143447096024155 (code B ref 20783); Tue, 16 Jun 2015 16:10:03 +0000 Received: (at 20783) by debbugs.gnu.org; 16 Jun 2015 16:09:20 +0000 Received: from localhost ([127.0.0.1]:56030 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4tQN-0006HW-Oi for submit@debbugs.gnu.org; Tue, 16 Jun 2015 12:09:20 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:54133) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4tQK-0006HI-SS for 20783@debbugs.gnu.org; Tue, 16 Jun 2015 12:09:18 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NQ100100OKZ8100@a-mtaout20.012.net.il> for 20783@debbugs.gnu.org; Tue, 16 Jun 2015 19:09:10 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQ1001R3OV10UA0@a-mtaout20.012.net.il>; Tue, 16 Jun 2015 19:09:01 +0300 (IDT) Date: Tue, 16 Jun 2015 19:08:49 +0300 From: Eli Zaretskii In-reply-to: <85k2v3veg7.fsf@iznogoud.viz> X-012-Sender: halo1@inter.net.il Message-id: <83mvzzmy9a.fsf@gnu.org> References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> <85k2v3veg7.fsf@iznogoud.viz> X-Spam-Score: 1.0 (+) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: Wolfgang Jenkner > Cc: 20783@debbugs.gnu.org > Date: Tue, 16 Jun 2015 17:40:38 +0200 > > + while (!CHAR_HEAD_P (*chp)) > + { > + pos_byte--; > + /* There's no buffer gap in the middle of a character. */ > + chp--; > + } Thanks, but I'd prefer we didn't have code that manipulated pointers to buffer text directly. E.g., if we ever have some kind of multi-threading, or even if at some point someone adds a non-trivial function call to this loop, this code will be a subtle bug waiting to bite. It's fundamentally not safe to do this, and not only due to the gap considerations, but also because in general BEG_ADDR might change under certain circumstances behind your back. (Buffer text and string data are implemented with double indirection for good reasons.) For some very tight loops, it might be justified to take these shortcuts (with WARNING COMMENTS CRYING BLOODY MURDER all around), but this function doesn't belong to those cases. So I prefer the previous variant, even though it will lose that benchmark. From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Wolfgang Jenkner Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 16 Jun 2015 16:32:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Eli Zaretskii Cc: 20783@debbugs.gnu.org Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.143447228126078 (code B ref 20783); Tue, 16 Jun 2015 16:32:02 +0000 Received: (at 20783) by debbugs.gnu.org; 16 Jun 2015 16:31:21 +0000 Received: from localhost ([127.0.0.1]:56037 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4tlg-0006mX-3m for submit@debbugs.gnu.org; Tue, 16 Jun 2015 12:31:20 -0400 Received: from b2bfep13.mx.upcmail.net ([62.179.121.58]:46296) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z4tld-0006mJ-CK for 20783@debbugs.gnu.org; Tue, 16 Jun 2015 12:31:18 -0400 Received: from edge11.upcmail.net ([192.168.13.81]) by b2bfep13.mx.upcmail.net (InterMail vM.8.01.05.11 201-2260-151-128-20120928) with ESMTP id <20150616163110.GIZU20529.b2bfep13-int.chello.at@edge11.upcmail.net> for <20783@debbugs.gnu.org>; Tue, 16 Jun 2015 18:31:10 +0200 Received: from iznogoud.viz ([91.119.133.139]) by edge11.upcmail.net with edge id h4X81q00v30cwdn0B4X81s; Tue, 16 Jun 2015 18:31:10 +0200 X-SourceIP: 91.119.133.139 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z4tlU-0001Li-Hf; Tue, 16 Jun 2015 18:31:08 +0200 From: Wolfgang Jenkner References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> <85k2v3veg7.fsf@iznogoud.viz> <83mvzzmy9a.fsf@gnu.org> Date: Tue, 16 Jun 2015 18:31:08 +0200 In-Reply-To: <83mvzzmy9a.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 16 Jun 2015 19:08:49 +0300") Message-ID: <85fv5rvcmr.fsf@iznogoud.viz> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Tue, Jun 16 2015, Eli Zaretskii wrote: > So I prefer the previous variant, even though it will lose that > benchmark. Neither do I care about winning the benchmark. It just seemed the better code from a micro-optimization point of view (in a single-threaded emacs). But I can't argue against your global arguments, of course. Thanks for explaining things. From unknown Mon Jun 23 11:27:10 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Wolfgang Jenkner Subject: bug#20783: closed (Re: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug) Message-ID: References: <85oakemsbc.fsf@iznogoud.viz> <85fv5za8vv.fsf@iznogoud.viz> X-Gnu-PR-Message: they-closed 20783 X-Gnu-PR-Package: emacs X-Gnu-PR-Keywords: patch Reply-To: 20783@debbugs.gnu.org Date: Wed, 17 Jun 2015 12:30:07 +0000 Content-Type: multipart/mixed; boundary="----------=_1434544207-22850-1" This is a multi-part message in MIME format... ------------=_1434544207-22850-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 20783@debbugs.gnu.org. --=20 20783: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D20783 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1434544207-22850-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 20783-done) by debbugs.gnu.org; 17 Jun 2015 12:29:38 +0000 Received: from localhost ([127.0.0.1]:56993 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z5CTK-0005vh-8i for submit@debbugs.gnu.org; Wed, 17 Jun 2015 08:29:38 -0400 Received: from b2bfep13.mx.upcmail.net ([62.179.121.58]:55861) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z5CTH-0005vS-3l for 20783-done@debbugs.gnu.org; Wed, 17 Jun 2015 08:29:36 -0400 Received: from edge11.upcmail.net ([192.168.13.81]) by b2bfep13.mx.upcmail.net (InterMail vM.8.01.05.11 201-2260-151-128-20120928) with ESMTP id <20150617122928.XRYJ20529.b2bfep13-int.chello.at@edge11.upcmail.net> for <20783-done@debbugs.gnu.org>; Wed, 17 Jun 2015 14:29:28 +0200 Received: from iznogoud.viz ([85.127.11.204]) by edge11.upcmail.net with edge id hQVT1q0284Q8eCd0BQVUVo; Wed, 17 Jun 2015 14:29:28 +0200 X-SourceIP: 85.127.11.204 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z5CT9-0003NB-GB; Wed, 17 Jun 2015 14:29:27 +0200 From: Wolfgang Jenkner To: Eli Zaretskii Subject: Re: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Date: Wed, 17 Jun 2015 14:19:10 +0200 References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> <83bngms03b.fsf@gnu.org> Message-ID: <85oakemsbc.fsf@iznogoud.viz> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 20783-done Cc: 20783-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Version: 25.1 On Thu, Jun 11 2015, Eli Zaretskii wrote: > Works for me, thanks. But please add a comment there about > BYTE_TO_CHAR expecting byte positions that are on a character > boundary, so that the reason for the loop is clear. Done and pushed. However, I didn't follow my own suggestion of adding a remark to the comment above BYTE_TO_CHAR, after all, as it is true for most macros or functions with a byte position argument, so adding such a comment to just one of them could be confusing, I think. Thank you for steering this change in the right direction. ------------=_1434544207-22850-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 10 Jun 2015 15:19:46 +0000 Received: from localhost ([127.0.0.1]:47377 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z2hn8-0005ke-3y for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:46 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46890) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z2hn6-0005kV-EX for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z2hn5-0005zg-HM for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:44 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:60872) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2hn5-0005zc-6e for submit@debbugs.gnu.org; Wed, 10 Jun 2015 11:19:43 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60387) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2hn4-0006Sq-0y for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 11:19:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z2hmy-0005xT-W2 for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 11:19:41 -0400 Received: from b2bfep12.mx.upcmail.net ([62.179.121.57]:39296) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z2hmy-0005vy-N3 for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 11:19:36 -0400 Received: from edge12.upcmail.net ([192.168.13.82]) by b2bfep12.mx.upcmail.net (InterMail vM.8.01.05.11 201-2260-151-128-20120928) with ESMTP id <20150610151933.QUNI604.b2bfep12-int.chello.at@edge12.upcmail.net> for ; Wed, 10 Jun 2015 17:19:33 +0200 Received: from iznogoud.viz ([91.119.92.228]) by edge12.upcmail.net with edge id efKY1q01E4vdLJb0CfKYSp; Wed, 10 Jun 2015 17:19:33 +0200 X-SourceIP: 91.119.92.228 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1Z2hmu-000157-Hj for bug-gnu-emacs@gnu.org; Wed, 10 Jun 2015 17:19:32 +0200 From: Wolfgang Jenkner To: bug-gnu-emacs@gnu.org Subject: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Date: Wed, 10 Jun 2015 17:13:30 +0200 Message-ID: <85fv5za8vv.fsf@iznogoud.viz> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Here's a test case for the bug: (with-temp-buffer (insert "=C3=A9=C3=A9") (let ((i 1) pos res) (while (setq pos (byte-to-position i)) (push pos res) (setq i (1+ i))) (nreverse res))) =3D> (1 2 2 2 3) while the correct result is =3D> (1 1 2 2 3) I found that this bug had been noticed before in http://stackoverflow.com/questions/17588117/emacs-byte-to-position-function= -is-not-consistent-with-document Here's a patch. The fix may look a bit clumsy but it's actually meant to avoid pessimizing the presumably common case where the initial bytepos is at a character boundary. -- >8 -- Subject: [PATCH] * src/marker.c (buf_bytepos_to_charpos): Fix best_below_by= te count. If bytepos is not after a character boundary the preceding loop overshoots by one character position. --- src/marker.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/src/marker.c b/src/marker.c index 73928ba..94d676b 100644 --- a/src/marker.c +++ b/src/marker.c @@ -341,6 +341,12 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t by= tepos) BUF_INC_POS (b, best_below_byte); } =20 + if (best_below_byte !=3D bytepos) + { + best_below--; + BUF_DEC_POS (b, best_below_byte); + } + /* If this position is quite far from the nearest known position, cache the correspondence by creating a marker here. It will last until the next GC. --=20 2.4.2 ------------=_1434544207-22850-1-- From unknown Mon Jun 23 11:27:10 2025 X-Loop: help-debbugs@gnu.org Subject: bug#20783: 25.0.50; [PATCH] byte-to-position has internal off-by-one bug Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 17 Jun 2015 16:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 20783 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch To: Wolfgang Jenkner Cc: 20783@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 20783-submit@debbugs.gnu.org id=B20783.14345602476666 (code B ref 20783); Wed, 17 Jun 2015 16:58:02 +0000 Received: (at 20783) by debbugs.gnu.org; 17 Jun 2015 16:57:27 +0000 Received: from localhost ([127.0.0.1]:49885 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z5GeV-0001jS-8O for submit@debbugs.gnu.org; Wed, 17 Jun 2015 12:57:27 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]:33500) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Z5GeS-0001jE-JI for 20783@debbugs.gnu.org; Wed, 17 Jun 2015 12:57:25 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NQ300G00LOZRV00@a-mtaout22.012.net.il> for 20783@debbugs.gnu.org; Wed, 17 Jun 2015 19:57:18 +0300 (IDT) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NQ300GKVLRHKL60@a-mtaout22.012.net.il>; Wed, 17 Jun 2015 19:57:18 +0300 (IDT) Date: Wed, 17 Jun 2015 19:57:08 +0300 From: Eli Zaretskii In-reply-to: <85oakemsbc.fsf@iznogoud.viz> X-012-Sender: halo1@inter.net.il Message-id: <834mm6mfx7.fsf@gnu.org> References: <85fv5za8vv.fsf@iznogoud.viz> <83vbevsctb.fsf@gnu.org> <85oakmxn4i.fsf@iznogoud.viz> <83bngms03b.fsf@gnu.org> <85oakemsbc.fsf@iznogoud.viz> X-Spam-Score: 1.0 (+) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > From: Wolfgang Jenkner > Cc: 20783-done@debbugs.gnu.org > Date: Wed, 17 Jun 2015 14:19:10 +0200 > > Version: 25.1 > > On Thu, Jun 11 2015, Eli Zaretskii wrote: > > > Works for me, thanks. But please add a comment there about > > BYTE_TO_CHAR expecting byte positions that are on a character > > boundary, so that the reason for the loop is clear. > > Done and pushed. Thanks.