From unknown Thu Aug 14 12:21:22 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#77935 <77935@debbugs.gnu.org> To: bug#77935 <77935@debbugs.gnu.org> Subject: Status: 31.0.50; diff-add-change-log-entries-other-window Reply-To: bug#77935 <77935@debbugs.gnu.org> Date: Thu, 14 Aug 2025 19:21:22 +0000 retitle 77935 31.0.50; diff-add-change-log-entries-other-window reassign 77935 emacs submitter 77935 Gerd M=C3=B6llmann severity 77935 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 20 03:51:51 2025 Received: (at submit) by debbugs.gnu.org; 20 Apr 2025 07:51:52 +0000 Received: from localhost ([127.0.0.1]:44642 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1u6PSx-00039F-Or for submit@debbugs.gnu.org; Sun, 20 Apr 2025 03:51:51 -0400 Received: from lists.gnu.org ([2001:470:142::17]:39576) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1u6PSv-000386-DC for submit@debbugs.gnu.org; Sun, 20 Apr 2025 03:51:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1u6PSo-00076e-MX for bug-gnu-emacs@gnu.org; Sun, 20 Apr 2025 03:51:42 -0400 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1u6PSj-0001yL-6z for bug-gnu-emacs@gnu.org; Sun, 20 Apr 2025 03:51:42 -0400 Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-39c1ee0fd43so2778755f8f.0 for ; Sun, 20 Apr 2025 00:51:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745135494; x=1745740294; darn=gnu.org; h=mime-version:message-id:date:subject:to:from:from:to:cc:subject :date:message-id:reply-to; bh=z48CNL/XzNv3LM+qp3vJkIzIKWiqoFlUlKa4UaQsJvA=; b=ViOFwsiRCOhgHroe+XgD/rHRc1wNijK7cy1qQMueGUlCC5PyWiGjks1EvF1azDtnvA yrc+udxpEy/6buo92UfO/ous2ASRnscOWCHK+6vKjYyL1BArSCNl/s+YovdlEVe31sGU S2abhau2JEL8ONk4fwrJWt9rJpXOVo47V9ESn76qTF1V16bmD1e69Rp8sQO5DvLdDDMu VwHkbvt3sM30bI5Y9H9aQe2v8QPhKiO4fJa3QvXcIx9hpiXBxyKR6f+0K4gbFh/irGaW k5BxRCFzqfdbWbDoRMfA9zsLdKVJYrF6PsLVL/D7kSlgCSzRUXi2OwVgq53pqzaVBh1T 13OA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745135494; x=1745740294; h=mime-version:message-id:date:subject:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=z48CNL/XzNv3LM+qp3vJkIzIKWiqoFlUlKa4UaQsJvA=; b=ig8lTaA050lOuKbeDG01XcZsECfYx9wN/FUxaYonjuwnxLkpPg/TglfMQvvEhDITaY y0cMIDZf4scm5KoxDdVtJ4Te0oO/O5TFsgoVPPLSUg97PyOYJjyz1naFZ5o5oJZ9DEUA 2h7kPcMwPBXhQj9rAjxM1aETJzcIcJ0r1WC+RDBVK8Za1JVgekdEBKeUHc5YwzootgsZ A/oYDIJPpChrquGnuhXk21IkQqvJeZ7PKiR18X7X5WmeMdSfCP6N517x2fVAUMoq3Eo5 9XKBjAm8aECdIuv6qHB1FXdpUDf0vndU/KW7e1d0CkwCBhQxW6qAJEircCmY0ZAWPqDQ BVwA== X-Gm-Message-State: AOJu0Yx1fTiwaXW1XllueJ4wbs5WhBHHQtZhzBiV7Ux0MLJxHoVVA5P6 Ydl0Gw5h4vJno9QSTzLuBNrIIo6Njs7Cn9U4EbigWweg7aghWIfvm+QDIRVM X-Gm-Gg: ASbGncuNrpchwR3qNAqWAp6GcpJ3KtnfeQuFpi/SxHK7WydsoQJB+guVADESJXMXysG 5xebrha5Ande1uR17Leoua8QMIYPjvO68r0hmzhWTeqwLnvpSMXdhg8nT2jjQSuYB+dn9+e5odj zGSgJ7VDrCWx2ZEDGmt5cI/BZioLZyl5LxvnOYn01YdGndXo8dW36kb8mJQmrQ3r4TDytTEAtAq rzsGRn6rbJYDrnT8EwzqEsTnd7k+tha8g3BlnlFsXViEYpL8Z5rWwtoNNtR38aSaVjuaztnWFZH lukJ8CWtC2SUwQ1dSB9rp42U9Bi1rw7Crl4ZxacZuxT+MJlLrGHTBDxFekR1rhHTErksfIBn5JN bNzfWZVodvA1W7X+NEwVJHtp4Nj5vVYDY0GxU3A39yxR1 X-Google-Smtp-Source: AGHT+IGyo8/qCwjrlr0hxOrC8XH7zscplYyh1+moJNVM/HkFk9yEkZ/ertnei0WpVtjBwQpgwxcyIg== X-Received: by 2002:a05:6000:188e:b0:390:e9b5:d69c with SMTP id ffacd0b85a97d-39efba61aa5mr6547199f8f.25.1745135494175; Sun, 20 Apr 2025 00:51:34 -0700 (PDT) Received: from pro2 (p200300e0b720a300c964490db7e76205.dip0.t-ipconnect.de. [2003:e0:b720:a300:c964:490d:b7e7:6205]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-39efa43ca78sm8068242f8f.47.2025.04.20.00.51.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 20 Apr 2025 00:51:33 -0700 (PDT) From: =?utf-8?Q?Gerd_M=C3=B6llmann?= To: bug-gnu-emacs@gnu.org Subject: 31.0.50; diff-add-change-log-entries-other-window X-Debbugs-Cc: Date: Sun, 20 Apr 2025 09:51:32 +0200 Message-ID: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=2a00:1450:4864:20::435; envelope-from=gerd.moellmann@gmail.com; helo=mail-wr1-x435.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" --=-=-= Content-Type: text/plain Take the attached diff and store in in the root of an Emacs repository. Find the diff in Emacs, and C-x 4 A. I have a few of complaints/wishes: - The change log entries that are produced are in reverse order of the diff, which is inconvenient when writing the log entries while following the diff - Some changes in the diff produce multiple entries for the same function. Example is what is produced for the main function in emacs.c. - Add the following to the start of the diff and start over. 27 files changed, 1629 insertions(+), 832 deletions(-) src/Makefile.in | 3 +- src/alloc.c | 52 ++-- src/buffer.c | 126 ++++---- src/buffer.h | 19 +- src/callint.c | 4 +- You will see that Emacs asks "Use file:" without saying for what it wants to use what. That's confusing. Event better in this particular case would be if it would skip over such lines to the first hunk, but that's another story. --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: attachment; filename=text-index.diff Content-Transfer-Encoding: quoted-printable modified src/Makefile.in @@ -451,7 +451,7 @@ base_obj =3D charset.o coding.o category.o ccl.o character.o chartab.o bidi.o \ $(CM_OBJ) term.o terminal.o xfaces.o $(XOBJ) $(GTK_OBJ) $(DBUS_OBJ) \ emacs.o keyboard.o macros.o keymap.o sysdep.o \ - bignum.o buffer.o filelock.o insdel.o marker.o \ + bignum.o buffer.o text-index.o filelock.o insdel.o marker.o \ minibuf.o fileio.o dired.o \ cmds.o casetab.o casefiddle.o indent.o search.o regex-emacs.o undo.o \ alloc.o pdumper.o data.o doc.o editfns.o callint.o \ @@ -464,6 +464,7 @@ base_obj =3D profiler.o decompress.o \ thread.o systhread.o sqlite.o treesit.o \ itree.o json.o \ + marker-vector.o \ $(MSDOS_OBJ) $(MSDOS_X_OBJ) $(NS_OBJ) $(CYGWIN_OBJ) $(FONT_OBJ) \ $(W32_OBJ) $(WINDOW_SYSTEM_OBJ) $(XGSELOBJ) \ $(HAIKU_OBJ) $(PGTK_OBJ) $(ANDROID_OBJ) modified src/alloc.c @@ -3807,9 +3807,7 @@ DEFUN ("make-marker", Fmake_marker, Smake_marker, 0, = 0, 0, struct Lisp_Marker *p =3D ALLOCATE_PLAIN_PSEUDOVECTOR (struct Lisp_Marke= r, PVEC_MARKER); p->buffer =3D 0; - p->bytepos =3D 0; - p->charpos =3D 0; - p->next =3D NULL; + p->entry =3D 0; p->insertion_type =3D 0; p->need_adjustment =3D 0; return make_lisp_ptr (p, Lisp_Vectorlike); @@ -3819,23 +3817,18 @@ DEFUN ("make-marker", Fmake_marker, Smake_marker, 0= , 0, 0, at character position CHARPOS and byte position BYTEPOS. */ =20 Lisp_Object -build_marker (struct buffer *buf, ptrdiff_t charpos, ptrdiff_t bytepos) +build_marker (struct buffer *buf, ptrdiff_t charpos) { /* No dead buffers here. */ eassert (BUFFER_LIVE_P (buf)); =20 - /* Every character is at least one byte. */ - eassert (charpos <=3D bytepos); - struct Lisp_Marker *m =3D ALLOCATE_PLAIN_PSEUDOVECTOR (struct Lisp_Marke= r, PVEC_MARKER); m->buffer =3D buf; - m->charpos =3D charpos; - m->bytepos =3D bytepos; m->insertion_type =3D 0; m->need_adjustment =3D 0; - m->next =3D BUF_MARKERS (buf); - BUF_MARKERS (buf) =3D m; + marker_vector_add (buf, m); + marker_vector_set_charpos (m, charpos); return make_lisp_ptr (m, Lisp_Vectorlike); } =20 @@ -6264,6 +6257,13 @@ mark_buffer (struct buffer *buffer) =20 mark_interval_tree (buffer_intervals (buffer)); =20 + /* Mark the marker vector itself live, but don't process its contents + yet, because these are weak references. The marker vector can + already have been set to nil in kill-buffer. */ + Lisp_Object mv =3D BUF_MARKERS (buffer); + if (!NILP (mv)) + set_vector_marked (XVECTOR (mv)); + /* For now, we just don't mark the undo_list. It's done later in a special way just before the sweep phase, and after stripping some of its elements that are not needed any more. @@ -6271,7 +6271,7 @@ mark_buffer (struct buffer *buffer) for dead buffers, the undo_list should be nil (set by Fkill_buffer), but just to be on the safe side, we mark it here. */ if (!BUFFER_LIVE_P (buffer)) - mark_object (BVAR (buffer, undo_list)); + mark_object (BVAR (buffer, undo_list)); =20 if (!itree_empty_p (buffer->overlays)) mark_overlays (buffer->overlays->root); @@ -7103,21 +7103,23 @@ sweep_symbols (void) gcstat.total_free_symbols =3D num_free; } =20 -/* Remove BUFFER's markers that are due to be swept. This is needed since - we treat BUF_MARKERS and markers's `next' field as weak pointers. */ +/* Remove BUFFER's markers that are due to be swept. This is needed + since we treat marker references from BUF_MARKER weak references. + This relies on the fact that marker vectors only contain markers, + fixnums and nil, which means we only have to look at the marker + slots. If such a marker is not yet marked, it's dead and we remove + it from the marker vector. If it is not dead, it won't be freed + my sweep_vectors which is called after sweep_buffers. */ + static void -unchain_dead_markers (struct buffer *buffer) +unchain_dead_markers (struct buffer *b) { - struct Lisp_Marker *this, **prev =3D &BUF_MARKERS (buffer); - - while ((this =3D *prev)) - if (vectorlike_marked_p (&this->header)) - prev =3D &this->next; - else - { - this->buffer =3D NULL; - *prev =3D this->next; - } + DO_MARKERS (b, m) + { + if (!vectorlike_marked_p (&m->header)) + marker_vector_remove (XVECTOR (BUF_MARKERS (b)), m); + } + END_DO_MARKERS; } =20 NO_INLINE /* For better stack traces */ modified src/buffer.c @@ -43,6 +43,7 @@ Copyright (C) 1985-2025 Free Software Foundation, Inc. #include "xwidget.h" #include "itree.h" #include "pdumper.h" +#include "text-index.h" =20 #ifdef WINDOWSNT #include "w32heap.h" /* for mmap_* */ @@ -592,6 +593,7 @@ DEFUN ("get-buffer-create", Fget_buffer_create, Sget_bu= ffer_create, 1, 2, 0, =20 /* An ordinary buffer uses its own struct buffer_text. */ b->text =3D &b->own_text; + b->text->index =3D NULL; b->base_buffer =3D NULL; /* No one shares the text with us now. */ b->indirections =3D 0; @@ -616,6 +618,8 @@ DEFUN ("get-buffer-create", Fget_buffer_create, Sget_bu= ffer_create, 1, 2, 0, b->begv_byte =3D BEG_BYTE; b->zv_byte =3D BEG_BYTE; =20 + BUF_MARKERS (b) =3D make_marker_vector (); + BUF_GPT (b) =3D BEG; BUF_GPT_BYTE (b) =3D BEG_BYTE; =20 @@ -659,8 +663,6 @@ DEFUN ("get-buffer-create", Fget_buffer_create, Sget_bu= ffer_create, 1, 2, 0, reset_buffer_local_variables (b, 1); =20 bset_mark (b, Fmake_marker ()); - BUF_MARKERS (b) =3D NULL; - /* Put this in the alist of all live buffers. */ XSETBUFFER (buffer, b); Vbuffer_alist =3D nconc2 (Vbuffer_alist, list1 (Fcons (name, buffer))); @@ -729,8 +731,8 @@ clone_per_buffer_values (struct buffer *from, struct bu= ffer *to) if (MARKERP (obj) && XMARKER (obj)->buffer =3D=3D from) { struct Lisp_Marker *m =3D XMARKER (obj); - - obj =3D build_marker (to, m->charpos, m->bytepos); + const ptrdiff_t charpos =3D marker_vector_charpos (m); + obj =3D build_marker (to, charpos); XMARKER (obj)->insertion_type =3D m->insertion_type; } =20 @@ -761,9 +763,9 @@ record_buffer_markers (struct buffer *b) eassert (!NILP (BVAR (b, zv_marker))); =20 XSETBUFFER (buffer, b); - set_marker_both (BVAR (b, pt_marker), buffer, b->pt, b->pt_byte); - set_marker_both (BVAR (b, begv_marker), buffer, b->begv, b->begv_byt= e); - set_marker_both (BVAR (b, zv_marker), buffer, b->zv, b->zv_byte); + set_marker_both (BVAR (b, pt_marker), buffer, b->pt); + set_marker_both (BVAR (b, begv_marker), buffer, b->begv); + set_marker_both (BVAR (b, zv_marker), buffer, b->zv); } } =20 @@ -845,6 +847,8 @@ DEFUN ("make-indirect-buffer", Fmake_indirect_buffer, S= make_indirect_buffer, =20 /* Use the base buffer's text object. */ b->text =3D b->base_buffer->text; + /* Make sure index has a defined value. */ + b->own_text.index =3D NULL; /* We have no own text. */ b->indirections =3D -1; /* Notify base buffer that we share the text now. */ @@ -895,16 +899,13 @@ DEFUN ("make-indirect-buffer", Fmake_indirect_buffer,= Smake_indirect_buffer, eassert (NILP (BVAR (b->base_buffer, zv_marker))); =20 bset_pt_marker (b->base_buffer, - build_marker (b->base_buffer, b->base_buffer->pt, - b->base_buffer->pt_byte)); + build_marker (b->base_buffer, b->base_buffer->pt)); =20 bset_begv_marker (b->base_buffer, - build_marker (b->base_buffer, b->base_buffer->begv, - b->base_buffer->begv_byte)); + build_marker (b->base_buffer, b->base_buffer->begv)); =20 bset_zv_marker (b->base_buffer, - build_marker (b->base_buffer, b->base_buffer->zv, - b->base_buffer->zv_byte)); + build_marker (b->base_buffer, b->base_buffer->zv)); =20 XMARKER (BVAR (b->base_buffer, zv_marker))->insertion_type =3D 1; } @@ -912,9 +913,9 @@ DEFUN ("make-indirect-buffer", Fmake_indirect_buffer, S= make_indirect_buffer, if (NILP (clone)) { /* Give the indirect buffer markers for its narrowing. */ - bset_pt_marker (b, build_marker (b, b->pt, b->pt_byte)); - bset_begv_marker (b, build_marker (b, b->begv, b->begv_byte)); - bset_zv_marker (b, build_marker (b, b->zv, b->zv_byte)); + bset_pt_marker (b, build_marker (b, b->pt)); + bset_begv_marker (b, build_marker (b, b->begv)); + bset_zv_marker (b, build_marker (b, b->zv)); XMARKER (BVAR (b, zv_marker))->insertion_type =3D 1; } else @@ -1890,7 +1891,6 @@ DEFUN ("kill-buffer", Fkill_buffer, Skill_buffer, 0, = 1, "bKill buffer: ", Lisp_Object buffer; struct buffer *b; Lisp_Object tem; - struct Lisp_Marker *m; =20 if (NILP (buffer_or_name)) buffer =3D Fcurrent_buffer (); @@ -2075,18 +2075,14 @@ DEFUN ("kill-buffer", Fkill_buffer, Skill_buffer, 0= , 1, "bKill buffer: ", /* Unchain all markers that belong to this indirect buffer. Don't unchain the markers that belong to the base buffer or its other indirect buffers. */ - struct Lisp_Marker **mp =3D &BUF_MARKERS (b); - while ((m =3D *mp)) + DO_MARKERS (b, m) { if (m->buffer =3D=3D b) - { - m->buffer =3D NULL; - *mp =3D m->next; - } - else - mp =3D &m->next; + marker_vector_remove (XVECTOR (BUF_MARKERS (b)), m); } - /* Intervals should be owned by the base buffer (Bug#16502). */ + END_DO_MARKERS; + + /* Intervals should be owned by the base buffer (Bug#16502). */ i =3D buffer_intervals (b); if (i) { @@ -2099,14 +2095,7 @@ DEFUN ("kill-buffer", Fkill_buffer, Skill_buffer, 0,= 1, "bKill buffer: ", { /* Unchain all markers of this buffer and its indirect buffers. and leave them pointing nowhere. */ - for (m =3D BUF_MARKERS (b); m; ) - { - struct Lisp_Marker *next =3D m->next; - m->buffer =3D 0; - m->next =3D NULL; - m =3D next; - } - BUF_MARKERS (b) =3D NULL; + marker_vector_reset (b); set_buffer_intervals (b, NULL); =20 /* Perhaps we should explicitly free the interval tree here... */ @@ -2156,6 +2145,7 @@ DEFUN ("kill-buffer", Fkill_buffer, Skill_buffer, 0, = 1, "bKill buffer: ", free_region_cache (b->bidi_paragraph_cache); b->bidi_paragraph_cache =3D 0; } + text_index_free (b->own_text.index); bset_width_table (b, Qnil); unblock_input (); =20 @@ -2630,22 +2620,28 @@ #define swapfield_(field, type) \ other_buffer->text->end_unchanged =3D other_buffer->text->gpt; swap_buffer_overlays (current_buffer, other_buffer); { - struct Lisp_Marker *m; - for (m =3D BUF_MARKERS (current_buffer); m; m =3D m->next) - if (m->buffer =3D=3D other_buffer) - m->buffer =3D current_buffer; - else - /* Since there's no indirect buffer in sight, markers on - BUF_MARKERS(buf) should either be for `buf' or dead. */ - eassert (!m->buffer); - for (m =3D BUF_MARKERS (other_buffer); m; m =3D m->next) - if (m->buffer =3D=3D current_buffer) - m->buffer =3D other_buffer; - else - /* Since there's no indirect buffer in sight, markers on - BUF_MARKERS(buf) should either be for `buf' or dead. */ - eassert (!m->buffer); + DO_MARKERS (current_buffer, m) + { + if (m->buffer =3D=3D other_buffer) + m->buffer =3D current_buffer; + else + /* Since there's no indirect buffer in sight, markers on + BUF_MARKERS(buf) should either be for `buf' or dead. */ + eassert (!m->buffer); + } + END_DO_MARKERS; + DO_MARKERS (other_buffer, m) + { + if (m->buffer =3D=3D current_buffer) + m->buffer =3D other_buffer; + else + /* Since there's no indirect buffer in sight, markers on + BUF_MARKERS(buf) should either be for `buf' or dead. */ + eassert (!m->buffer); + } + END_DO_MARKERS; } + { /* Some of the C code expects that both window markers of a live window points to that window's buffer. So since we just swapped the markers between the two buffers, we need @@ -2707,7 +2703,6 @@ DEFUN ("set-buffer-multibyte", Fset_buffer_multibyte,= Sset_buffer_multibyte, current buffer is cleared. */) (Lisp_Object flag) { - struct Lisp_Marker *tail, *markers; Lisp_Object btail, other; ptrdiff_t begv, zv; bool narrowed =3D (BEG !=3D BEGV || Z !=3D ZV); @@ -2756,9 +2751,12 @@ DEFUN ("set-buffer-multibyte", Fset_buffer_multibyte= , Sset_buffer_multibyte, GPT =3D GPT_BYTE; TEMP_SET_PT_BOTH (PT_BYTE, PT_BYTE); =20 - - for (tail =3D BUF_MARKERS (current_buffer); tail; tail =3D tail->nex= t) - tail->charpos =3D tail->bytepos; + DO_MARKERS (current_buffer, tail) + { + const ptrdiff_t bytepos =3D marker_vector_bytepos (tail); + marker_vector_set_charpos (tail, bytepos); + } + END_DO_MARKERS; =20 /* Convert multibyte form of 8-bit characters to unibyte. */ pos =3D BEG; @@ -2908,25 +2906,13 @@ DEFUN ("set-buffer-multibyte", Fset_buffer_multibyt= e, Sset_buffer_multibyte, TEMP_SET_PT_BOTH (position, byte); } =20 - tail =3D markers =3D BUF_MARKERS (current_buffer); - - /* This prevents BYTE_TO_CHAR (that is, buf_bytepos_to_charpos) from - getting confused by the markers that have not yet been updated. - It is also a signal that it should never create a marker. */ - BUF_MARKERS (current_buffer) =3D NULL; - - for (; tail; tail =3D tail->next) + DO_MARKERS (current_buffer, tail) { - tail->bytepos =3D advance_to_char_boundary (tail->bytepos); - tail->charpos =3D BYTE_TO_CHAR (tail->bytepos); + ptrdiff_t bytepos =3D marker_vector_bytepos (tail); + bytepos =3D advance_to_char_boundary (bytepos); + marker_vector_set_charpos (tail, BYTE_TO_CHAR (bytepos)); } - - /* Make sure no markers were put on the chain - while the chain value was incorrect. */ - if (BUF_MARKERS (current_buffer)) - emacs_abort (); - - BUF_MARKERS (current_buffer) =3D markers; + END_DO_MARKERS; =20 /* Do this last, so it can calculate the new correspondences between chars and bytes. */ modified src/buffer.h @@ -217,9 +217,11 @@ #define BUF_BYTES_MAX \ Do not check that the position is in range. */ =20 #define FETCH_BYTE(n) (*BYTE_POS_ADDR (n)) - + /* Define the actual buffer data structures. */ =20 +struct text_index; + /* This data structure describes the actual text contents of a buffer. It is shared between indirect buffers and their base buffer. */ =20 @@ -271,14 +273,8 @@ #define FETCH_BYTE(n) (*BYTE_POS_ADDR (n)) /* Properties of this buffer's text. */ INTERVAL intervals; =20 - /* The markers that refer to this buffer. - This is actually a single marker --- - successive elements in its marker `chain' - are the other markers referring to this buffer. - This is a singly linked unordered list, which means that it's - very cheap to add a marker to the list and it's also very cheap - to move a marker within a buffer. */ - struct Lisp_Marker *markers; + /* Marker vector. */ + Lisp_Object markers; =20 /* Usually false. Temporarily true in decode_coding_gap to prevent Fgarbage_collect from shrinking the gap and losing @@ -287,6 +283,9 @@ #define FETCH_BYTE(n) (*BYTE_POS_ADDR (n)) =20 /* True if it needs to be redisplayed. */ bool_bf redisplay : 1; + + /* Index supporting char <-> byte position mapping. */ + struct text_index *index; }; =20 /* Most code should use this macro to access Lisp fields in struct buffer.= */ @@ -1744,4 +1743,6 @@ dec_both (ptrdiff_t *charpos, ptrdiff_t *bytepos) int compare_overlays (const void *v1, const void *v2); void make_sortvec_item (struct sortvec *item, Lisp_Object overlay); =20 +#include "marker-vector.h" + #endif /* EMACS_BUFFER_H */ modified src/callint.c @@ -500,7 +500,7 @@ DEFUN ("call-interactively", Fcall_interactively, Scall= _interactively, 1, 3, 0, break; =20 case 'd': /* Value of point. Does not do I/O. */ - set_marker_both (point_marker, Qnil, PT, PT_BYTE); + set_marker_both (point_marker, Qnil, PT); args[i] =3D point_marker; /* visargs[i] =3D Qnil; */ varies[i] =3D 1; @@ -658,7 +658,7 @@ DEFUN ("call-interactively", Fcall_interactively, Scall= _interactively, 1, 3, 0, case 'r': /* Region, point and mark as 2 args. */ { check_mark (true); - set_marker_both (point_marker, Qnil, PT, PT_BYTE); + set_marker_both (point_marker, Qnil, PT); ptrdiff_t mark =3D marker_position (BVAR (current_buffer, mark)); /* visargs[i] =3D visargs[i + 1] =3D Qnil; */ args[i] =3D PT < mark ? point_marker : BVAR (current_buffer, mark); modified src/coding.c @@ -8118,14 +8118,14 @@ decode_coding_object (struct coding_system *coding, move_gap_both (from, from_byte); if (BASE_EQ (src_object, dst_object)) { - struct Lisp_Marker *tail; - - for (tail =3D BUF_MARKERS (current_buffer); tail; tail =3D tail->next) + DO_MARKERS (current_buffer, tail) { + const ptrdiff_t charpos =3D marker_vector_charpos (tail); tail->need_adjustment - =3D tail->charpos =3D=3D (tail->insertion_type ? from : to); + =3D charpos =3D=3D (tail->insertion_type ? from : to); need_marker_adjustment |=3D tail->need_adjustment; } + END_DO_MARKERS; saved_pt =3D PT, saved_pt_byte =3D PT_BYTE; TEMP_SET_PT_BOTH (from, from_byte); current_buffer->text->inhibit_shrinking =3D true; @@ -8250,25 +8250,26 @@ decode_coding_object (struct coding_system *coding, =20 if (need_marker_adjustment) { - struct Lisp_Marker *tail; - - for (tail =3D BUF_MARKERS (current_buffer); tail; tail =3D tail->next) - if (tail->need_adjustment) - { - tail->need_adjustment =3D 0; - if (tail->insertion_type) - { - tail->bytepos =3D from_byte; - tail->charpos =3D from; - } - else - { - tail->bytepos =3D from_byte + coding->produced; - tail->charpos - =3D (NILP (BVAR (current_buffer, enable_multibyte_characters)) - ? tail->bytepos : from + coding->produced_char); - } - } + DO_MARKERS (current_buffer, tail) + { + if (tail->need_adjustment) + { + tail->need_adjustment =3D 0; + if (tail->insertion_type) + { + marker_vector_set_charpos (tail, from); + } + else + { + ptrdiff_t bytepos =3D from_byte + coding->produced; + ptrdiff_t charpos + =3D (NILP (BVAR (current_buffer, enable_multibyte_characters)) + ? bytepos : from + coding->produced_char); + marker_vector_set_charpos (tail, charpos); + } + } + } + END_DO_MARKERS; } } =20 @@ -8338,16 +8339,15 @@ encode_coding_object (struct coding_system *coding, bool same_buffer =3D false; if (BASE_EQ (src_object, dst_object) && BUFFERP (src_object)) { - struct Lisp_Marker *tail; - same_buffer =3D true; - - for (tail =3D BUF_MARKERS (XBUFFER (src_object)); tail; tail =3D tai= l->next) + DO_MARKERS (XBUFFER (src_object), tail) { + const ptrdiff_t charpos =3D marker_vector_charpos (tail); tail->need_adjustment - =3D tail->charpos =3D=3D (tail->insertion_type ? from : to); + =3D charpos =3D=3D (tail->insertion_type ? from : to); need_marker_adjustment |=3D tail->need_adjustment; } + END_DO_MARKERS; } =20 if (! NILP (CODING_ATTR_PRE_WRITE (attrs))) @@ -8505,25 +8505,26 @@ encode_coding_object (struct coding_system *coding, =20 if (need_marker_adjustment) { - struct Lisp_Marker *tail; - - for (tail =3D BUF_MARKERS (current_buffer); tail; tail =3D tail->next) - if (tail->need_adjustment) - { - tail->need_adjustment =3D 0; - if (tail->insertion_type) - { - tail->bytepos =3D from_byte; - tail->charpos =3D from; - } - else - { - tail->bytepos =3D from_byte + coding->produced; - tail->charpos - =3D (NILP (BVAR (current_buffer, enable_multibyte_characters)) - ? tail->bytepos : from + coding->produced_char); - } - } + DO_MARKERS (current_buffer, tail) + { + if (tail->need_adjustment) + { + tail->need_adjustment =3D 0; + if (tail->insertion_type) + { + marker_vector_set_charpos (tail, from); + } + else + { + const ptrdiff_t bytepos =3D from_byte + coding->produced; + const ptrdiff_t charpos + =3D (NILP (BVAR (current_buffer, enable_multibyte_characters)) + ? bytepos : from + coding->produced_char); + marker_vector_set_charpos (tail, charpos); + } + } + } + END_DO_MARKERS; } } =20 modified src/composite.c @@ -934,7 +934,7 @@ autocmp_chars (Lisp_Object rule, ptrdiff_t charpos, ptr= diff_t bytepos, specpdl_ref count =3D SPECPDL_INDEX (); Lisp_Object pos =3D make_fixnum (charpos); ptrdiff_t to; - ptrdiff_t pt =3D PT, pt_byte =3D PT_BYTE; + ptrdiff_t pt =3D PT; Lisp_Object re, font_object, lgstring; ptrdiff_t len; =20 @@ -975,7 +975,7 @@ autocmp_chars (Lisp_Object rule, ptrdiff_t charpos, ptr= diff_t bytepos, /* Save point as marker before calling out to lisp. */ if (NILP (string)) record_unwind_protect (restore_point_unwind, - build_marker (current_buffer, pt, pt_byte)); + build_marker (current_buffer, pt)); lgstring =3D safe_calln (Vauto_composition_function, AREF (rule, 2), pos, make_fixnum (to), font_object, string, direction); modified src/dispextern.h @@ -322,7 +322,7 @@ #define CLIP_TEXT_POS_FROM_MARKER(POS, MARKER) \ /* Set marker MARKER from text position POS. */ =20 #define SET_MARKER_FROM_TEXT_POS(MARKER, POS) \ - set_marker_both (MARKER, Qnil, CHARPOS (POS), BYTEPOS (POS)) + set_marker_both (MARKER, Qnil, CHARPOS (POS)) =20 /* Value is non-zero if character and byte positions of POS1 and POS2 are equal. */ modified src/editfns.c @@ -199,7 +199,7 @@ DEFUN ("point-marker", Fpoint_marker, Spoint_marker, 0,= 0, 0, doc: /* Return value of point, as a marker object. */) (void) { - return build_marker (current_buffer, PT, PT_BYTE); + return build_marker (current_buffer, PT); } =20 DEFUN ("goto-char", Fgoto_char, Sgoto_char, 1, 1, @@ -889,7 +889,7 @@ DEFUN ("point-min-marker", Fpoint_min_marker, Spoint_mi= n_marker, 0, 0, 0, This is the beginning, unless narrowing (a buffer restriction) is in effec= t. */) (void) { - return build_marker (current_buffer, BEGV, BEGV_BYTE); + return build_marker (current_buffer, BEGV); } =20 DEFUN ("point-max", Fpoint_max, Spoint_max, 0, 0, 0, @@ -909,7 +909,7 @@ DEFUN ("point-max-marker", Fpoint_max_marker, Spoint_ma= x_marker, 0, 0, 0, is in effect, in which case it is less. */) (void) { - return build_marker (current_buffer, ZV, ZV_BYTE); + return build_marker (current_buffer, ZV); } =20 DEFUN ("gap-position", Fgap_position, Sgap_position, 0, 0, 0, @@ -3022,8 +3022,8 @@ save_restriction_save_1 (void) { Lisp_Object beg, end; =20 - beg =3D build_marker (current_buffer, BEGV, BEGV_BYTE); - end =3D build_marker (current_buffer, ZV, ZV_BYTE); + beg =3D build_marker (current_buffer, BEGV); + end =3D build_marker (current_buffer, ZV); =20 /* END must move forward if text is inserted at its exact location. = */ XMARKER (end)->insertion_type =3D 1; @@ -3056,22 +3056,27 @@ save_restriction_restore_1 (Lisp_Object data) struct Lisp_Marker *end =3D XMARKER (XCDR (data)); eassert (buf =3D=3D end->buffer); =20 + ptrdiff_t beg_charpos, end_charpos; if (buf /* Verify marker still points to a buffer. */ - && (beg->charpos !=3D BUF_BEGV (buf) || end->charpos !=3D BUF_ZV (buf))) + && (beg_charpos =3D marker_vector_charpos (beg), + end_charpos =3D marker_vector_charpos (end), + beg_charpos !=3D BUF_BEGV (buf) || end_charpos !=3D BUF_ZV (buf))) /* The restriction has changed from the saved one, so restore the saved restriction. */ { ptrdiff_t pt =3D BUF_PT (buf); + const ptrdiff_t beg_bytepos =3D marker_vector_bytepos (beg); + const ptrdiff_t end_bytepos =3D marker_vector_bytepos (end); =20 - SET_BUF_BEGV_BOTH (buf, beg->charpos, beg->bytepos); - SET_BUF_ZV_BOTH (buf, end->charpos, end->bytepos); + SET_BUF_BEGV_BOTH (buf, beg_charpos, beg_bytepos); + SET_BUF_ZV_BOTH (buf, end_charpos, end_bytepos); =20 - if (pt < beg->charpos || pt > end->charpos) + if (pt < beg_charpos || pt > end_charpos) /* The point is outside the new visible range, move it inside. */ SET_BUF_PT_BOTH (buf, - clip_to_bounds (beg->charpos, pt, end->charpos), - clip_to_bounds (beg->bytepos, BUF_PT_BYTE (buf), - end->bytepos)); + clip_to_bounds (beg_charpos, pt, end_charpos), + clip_to_bounds (beg_bytepos, BUF_PT_BYTE (buf), + end_bytepos)); =20 buf->clip_changed =3D 1; /* Remember that the narrowing changed. */ } @@ -4401,8 +4406,7 @@ transpose_markers (ptrdiff_t start1, ptrdiff_t end1, ptrdiff_t start1_byte, ptrdiff_t end1_byte, ptrdiff_t start2_byte, ptrdiff_t end2_byte) { - register ptrdiff_t amt1, amt1_byte, amt2, amt2_byte, diff, diff_byte, mp= os; - register struct Lisp_Marker *marker; + register ptrdiff_t amt1, amt2, diff, mpos; =20 /* Update point as if it were a marker. */ if (PT < start1) @@ -4428,29 +4432,15 @@ transpose_markers (ptrdiff_t start1, ptrdiff_t end1, =20 /* The difference between the region's lengths */ diff =3D (end2 - start2) - (end1 - start1); - diff_byte =3D (end2_byte - start2_byte) - (end1_byte - start1_byte); =20 /* For shifting each marker in a region by the length of the other region plus the distance between the regions. */ amt1 =3D (end2 - start2) + (start2 - end1); amt2 =3D (end1 - start1) + (start2 - end1); - amt1_byte =3D (end2_byte - start2_byte) + (start2_byte - end1_byte); - amt2_byte =3D (end1_byte - start1_byte) + (start2_byte - end1_byte); =20 - for (marker =3D BUF_MARKERS (current_buffer); marker; marker =3D marker-= >next) + DO_MARKERS (current_buffer, marker) { - mpos =3D marker->bytepos; - if (mpos >=3D start1_byte && mpos < end2_byte) - { - if (mpos < end1_byte) - mpos +=3D amt1_byte; - else if (mpos < start2_byte) - mpos +=3D diff_byte; - else - mpos -=3D amt2_byte; - marker->bytepos =3D mpos; - } - mpos =3D marker->charpos; + mpos =3D marker_vector_charpos (marker); if (mpos >=3D start1 && mpos < end2) { if (mpos < end1) @@ -4460,8 +4450,9 @@ transpose_markers (ptrdiff_t start1, ptrdiff_t end1, else mpos -=3D amt2; } - marker->charpos =3D mpos; + marker_vector_set_charpos (marker, mpos); } + END_DO_MARKERS; } =20 DEFUN ("transpose-regions", Ftranspose_regions, Stranspose_regions, 4, 5, modified src/emacs.c @@ -95,6 +95,7 @@ #define MAIN_PROGRAM #include "intervals.h" #include "character.h" #include "buffer.h" +#include "text-index.h" #include "window.h" #include "xwidget.h" #include "atimer.h" @@ -2148,6 +2149,7 @@ android_emacs_init (int argc, char **argv, char *dump= _file) #endif =20 /* Init buffer storage and default directory of main buffer. */ + init_text_index (); init_buffer (); =20 /* Must precede init_cmdargs and init_sys_modes. */ @@ -2253,6 +2255,7 @@ android_emacs_init (int argc, char **argv, char *dump= _file) syms_of_eval (); syms_of_floatfns (); =20 + syms_of_text_index (); syms_of_buffer (); syms_of_bytecode (); syms_of_callint (); modified src/fns.c @@ -2934,7 +2934,8 @@ internal_equal_1 (Lisp_Object o1, Lisp_Object o2, enu= m equal_kind equal_kind, { return (XMARKER (o1)->buffer =3D=3D XMARKER (o2)->buffer && (XMARKER (o1)->buffer =3D=3D 0 - || XMARKER (o1)->bytepos =3D=3D XMARKER (o2)->bytepos)); + || (marker_vector_charpos (XMARKER (o1)) + =3D=3D marker_vector_charpos (XMARKER (o2))))); } if (BOOL_VECTOR_P (o1)) { @@ -3160,8 +3161,8 @@ value_cmp (Lisp_Object a, Lisp_Object b, int maxdepth) int cmp =3D value_cmp (buf_a, buf_b, maxdepth - 1); if (cmp !=3D 0) return cmp; - ptrdiff_t pa =3D XMARKER (a)->charpos; - ptrdiff_t pb =3D XMARKER (b)->charpos; + ptrdiff_t pa =3D marker_vector_charpos (XMARKER (a)); + ptrdiff_t pb =3D marker_vector_charpos (XMARKER (b)); return pa < pb ? -1 : pa > pb; } =20 @@ -5484,10 +5485,11 @@ sxhash_obj (Lisp_Object obj, int depth) return sxhash_bignum (obj); else if (pvec_type =3D=3D PVEC_MARKER) { - ptrdiff_t bytepos - =3D XMARKER (obj)->buffer ? XMARKER (obj)->bytepos : 0; + const ptrdiff_t charpos =3D (XMARKER (obj)->buffer + ? marker_vector_charpos (XMARKER (obj)) + : 0); EMACS_UINT hash - =3D sxhash_combine ((intptr_t) XMARKER (obj)->buffer, bytepos); + =3D sxhash_combine ((intptr_t) XMARKER (obj)->buffer, charpos); return hash; } else if (pvec_type =3D=3D PVEC_BOOL_VECTOR) modified src/indent.c @@ -539,7 +539,7 @@ check_display_width (Lisp_Object window, make_fixnum (marker_byte_position (w->pointm))); record_unwind_protect (restore_window_buffer, oldbuf); wset_buffer (w, Fcurrent_buffer ()); - set_marker_both (w->pointm, w->contents, pos, pos_byte); + set_marker_both (w->pointm, w->contents, pos); } =20 struct text_pos startpos; @@ -2196,8 +2196,7 @@ restore_window_buffer (Lisp_Object list) wset_buffer (w, XCAR (list)); list =3D XCDR (list); set_marker_both (w->pointm, w->contents, - XFIXNAT (XCAR (list)), - XFIXNAT (XCAR (XCDR (list)))); + XFIXNAT (XCAR (list))); } =20 DEFUN ("vertical-motion", Fvertical_motion, Svertical_motion, 1, 3, 0, @@ -2274,7 +2273,7 @@ line (if such column exists on that line, that is). = If the line is record_unwind_protect (restore_window_buffer, old); wset_buffer (w, Fcurrent_buffer ()); set_marker_both (w->pointm, w->contents, - BUF_PT (current_buffer), BUF_PT_BYTE (current_buffer)); + BUF_PT (current_buffer)); } =20 if (noninteractive) modified src/insdel.c @@ -27,6 +27,8 @@ #include "intervals.h" #include "character.h" #include "buffer.h" +#include "marker-vector.h" +#include "text-index.h" #include "window.h" #include "region-cache.h" #include "pdumper.h" @@ -229,15 +231,14 @@ adjust_suspend_auto_hscroll (ptrdiff_t from, ptrdiff_= t to) { struct window *w =3D XWINDOW (selected_window); =20 - if (BUFFERP (w->contents) - && XBUFFER (w->contents) =3D=3D current_buffer - && XMARKER (w->old_pointm)->charpos >=3D from - && XMARKER (w->old_pointm)->charpos <=3D to) + ptrdiff_t charpos; + if (BUFFERP (w->contents) && XBUFFER (w->contents) =3D=3D current_bu= ffer + && (charpos =3D marker_vector_charpos (XMARKER (w->old_pointm)), + charpos >=3D from && charpos <=3D to)) w->suspend_auto_hscroll =3D 0; } } =20 - /* Adjust all markers for a deletion whose range in bytes is FROM_BYTE to TO_BYTE. The range in charpos is FROM to TO. @@ -249,29 +250,9 @@ adjust_suspend_auto_hscroll (ptrdiff_t from, ptrdiff_t= to) adjust_markers_for_delete (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t to, ptrdiff_t to_byte) { - struct Lisp_Marker *m; - ptrdiff_t charpos; - + text_index_invalidate (current_buffer, from_byte); adjust_suspend_auto_hscroll (from, to); - for (m =3D BUF_MARKERS (current_buffer); m; m =3D m->next) - { - charpos =3D m->charpos; - eassert (charpos <=3D Z); - - /* If the marker is after the deletion, - relocate by number of chars / bytes deleted. */ - if (charpos > to) - { - m->charpos -=3D to - from; - m->bytepos -=3D to_byte - from_byte; - } - /* Here's the case where a marker is inside text being deleted. */ - else if (charpos > from) - { - m->charpos =3D from; - m->bytepos =3D from_byte; - } - } + marker_vector_adjust_for_delete (current_buffer, from, to); adjust_overlays_for_delete (from, to - from); } =20 @@ -288,30 +269,9 @@ adjust_markers_for_delete (ptrdiff_t from, ptrdiff_t f= rom_byte, adjust_markers_for_insert (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t to, ptrdiff_t to_byte, bool before_markers) { - struct Lisp_Marker *m; - ptrdiff_t nchars =3D to - from; - ptrdiff_t nbytes =3D to_byte - from_byte; - + text_index_invalidate (current_buffer, from_byte); adjust_suspend_auto_hscroll (from, to); - for (m =3D BUF_MARKERS (current_buffer); m; m =3D m->next) - { - eassert (m->bytepos >=3D m->charpos - && m->bytepos - m->charpos <=3D Z_BYTE - Z); - - if (m->bytepos =3D=3D from_byte) - { - if (m->insertion_type || before_markers) - { - m->bytepos =3D to_byte; - m->charpos =3D to; - } - } - else if (m->bytepos > from_byte) - { - m->bytepos +=3D nbytes; - m->charpos +=3D nchars; - } - } + marker_vector_adjust_for_insert (current_buffer, from, to, before_marker= s); adjust_overlays_for_insert (from, to - from, before_markers); } =20 @@ -343,10 +303,7 @@ adjust_markers_for_replace (ptrdiff_t from, ptrdiff_t = from_byte, ptrdiff_t old_chars, ptrdiff_t old_bytes, ptrdiff_t new_chars, ptrdiff_t new_bytes) { - register struct Lisp_Marker *m; - ptrdiff_t prev_to_byte =3D from_byte + old_bytes; - ptrdiff_t diff_chars =3D new_chars - old_chars; - ptrdiff_t diff_bytes =3D new_bytes - old_bytes; + text_index_invalidate (current_buffer, from_byte); =20 if (old_chars =3D=3D 0) { @@ -361,45 +318,12 @@ adjust_markers_for_replace (ptrdiff_t from, ptrdiff_t= from_byte, } =20 adjust_suspend_auto_hscroll (from, from + old_chars); - - for (m =3D BUF_MARKERS (current_buffer); m; m =3D m->next) - { - if (m->bytepos >=3D prev_to_byte) - { - m->charpos +=3D diff_chars; - m->bytepos +=3D diff_bytes; - } - else if (m->bytepos > from_byte) - { - m->charpos =3D from; - m->bytepos =3D from_byte; - } - } - + marker_vector_adjust_for_replace (current_buffer, from, old_chars, new_c= hars); check_markers (); - adjust_overlays_for_insert (from + old_chars, new_chars, true); adjust_overlays_for_delete (from, old_chars); } =20 -/* Starting at POS (BYTEPOS), find the byte position corresponding to - ENDPOS, which could be either before or after POS. */ -static ptrdiff_t -count_bytes (ptrdiff_t pos, ptrdiff_t bytepos, ptrdiff_t endpos) -{ - eassert (BEG_BYTE <=3D bytepos && bytepos <=3D Z_BYTE - && BEG <=3D endpos && endpos <=3D Z); - - if (pos <=3D endpos) - for ( ; pos < endpos; pos++) - bytepos +=3D next_char_len (bytepos); - else - for ( ; pos > endpos; pos--) - bytepos -=3D prev_char_len (bytepos); - - return bytepos; -} - /* Adjust byte positions of markers when their character positions didn't change. This is used in several places that replace text, but keep the character positions of the markers unchanged -- the @@ -413,44 +337,7 @@ count_bytes (ptrdiff_t pos, ptrdiff_t bytepos, ptrdiff= _t endpos) adjust_markers_bytepos (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t to, ptrdiff_t to_byte, int to_z) { - register struct Lisp_Marker *m; - ptrdiff_t beg =3D from, begbyte =3D from_byte; - adjust_suspend_auto_hscroll (from, to); - - if (Z =3D=3D Z_BYTE || (!to_z && to =3D=3D to_byte)) - { - /* Make sure each affected marker's bytepos is equal to - its charpos. */ - for (m =3D BUF_MARKERS (current_buffer); m; m =3D m->next) - { - if (m->bytepos > from_byte - && (to_z || m->bytepos <=3D to_byte)) - m->bytepos =3D m->charpos; - } - } - else - { - for (m =3D BUF_MARKERS (current_buffer); m; m =3D m->next) - { - /* Recompute each affected marker's bytepos. */ - if (m->bytepos > from_byte - && (to_z || m->bytepos <=3D to_byte)) - { - if (m->charpos < beg - && beg - m->charpos > m->charpos - from) - { - beg =3D from; - begbyte =3D from_byte; - } - m->bytepos =3D count_bytes (beg, begbyte, m->charpos); - beg =3D m->charpos; - begbyte =3D m->bytepos; - } - } - } - - /* Make sure cached charpos/bytepos is invalid. */ clear_charpos_cache (current_buffer); } =20 modified src/lisp.h @@ -1757,11 +1757,17 @@ ASIZE (Lisp_Object array) return size; } =20 +INLINE ptrdiff_t +gc_vsize (const struct Lisp_Vector *v) +{ + return v->header.size & ~ARRAY_MARK_FLAG; +} + INLINE ptrdiff_t gc_asize (Lisp_Object array) { /* Like ASIZE, but also can be used in the garbage collector. */ - return XVECTOR (array)->header.size & ~ARRAY_MARK_FLAG; + return gc_vsize (XVECTOR (array)); } =20 INLINE ptrdiff_t @@ -2836,23 +2842,8 @@ knuth_hash (hash_hash_t hash, unsigned bits) leaves the marker after the inserted text. */ bool_bf insertion_type : 1; =20 - /* The remaining fields are meaningless in a marker that - does not point anywhere. */ - - /* For markers that point somewhere, - this is used to chain of all the markers in a given buffer. - The chain does not preserve markers from garbage collection; - instead, markers are removed from the chain when freed by GC. */ - /* We could remove it and use an array in buffer_text instead. - That would also allow us to preserve it ordered. */ - struct Lisp_Marker *next; - /* This is the char position where the marker points. */ - ptrdiff_t charpos; - /* This is the byte position. - It's mostly used as a charpos<->bytepos cache (i.e. it's not directly - used to implement the functionality of markers, but rather to (ab)use - markers as a cache for char<->byte mappings). */ - ptrdiff_t bytepos; + /* If in a buffer's marker vector, this is the entry where it is stored.= */ + ptrdiff_t entry; } GCALIGNED_STRUCT; =20 struct Lisp_Overlay @@ -4968,10 +4959,10 @@ XMODULE_FUNCTION (Lisp_Object o) extern void detach_marker (Lisp_Object); extern void unchain_marker (struct Lisp_Marker *); extern Lisp_Object set_marker_restricted (Lisp_Object, Lisp_Object, Lisp_O= bject); -extern Lisp_Object set_marker_both (Lisp_Object, Lisp_Object, ptrdiff_t, p= trdiff_t); +extern Lisp_Object set_marker_both (Lisp_Object, Lisp_Object, ptrdiff_t); extern Lisp_Object set_marker_restricted_both (Lisp_Object, Lisp_Object, - ptrdiff_t, ptrdiff_t); -extern Lisp_Object build_marker (struct buffer *, ptrdiff_t, ptrdiff_t); + ptrdiff_t); +extern Lisp_Object build_marker (struct buffer *, ptrdiff_t); extern void syms_of_marker (void); =20 /* Defined in fileio.c. */ modified src/lread.c @@ -353,9 +353,8 @@ readchar (Lisp_Object readcharfun, bool *multibyte) bytepos++; } =20 - XMARKER (readcharfun)->bytepos =3D bytepos; - XMARKER (readcharfun)->charpos++; - + const ptrdiff_t charpos =3D marker_vector_charpos (XMARKER (readchar= fun)); + marker_vector_set_charpos (XMARKER (readcharfun), charpos + 1); return c; } =20 @@ -502,16 +501,8 @@ unreadchar (Lisp_Object readcharfun, int c) } else if (MARKERP (readcharfun)) { - struct buffer *b =3D XMARKER (readcharfun)->buffer; - ptrdiff_t bytepos =3D XMARKER (readcharfun)->bytepos; - - XMARKER (readcharfun)->charpos--; - if (! NILP (BVAR (b, enable_multibyte_characters))) - bytepos -=3D buf_prev_char_len (b, bytepos); - else - bytepos--; - - XMARKER (readcharfun)->bytepos =3D bytepos; + const ptrdiff_t charpos =3D marker_vector_charpos (XMARKER (readchar= fun)); + marker_vector_set_charpos (XMARKER (readcharfun), charpos - 1); } else if (STRINGP (readcharfun)) { new file src/marker-vector.c @@ -0,0 +1,398 @@ +/* Marker vectors.. + Copyright (C) 2025 Free Software Foundation, Inc. + +This file is part of GNU Emacs. + +GNU Emacs is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or (at +your option) any later version. + +GNU Emacs is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Emacs. If not, see . */ + +/* A marker vector is used to hold the markers of a buffer. The vector + is a normal Lisp vector that consists of a header and a number of + entries for each marker. A Lisp vector is used because the vector + references markers "weakly", and that's what easy for igc. + + +------+-----------+---------+---------+--------------+ + | FREE | MAX_ENTRY | entry 0 | entry 1 | ... | + +------+-----------+---------+---------+--------------+ + |<----- header --->| + + Entries consist of 2 vector slots MARKER and CHARPOS. MARKER holds a + marker, if the entry is in use. CHARPOS is not yet used. (The idea is + to move the positions from Lisp_Marker here, which speeds up + adjusting positions when the text changes.) + + FREE is the array index of the start of the next free entry in the + marker vector. Free entries form a singly-linked list using the + MARKER field of entries. + + MAX_ENTRY is the largest index ever used to store a marker. This is + used to (supposedly) speed up iteration over the marker vector, with + the assumption that there might be a tail of slots in the marker + vector that is never used. Or, IOW, that we over-allocate room in the + marker vector. + + Lisp_Marker objects contain the index under which they are stored in + the marker vector. + + The use of a free list gives O(1) for adding a marker. The index + stored in the Lisp_Marker provides O(1) deletion of a marker from + the markers of a buffer. + + Iteration over marker vectors is done by iterating over all slots of + the vector that can contain markers, skipping those that don't. + + Iteration over markers is O(N) where N is the size of the marker + vector. This could be improved to N being the number of live markers + by putting marker entries in a doubly-linked list. The downside is + that iteration then might access the marker vector slots in an + unpredictable order, while the current method scans the vector + sequentially which should be fast. */ + +#include +#include "lisp.h" +#include "buffer.h" +#include "text-index.h" +#include "marker-vector.h" +#ifdef HAVE_MPS +#include "igc.h" +#endif + +/* Number of entries to allocate initially. */ +#define MARKER_VECTOR_INITIAL_ENTRIES 20 + +/* Access fields of an entry E of marker vector V as lvalues. */ +#define MARKER(v, e) (v)->contents[(e) + MARKER_VECTOR_OFFSET_MARKER] +#define CHARPOS(v, e) (v)->contents[(e) + MARKER_VECTOR_OFFSET_CHARPOS] + +/* Index of next free entry in the header of a marker vector. This must + use the marker field so that putting an entry on the free list + implicitly sets the marker slot to a non-marker. See + fix_marker_vector in igc.c. */ +#define NEXT_FREE(v, e) MARKER (v, e) + +/* Value denoting end of the free list. */ +#define FREE_LIST_END make_fixnum (0) +#define IS_FREE_LIST_END(x) EQ ((x), FREE_LIST_END) + +/* Access header fields of marker vector V as lvalues. */ +#define FREE(v) (v)->contents[MARKER_VECTOR_FREE] +#define MAX_ENTRY(v) (v)->contents[MARKER_VECTOR_MAX_ENTRY] + +/* Check that index ENTRY is a valid entry start index in vector V. */ + +static void +check_is_entry (const struct Lisp_Vector *v, ptrdiff_t entry) +{ + eassert (entry >=3D MARKER_VECTOR_HEADER_SIZE); + eassert (entry < gc_vsize (v)); + eassert ((entry - MARKER_VECTOR_HEADER_SIZE) % MARKER_VECTOR_ENTRY_SIZE = =3D=3D 0); +} + +/* Push entry ENTRY of V to its free-list. This must set MARKER to not + be a marker, which is done by using the MARKER field of entry + to form the free-list. */ + +static void +push_free (struct Lisp_Vector *v, const ptrdiff_t entry) +{ + check_is_entry (v, entry); + NEXT_FREE (v, entry) =3D FREE (v); + FREE (v) =3D make_fixnum (entry); +} + +/* Pop the next free entry from the free-list of V and return its entry + start index. */ + +static ptrdiff_t +pop_free (struct Lisp_Vector *v) +{ + const ptrdiff_t free =3D XFIXNUM (FREE (v)); + FREE (v) =3D NEXT_FREE (v, free); + check_is_entry (v, free); + return free; +} + +/* Add a new entry for marker M to marker vector MV and return its entry + start index. */ + +static ptrdiff_t +add_entry (Lisp_Object mv, struct Lisp_Marker *m) +{ + struct Lisp_Vector *v =3D XVECTOR (mv); + const ptrdiff_t entry =3D pop_free (v); + MARKER (v, entry) =3D make_lisp_ptr (m, Lisp_Vectorlike); + const ptrdiff_t max_entry =3D XFIXNUM (MAX_ENTRY (v)); + MAX_ENTRY (v) =3D make_fixnum (max (entry, max_entry)); + return entry; +} + +/* Allocate a marker vector of length LEN. */ + +Lisp_Object +alloc_marker_vector (ptrdiff_t len) +{ +#ifdef HAVE_MPS + return igc_alloc_marker_vector (len, FREE_LIST_END); +#else + return make_vector (len, FREE_LIST_END); +#endif +} + +/* Expensive pre- and post-condition checking. V is the marker vector to + check. ALLOCATING true means we are called from allocation functions + where V may be different from the underlying buffer's marker + vector. */ + +static void +check_marker_vector (struct Lisp_Vector *v, bool allocating) +{ +#ifdef ENABLE_CHECKING + size_t nfree =3D 0; + for (Lisp_Object e =3D FREE (v); !IS_FREE_LIST_END (e); + e =3D NEXT_FREE (v, XFIXNUM (e))) + ++nfree; + + size_t nused =3D 0; + Lisp_Object mv =3D make_lisp_ptr (v, Lisp_Vectorlike); + DO_MARKERS_OF_VECTOR (mv, m) + { + eassert (m->entry =3D=3D e_); + eassert (m->buffer !=3D NULL); + if (!allocating) + { + struct Lisp_Vector *mv =3D XVECTOR (BUF_MARKERS (m->buffer)); + eassert (mv =3D=3D v); + } + ++nused; + } + END_DO_MARKERS; + + eassert ((nused + nfree) * MARKER_VECTOR_ENTRY_SIZE + + MARKER_VECTOR_HEADER_SIZE =3D=3D gc_vsize (v)); +#endif +} + +/* Add all entries of MV starting with FIRST to the end of marker vector MV + to its free list. */ + +static void +add_to_free_list (Lisp_Object mv, ptrdiff_t first) +{ + struct Lisp_Vector *v =3D XVECTOR (mv); + for (ptrdiff_t e =3D ASIZE (mv) - MARKER_VECTOR_ENTRY_SIZE; + e >=3D first; e -=3D MARKER_VECTOR_ENTRY_SIZE) + push_free (v, e); +} + +/* Make a new marker vector. */ + +Lisp_Object +make_marker_vector (void) +{ + const ptrdiff_t len + =3D (MARKER_VECTOR_INITIAL_ENTRIES * MARKER_VECTOR_ENTRY_SIZE + + MARKER_VECTOR_HEADER_SIZE); + Lisp_Object mv =3D alloc_marker_vector (len); + add_to_free_list (mv, MARKER_VECTOR_HEADER_SIZE); + check_marker_vector (XVECTOR (mv), true); + return mv; +} + +/* Return a new marker vector that is like OLD_MV but larger. */ + +static Lisp_Object +larger_marker_vector (Lisp_Object old_mv) +{ + const ptrdiff_t old_size =3D ASIZE (old_mv); + const ptrdiff_t old_entries_size =3D old_size - MARKER_VECTOR_HEADER_SIZ= E; + const ptrdiff_t new_size =3D 2 * old_entries_size + MARKER_VECTOR_HEADER= _SIZE; + + /* Allocate a new marker vector. */ + Lisp_Object new_mv =3D alloc_marker_vector (new_size); + struct Lisp_Vector *new_v =3D XVECTOR (new_mv); + + /* Copy existing entries. */ + const struct Lisp_Vector *old_v =3D XVECTOR (old_mv); + const size_t nbytes =3D old_size * sizeof (Lisp_Object); + memcpy (new_v->contents, old_v->contents, nbytes); + + /* Add new entries to free-list. */ + add_to_free_list (new_mv, old_size); + check_marker_vector (new_v, true); + return new_mv; +} + +/* Make sure that the marker vector of buffer B has room for a new + entry. Make a larger marker vector if not. Value is the marker + vector of B at the end. */ + +static Lisp_Object +ensure_room (struct buffer *b) +{ + Lisp_Object mv =3D BUF_MARKERS (b); + if (IS_FREE_LIST_END (FREE (XVECTOR (mv)))) + { + mv =3D larger_marker_vector (mv); + BUF_MARKERS (b) =3D mv; + } + return mv; +} + +/* Add marker M to the marker vector of buffer B. */ + +void +marker_vector_add (struct buffer *b, struct Lisp_Marker *m) +{ + const Lisp_Object mv =3D ensure_room (b); + check_marker_vector (XVECTOR (mv), false); + m->buffer =3D b; + m->entry =3D add_entry (mv, m); + check_marker_vector (XVECTOR (mv), false); +} + +/* Remove marker M from marker vector V. */ + +void +marker_vector_remove (struct Lisp_Vector *v, struct Lisp_Marker *m) +{ + check_marker_vector (v, false); + check_is_entry (v, m->entry); + eassert (MARKERP (MARKER (v, m->entry))); + eassert (XMARKER (MARKER (v, m->entry)) =3D=3D m); + push_free (v, m->entry); + /* The old GC contains at least one assertion that unchaining markers + in kill-buffer resets the markers' buffers. IGC does not do this, + can't do this, and does not need it. */ + m->buffer =3D NULL; + check_marker_vector (v, false); +} + +/* Reset markers of buffer B. Called from kill-buffer. */ + +void +marker_vector_reset (struct buffer *b) +{ + /* The old GC contains at least one assertion that unchaining markers + in kill-buffer resets the markers' buffers. IGC does not do this, + can't do this, and does not need it. */ + DO_MARKERS (b, m) + { + m->buffer =3D NULL; + } + END_DO_MARKERS; + BUF_MARKERS (b) =3D Qnil; +} + +/* Set marker M's character position to CHARPOS. */ + +void +marker_vector_set_charpos (struct Lisp_Marker *m, ptrdiff_t charpos) +{ + eassert (m->buffer); + struct Lisp_Vector *v =3D XVECTOR (BUF_MARKERS (m->buffer)); + check_is_entry (v, m->entry); + CHARPOS (v, m->entry) =3D make_fixnum (charpos); +} + +/* Return marker M's character position. */ + +ptrdiff_t +marker_vector_charpos (const struct Lisp_Marker *m) +{ + eassert (m->buffer); + struct Lisp_Vector *v =3D XVECTOR (BUF_MARKERS (m->buffer)); + check_is_entry (v, m->entry); + return XFIXNUM (CHARPOS (v, m->entry)); +} + +/* Return marker M's byte position. */ + +ptrdiff_t +marker_vector_bytepos (const struct Lisp_Marker *m) +{ + const ptrdiff_t charpos =3D marker_vector_charpos (m); + return text_index_charpos_to_bytepos (m->buffer, charpos); +} + +/* Adjust all marker positions of buffer B for a deletion of a range + FROM_CHARPOS to TO_CHARPOS characters. */ + +void +marker_vector_adjust_for_delete (struct buffer *b, + const ptrdiff_t from_charpos, + const ptrdiff_t to_charpos) +{ + const ptrdiff_t nchars =3D to_charpos - from_charpos; + struct Lisp_Vector *v =3D XVECTOR (BUF_MARKERS (b)); + DO_MARKERS (b, m) + { + const ptrdiff_t charpos =3D XFIXNUM (CHARPOS (v, m->entry)); + eassert (charpos <=3D Z); + if (charpos > to_charpos) + CHARPOS (v, m->entry) =3D make_fixnum (charpos - nchars); + else if (charpos > from_charpos) + CHARPOS (v, m->entry) =3D make_fixnum (from_charpos); + } + END_DO_MARKERS; +} + +/* Adjust marker positions in buffer B for an insertion that stretches + from FROM_CHARPOS to TO_CHARPOS. When a marker points at the + insertion point FROM_CHARPOS, we advance it if either its + insertion-type is t or BEFORE_MARKERS is true. */ + +void +marker_vector_adjust_for_insert (struct buffer *b, + const ptrdiff_t from_charpos, + const ptrdiff_t to_charpos, + const bool before_markers) +{ + const ptrdiff_t nchars =3D to_charpos - from_charpos; + struct Lisp_Vector *v =3D XVECTOR (BUF_MARKERS (b)); + DO_MARKERS (b, m) + { + const ptrdiff_t charpos =3D XFIXNUM (CHARPOS (v, m->entry)); + if (charpos =3D=3D from_charpos) + { + if (m->insertion_type || before_markers) + CHARPOS (v, m->entry) =3D make_fixnum (to_charpos); + } + else if (charpos > from_charpos) + CHARPOS (v, m->entry) =3D make_fixnum (charpos + nchars); + } + END_DO_MARKERS; +} + +/* Adjust marker positions of buffer Bs for a replacement of text at + FROM_CHARPOS of length OLD_NCHARS to a new text of length NEW_NCHARS. + It is assumed that OLD_CHARS > 0, i.e., this is not an insertion. */ + +void +marker_vector_adjust_for_replace (struct buffer *b, + const ptrdiff_t from_charpos, + const ptrdiff_t old_nchars, + const ptrdiff_t new_nchars) +{ + const ptrdiff_t diff_nchars =3D new_nchars - old_nchars; + const ptrdiff_t old_to_charpos =3D from_charpos + old_nchars; + struct Lisp_Vector *v =3D XVECTOR (BUF_MARKERS (b)); + DO_MARKERS (b, m) + { + const ptrdiff_t charpos =3D XFIXNUM (CHARPOS (v, m->entry)); + if (charpos >=3D old_to_charpos) + CHARPOS (v, m->entry) =3D make_fixnum (charpos + diff_nchars); + else if (charpos > from_charpos) + CHARPOS (v, m->entry) =3D make_fixnum (from_charpos); + } + END_DO_MARKERS; +} new file src/marker-vector.h @@ -0,0 +1,78 @@ +/* Marker vectors. + Copyright (C) 2025 Free Software Foundation, Inc. + +This file is part of GNU Emacs. + +GNU Emacs is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or (at +your option) any later version. + +GNU Emacs is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Emacs. If not, see . */ + +#ifndef EMACS_MARKER_VECTOR_H +#define EMACS_MARKER_VECTOR_H + +#include +#include "lisp.h" + +/* A marker vector is a Lisp vector starting with a header of + MARKER_VECTOR_HEADER_SIZE Lisp_Objects, followed by entries of + MARKER_VECTOR_ENTRY_SIZE Lisp_Objects. */ + +enum +{ + /* Header. */ + MARKER_VECTOR_FREE =3D 0, + MARKER_VECTOR_MAX_ENTRY =3D 1, + MARKER_VECTOR_HEADER_SIZE =3D 2, + + /* Entries. */ + MARKER_VECTOR_OFFSET_MARKER =3D 0, + MARKER_VECTOR_OFFSET_CHARPOS =3D 1, + MARKER_VECTOR_ENTRY_SIZE =3D 2, +}; + +/* Iterate over markers in marker vector MV, binding a variable with + name M to a pointer to Lisp_Marker. The loop must be ended + with an END_DO_MARKERS. */ + +# define DO_MARKERS_OF_VECTOR(mv, m) \ + for (ptrdiff_t e_ =3D MARKER_VECTOR_HEADER_SIZE, \ + end_ =3D XFIXNUM (AREF (mv, MARKER_VECTOR_MAX_ENTRY)); \ + e_ <=3D end_; \ + e_ +=3D MARKER_VECTOR_ENTRY_SIZE) \ + { \ + Lisp_Object m_ =3D AREF (mv, e_ + MARKER_VECTOR_OFFSET_MARKER); \ + if (MARKERP (m_)) \ + { \ + struct Lisp_Marker *m =3D XMARKER (m_); + +/* Iterate over markers of buffer B, binding a variable with name M to a + pointer to Lisp_Marker. The loop must be ended with an + END_DO_MARKERS. */ + +# define DO_MARKERS(b, m) DO_MARKERS_OF_VECTOR (BUF_MARKERS (b), m) +# define END_DO_MARKERS }} + +Lisp_Object make_marker_vector (void); +Lisp_Object alloc_marker_vector (ptrdiff_t len); +void marker_vector_add (struct buffer *b, struct Lisp_Marker *m); +void marker_vector_remove (struct Lisp_Vector *v, struct Lisp_Marker *m); +void marker_vector_reset (struct buffer *b); +void marker_vector_set_charpos (struct Lisp_Marker *m, ptrdiff_t charpos); +ptrdiff_t marker_vector_charpos (const struct Lisp_Marker *m); +ptrdiff_t marker_vector_bytepos (const struct Lisp_Marker *m); +void marker_vector_adjust_for_delete (struct buffer *b, ptrdiff_t from, pt= rdiff_t to); +void marker_vector_adjust_for_insert (struct buffer *b, const ptrdiff_t fr= om, + ptrdiff_t to, bool before_markers); +void marker_vector_adjust_for_replace (struct buffer *b, ptrdiff_t from, + ptrdiff_t old_chars, ptrdiff_t new_chars); + +#endif /* EMACS_MARKER_VECTOR_H */ modified src/marker.c @@ -28,16 +28,9 @@ #include "lisp.h" #include "character.h" #include "buffer.h" +#include "text-index.h" #include "window.h" =20 -/* Record one cached position found recently by - buf_charpos_to_bytepos or buf_bytepos_to_charpos. */ - -static ptrdiff_t cached_charpos; -static ptrdiff_t cached_bytepos; -static struct buffer *cached_buffer; -static modiff_count cached_modiff; - /* Juanma Barranquero reported ~3x increased bootstrap time when byte_char_debug_check is enabled; so this is never turned on by --enable-checking configure option. */ @@ -78,240 +71,24 @@ #define byte_char_debug_check(b, charpos, bytepos) do = { } while (0) void clear_charpos_cache (struct buffer *b) { - if (cached_buffer =3D=3D b) - cached_buffer =3D 0; + /* FIXME: Maybe check again that this does not need to do anything. + when using marker vectors. I don't think so, but anyway. */ } /* Converting between character positions and byte positions. */ =20 -/* There are several places in the buffer where we know - the correspondence: BEG, BEGV, PT, GPT, ZV and Z, - and everywhere there is a marker. So we find the one of these places - that is closest to the specified position, and scan from there. */ - -/* This macro is a subroutine of buf_charpos_to_bytepos. - Note that it is desirable that BYTEPOS is not evaluated - except when we really want its value. */ - -#define CONSIDER(CHARPOS, BYTEPOS) \ -{ \ - ptrdiff_t this_charpos =3D (CHARPOS); \ - bool changed =3D false; \ - \ - if (this_charpos =3D=3D charpos) \ - { \ - ptrdiff_t value =3D (BYTEPOS); \ - \ - byte_char_debug_check (b, charpos, value); \ - return value; \ - } \ - else if (this_charpos > charpos) \ - { \ - if (this_charpos < best_above) \ - { \ - best_above =3D this_charpos; \ - best_above_byte =3D (BYTEPOS); \ - changed =3D true; \ - } \ - } \ - else if (this_charpos > best_below) \ - { \ - best_below =3D this_charpos; \ - best_below_byte =3D (BYTEPOS); \ - changed =3D true; \ - } \ - \ - if (changed) \ - { \ - if (best_above - best_below =3D=3D best_above_byte - best_below_byte= ) \ - { \ - ptrdiff_t value =3D best_below_byte + (charpos - best_below); \ - \ - byte_char_debug_check (b, charpos, value); \ - return value; \ - } \ - } \ -} - static void CHECK_MARKER (Lisp_Object x) { CHECK_TYPE (MARKERP (x), Qmarkerp, x); } =20 -/* When converting bytes from/to chars, we look through the list of - markers to try and find a good starting point (since markers keep - track of both bytepos and charpos at the same time). - But if there are many markers, it can take too much time to find a "goo= d" - marker from which to start. Worse yet: if it takes a long time and we = end - up finding a nearby markers, we won't add a new marker to cache this - result, so next time around we'll have to go through this same long list - to (re)find this best marker. So the further down the list of - markers we go, the less demanding we are w.r.t what is a good marker. - - The previous code used INITIAL=3D50 and INCREMENT=3D0 and this lead to - really poor performance when there are many markers. - I haven't tried to tweak INITIAL, but experiments on my trusty Thinkpad - T61 using various artificial test cases seem to suggest that INCREMENT= =3D50 - might be "the best compromise": it significantly improved the - worst case and it was rarely slower and never by much. - - The asymptotic behavior is still poor, tho, so in largish buffers with = many - overlays (e.g. 300KB and 30K overlays), it can still be a bottleneck. = */ -#define BYTECHAR_DISTANCE_INITIAL 50 -#define BYTECHAR_DISTANCE_INCREMENT 50 - /* Return the byte position corresponding to CHARPOS in B. */ =20 ptrdiff_t buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) { - struct Lisp_Marker *tail; - ptrdiff_t best_above, best_above_byte; - ptrdiff_t best_below, best_below_byte; - ptrdiff_t distance =3D BYTECHAR_DISTANCE_INITIAL; - - eassert (BUF_BEG (b) <=3D charpos && charpos <=3D BUF_Z (b)); - - best_above =3D BUF_Z (b); - best_above_byte =3D BUF_Z_BYTE (b); - - /* If this buffer has as many characters as bytes, - each character must be one byte. - This takes care of the case where enable-multibyte-characters is nil.= */ - if (best_above =3D=3D best_above_byte) - return charpos; - - best_below =3D BEG; - best_below_byte =3D BEG_BYTE; - - /* We find in best_above and best_above_byte - the closest known point above CHARPOS, - and in best_below and best_below_byte - the closest known point below CHARPOS, - - If at any point we can tell that the space between those - two best approximations is all single-byte, - we interpolate the result immediately. */ - - CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); - CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b)); - CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b)); - CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); - - if (b =3D=3D cached_buffer && BUF_MODIFF (b) =3D=3D cached_modiff) - CONSIDER (cached_charpos, cached_bytepos); - - for (tail =3D BUF_MARKERS (b); - /* If we are down to a range of DISTANCE chars, - don't bother checking any other markers; - scan the intervening chars directly now. */ - tail && !(best_above - charpos < distance - || charpos - best_below < distance); - tail =3D tail->next, - distance +=3D BYTECHAR_DISTANCE_INCREMENT) - CONSIDER (tail->charpos, tail->bytepos); - - /* We get here if we did not exactly hit one of the known places. - We have one known above and one known below. - Scan, counting characters, from whichever one is closer. */ - - eassert (best_below <=3D charpos && charpos <=3D best_above); - if (charpos - best_below < best_above - charpos) - { - bool record =3D charpos - best_below > 5000; - - while (best_below < charpos) - { - best_below++; - best_below_byte +=3D buf_next_char_len (b, best_below_byte); - } - - /* If this position is quite far from the nearest known position, - cache the correspondence by creating a marker here. - It will last until the next GC. */ - if (record) - build_marker (b, best_below, best_below_byte); - - byte_char_debug_check (b, best_below, best_below_byte); - - cached_buffer =3D b; - cached_modiff =3D BUF_MODIFF (b); - cached_charpos =3D best_below; - cached_bytepos =3D best_below_byte; - - return best_below_byte; - } - else - { - bool record =3D best_above - charpos > 5000; - - while (best_above > charpos) - { - best_above--; - best_above_byte -=3D buf_prev_char_len (b, best_above_byte); - } - - /* If this position is quite far from the nearest known position, - cache the correspondence by creating a marker here. - It will last until the next GC. */ - if (record) - build_marker (b, best_above, best_above_byte); - - byte_char_debug_check (b, best_above, best_above_byte); - - cached_buffer =3D b; - cached_modiff =3D BUF_MODIFF (b); - cached_charpos =3D best_above; - cached_bytepos =3D best_above_byte; - - return best_above_byte; - } -} - -#undef CONSIDER - -/* This macro is a subroutine of buf_bytepos_to_charpos. - It is used when BYTEPOS is actually the byte position. */ - -#define CONSIDER(BYTEPOS, CHARPOS) \ -{ \ - ptrdiff_t this_bytepos =3D (BYTEPOS); \ - int changed =3D false; \ - \ - if (this_bytepos =3D=3D bytepos) \ - { \ - ptrdiff_t value =3D (CHARPOS); \ - \ - byte_char_debug_check (b, value, bytepos); \ - return value; \ - } \ - else if (this_bytepos > bytepos) \ - { \ - if (this_bytepos < best_above_byte) \ - { \ - best_above =3D (CHARPOS); \ - best_above_byte =3D this_bytepos; \ - changed =3D true; \ - } \ - } \ - else if (this_bytepos > best_below_byte) \ - { \ - best_below =3D (CHARPOS); \ - best_below_byte =3D this_bytepos; \ - changed =3D true; \ - } \ - \ - if (changed) \ - { \ - if (best_above - best_below =3D=3D best_above_byte - best_below_byte= ) \ - { \ - ptrdiff_t value =3D best_below + (bytepos - best_below_byte); \ - \ - byte_char_debug_check (b, value, bytepos); \ - return value; \ - } \ - } \ + return text_index_charpos_to_bytepos (b, charpos); } =20 /* Return the character position corresponding to BYTEPOS in B. */ @@ -319,108 +96,8 @@ #define CONSIDER(BYTEPOS, CHARPOS) \ ptrdiff_t buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) { - struct Lisp_Marker *tail; - ptrdiff_t best_above, best_above_byte; - ptrdiff_t best_below, best_below_byte; - ptrdiff_t distance =3D BYTECHAR_DISTANCE_INITIAL; - - eassert (BUF_BEG_BYTE (b) <=3D bytepos && bytepos <=3D BUF_Z_BYTE (b)); - - best_above =3D BUF_Z (b); - best_above_byte =3D BUF_Z_BYTE (b); - - /* If this buffer has as many characters as bytes, - each character must be one byte. - This takes care of the case where enable-multibyte-characters is nil.= */ - if (best_above =3D=3D best_above_byte) - return bytepos; - - /* Check bytepos is not in the middle of a character. */ - eassert (bytepos >=3D BUF_Z_BYTE (b) - || CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))); - - best_below =3D BEG; - best_below_byte =3D BEG_BYTE; - - CONSIDER (BUF_PT_BYTE (b), BUF_PT (b)); - CONSIDER (BUF_GPT_BYTE (b), BUF_GPT (b)); - CONSIDER (BUF_BEGV_BYTE (b), BUF_BEGV (b)); - CONSIDER (BUF_ZV_BYTE (b), BUF_ZV (b)); - - if (b =3D=3D cached_buffer && BUF_MODIFF (b) =3D=3D cached_modiff) - CONSIDER (cached_bytepos, cached_charpos); - - for (tail =3D BUF_MARKERS (b); - /* If we are down to a range of DISTANCE bytes, - don't bother checking any other markers; - scan the intervening chars directly now. */ - tail && !(best_above_byte - bytepos < distance - || bytepos - best_below_byte < distance); - tail =3D tail->next, - distance +=3D BYTECHAR_DISTANCE_INCREMENT) - CONSIDER (tail->bytepos, tail->charpos); - - /* We get here if we did not exactly hit one of the known places. - We have one known above and one known below. - Scan, counting characters, from whichever one is closer. */ - - if (bytepos - best_below_byte < best_above_byte - bytepos) - { - bool record =3D bytepos - best_below_byte > 5000; - - while (best_below_byte < bytepos) - { - best_below++; - best_below_byte +=3D buf_next_char_len (b, best_below_byte); - } - - /* If this position is quite far from the nearest known position, - cache the correspondence by creating a marker here. - It will last until the next GC. - But don't do it if BUF_MARKERS is nil; - that is a signal from Fset_buffer_multibyte. */ - if (record && BUF_MARKERS (b)) - build_marker (b, best_below, best_below_byte); - - byte_char_debug_check (b, best_below, best_below_byte); - - cached_buffer =3D b; - cached_modiff =3D BUF_MODIFF (b); - cached_charpos =3D best_below; - cached_bytepos =3D best_below_byte; - - return best_below; - } - else - { - bool record =3D best_above_byte - bytepos > 5000; - - while (best_above_byte > bytepos) - { - best_above--; - best_above_byte -=3D buf_prev_char_len (b, best_above_byte); - } - - /* If this position is quite far from the nearest known position, - cache the correspondence by creating a marker here. - It will last until the next GC. - But don't do it if BUF_MARKERS is nil; - that is a signal from Fset_buffer_multibyte. */ - if (record && BUF_MARKERS (b)) - build_marker (b, best_above, best_above_byte); - - byte_char_debug_check (b, best_above, best_above_byte); - - cached_buffer =3D b; - cached_modiff =3D BUF_MODIFF (b); - cached_charpos =3D best_above; - cached_bytepos =3D best_above_byte; - - return best_above; - } + return text_index_bytepos_to_charpos (b, bytepos); } - -#undef CONSIDER /* Operations on markers. */ =20 @@ -450,7 +127,7 @@ DEFUN ("marker-position", Fmarker_position, Smarker_pos= ition, 1, 1, 0, { CHECK_MARKER (marker); if (XMARKER (marker)->buffer) - return make_fixnum (XMARKER (marker)->charpos); + return make_fixnum (marker_vector_charpos (XMARKER (marker))); =20 return Qnil; } @@ -464,32 +141,23 @@ DEFUN ("marker-last-position", Fmarker_last_position,= Smarker_last_position, 1, { CHECK_MARKER (marker); =20 - return make_fixnum (XMARKER (marker)->charpos); + return make_fixnum (marker_vector_charpos (XMARKER (marker))); } =20 /* Change M so it points to B at CHARPOS and BYTEPOS. */ =20 static void attach_marker (struct Lisp_Marker *m, struct buffer *b, - ptrdiff_t charpos, ptrdiff_t bytepos) + ptrdiff_t charpos) { - /* In a single-byte buffer, two positions must be equal. - Otherwise, every character is at least one byte. */ - if (BUF_Z (b) =3D=3D BUF_Z_BYTE (b)) - eassert (charpos =3D=3D bytepos); - else - eassert (charpos <=3D bytepos); - - m->charpos =3D charpos; - m->bytepos =3D bytepos; - - if (m->buffer !=3D b) + if (m->buffer !=3D b) { unchain_marker (m); - m->buffer =3D b; - m->next =3D BUF_MARKERS (b); - BUF_MARKERS (b) =3D m; + marker_vector_add (b, m); } + + eassert (m->buffer =3D=3D b); + marker_vector_set_charpos (m, charpos); } =20 /* If BUFFER is nil, return current buffer pointer. Next, check @@ -528,13 +196,13 @@ set_marker_internal (Lisp_Object marker, Lisp_Object = position, else if (MARKERP (position) && b =3D=3D XMARKER (position)->buffer && b =3D=3D m->buffer) { - m->bytepos =3D XMARKER (position)->bytepos; - m->charpos =3D XMARKER (position)->charpos; + const ptrdiff_t charpos =3D marker_vector_charpos (XMARKER (position= )); + marker_vector_set_charpos (m, charpos); } =20 else { - register ptrdiff_t charpos, bytepos; + register ptrdiff_t charpos; =20 /* Do not use CHECK_FIXNUM_COERCE_MARKER because we don't want to call buf_charpos_to_bytepos if POSITION @@ -547,15 +215,13 @@ set_marker_internal (Lisp_Object marker, Lisp_Object = position, if (cpos > PTRDIFF_MAX) cpos =3D PTRDIFF_MAX; charpos =3D cpos; - bytepos =3D -1; #else - charpos =3D XFIXNUM (position), bytepos =3D -1; + charpos =3D XFIXNUM (position); #endif } else if (MARKERP (position)) { - charpos =3D XMARKER (position)->charpos; - bytepos =3D XMARKER (position)->bytepos; + charpos =3D marker_vector_charpos (XMARKER (position)); } else wrong_type_argument (Qinteger_or_marker_p, position); @@ -563,18 +229,8 @@ set_marker_internal (Lisp_Object marker, Lisp_Object p= osition, charpos =3D clip_to_bounds (restricted ? BUF_BEGV (b) : BUF_BEG (b), charpos, restricted ? BUF_ZV (b) : BUF_Z (b)); - /* Don't believe BYTEPOS if it comes from a different buffer, - since that buffer might have a very different correspondence - between character and byte positions. */ - if (bytepos =3D=3D -1 - || !(MARKERP (position) && XMARKER (position)->buffer =3D=3D b)) - bytepos =3D buf_charpos_to_bytepos (b, charpos); - else - bytepos =3D clip_to_bounds - (restricted ? BUF_BEGV_BYTE (b) : BUF_BEG_BYTE (b), - bytepos, restricted ? BUF_ZV_BYTE (b) : BUF_Z_BYTE (b)); =20 - attach_marker (m, b, charpos, bytepos); + attach_marker (m, b, charpos); } =20 #ifdef HAVE_TEXT_CONVERSION @@ -628,7 +284,7 @@ set_marker_restricted (Lisp_Object marker, Lisp_Object = position, =20 Lisp_Object set_marker_both (Lisp_Object marker, Lisp_Object buffer, - ptrdiff_t charpos, ptrdiff_t bytepos) + ptrdiff_t charpos) { register struct Lisp_Marker *m; register struct buffer *b =3D live_buffer (buffer); @@ -637,7 +293,7 @@ set_marker_both (Lisp_Object marker, Lisp_Object buffer, m =3D XMARKER (marker); =20 if (b) - attach_marker (m, b, charpos, bytepos); + attach_marker (m, b, charpos); else unchain_marker (m); return marker; @@ -647,7 +303,7 @@ set_marker_both (Lisp_Object marker, Lisp_Object buffer, =20 Lisp_Object set_marker_restricted_both (Lisp_Object marker, Lisp_Object buffer, - ptrdiff_t charpos, ptrdiff_t bytepos) + ptrdiff_t charpos) { register struct Lisp_Marker *m; register struct buffer *b =3D live_buffer (buffer); @@ -657,10 +313,9 @@ set_marker_restricted_both (Lisp_Object marker, Lisp_O= bject buffer, =20 if (b) { - attach_marker - (m, b, - clip_to_bounds (BUF_BEGV (b), charpos, BUF_ZV (b)), - clip_to_bounds (BUF_BEGV_BYTE (b), bytepos, BUF_ZV_BYTE (b))); + attach_marker (m, b, + clip_to_bounds (BUF_BEGV (b), + charpos, BUF_ZV (b))); } else unchain_marker (m); @@ -681,40 +336,12 @@ detach_marker (Lisp_Object marker) buffer NULL. */ =20 void -unchain_marker (register struct Lisp_Marker *marker) +unchain_marker (struct Lisp_Marker *marker) { - register struct buffer *b =3D marker->buffer; - - if (b) + if (marker->buffer) { - register struct Lisp_Marker *tail, **prev; - - /* No dead buffers here. */ - eassert (BUFFER_LIVE_P (b)); - - marker->buffer =3D NULL; - prev =3D &BUF_MARKERS (b); - - for (tail =3D BUF_MARKERS (b); tail; prev =3D &tail->next, tail =3D = *prev) - if (marker =3D=3D tail) - { - if (*prev =3D=3D BUF_MARKERS (b)) - { - /* Deleting first marker from the buffer's chain. Crash - if new first marker in chain does not say it belongs - to the same buffer, or at least that they have the same - base buffer. */ - if (tail->next && b->text !=3D tail->next->buffer->text) - emacs_abort (); - } - *prev =3D tail->next; - /* We have removed the marker from the chain; - no need to scan the rest of the chain. */ - break; - } - - /* Error if marker was not in it's chain. */ - eassert (tail !=3D NULL); + Lisp_Object mv =3D BUF_MARKERS (marker->buffer); + marker_vector_remove (XVECTOR (mv), marker); } } =20 @@ -729,9 +356,9 @@ marker_position (Lisp_Object marker) if (!buf) error ("Marker does not point anywhere"); =20 - eassert (BUF_BEG (buf) <=3D m->charpos && m->charpos <=3D BUF_Z (buf)); - - return m->charpos; + const ptrdiff_t charpos =3D marker_vector_charpos (m); + eassert (BUF_BEG (buf) <=3D charpos && charpos <=3D BUF_Z (buf)); + return charpos; } =20 /* Return the byte position of marker MARKER, as a C integer. */ @@ -745,9 +372,9 @@ marker_byte_position (Lisp_Object marker) if (!buf) error ("Marker does not point anywhere"); =20 - eassert (BUF_BEG_BYTE (buf) <=3D m->bytepos && m->bytepos <=3D BUF_Z_BYT= E (buf)); - - return m->bytepos; + const ptrdiff_t bytepos =3D marker_vector_bytepos (m); + eassert (BUF_BEG_BYTE (buf) <=3D bytepos && bytepos <=3D BUF_Z_BYTE (buf= )); + return bytepos; } DEFUN ("copy-marker", Fcopy_marker, Scopy_marker, 0, 2, 0, modified src/pdumper.c @@ -2129,10 +2129,7 @@ dump_marker (struct dump_context *ctx, const struct = Lisp_Marker *marker) { dump_field_lv_rawptr (ctx, out, marker, &marker->buffer, Lisp_Vectorlike, WEIGHT_NORMAL); - dump_field_lv_rawptr (ctx, out, marker, &marker->next, - Lisp_Vectorlike, WEIGHT_STRONG); - DUMP_FIELD_COPY (out, marker, charpos); - DUMP_FIELD_COPY (out, marker, bytepos); + DUMP_FIELD_COPY (out, marker, entry); } return finish_dump_pvec (ctx, &out->header); } @@ -2825,6 +2822,7 @@ dump_buffer (struct dump_context *ctx, const struct b= uffer *in_buffer) buffer->clip_changed =3D 0; buffer->last_window_start =3D -1; buffer->point_before_scroll_ =3D Qnil; + buffer->own_text.index =3D NULL; =20 dump_off base_offset =3D 0; if (buffer->base_buffer) @@ -2873,8 +2871,8 @@ dump_buffer (struct dump_context *ctx, const struct b= uffer *in_buffer) DUMP_FIELD_COPY (out, buffer, own_text.overlay_unchanged_modified); if (buffer->own_text.intervals) dump_field_fixup_later (ctx, out, buffer, &buffer->own_text.interv= als); - dump_field_lv_rawptr (ctx, out, buffer, &buffer->own_text.markers, - Lisp_Vectorlike, WEIGHT_NORMAL); + dump_field_lv (ctx, out, buffer, &buffer->own_text.markers, + WEIGHT_NORMAL); DUMP_FIELD_COPY (out, buffer, own_text.inhibit_shrinking); DUMP_FIELD_COPY (out, buffer, own_text.redisplay); } modified src/print.c @@ -220,7 +220,7 @@ print_finish (struct print_context *pc) signal_after_change (PT - print_buffer.pos, 0, print_buffer.pos); } if (MARKERP (pc->old_printcharfun)) - set_marker_both (pc->old_printcharfun, Qnil, PT, PT_BYTE); + set_marker_both (pc->old_printcharfun, Qnil, PT); if (pc->old_point >=3D 0) SET_PT_BOTH (pc->old_point + (pc->old_point >=3D pc->start_point modified src/process.c @@ -1284,8 +1284,7 @@ update_process_mark (struct Lisp_Process *p) if (BUFFERP (buffer) && XMARKER (p->mark)->buffer !=3D XBUFFER (buffer)) set_marker_both (p->mark, buffer, - BUF_ZV (XBUFFER (buffer)), - BUF_ZV_BYTE (XBUFFER (buffer))); + BUF_ZV (XBUFFER (buffer))); } =20 DEFUN ("set-process-buffer", Fset_process_buffer, Sset_process_buffer, @@ -6339,9 +6338,9 @@ read_process_output_after_insert (struct Lisp_Process= *p, Lisp_Object *old_read_ W3 is known to do that. */ if (BUFFERP (p->buffer) && (b =3D XBUFFER (p->buffer), b !=3D current_buffer)) - set_marker_both (p->mark, p->buffer, BUF_PT (b), BUF_PT_BYTE (b)); + set_marker_both (p->mark, p->buffer, BUF_PT (b)); else - set_marker_both (p->mark, p->buffer, PT, PT_BYTE); + set_marker_both (p->mark, p->buffer, PT); =20 update_mode_lines =3D 23; =20 @@ -7942,7 +7941,7 @@ DEFUN ("internal-default-process-sentinel", Finternal= _default_process_sentinel, insert_string (" "); Finsert (1, &msg); bset_read_only (current_buffer, tem); - set_marker_both (p->mark, p->buffer, PT, PT_BYTE); + set_marker_both (p->mark, p->buffer, PT); =20 if (opoint >=3D before) SET_PT_BOTH (opoint + (PT - before), new file src/text-index.c @@ -0,0 +1,722 @@ +/* Text index for character positions. + Copyright (C) 2025 Free Software Foundation, Inc. + +This file is part of GNU Emacs. + +GNU Emacs is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or (at +your option) any later version. + +GNU Emacs is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Emacs. If not, see . */ + +/* A text index is used to map character positions in buffer text to + byte positions and vice versa. (One could also think of using it for + other things like line numbers, but that is currently not done.) + + The index divides buffer text into intervals of constant size =3D + number of bytes. + + BEG_BYTE Z_BYTE + |-------------------------------------------------| + | interval | interval | interval | interval | | + 0 1 2 3 4 - + index in character position array + + The index consists of an array of character positions. The entry at + index ENTRY is the character position of the character containing the + byte position ENTRY * interval size. There is no entry for a partial + interval at the end of the text. The position (Z, Z_BYTE) is instead + handled specially in the code. + + Note that the byte positions at interval boundaries can be in the + middle of a multi-byte character. + + character start byte position + | + ------01234-------- bytes of a character (up to 5 in Emacs' + | internal encoding) + N * interval + + To find the character position corresponding to a byte position + BYTEPOS, we look up the character position in the index at BYTEPOS / + interval. From there, we can scan forward in the text until we reach + BYTEPOS, counting characters, or we scan backward from the interval + end, if that is closer. + + To find the byte position BYTEPOS corresponding to a given character + position CHARPOS, we search the index for the last entry ENTRY whose + character position is <=3D CHARPOS. That entry corresponds to a byte + position ENTRY * interval size. From there, we scan the text until + we reach BYTEPOS, counting characters until we reach CHARPOS. The + byte position reached at the end is BYTEPOS. We also scan backward + from the interval end, if that looks advantageous. + + Why divide the text into intervals of bytes instead of characters? + Dividing the text into intervals of characters makes scanning + overhead less uniform, since characters can be of different lengths + (1 to 5 bytes). */ + +#include +#include "lisp.h" +#include "buffer.h" +#include "dispextern.h" /* for struct text_pos */ +#include "text-index.h" + +// clang-format off + +struct text_index +{ + /* Value at index IDX is the character position of byte position IDX * + INTERVAL. Note that that byte position may be in the middle of a + character. The value at index 0 is BEG. */ + ptrdiff_t *charpos; + + /* Number of valid entries in the above array. This is always at least 1 + because the first entry is for BEG. */ + size_t nentries; + + /* Number of entries allocated. */ + size_t capacity; + + /* Known position cache. This is the last position conversion result. */ + struct text_pos cache; +}; + +enum +{ + /* Number of bytes in an interval. */ + TEXT_INDEX_INTERVAL =3D 4096, + + /* Default capacity in number of intervals for text indices. */ + TEXT_INDEX_DEFAULT_CAPACITY =3D 20, + + /* Value indicating a non-position. */ + TEXT_INDEX_INVALID_POSITION =3D -1 +}; + +/* Cache (CHARPOS, BYTEPOS) as known position in index TI. */ + +static void +cache (struct text_index *ti, ptrdiff_t charpos, ptrdiff_t bytepos) +{ + ti->cache.charpos =3D charpos; + ti->cache.bytepos =3D bytepos; +} + +/* Invalidate known position cache of TI. */ + +static void +invalidate_cache (struct text_index *ti) +{ + ti->cache.charpos =3D TEXT_INDEX_INVALID_POSITION; + ti->cache.bytepos =3D TEXT_INDEX_INVALID_POSITION; +} + +/* Value is true is known position cache of TI is valid. */ + +static bool +is_cache_valid (const struct text_index *ti) +{ + return ti->cache.bytepos !=3D TEXT_INDEX_INVALID_POSITION; +} + +/* Return the byte position in index TI corresponding to index entry + ENTRY. Note that this position cab be in the middle of a multi-byte + character. */ + +static ptrdiff_t +index_bytepos (const struct text_index *ti, ptrdiff_t entry) +{ + return BEG_BYTE + entry * TEXT_INDEX_INTERVAL; +} + +/* Return the character position in index TI corresponding index entry + ENTRY. */ + +static ptrdiff_t +index_charpos (const struct text_index *ti, ptrdiff_t entry) +{ + eassert (entry >=3D 0 && entry < ti->nentries); + return ti->charpos[entry]; +} + +/* Return the index entry for BYTEPOS in index TI. */ + +static ptrdiff_t +index_bytepos_entry (const struct text_index *ti, ptrdiff_t bytepos) +{ + return (bytepos - BEG_BYTE) / TEXT_INDEX_INTERVAL; +} + +/* Return the entry of index TI for the largest character position <=3D + CHARPOS. */ + +static ptrdiff_t +index_charpos_entry (const struct text_index *ti, ptrdiff_t charpos) +{ + ptrdiff_t entry =3D -1; + for (ptrdiff_t low =3D 0, high =3D ti->nentries - 1; low <=3D high;) + { + ptrdiff_t mid =3D low + (high - low) / 2; + if (ti->charpos[mid] <=3D charpos) + { + entry =3D mid; + low =3D mid + 1; + } + else + high =3D mid - 1; + } + eassert (entry >=3D 0 && entry < ti->nentries); + return entry; +} + +/* Return TI's index entry ENTRY as a struct text_pos. */ + +static struct text_pos +index_text_pos (const struct text_index *ti, ptrdiff_t entry) +{ + eassert (entry >=3D 0 && entry < ti->nentries); + return (struct text_pos) { + .charpos =3D index_charpos (ti, entry), + .bytepos =3D index_bytepos (ti, entry) + }; +} + +/* Return index TI's maximum indexed character position. */ + +static ptrdiff_t +max_indexed_charpos (const struct text_index *ti) +{ + return index_charpos (ti, ti->nentries - 1); +} + +/* Return index TI's maximum indexed byte position. */ + +static ptrdiff_t +max_indexed_bytepos (const struct text_index *ti) +{ + return index_bytepos (ti, ti->nentries - 1); +} + +/* Given a byte position BYTEPOS in buffer B, return the byte position + where the character starts that contains BYTEPOS: */ + +static ptrdiff_t +char_start_bytepos (struct buffer *b, ptrdiff_t bytepos) +{ + while (!CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))) + --bytepos; + return bytepos; +} + +/* Allocate and return a text index structure with enough room for a + text of length NBYTES bytes. */ + +static struct text_index * +make_text_index (size_t nbytes) +{ + struct text_index *ti =3D xzalloc (sizeof *ti); + ti->capacity =3D TEXT_INDEX_DEFAULT_CAPACITY; + ti->charpos =3D xnmalloc (ti->capacity, sizeof *ti->charpos); + ti->charpos[0] =3D BEG; + ti->nentries =3D 1; + invalidate_cache (ti); + return ti; +} + +/* Free the text index TI. TI may be NULL. */ + +void +text_index_free (struct text_index *ti) +{ + if (ti =3D=3D NULL) + return; + xfree (ti->charpos); + xfree (ti); +} + +/* Append entry for CHARPOS to index TI. */ + +static void +append_entry (struct text_index *ti, ptrdiff_t charpos) +{ + if (ti->nentries =3D=3D ti->capacity) + { + eassert (ti->capacity > 0); + ti->capacity =3D 2 * ti->capacity; + ti->charpos =3D xnrealloc (ti->charpos, ti->capacity, + sizeof *ti->charpos); + } + ti->charpos[ti->nentries] =3D charpos; + ++ti->nentries; +} + +/* Build text index of buffer B up to and including position TO. + One of TO.charpos or TO.bytepos must be non-zero. */ + +static void +build_index (struct buffer *b, const struct text_pos to) +{ + struct text_index *ti =3D b->text->index; + + eassert (to.charpos !=3D TEXT_INDEX_INVALID_POSITION + || to.bytepos !=3D TEXT_INDEX_INVALID_POSITION); + eassert (to.charpos =3D=3D TEXT_INDEX_INVALID_POSITION + || to.bytepos =3D=3D TEXT_INDEX_INVALID_POSITION); + eassert (to.bytepos =3D=3D TEXT_INDEX_INVALID_POSITION + || (to.bytepos >=3D BEG_BYTE + && to.bytepos <=3D BUF_Z_BYTE (b))); + eassert (to.bytepos =3D=3D TEXT_INDEX_INVALID_POSITION + || to.bytepos > max_indexed_bytepos (ti)); + eassert (to.charpos =3D=3D TEXT_INDEX_INVALID_POSITION + || (to.charpos >=3D BEG && to.charpos <=3D BUF_Z (b))); + eassert (to.charpos =3D=3D TEXT_INDEX_INVALID_POSITION + || to.charpos > max_indexed_charpos (ti)); + + /* Start at the byte position of the last index entry. if TO_BYTEPOS + equals the byte position of that entry, this is okay, because the + character position at that byte position cannot have changed. */ + const ptrdiff_t last_entry =3D ti->nentries - 1; + ptrdiff_t charpos =3D index_charpos (ti, last_entry); + ptrdiff_t bytepos =3D index_bytepos (ti, last_entry); + ptrdiff_t next_stop =3D bytepos + TEXT_INDEX_INTERVAL; + + /* Quickly give up if there are not enough bytes left to scan to make + a new index entry. */ + const ptrdiff_t z_byte =3D BUF_Z_BYTE (b); + if (next_stop >=3D z_byte) + return; + + /* Loop over bytes, starting one after the index entry we start from + because we are only interested in yet unknown entries, and the + one at EMTRY can be assumed to stay unchanged. */ + for (++bytepos; bytepos < z_byte; ++bytepos) + { + if (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))) + ++charpos; + + if (bytepos =3D=3D next_stop) + { + /* Add a new index entry. */ + append_entry (ti, charpos); + + /* If we reached the one after the position we are interested + in, we're done since we can then scan forward and backward + to BYTEPOS. */ + if ((to.bytepos !=3D TEXT_INDEX_INVALID_POSITION + && bytepos > to.bytepos) + || (to.charpos !=3D TEXT_INDEX_INVALID_POSITION + && charpos > to.charpos)) + break; + + /* Compute next stop. We are done if no next entry + can be built. */ + next_stop +=3D TEXT_INDEX_INTERVAL; + if (next_stop >=3D z_byte) + break; + } + } +} + +/* Make sure that buffer B has a text index. */ + +static struct text_index * +ensure_has_index (struct buffer *b) +{ + if (b->text->index =3D=3D NULL) + b->text->index =3D make_text_index (BUF_Z_BYTE (b)); + return b->text->index; +} + +/* Make sure that buffer B's text index contains BYTEPOS. */ + +static void +ensure_bytepos_indexed (struct buffer *b, ptrdiff_t bytepos) +{ + struct text_index *ti =3D ensure_has_index (b); + if (bytepos > max_indexed_bytepos (ti)) + { + struct text_pos to + =3D {.charpos =3D TEXT_INDEX_INVALID_POSITION, .bytepos =3D bytepos}; + build_index (b, to); + } +} + +/* Make sure that buffer B's text index contains CHARPOS. */ + +static void +ensure_charpos_indexed (struct buffer *b, ptrdiff_t charpos) +{ + struct text_index *ti =3D ensure_has_index (b); + if (charpos > max_indexed_charpos (ti)) + { + struct text_pos to + =3D {.charpos =3D charpos, .bytepos =3D TEXT_INDEX_INVALID_POSITION}; + build_index (b, to); + } +} + +/* In buffer B, starting from index entry ENTRY, scan forward in B's + text to TO_BYTEPOS, and return the corresponding character + position. */ + +static ptrdiff_t +charpos_forward_to_bytepos (struct buffer *b, const struct text_pos from, + const ptrdiff_t to_bytepos) +{ + eassert (from.bytepos <=3D to_bytepos); + ptrdiff_t bytepos =3D from.bytepos; + ptrdiff_t charpos =3D from.charpos; + while (bytepos < to_bytepos) + { + ++bytepos; + if (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))) + ++charpos; + } + return charpos; +} + +/* In buffer B, starting from FROM, scan backward in B's text to + TO_BYTEPOS, and return the corresponding character position. */ + +static ptrdiff_t +charpos_backward_to_bytepos (struct buffer *b, const struct text_pos from, + const ptrdiff_t to_bytepos) +{ + eassert (from.bytepos >=3D to_bytepos); + ptrdiff_t bytepos =3D char_start_bytepos (b, from.bytepos); + ptrdiff_t charpos =3D from.charpos; + while (bytepos > to_bytepos) + { + --bytepos; + if (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))) + --charpos; + } + return charpos; +} + +/* In buffer B, starting from FROM, scan forward in B's text to + TO_CHARPOS, and return the corresponding byte position. The byte + position is the one of the character start. FROM's charpos + must be < TO_CHARPOS. */ + +static ptrdiff_t +bytepos_forward_to_charpos (struct buffer *b, const struct text_pos from, + ptrdiff_t to_charpos) +{ + eassert (from.charpos < to_charpos); + ptrdiff_t bytepos =3D from.bytepos; + ptrdiff_t charpos =3D from.charpos; + while (charpos < to_charpos) + { + ++bytepos; + if (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))) + ++charpos; + } + eassert (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))); + return bytepos; +} + +/* In buffer B, starting from FROM, scan backward in B's text to + TO_CHARPOS, and return the corresponding byte position. The byte + position is the one of the character start. FROM's charpos must be + >=3D TO_CHARPOS. */ + +static ptrdiff_t +bytepos_backward_to_charpos (struct buffer *b, const struct text_pos from, + const ptrdiff_t to_charpos) +{ + eassert (from.charpos >=3D to_charpos); + ptrdiff_t bytepos =3D char_start_bytepos (b, from.bytepos); + ptrdiff_t charpos =3D from.charpos; + while (charpos > to_charpos) + { + --bytepos; + if (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))) + --charpos; + } + eassert (CHAR_HEAD_P (BUF_FETCH_BYTE (b, bytepos))); + return bytepos; +} + +/* Return the next known (char, byte) position in buffer B after the one + in index entry ENTRY. */ + +static struct text_pos +next_known_text_pos (struct buffer *b, ptrdiff_t entry) +{ + const struct text_index *ti =3D b->text->index; + if (entry + 1 < ti->nentries) + return index_text_pos (ti, entry + 1); + return (struct text_pos) { .charpos =3D BUF_Z (b), + .bytepos =3D BUF_Z_BYTE (b) }; +} + +/* Improve the known bytepos bounds *PREV and *NEXT if KNOWN is closer + to BYTEPOS. If KNOWN is an exact match for BYTEPOS return true. */ + +static bool +narrow_bytepos_bounds_1 (const struct text_pos known, struct text_pos *pre= v, + struct text_pos *next, const ptrdiff_t bytepos) +{ + eassert (bytepos >=3D prev->bytepos && bytepos <=3D next->bytepos); + eassert (known.bytepos !=3D TEXT_INDEX_INVALID_POSITION); + if (known.bytepos =3D=3D bytepos) + return true; + + /* If KNOWN is in (PREV, BYTEPOS] it is a better PREV. */ + if (known.bytepos < bytepos + && known.bytepos > prev->bytepos) + *prev =3D known; + + /* If KNOWN is in [BYTEPOS NEXT) it is a better NEXT. */ + if (known.bytepos > bytepos + && known.bytepos < next->bytepos) + *next =3D known; + + return false; +} + +/* Improve the known bytepos bounds *PREV and *NEXT of buffer B using + known positions in B. BYTEPOS is a byte position to convert to a + character position. If an exact match for BYTEPOS is found, return + its charpos, otherwise return TEXT_INDEX_INVALID_POSITION. */ + +static ptrdiff_t +narrow_bytepos_bounds (struct buffer *b, struct text_pos *prev, + struct text_pos *next, const ptrdiff_t bytepos) +{ + const struct text_pos pt + =3D {.charpos =3D BUF_PT (b), .bytepos =3D BUF_PT_BYTE (b)}; + if (narrow_bytepos_bounds_1 (pt, prev, next, bytepos)) + return pt.charpos; + + const struct text_pos gpt + =3D {.charpos =3D BUF_GPT (b), .bytepos =3D BUF_GPT_BYTE (b)}; + if (narrow_bytepos_bounds_1 (gpt, prev, next, bytepos)) + return gpt.charpos; + + struct text_index *ti =3D b->text->index; + if (is_cache_valid (ti) + && narrow_bytepos_bounds_1 (ti->cache, prev, next, bytepos)) + return ti->cache.charpos; + + return TEXT_INDEX_INVALID_POSITION; +} + +/* Improve the known bytepos bounds *PREV and *NEXT if KNOWN is closer + to BYTEPOS. If KNOWN is an exact match for BYTEPOS return true. */ + +static bool +narrow_charpos_bounds_1 (const struct text_pos known, struct text_pos *pre= v, + struct text_pos *next, const ptrdiff_t charpos) +{ + eassert (charpos >=3D prev->charpos && charpos <=3D next->charpos); + eassert (known.charpos !=3D TEXT_INDEX_INVALID_POSITION); + if (known.charpos =3D=3D charpos) + return true; + + /* If KNOWN is in (PREV, BYTEPOS] it is a better PREV. */ + if (known.charpos < charpos + && known.charpos > prev->charpos) + *prev =3D known; + + /* If KNOWN is in [BYTEPOS NEXT) it is a better NEXT. */ + if (known.charpos > charpos + && known.charpos < next->charpos) + *next =3D known; + + return false; +} + +/* Improve the known bytepos bounds *PREV and *NEXT of buffer B using + known positions in B. BYTEPOS is a byte position to convert to a + character position. If an exact match for BYTEPOS is found, return + its charpos, otherwise return TEXT_INDEX_INVALID_POSITION. */ + +static ptrdiff_t +narrow_charpos_bounds (struct buffer *b, struct text_pos *prev, + struct text_pos *next, const ptrdiff_t charpos) +{ + const struct text_pos pt + =3D {.charpos =3D BUF_PT (b), .bytepos =3D BUF_PT_BYTE (b)}; + if (narrow_charpos_bounds_1 (pt, prev, next, charpos)) + return pt.bytepos; + + const struct text_pos gpt + =3D {.charpos =3D BUF_GPT (b), .bytepos =3D BUF_GPT_BYTE (b)}; + if (narrow_charpos_bounds_1 (gpt, prev, next, charpos)) + return gpt.bytepos; + + struct text_index *ti =3D b->text->index; + if (is_cache_valid (ti) + && narrow_charpos_bounds_1 (ti->cache, prev, next, charpos)) + return ti->cache.bytepos; + + return TEXT_INDEX_INVALID_POSITION; +} + +/* Return the character position in buffer B corresponding to + byte position BYTEPOS. */ + +ptrdiff_t +text_index_bytepos_to_charpos (struct buffer *b, const ptrdiff_t bytepos) +{ + /* If this buffer has as many characters as bytes, each character must + be one byte. This takes care of the case where + enable-multibyte-characters is nil. */ + if (BUF_Z (b) =3D=3D BUF_Z_BYTE (b)) + return bytepos; + + /* BYTEPOS =3D=3D Z_BYTE, and BYTEPOS is an interval boundary, + then BYTEPOS does not have an index entry because we don't want + extra entries for (Z, Z_BYTE). Changing that would be possible + but leads to more code than this if-statement, so it's probably + not worth it. */ + if (bytepos =3D=3D BUF_Z_BYTE (b)) + return BUF_Z (b); + ensure_bytepos_indexed (b, bytepos); + + struct text_index *ti =3D b->text->index; + const ptrdiff_t entry =3D index_bytepos_entry (ti, bytepos); + struct text_pos prev =3D index_text_pos (ti, entry); + struct text_pos next =3D next_known_text_pos (b, entry); + + ptrdiff_t charpos =3D narrow_bytepos_bounds (b, &prev, &next, bytepos); + if (charpos !=3D TEXT_INDEX_INVALID_POSITION) + return charpos; + + /* Scan forward if the distance to the previous known position is + smaller than the distance to the next known position. */ + if (bytepos - prev.bytepos < next.bytepos - bytepos) + charpos =3D charpos_forward_to_bytepos (b, prev, bytepos); + else + charpos =3D charpos_backward_to_bytepos (b, next, bytepos); + + cache (ti, charpos, bytepos); + return charpos; +} + +/* Return the byte position in buffer B corresponding to character + position CHARPOS. */ + +ptrdiff_t +text_index_charpos_to_bytepos (struct buffer *b, const ptrdiff_t charpos) +{ + /* If this buffer has as many characters as bytes, each character must + be one byte. This takes care of the case where + enable-multibyte-characters is nil. */ + if (BUF_Z (b) =3D=3D BUF_Z_BYTE (b)) + return charpos; + + if (charpos =3D=3D BUF_Z (b)) + return BUF_Z_BYTE (b); + ensure_charpos_indexed (b, charpos); + + struct text_index *ti =3D b->text->index; + const ptrdiff_t entry =3D index_charpos_entry (ti, charpos); + struct text_pos prev =3D index_text_pos (ti, entry); + struct text_pos next =3D next_known_text_pos (b, entry); + + ptrdiff_t bytepos =3D narrow_charpos_bounds (b, &prev, &next, charpos); + if (bytepos !=3D TEXT_INDEX_INVALID_POSITION) + return bytepos; + + /* Don't scan forward if CHARPOS is exactly on the previous know + position because the index bytepos can be in the middle of a + character, which is found by scanning backwards. Otherwise, scan + forward if we believe that's less expensive. */ + if (charpos > prev.charpos + && charpos - prev.charpos < next.charpos - charpos) + bytepos =3D bytepos_forward_to_charpos (b, prev, charpos); + else + bytepos =3D bytepos_backward_to_charpos (b, next, charpos); + + cache (ti, charpos, bytepos); + return bytepos; +} + +/* Invalidate index entries for all positions > BYTEPOS in buffer B. + Note that the entry for BYTEPOS itself, if it is at an interval + boundary, remains unchanged. */ + +void +text_index_invalidate (struct buffer *b, ptrdiff_t bytepos) +{ + struct text_index *ti =3D b->text->index; + if (ti =3D=3D NULL) + return; + + const ptrdiff_t last_valid_entry =3D index_bytepos_entry (ti, bytepos); + ti->nentries =3D min (ti->nentries, last_valid_entry + 1); + + if (ti->cache.bytepos > bytepos) + invalidate_cache (ti); +} + +DEFUN ("text-index--charpos-to-bytepos", Ftext_index__charpos_to_bytepos, + Stext_index__charpos_to_bytepos, 1, 1, 0, + doc: /* Convert CHARPOS to a bytepos in current buffer. +If POSITION is out of range, the value is nil. */) + (Lisp_Object charpos) +{ + const EMACS_INT pos =3D fix_position (charpos); + if (pos < BEG || pos > Z) + return Qnil; + ptrdiff_t bytepos =3D text_index_charpos_to_bytepos (current_buffer, pos= ); + return make_fixnum (bytepos); +} + +DEFUN ("text-index--bytepos-to-charpos", Ftext_index__bytepos_to_charpos, + Stext_index__bytepos_to_charpos, 1, 1, 0, + doc: /* Convert BYTEPOS to a charpos in current buffer. +If BYTEPOS is out of range, the value is nil. */) + (Lisp_Object bytepos) +{ + CHECK_FIXNUM (bytepos); + const ptrdiff_t pos_byte =3D XFIXNUM (bytepos); + if (pos_byte < BEG_BYTE || pos_byte > Z_BYTE) + return Qnil; + ptrdiff_t charpos =3D text_index_bytepos_to_charpos (current_buffer, pos= _byte); + return make_fixnum (charpos); +} + +DEFUN ("text-index--charpos-to-bytepos-brute", + Ftext_index__charpos_to_bytepos_brute, + Stext_index__charpos_to_bytepos_brute, 1, 1, 0, + doc: /* Convert CHARPOS to a bytepos in current buffer. +Compute with brute force. */) + (Lisp_Object pos) +{ + const EMACS_INT to_charpos =3D fix_position (pos); + if (to_charpos < BEG || to_charpos > Z) + return Qnil; + ptrdiff_t charpos =3D BEG, bytepos =3D BEG_BYTE; + while (charpos < to_charpos) + { + ++bytepos; + if (CHAR_HEAD_P (FETCH_BYTE (bytepos))) + ++charpos; + } + return make_fixnum (bytepos); +} + +void +syms_of_text_index (void) +{ + defsubr (&Stext_index__charpos_to_bytepos); + defsubr (&Stext_index__bytepos_to_charpos); + defsubr (&Stext_index__charpos_to_bytepos_brute); +} + +void +init_text_index (void) +{ +} new file src/text-index.h @@ -0,0 +1,34 @@ +/* Text index for character positions. + Copyright (C) 2025 Free Software Foundation, Inc. + +This file is part of GNU Emacs. + +GNU Emacs is free software: you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation, either version 3 of the License, or (at +your option) any later version. + +GNU Emacs is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GNU Emacs. If not, see . */ + +#ifndef EMACS_TEXT_INDEX_H +# define EMACS_TEXT_INDEX_H + +#include "config.h" +# include "lisp.h" + +struct text_index; + +void init_text_index (void); +void syms_of_text_index (void); +void text_index_free (struct text_index *ti); +ptrdiff_t text_index_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytep= os); +ptrdiff_t text_index_charpos_to_bytepos (struct buffer *b, ptrdiff_t charp= os); +void text_index_invalidate (struct buffer *b, ptrdiff_t from_byte); + +#endif /* EMACS_TEXT_INDEX_H */ modified src/undo.c @@ -128,9 +128,9 @@ record_marker_adjustments (ptrdiff_t from, ptrdiff_t to) { prepare_record (); =20 - for (struct Lisp_Marker *m =3D BUF_MARKERS (current_buffer); m; m =3D m-= >next) + DO_MARKERS (current_buffer, m) { - ptrdiff_t charpos =3D m->charpos; + ptrdiff_t charpos =3D marker_vector_charpos (m); eassert (charpos <=3D Z); =20 if (from <=3D charpos && charpos <=3D to) @@ -154,6 +154,7 @@ record_marker_adjustments (ptrdiff_t from, ptrdiff_t to) } } } + END_DO_MARKERS; } =20 /* Record that a deletion is about to take place, of the characters in modified src/window.c @@ -599,8 +599,7 @@ select_window_1 (Lisp_Object window, bool inhibit_point= _swap) struct window *ow =3D XWINDOW (selected_window); if (BUFFERP (ow->contents)) set_marker_both (ow->pointm, ow->contents, - BUF_PT (XBUFFER (ow->contents)), - BUF_PT_BYTE (XBUFFER (ow->contents))); + BUF_PT (XBUFFER (ow->contents))); } =20 selected_window =3D window; @@ -1781,7 +1780,7 @@ window_point (struct window *w) { return (w =3D=3D XWINDOW (selected_window) ? BUF_PT (XBUFFER (w->contents)) - : XMARKER (w->pointm)->charpos); + : marker_vector_charpos (XMARKER (w->pointm))); } =20 DEFUN ("window-point", Fwindow_point, Swindow_point, 0, 1, 0, @@ -3689,7 +3688,7 @@ DEFUN ("delete-other-windows-internal", Fdelete_other= _windows_internal, can have unwanted side effects due to text properties. */ pos =3D *vmotion (startpos, startbyte, -top, w); =20 - set_marker_both (w->start, w->contents, pos.bufpos, pos.bytepos); + set_marker_both (w->start, w->contents, pos.bufpos); w->window_end_valid =3D false; w->start_at_line_beg =3D (pos.bytepos =3D=3D BEGV_BYTE || FETCH_BYTE (pos.bytepos - 1) =3D=3D '\n'); @@ -4345,8 +4344,8 @@ set_window_buffer (Lisp_Object window, Lisp_Object bu= ffer, w->hscroll =3D w->min_hscroll =3D w->hscroll_whole =3D 0; w->suspend_auto_hscroll =3D false; w->vscroll =3D 0; - set_marker_both (w->pointm, buffer, BUF_PT (b), BUF_PT_BYTE (b)); - set_marker_both (w->old_pointm, buffer, BUF_PT (b), BUF_PT_BYTE (b)); + set_marker_both (w->pointm, buffer, BUF_PT (b)); + set_marker_both (w->old_pointm, buffer, BUF_PT (b)); set_marker_restricted (w->start, make_fixnum (b->last_window_start), buffer); @@ -4532,9 +4531,9 @@ temp_output_buffer_show (register Lisp_Object buf) w =3D XWINDOW (window); w->hscroll =3D w->min_hscroll =3D w->hscroll_whole =3D 0; w->suspend_auto_hscroll =3D false; - set_marker_restricted_both (w->start, buf, BEG, BEG); - set_marker_restricted_both (w->pointm, buf, BEG, BEG); - set_marker_restricted_both (w->old_pointm, buf, BEG, BEG); + set_marker_restricted_both (w->start, buf, BEG); + set_marker_restricted_both (w->pointm, buf, BEG); + set_marker_restricted_both (w->old_pointm, buf, BEG); =20 /* Run temp-buffer-show-hook, with the chosen window selected and its buffer current. */ @@ -5398,11 +5397,11 @@ DEFUN ("split-window-internal", Fsplit_window_inter= nal, Ssplit_window_internal, /* Get dead window back its old buffer and markers. */ wset_buffer (n, n->old_buffer); set_marker_restricted - (n->start, make_fixnum (XMARKER (n->start)->charpos), n->contents); + (n->start, make_fixnum (marker_vector_charpos (XMARKER (n->start))), n->c= ontents); set_marker_restricted - (n->pointm, make_fixnum (XMARKER (n->pointm)->charpos), n->contents); + (n->pointm, make_fixnum (marker_vector_charpos (XMARKER (n->pointm))), n-= >contents); set_marker_restricted - (n->old_pointm, make_fixnum (XMARKER (n->old_pointm)->charpos), + (n->old_pointm, make_fixnum (marker_vector_charpos (XMARKER (n->old_point= m))), n->contents); =20 Vwindow_list =3D Qnil; @@ -6017,7 +6016,7 @@ window_scroll_for_long_lines (struct window *w, int n= , bool noerror) if (pos.bufpos < ZV) { set_marker_restricted_both (w->start, w->contents, - pos.bufpos, pos.bytepos); + pos.bufpos); w->start_at_line_beg =3D bolp; wset_update_mode_line (w); /* Set force_start so that redisplay_window will run @@ -6339,8 +6338,7 @@ window_scroll_pixel_based (Lisp_Object window, int n,= bool whole, bool noerror) } =20 /* Set the window start, and set up the window for redisplay. */ - set_marker_restricted_both (w->start, w->contents, IT_CHARPOS (it), - IT_BYTEPOS (it)); + set_marker_restricted_both (w->start, w->contents, IT_CHARPOS (it)); bytepos =3D marker_byte_position (w->start); w->start_at_line_beg =3D (pos =3D=3D BEGV || FETCH_BYTE (bytepos - 1= ) =3D=3D '\n'); wset_update_mode_line (w); @@ -6602,7 +6600,7 @@ window_scroll_line_based (Lisp_Object window, int n, = bool whole, bool noerror) { int this_scroll_margin =3D window_scroll_margin (w, MARGIN_IN_LINES); =20 - set_marker_restricted_both (w->start, w->contents, pos, pos_byte); + set_marker_restricted_both (w->start, w->contents, pos); w->start_at_line_beg =3D !NILP (bolp); wset_update_mode_line (w); /* Set force_start so that redisplay_window will run @@ -6755,8 +6753,8 @@ scroll_command (Lisp_Object window, Lisp_Object n, in= t direction) =20 if (other_window) { - set_marker_both (w->pointm, Qnil, PT, PT_BYTE); - set_marker_both (w->old_pointm, Qnil, PT, PT_BYTE); + set_marker_both (w->pointm, Qnil, PT); + set_marker_both (w->old_pointm, Qnil, PT); } =20 unbind_to (count, Qnil); @@ -7176,7 +7174,7 @@ DEFUN ("recenter", Frecenter, Srecenter, 0, 2, "P\np", } =20 /* Set the new window start. */ - set_marker_both (w->start, w->contents, charpos, bytepos); + set_marker_both (w->start, w->contents, charpos); =20 /* The window start was calculated with an iterator already adjusted by the existing vscroll, so w->start must not be combined with @@ -7269,7 +7267,7 @@ DEFUN ("move-to-window-line", Fmove_to_window_line, S= move_to_window_line, { int height =3D window_internal_height (w); Fvertical_motion (make_fixnum (- (height / 2)), window, Qnil); - set_marker_both (w->start, w->contents, PT, PT_BYTE); + set_marker_both (w->start, w->contents, PT); w->start_at_line_beg =3D !NILP (Fbolp ()); w->force_start =3D true; =20 @@ -7522,8 +7520,7 @@ DEFUN ("set-window-configuration", Fset_window_config= uration, w =3D XWINDOW (selected_window); set_marker_both (w->pointm, w->contents, - BUF_PT (XBUFFER (w->contents)), - BUF_PT_BYTE (XBUFFER (w->contents))); + BUF_PT (XBUFFER (w->contents))); } =20 fset_redisplay (f); @@ -7660,17 +7657,15 @@ DEFUN ("set-window-configuration", Fset_window_conf= iguration, { /* Set window markers at start of visible range. */ if (XMARKER (w->start)->buffer =3D=3D 0) - set_marker_restricted_both (w->start, w->contents, 0, 0); + set_marker_restricted_both (w->start, w->contents, 0); if (XMARKER (w->pointm)->buffer =3D=3D 0) set_marker_restricted_both (w->pointm, w->contents, - BUF_PT (XBUFFER (w->contents)), - BUF_PT_BYTE (XBUFFER (w->contents))); + BUF_PT (XBUFFER (w->contents))); if (XMARKER (w->old_pointm)->buffer =3D=3D 0) set_marker_restricted_both (w->old_pointm, w->contents, - BUF_PT (XBUFFER (w->contents)), - BUF_PT_BYTE (XBUFFER (w->contents))); + BUF_PT (XBUFFER (w->contents))); w->start_at_line_beg =3D true; if (FUNCTIONP (window_restore_killed_buffer_windows) && !MINI_WINDOW_P (w)) @@ -7691,9 +7686,9 @@ DEFUN ("set-window-configuration", Fset_window_config= uration, window_discard_buffer_from_window (w->contents, window, false); /* This will set the markers to beginning of visible range. */ - set_marker_restricted_both (w->start, w->contents, 0, 0); - set_marker_restricted_both (w->pointm, w->contents, 0, 0); - set_marker_restricted_both (w->old_pointm, w->contents, 0, 0); + set_marker_restricted_both (w->start, w->contents, 0); + set_marker_restricted_both (w->pointm, w->contents, 0); + set_marker_restricted_both (w->old_pointm, w->contents, 0); w->start_at_line_beg =3D true; if (!MINI_WINDOW_P (w)) { @@ -8052,8 +8047,7 @@ save_window_save (Lisp_Object window, struct Lisp_Vec= tor *vector, ptrdiff_t i) the buffer; pointm is garbage in the selected window. */ if (EQ (window, selected_window)) p->pointm =3D build_marker (XBUFFER (w->contents), - BUF_PT (XBUFFER (w->contents)), - BUF_PT_BYTE (XBUFFER (w->contents))); + BUF_PT (XBUFFER (w->contents))); else p->pointm =3D Fcopy_marker (w->pointm, Qnil); p->old_pointm =3D Fcopy_marker (w->old_pointm, Qnil); modified src/xdisp.c @@ -12004,8 +12004,8 @@ DEFUN ("buffer-text-pixel-size", Fbuffer_text_pixel= _size, Sbuffer_text_pixel_siz if (!EQ (buffer, w->contents)) { wset_buffer (w, buffer); - set_marker_both (w->pointm, buffer, BEG, BEG_BYTE); - set_marker_both (w->old_pointm, buffer, BEG, BEG_BYTE); + set_marker_both (w->pointm, buffer, BEG); + set_marker_both (w->old_pointm, buffer, BEG); } =20 value =3D window_text_pixel_size (window, Qnil, Qnil, x_limit, y_limit, = Qnil, @@ -12172,11 +12172,11 @@ message_dolog (const char *m, ptrdiff_t nbytes, b= ool nlflag, bool multibyte) bset_cache_long_scans (current_buffer, Qnil); =20 oldpoint =3D message_dolog_marker1; - set_marker_restricted_both (oldpoint, Qnil, PT, PT_BYTE); + set_marker_restricted_both (oldpoint, Qnil, PT); oldbegv =3D message_dolog_marker2; - set_marker_restricted_both (oldbegv, Qnil, BEGV, BEGV_BYTE); + set_marker_restricted_both (oldbegv, Qnil, BEGV); oldzv =3D message_dolog_marker3; - set_marker_restricted_both (oldzv, Qnil, ZV, ZV_BYTE); + set_marker_restricted_both (oldzv, Qnil, ZV); =20 if (PT =3D=3D Z) point_at_end =3D 1; @@ -12750,8 +12750,8 @@ with_echo_area_buffer (struct window *w, int which, if (w) { wset_buffer (w, buffer); - set_marker_both (w->pointm, buffer, BEG, BEG_BYTE); - set_marker_both (w->old_pointm, buffer, BEG, BEG_BYTE); + set_marker_both (w->pointm, buffer, BEG); + set_marker_both (w->old_pointm, buffer, BEG); } =20 bset_undo_list (current_buffer, Qt); @@ -12839,14 +12839,11 @@ unwind_with_echo_area_buffer (Lisp_Object vector) =20 wset_buffer (w, buffer); set_marker_restricted_both (w->pointm, buffer, - XFIXNAT (AREF (vector, 5)), - XFIXNAT (AREF (vector, 6))); + XFIXNAT (AREF (vector, 5))); set_marker_restricted_both (w->old_pointm, buffer, - XFIXNAT (AREF (vector, 7)), - XFIXNAT (AREF (vector, 8))); + XFIXNAT (AREF (vector, 7))); set_marker_restricted_both (w->start, buffer, - XFIXNAT (AREF (vector, 9)), - XFIXNAT (AREF (vector, 10))); + XFIXNAT (AREF (vector, 9))); } =20 Vwith_echo_area_save_vector =3D vector; @@ -13077,8 +13074,7 @@ resize_mini_window (struct window *w, bool exact_p) /* By default, start display at the beginning. */ if (redisplay_adhoc_scroll_in_resize_mini_windows) set_marker_both (w->start, w->contents, - BUF_BEGV (XBUFFER (w->contents)), - BUF_BEGV_BYTE (XBUFFER (w->contents))); + BUF_BEGV (XBUFFER (w->contents))); =20 /* Nil means don't try to resize. */ if ((NILP (Vresize_mini_windows) @@ -13733,7 +13729,7 @@ format_mode_line_unwind_data (struct frame *target_= frame, unwinding (Bug#32777). */ ASET (vector, 10, buffer); current_buffer =3D b; - ASET (vector, 11, build_marker (current_buffer, PT, PT_BYTE)); + ASET (vector, 11, build_marker (current_buffer, PT)); current_buffer =3D cb; } =20 @@ -20350,13 +20346,13 @@ redisplay_window (Lisp_Object window, bool just_t= his_one_p) { new_pt =3D BEGV; new_pt_byte =3D BEGV_BYTE; - set_marker_both (w->pointm, Qnil, BEGV, BEGV_BYTE); + set_marker_both (w->pointm, Qnil, BEGV); } else if (new_pt > (ZV - 1)) { new_pt =3D ZV; new_pt_byte =3D ZV_BYTE; - set_marker_both (w->pointm, Qnil, ZV, ZV_BYTE); + set_marker_both (w->pointm, Qnil, ZV); } =20 /* We don't use SET_PT so that the point-motion hooks don't run. */ @@ -20616,7 +20612,7 @@ redisplay_window (Lisp_Object window, bool just_thi= s_one_p) MATRIX_ROW_START_BYTEPOS (row)); =20 if (w !=3D XWINDOW (selected_window)) - set_marker_both (w->pointm, Qnil, PT, PT_BYTE); + set_marker_both (w->pointm, Qnil, PT); else if (current_buffer =3D=3D old) SET_TEXT_POS (lpoint, PT, PT_BYTE); =20 @@ -20985,7 +20981,7 @@ redisplay_window (Lisp_Object window, bool just_thi= s_one_p) /* Set the window start position here explicitly, to avoid an infinite loop in case the functions in window-scroll-functions get errors. */ - set_marker_both (w->start, Qnil, IT_CHARPOS (it), IT_BYTEPOS (it)); + set_marker_both (w->start, Qnil, IT_CHARPOS (it)); =20 /* Run scroll hooks. */ startp =3D run_window_scroll_functions (window, it.current.pos); @@ -21407,7 +21403,7 @@ try_window (Lisp_Object window, struct text_pos pos= , int flags) int cursor_vpos =3D w->cursor.vpos; =20 /* Make POS the new window start. */ - set_marker_both (w->start, Qnil, CHARPOS (pos), BYTEPOS (pos)); + set_marker_both (w->start, Qnil, CHARPOS (pos)); =20 /* Mark cursor position as unknown. No overlay arrow seen. */ w->cursor.vpos =3D -1; new file test/src/text-index-tests.el @@ -0,0 +1,95 @@ +;;; text-index-tests.el --- tests for src/text-index.c -*- lexical-bindin= g:t -*- + +;; Copyright (C) 2025 Free Software Foundation, Inc. + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see . + +;;; Commentary: + +;;; Code: + +(require 'ert) +(require 'ert-x) +(require 'cl-lib) + +;; Note that the tests can be quite expensive on large files because of +;; the calls to 'text-index--charpos-to-bytepos-brute' which always +;; scans the text from the beginning. + +(defvar text-index-small-test-files + (list (expand-file-name "HELLO" data-directory) + (expand-file-name "src/buffer.c" source-directory))) + +(defvar text-index-big-test-files + (list (expand-file-name "src/buffer.c" source-directory))) + +(defvar text-index-all-test-files + (append text-index-small-test-files + text-index-big-test-files)) + +(cl-defun test-check-charpos (charpos) + (let* ((real-bytepos (text-index--charpos-to-bytepos-brute charpos)) + (index-bytepos (text-index--charpos-to-bytepos charpos)) + (index-charpos (and index-bytepos + (text-index--bytepos-to-charpos real-bytepos)= ))) + (cond ((not (eq real-bytepos index-bytepos)) + (message "Different bytepos at charpos %d (real %S index %S)" + charpos real-bytepos index-bytepos) + nil) + ((and index-charpos + (not (eq index-charpos charpos))) + (message "Different charpos at bytepos %S (charpos %S index %S)" + real-bytepos charpos index-charpos) + nil) + (t t)))) + +(cl-defmacro text-index-with-buffer ((file) &rest body) + (declare (indent 1)) + `(ert-with-test-buffer () + (insert-file-contents ,file) + (progn ,@body))) + +(ert-deftest text-index-test-forward () + (cl-loop for file in text-index-small-test-files do + (text-index-with-buffer (file) + (cl-loop for charpos from (point-min) below (point-max) do + (should (test-check-charpos charpos)))))) + +(ert-deftest text-index-test-backward () + (cl-loop for file in text-index-small-test-files do + (text-index-with-buffer (file) + (cl-loop for charpos from (point-max) downto (point-min) do + (should (test-check-charpos charpos)))))) + +(ert-deftest text-index-test-random-charpos () + (cl-loop for file in text-index-all-test-files do + (text-index-with-buffer (file) + (cl-loop repeat 10000 do + (should (test-check-charpos (random (point-max))))))= )) + +(defvar text-index-test-strings + ["1" "=F0=9F=92=A1" "=F0=9F=92=A13" "=F0=9F=92=A1=F0=9F=92=A1" "=F0=9F= =92=A1=F0=9F=92=A15"]) + +(ert-deftest text-index-test-random-insert () + (cl-loop for file in text-index-all-test-files do + (text-index-with-buffer (file) + (cl-loop repeat 10000 + for charpos =3D (1+ (random (buffer-size))) + for string =3D (aref text-index-test-strings + (random (length text-index-test-s= trings))) + do (progn (goto-char charpos) + (insert string) + (test-check-charpos (random (point-max)))))))) [back] --=-=-=--