GNU bug report logs - #77325
Crash in Fjson_parse_buffer: ZV changes underneath it?

Previous Next

Package: emacs;

Reported by: Daniel Colascione <dancol <at> dancol.org>

Date: Fri, 28 Mar 2025 01:08:02 UTC

Severity: normal

To reply to this bug, email your comments to 77325 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 01:08:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Daniel Colascione <dancol <at> dancol.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 28 Mar 2025 01:08:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: bug-gnu-emacs <at> gnu.org
Subject: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Thu, 27 Mar 2025 21:07:02 -0400
Somehow, the buffer changes underneath json_parse.  We pass an
out-of-bounds position to SET_PT_BOTH (position, byte), which either
asserts or crashes.  Not sure how the buffer could have changed ---
maybe a handler-bind?  The JSON parser doesn't seem to do anything
except allocate and signal.

The buffer itself is plenty big enough --- it's just that the accessible
region has somehow shrunk to nothing.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: 0x000000010026d7a0 emacs`die(msg="charpos <= ZV && charpos >= BEGV", file="intervals.c", line=1873) at alloc.c:7450:12
    frame #1: 0x000000010037e3dc emacs`set_point_both(charpos=8, bytepos=8) at intervals.c:1873:3
    frame #2: 0x00000001003baf1c emacs`SET_PT_BOTH(position=8, byte=8) at buffer.h:182:3
  * frame #3: 0x00000001003bbadc emacs`Fjson_parse_buffer(nargs=6, args=(struct Lisp_Symbol *) $4 = 0x0000000270365e38) at json.c:1779:3
    frame #4: 0x00000001002bcd64 emacs`eval_sub(form=(struct Lisp_Cons
    *) $116 = 0x000000010831a440) at eval.c:2564:10


-> 1873	 eassert (charpos <= ZV && charpos >= BEGV);
   1874	
   1875	 have_overlays = buffer_has_overlays ();
   1876	
(lldb) print charpos
(ptrdiff_t) 8


(lldb) print *current_thread->m_current_buffer
(buffer) {
  header = (size = 4611686018645684300)
  name_ = 0x000000011ed08374 (struct Lisp_String *) $120 = 0x000000011ed08370
  last_name_ = 0x000000011ed08374 (struct Lisp_String *) $120 = 0x000000011ed08370
  filename_ = NULL
  directory_ = 0x0000000103708724 (struct Lisp_String *) $124 = 0x0000000103708720
  backed_up_ = NULL
  save_length_ = 0x0000000000000002 (EMACS_INT) $125 = 0
  auto_save_file_name_ = NULL
  read_only_ = NULL
  mark_ = 0x000000011e0113e5 (struct Lisp_Marker *) $128 = 0x000000011e0113e0
  local_var_alist_ = 0x000000012078c0b3 (struct Lisp_Cons *) $129 = 0x000000012078c0b0
  major_mode_ = 0x000000001f379d28 (struct Lisp_Symbol *) $131 = 0x000000011f8e8bc0
  local_minor_modes_ = 0x000000013042cff3 (struct Lisp_Cons *) $133 = 0x000000013042cff0
  mode_name_ = 0x000000013ee94fe4 (struct Lisp_String *) $134 = 0x000000013ee94fe0
  mode_line_format_ = 0x0000000110141db3 (struct Lisp_Cons *) $135 = 0x0000000110141db0
  header_line_format_ = NULL
  tab_line_format_ = NULL
  keymap_ = 0x00000001205abc33 (struct Lisp_Cons *) $136 = 0x00000001205abc30
  abbrev_table_ = 0x000000013f116a2d (struct Lisp_Obarray *) $139 = 0x000000013f116a28
  syntax_table_ = 0x000000013f116805 (struct Lisp_Vector *) $140 = 0x000000013f116800
  category_table_ = 0x000000010269d35d (struct Lisp_Vector *) $142 = 0x000000010269d358
  tab_width_ = 0x0000000000000022 (EMACS_INT) $143 = 8
  fill_column_ = 0x000000000000011a (EMACS_INT) $144 = 70
  left_margin_ = 0x0000000000000002 (EMACS_INT) $125 = 0
  auto_fill_function_ = NULL
  downcase_table_ = 0x000000010266a17d (struct Lisp_Vector *) $145 = 0x000000010266a178
  upcase_table_ = 0x00000001026590dd (struct Lisp_Vector *) $146 = 0x00000001026590d8
  case_canon_table_ = 0x000000010267b45d (struct Lisp_Vector *) $147 = 0x000000010267b458
  case_eqv_table_ = 0x000000010266a3bd (struct Lisp_Vector *) $148 = 0x000000010266a3b8
  truncate_lines_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  word_wrap_ = NULL
  ctl_arrow_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  bidi_display_reordering_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  bidi_paragraph_direction_ = NULL
  bidi_paragraph_separate_re_ = NULL
  bidi_paragraph_start_re_ = NULL
  selective_display_ = NULL
  selective_display_ellipses_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  overwrite_mode_ = NULL
  abbrev_mode_ = NULL
  display_table_ = NULL
  mark_active_ = NULL
  enable_multibyte_characters_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  buffer_file_coding_system_ = 0x00000000000127e0 (struct Lisp_Symbol *) $153 = 0x0000000100581678
  file_format_ = NULL
  auto_save_file_format_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  cache_long_scans_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  width_table_ = NULL
  pt_marker_ = NULL
  begv_marker_ = NULL
  zv_marker_ = NULL
  point_before_scroll_ = NULL
  file_truename_ = NULL
  invisibility_spec_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  last_selected_window_ = 0x0000000148490c15 (struct window *) $157 = 0x0000000148490c10
  display_count_ = 0x0000000000000006 (EMACS_INT) $159 = 1
  left_margin_cols_ = 0x0000000000000002 (EMACS_INT) $125 = 0
  right_margin_cols_ = 0x0000000000000002 (EMACS_INT) $125 = 0
  left_fringe_width_ = NULL
  right_fringe_width_ = NULL
  fringes_outside_margins_ = NULL
  scroll_bar_width_ = NULL
  scroll_bar_height_ = NULL
  vertical_scroll_bar_type_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  horizontal_scroll_bar_type_ = 0x0000000000000030 (struct Lisp_Symbol *) $150 = 0x000000010056eec8
  indicate_empty_lines_ = NULL
  indicate_buffer_boundaries_ = NULL
  fringe_indicator_alist_ = 0x0000000102656e8b (struct Lisp_Cons *) $160 = 0x0000000102656e88
  fringe_cursor_alist_ = 0x00000001026560db (struct Lisp_Cons *) $161 = 0x00000001026560d8
  display_time_ = 0x0000000110c15e73 (struct Lisp_Cons *) $162 = 0x0000000110c15e70
  scroll_up_aggressively_ = NULL
  scroll_down_aggressively_ = NULL
  cursor_type_ = NULL
  extra_line_spacing_ = NULL
  ts_parser_list_ = NULL
  text_conversion_style_ = NULL
  cursor_in_non_selected_windows_ = 0x0000000000009ba0 (struct Lisp_Symbol *) $164 = 0x0000000100578a38
  own_text = {
    beg = 0x0000000130088000 ""
    gpt = 1
    z = 74465
    gpt_byte = 1
    z_byte = 76476
    gap_size = 60247
    modiff = 15338
    chars_modiff = 15338
    save_modiff = 1
    overlay_modiff = 757
    compact = 1
    beg_unchanged = 0
    end_unchanged = 0
    unchanged_modified = 4374
    overlay_unchanged_modified = 755
    intervals = 0x000000011f38caa8
    markers = 0x000000011ffa4288
    inhibit_shrinking = false
    redisplay = true
  }
  text = 0x000000011e011268
  pt = 1
  pt_byte = 1
  begv = 1
  begv_byte = 1
  zv = 1
  zv_byte = 1
  base_buffer = NULL
  indirections = 0
  window_count = 1
  local_flags = "\0\0\0\0\0\0\0\0\U00000001\0\0\0\0\U00000001\0\U00000001\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\U00000001"
  modtime = (tv_sec = 0, tv_nsec = -2)
  modtime_size = -1
  auto_save_modified = 0
  display_error_modiff = 0
  auto_save_failure_time = 0
  last_window_start = 1
  newline_cache = 0x0000600000666300
  width_run_cache = NULL
  bidi_paragraph_cache = 0x000060000067afc0
  prevent_redisplay_optimizations_p = true
  clip_changed = true
  inhibit_buffer_hooks = false
  long_line_optimizations_p = false
  overlays = 0x00006000013c2a40
  undo_list_ = 0x0000000108985be3 (struct Lisp_Cons *) $166 =
  0x0000000108985be0


(lldb) print p.point_of_current_line 
(ptrdiff_t) 1
(lldb) print p.current_column 
(ptrdiff_t) 6

input_begin = 0x0000000130096b57 "\n     6 pass\n     620 skip\n [...]
input_current = 0x0000000130096b5e " pass\n     620 skip\n

The actual JSON we're parsing appears to be mangled somehow --- raw
newlines embedded in the output instead of being encapsulated inside
a string --- but that's a separate bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 02:44:02 GMT) Full text and rfc822 format available.

Message #8 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Fri, 28 Mar 2025 02:43:36 +0000
"Daniel Colascione" <dancol <at> dancol.org> writes:

> Somehow, the buffer changes underneath json_parse.  We pass an

Do we know that the buffer changed after we entered json-parse-buffer?
It looks to me like the buffer was narrowed to nothing before we called
json-parse-buffer, like this:

(with-temp-buffer
  (insert "3")
  (narrow-to-region (point-min) (point-min))
  (message "%S" (json-parse-buffer)))

json.c proceeds to read past ZV, all the way to Z, then hits the
assertion just as it did for you, so this code currently causes a crash.

Do you still see the crash if you change json-parse-buffer to honor
buffer narrowing, like this?

From 073c00135e6f0e213fc8671fc0a52a67ee5b56ce Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH] Respect narrowed buffers when parsing JSON (bug#77325)

* src/json.c (Fjson_parse_buffer): Only read to ZV, not all the way to
Z.
---
 src/json.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/json.c b/src/json.c
index f438d191bde..a0480718ca8 100644
--- a/src/json.c
+++ b/src/json.c
@@ -1757,12 +1757,16 @@ DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
   unsigned char *end = GPT_ADDR;
   unsigned char *secondary_begin = NULL;
   unsigned char *secondary_end = NULL;
-  if (GPT_ADDR < Z_ADDR)
+  if (GPT_ADDR < ZV_ADDR)
     {
       secondary_begin = GAP_END_ADDR;
       if (secondary_begin < PT_ADDR)
 	secondary_begin = PT_ADDR;
-      secondary_end = Z_ADDR;
+      secondary_end = ZV_ADDR;
+    }
+  else if (ZV_ADDR < GPT_ADDR)
+    {
+      end = ZV_ADDR;
     }
 
   json_parser_init (&p, conf, begin, end, secondary_begin,
-- 
2.48.1

> input_begin = 0x0000000130096b57 "\n     6 pass\n     620 skip\n [...]
> input_current = 0x0000000130096b5e " pass\n     620 skip\n
>
> The actual JSON we're parsing appears to be mangled somehow --- raw
> newlines embedded in the output instead of being encapsulated inside
> a string --- but that's a separate bug.

Certainly doesn't look like JSON, but maybe that's why it's outside the
accessible region?

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 11:06:02 GMT) Full text and rfc822 format available.

Message #11 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Fri, 28 Mar 2025 14:05:29 +0300
> From: Daniel Colascione <dancol <at> dancol.org>
> Date: Thu, 27 Mar 2025 21:07:02 -0400
> 
> 
> Somehow, the buffer changes underneath json_parse.  We pass an
> out-of-bounds position to SET_PT_BOTH (position, byte), which either
> asserts or crashes.  Not sure how the buffer could have changed ---
> maybe a handler-bind?  The JSON parser doesn't seem to do anything
> except allocate and signal.

Can you post a recipe for reproducing this?

>   own_text = {
>     beg = 0x0000000130088000 ""
>     gpt = 1
>     z = 74465
>     gpt_byte = 1
>     z_byte = 76476
>     gap_size = 60247
>     modiff = 15338
>     chars_modiff = 15338
>     save_modiff = 1
>     overlay_modiff = 757
>     compact = 1
>     beg_unchanged = 0
>     end_unchanged = 0
>     unchanged_modified = 4374
>     overlay_unchanged_modified = 755
>     intervals = 0x000000011f38caa8
>     markers = 0x000000011ffa4288
>     inhibit_shrinking = false
>     redisplay = true
>   }
>   text = 0x000000011e011268
>   pt = 1
>   pt_byte = 1
>   begv = 1
>   begv_byte = 1
>   zv = 1
>   zv_byte = 1

This seems to tell that the buffer is narrowed to an empty region.
Does that make sense in the scenario where you had this problem?

> (lldb) print p.point_of_current_line 
> (ptrdiff_t) 1
> (lldb) print p.current_column 
> (ptrdiff_t) 6
> 
> input_begin = 0x0000000130096b57 "\n     6 pass\n     620 skip\n [...]
> input_current = 0x0000000130096b5e " pass\n     620 skip\n

Give BEGV and ZV, this seems to mean we are accessing beyond the
restriction, which should never happen.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 14:13:02 GMT) Full text and rfc822 format available.

Message #14 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Colascione <dancol <at> dancol.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Pip Cet <pipcet <at> protonmail.com>, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Fri, 28 Mar 2025 10:11:58 -0400
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Daniel Colascione <dancol <at> dancol.org>
>> Date: Thu, 27 Mar 2025 21:07:02 -0400
>> 
>> 
>> Somehow, the buffer changes underneath json_parse.  We pass an
>> out-of-bounds position to SET_PT_BOTH (position, byte), which either
>> asserts or crashes.  Not sure how the buffer could have changed ---
>> maybe a handler-bind?  The JSON parser doesn't seem to do anything
>> except allocate and signal.
>
> Can you post a recipe for reproducing this?

Didn't have a good repro.  Pip's fix works though.  I was barking up
the wrong tree: I'm parsing JSON out of a process buffer in a loop and
dispatching commands as they come in. One of these commands switched the
buffer, so in the next iteration of the loop, I started parsing JSON out
of some other random buffer.  It just so happened that other buffer was
narrowed, so we crashed.  I'll let Pip do the honors of checking in the
fix if he wants.

I initially thought a GC finalizer might have been switching the buffer,
but turns out GC doesn't actually run for me while parsing.

IGC does GC all the time --- but it's not observable because we pump
messages from the GC only at dedicated points and run GC hooks only in
response to these messages. however, notice that on the IGC branch that
we pump GC messages, including finalizer callbacks, on the allocation
path for, e.g. various pseudovectors.  That'll cause Lisp to run where
it wouldn't have before. Is that going to be a problem?  ISTM we can
either pump messages in maybe_quit() or just rely on igc_on_idle().





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 15:06:01 GMT) Full text and rfc822 format available.

Message #17 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Fri, 28 Mar 2025 15:05:22 +0000
"Daniel Colascione" <dancol <at> dancol.org> writes:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>>> From: Daniel Colascione <dancol <at> dancol.org>
>>> Date: Thu, 27 Mar 2025 21:07:02 -0400
>>>
>>>
>>> Somehow, the buffer changes underneath json_parse.  We pass an
>>> out-of-bounds position to SET_PT_BOTH (position, byte), which either
>>> asserts or crashes.  Not sure how the buffer could have changed ---
>>> maybe a handler-bind?  The JSON parser doesn't seem to do anything
>>> except allocate and signal.
>>
>> Can you post a recipe for reproducing this?
>
> Didn't have a good repro.  Pip's fix works though.  I was barking up
> the wrong tree: I'm parsing JSON out of a process buffer in a loop and
> dispatching commands as they come in. One of these commands switched the
> buffer, so in the next iteration of the loop, I started parsing JSON out
> of some other random buffer.  It just so happened that other buffer was
> narrowed, so we crashed.  I'll let Pip do the honors of checking in the
> fix if he wants.

Eli, is that okay?  I'll simplify the else branch, which has an
unnecessary "else if" in the original patch.

> I initially thought a GC finalizer might have been switching the buffer,
> but turns out GC doesn't actually run for me while parsing.

json.c assumes no GC on the master branch, because it doesn't protect
its object workspace (and possibly for other reasons).

> IGC does GC all the time --- but it's not observable because we pump
> messages from the GC only at dedicated points and run GC hooks only in
> response to these messages. however, notice that on the IGC branch that
> we pump GC messages, including finalizer callbacks, on the allocation
> path for, e.g. various pseudovectors.  That'll cause Lisp to run where
> it wouldn't have before. Is that going to be a problem?  ISTM we can
> either pump messages in maybe_quit() or just rely on igc_on_idle().

Oh, I forgot about that one!  I'll open a new bug so this gets a
number: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=77338

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 16:23:02 GMT) Full text and rfc822 format available.

Message #20 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Daniel Colascione <dancol <at> dancol.org>
Cc: pipcet <at> protonmail.com, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Fri, 28 Mar 2025 19:21:57 +0300
> From: Daniel Colascione <dancol <at> dancol.org>
> Cc: 77325 <at> debbugs.gnu.org, Pip Cet <pipcet <at> protonmail.com>
> Date: Fri, 28 Mar 2025 10:11:58 -0400
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> Didn't have a good repro.  Pip's fix works though.  I was barking up
> the wrong tree: I'm parsing JSON out of a process buffer in a loop and
> dispatching commands as they come in. One of these commands switched the
> buffer, so in the next iteration of the loop, I started parsing JSON out
> of some other random buffer.  It just so happened that other buffer was
> narrowed, so we crashed.  I'll let Pip do the honors of checking in the
> fix if he wants.

I think we should simply replace each BEG with BEGV and each Z with
ZV.  Emacs should never look outside of the current restriction.

> IGC does GC all the time --- but it's not observable because we pump
> messages from the GC only at dedicated points and run GC hooks only in
> response to these messages. however, notice that on the IGC branch that
> we pump GC messages, including finalizer callbacks, on the allocation
> path for, e.g. various pseudovectors.

You mean, we'll run Lisp as part of displaying messages?  Or what do
you mean by "pump GC messages"?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 16:31:01 GMT) Full text and rfc822 format available.

Message #23 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Fri, 28 Mar 2025 19:29:14 +0300
> Date: Fri, 28 Mar 2025 15:05:22 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 77325 <at> debbugs.gnu.org
> 
> "Daniel Colascione" <dancol <at> dancol.org> writes:
> 
> > Didn't have a good repro.  Pip's fix works though.  I was barking up
> > the wrong tree: I'm parsing JSON out of a process buffer in a loop and
> > dispatching commands as they come in. One of these commands switched the
> > buffer, so in the next iteration of the loop, I started parsing JSON out
> > of some other random buffer.  It just so happened that other buffer was
> > narrowed, so we crashed.  I'll let Pip do the honors of checking in the
> > fix if he wants.
> 
> Eli, is that okay?  I'll simplify the else branch, which has an
> unnecessary "else if" in the original patch.

Can we discuss why you don't simply replace Z with ZV and BEG with
BEGV?  I'm not sure I understand some parts of the change you
proposed.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Fri, 28 Mar 2025 17:01:02 GMT) Full text and rfc822 format available.

Message #26 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Fri, 28 Mar 2025 17:00:24 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Fri, 28 Mar 2025 15:05:22 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: Eli Zaretskii <eliz <at> gnu.org>, 77325 <at> debbugs.gnu.org
>>
>> "Daniel Colascione" <dancol <at> dancol.org> writes:
>>
>> > Didn't have a good repro.  Pip's fix works though.  I was barking up
>> > the wrong tree: I'm parsing JSON out of a process buffer in a loop and
>> > dispatching commands as they come in. One of these commands switched the
>> > buffer, so in the next iteration of the loop, I started parsing JSON out
>> > of some other random buffer.  It just so happened that other buffer was
>> > narrowed, so we crashed.  I'll let Pip do the honors of checking in the
>> > fix if he wants.
>>
>> Eli, is that okay?  I'll simplify the else branch, which has an
>> unnecessary "else if" in the original patch.
>
> Can we discuss why you don't simply replace Z with ZV and BEG with
> BEGV?  I'm not sure I understand some parts of the change you
> proposed.

Because the code assumes GPT <= Z, and GPT <= ZV isn't always true.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Sat, 29 Mar 2025 07:16:01 GMT) Full text and rfc822 format available.

Message #29 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Sat, 29 Mar 2025 10:14:51 +0300
> Date: Fri, 28 Mar 2025 17:00:24 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> Date: Fri, 28 Mar 2025 15:05:22 +0000
> >> From: Pip Cet <pipcet <at> protonmail.com>
> >> Cc: Eli Zaretskii <eliz <at> gnu.org>, 77325 <at> debbugs.gnu.org
> >>
> >> "Daniel Colascione" <dancol <at> dancol.org> writes:
> >>
> >> > Didn't have a good repro.  Pip's fix works though.  I was barking up
> >> > the wrong tree: I'm parsing JSON out of a process buffer in a loop and
> >> > dispatching commands as they come in. One of these commands switched the
> >> > buffer, so in the next iteration of the loop, I started parsing JSON out
> >> > of some other random buffer.  It just so happened that other buffer was
> >> > narrowed, so we crashed.  I'll let Pip do the honors of checking in the
> >> > fix if he wants.
> >>
> >> Eli, is that okay?  I'll simplify the else branch, which has an
> >> unnecessary "else if" in the original patch.
> >
> > Can we discuss why you don't simply replace Z with ZV and BEG with
> > BEGV?  I'm not sure I understand some parts of the change you
> > proposed.
> 
> Because the code assumes GPT <= Z, and GPT <= ZV isn't always true.

Sorry, I don't understand: if the gap is beyond ZV, then there's no
"secondary" region for json.c's purposes, which AFAIU is the only
thing json-parse-buffer needs to know.  In addition, the value of
'end' should be limited to not exceed ZV_ADDR.  Or what am I missing?

IOW, why does json-parse-buffer ignore the restriction?  No other
primitive does, with rare exceptions that are explicitly documented.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Sat, 29 Mar 2025 11:55:02 GMT) Full text and rfc822 format available.

Message #32 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Sat, 29 Mar 2025 11:53:49 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Fri, 28 Mar 2025 17:00:24 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> >> Date: Fri, 28 Mar 2025 15:05:22 +0000
>> >> From: Pip Cet <pipcet <at> protonmail.com>
>> >> Cc: Eli Zaretskii <eliz <at> gnu.org>, 77325 <at> debbugs.gnu.org
>> >>
>> >> "Daniel Colascione" <dancol <at> dancol.org> writes:
>> >>
>> >> > Didn't have a good repro.  Pip's fix works though.  I was barking up
>> >> > the wrong tree: I'm parsing JSON out of a process buffer in a loop and
>> >> > dispatching commands as they come in. One of these commands switched the
>> >> > buffer, so in the next iteration of the loop, I started parsing JSON out
>> >> > of some other random buffer.  It just so happened that other buffer was
>> >> > narrowed, so we crashed.  I'll let Pip do the honors of checking in the
>> >> > fix if he wants.
>> >>
>> >> Eli, is that okay?  I'll simplify the else branch, which has an
>> >> unnecessary "else if" in the original patch.
>> >
>> > Can we discuss why you don't simply replace Z with ZV and BEG with
>> > BEGV?  I'm not sure I understand some parts of the change you
>> > proposed.
>>
>> Because the code assumes GPT <= Z, and GPT <= ZV isn't always true.
>
> Sorry, I don't understand: if the gap is beyond ZV, then there's no
> "secondary" region for json.c's purposes, which AFAIU is the only
> thing json-parse-buffer needs to know.

It's about the primary selection, not the secondary one.

The code currently reads:

  unsigned char *begin = PT_ADDR;
  unsigned char *end = GPT_ADDR;
  unsigned char *secondary_begin = NULL;
  unsigned char *secondary_end = NULL;
  if (GPT_ADDR < Z_ADDR)
    {
      secondary_begin = GAP_END_ADDR;
      if (secondary_begin < PT_ADDR)
	secondary_begin = PT_ADDR;
      secondary_end = Z_ADDR;
    }

Simply replacing Z_ADDR by ZV_ADDR would still set up the primary region
to be [PT, GPT].  If GPT > ZV, that would mean that the primary region
extends beyond ZV, which would mean we parse buffer text that should be
inaccessible.

So, in this case, we need to limit the primary region to end at ZV_ADDR.
That's what my patch does.

The code for the secondary region is correct, if unnecessary because
sending up a paradoxical [GPT, ZV] range if ZV < GPT wouldn't hurt.

> In addition, the value of 'end' should be limited to not exceed
> ZV_ADDR.  Or what am I missing?

That's what my patch does, yes.

> IOW, why does json-parse-buffer ignore the restriction?  No other
> primitive does, with rare exceptions that are explicitly documented.

I assumed it was an accident, and that's why my patch changes it to
respect the restriction.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Sat, 29 Mar 2025 12:39:02 GMT) Full text and rfc822 format available.

Message #35 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Sat, 29 Mar 2025 15:38:42 +0300
> Date: Sat, 29 Mar 2025 11:53:49 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> Simply replacing Z_ADDR by ZV_ADDR would still set up the primary region
> to be [PT, GPT].  If GPT > ZV, that would mean that the primary region
> extends beyond ZV, which would mean we parse buffer text that should be
> inaccessible.
> 
> So, in this case, we need to limit the primary region to end at ZV_ADDR.
> That's what my patch does.
> 
> The code for the secondary region is correct, if unnecessary because
> sending up a paradoxical [GPT, ZV] range if ZV < GPT wouldn't hurt.
> 
> > In addition, the value of 'end' should be limited to not exceed
> > ZV_ADDR.  Or what am I missing?
> 
> That's what my patch does, yes.
> 
> > IOW, why does json-parse-buffer ignore the restriction?  No other
> > primitive does, with rare exceptions that are explicitly documented.
> 
> I assumed it was an accident, and that's why my patch changes it to
> respect the restriction.

Sorry, I've misread your patch.  It's fine (but please don't use
braces for a 1-line block).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Sat, 29 Mar 2025 15:38:01 GMT) Full text and rfc822 format available.

Message #38 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Sat, 29 Mar 2025 15:37:17 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Sat, 29 Mar 2025 11:53:49 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> Simply replacing Z_ADDR by ZV_ADDR would still set up the primary region
>> to be [PT, GPT].  If GPT > ZV, that would mean that the primary region
>> extends beyond ZV, which would mean we parse buffer text that should be
>> inaccessible.
>>
>> So, in this case, we need to limit the primary region to end at ZV_ADDR.
>> That's what my patch does.
>>
>> The code for the secondary region is correct, if unnecessary because
>> sending up a paradoxical [GPT, ZV] range if ZV < GPT wouldn't hurt.
>>
>> > In addition, the value of 'end' should be limited to not exceed
>> > ZV_ADDR.  Or what am I missing?
>>
>> That's what my patch does, yes.
>>
>> > IOW, why does json-parse-buffer ignore the restriction?  No other
>> > primitive does, with rare exceptions that are explicitly documented.
>>
>> I assumed it was an accident, and that's why my patch changes it to
>> respect the restriction.
>
> Sorry, I've misread your patch.

No problem at all.

The code does seem a bit complicated for what it's trying to achieve, to
be honest. I think it'd be clearer just to write:

  unsigned char *begin = PT_ADDR;
  unsigned char *end = min (GPT_ADDR, ZV_ADDR);
  unsigned char *secondary_begin = min (GAP_END_ADDR, ZV_ADDR));
  unsigned char *secondary_end = ZV_ADDR;

  json_parser_init (&p, conf, begin, end, secondary_begin,
		    secondary_end);

json_parser_init fixes up secondary_begin and secondary_end to be NULL
pointers in this case.

("min (GAP_END_ADDR, ZV_ADDR)" could also be replaced by just
"GAP_END_ADDR", since json_parser_init fixes paradoxical ranges itself,
but I think we should avoid ever creating those where possible.)

But if that's not okay, let me know and I'll push the other patch.

Pip





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Sat, 29 Mar 2025 15:48:02 GMT) Full text and rfc822 format available.

Message #41 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Sat, 29 Mar 2025 18:47:07 +0300
> Date: Sat, 29 Mar 2025 15:37:17 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
> 
> The code does seem a bit complicated for what it's trying to achieve, to
> be honest. I think it'd be clearer just to write:
> 
>   unsigned char *begin = PT_ADDR;
>   unsigned char *end = min (GPT_ADDR, ZV_ADDR);
>   unsigned char *secondary_begin = min (GAP_END_ADDR, ZV_ADDR));
>   unsigned char *secondary_end = ZV_ADDR;
> 
>   json_parser_init (&p, conf, begin, end, secondary_begin,
> 		    secondary_end);
> 
> json_parser_init fixes up secondary_begin and secondary_end to be NULL
> pointers in this case.

I'd prefer the secondary_* values to be set NULL explicitly, otherwise
it is hard to understand what the code does without looking deep into
the subroutines of json_parse.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Sun, 30 Mar 2025 10:41:02 GMT) Full text and rfc822 format available.

Message #44 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Sun, 30 Mar 2025 10:40:42 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Sat, 29 Mar 2025 15:37:17 +0000
>> From: Pip Cet <pipcet <at> protonmail.com>
>> Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
>>
>> The code does seem a bit complicated for what it's trying to achieve, to
>> be honest. I think it'd be clearer just to write:
>>
>>   unsigned char *begin = PT_ADDR;
>>   unsigned char *end = min (GPT_ADDR, ZV_ADDR);
>>   unsigned char *secondary_begin = min (GAP_END_ADDR, ZV_ADDR));
>>   unsigned char *secondary_end = ZV_ADDR;
>>
>>   json_parser_init (&p, conf, begin, end, secondary_begin,
>> 		    secondary_end);
>>
>> json_parser_init fixes up secondary_begin and secondary_end to be NULL
>> pointers in this case.
>
> I'd prefer the secondary_* values to be set NULL explicitly, otherwise
> it is hard to understand what the code does without looking deep into
> the subroutines of json_parse.

Sorry, it took me a while.  I was surprised by how tricky this is, and I
admit I was confused briefly because ZV == GPT doesn't make ZV_ADDR ==
GPT_ADDR.  I can see that most places in the code manipulate byte
positions, not pointers, and I think I understand the reason now :-)

From dd30d60a7992da0cef4e09a8a7e8989249dab4f1 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet <at> protonmail.com>
Subject: [PATCH] Respect narrowed buffers when parsing JSON (bug#77325)

* src/json.c (Fjson_insert): Simplify 'memcpy' argument.
(Fjson_parse_buffer): Only read to ZV, not all the way to Z.
* test/src/json-tests.el (with-all-gap-positions-in-temp-buffer):
New macro.
(json-parse-buffer/restricted): New test.
---
 src/json.c             | 13 +++++++------
 test/src/json-tests.el | 26 ++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/src/json.c b/src/json.c
index f438d191bde..5795c582ce0 100644
--- a/src/json.c
+++ b/src/json.c
@@ -641,7 +641,7 @@ DEFUN ("json-insert", Fjson_insert, Sjson_insert, 1, MANY,
   move_gap_both (PT, PT_BYTE);
   if (GAP_SIZE < jo.size)
     make_gap (jo.size - GAP_SIZE);
-  memcpy ((char *) BEG_ADDR + PT_BYTE - BEG_BYTE, jo.buf, jo.size);
+  memcpy (GPT_ADDR, jo.buf, jo.size);
 
   /* No need to keep allocation beyond this point.  */
   unbind_to (count, Qnil);
@@ -1754,15 +1754,16 @@ DEFUN ("json-parse-buffer", Fjson_parse_buffer, Sjson_parse_buffer,
 
   struct json_parser p;
   unsigned char *begin = PT_ADDR;
-  unsigned char *end = GPT_ADDR;
+  unsigned char *end = (GPT == ZV) ? GPT_ADDR : ZV_ADDR;
   unsigned char *secondary_begin = NULL;
   unsigned char *secondary_end = NULL;
-  if (GPT_ADDR < Z_ADDR)
+  if (PT == ZV)
+    begin = end = NULL;
+  else if (GPT > PT && GPT < ZV && GAP_SIZE > 0)
     {
+      end = GPT_ADDR;
       secondary_begin = GAP_END_ADDR;
-      if (secondary_begin < PT_ADDR)
-	secondary_begin = PT_ADDR;
-      secondary_end = Z_ADDR;
+      secondary_end = ZV_ADDR;
     }
 
   json_parser_init (&p, conf, begin, end, secondary_begin,
diff --git a/test/src/json-tests.el b/test/src/json-tests.el
index 94b6cfcffca..1cb667ddeac 100644
--- a/test/src/json-tests.el
+++ b/test/src/json-tests.el
@@ -315,6 +315,32 @@ json-parse-buffer/trailing
     (should-not (bobp))
     (should (looking-at-p (rx " [456]" eos)))))
 
+(defmacro with-all-gap-positions-in-temp-buffer (string &rest body)
+  "Create a temporary buffer containing STRING, and evaluate BODY
+with each possible gap position.
+See also `with-temp-buffer'."
+  `(with-temp-buffer
+     (insert ,string)
+     (dotimes (i (- (point-max) (point-min)))
+       (goto-char (- (point-max) i))
+       (insert "X")
+       (delete-region (1- (point)) (point))
+       ,@body)))
+
+(ert-deftest json-parse-buffer/restricted ()
+  (with-all-gap-positions-in-temp-buffer
+   "[123] [456] [789]"
+   (pcase-dolist (`((,beg . ,end) ,result)
+                  '(((7 . 12) [456])
+                    ((1 . 6) [123])
+                    ((13 . 18) [789])))
+     (goto-char beg)
+     (narrow-to-region beg end)
+     (should (equal (json-parse-buffer) result))
+     (should (= (point) end))
+     (should-error (json-parse-buffer) :type 'json-end-of-file)
+     (widen))))
+
 (ert-deftest json-parse-with-custom-null-and-false-objects ()
   (let* ((input
           "{ \"abc\" : [9, false] , \"def\" : null }")
-- 
2.48.1






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#77325; Package emacs. (Sun, 30 Mar 2025 11:46:02 GMT) Full text and rfc822 format available.

Message #47 received at 77325 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
Subject: Re: bug#77325: Crash in Fjson_parse_buffer: ZV changes underneath it?
Date: Sun, 30 Mar 2025 14:45:20 +0300
> Date: Sun, 30 Mar 2025 10:40:42 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: dancol <at> dancol.org, 77325 <at> debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> > I'd prefer the secondary_* values to be set NULL explicitly, otherwise
> > it is hard to understand what the code does without looking deep into
> > the subroutines of json_parse.
> 
> Sorry, it took me a while.  I was surprised by how tricky this is, and I
> admit I was confused briefly because ZV == GPT doesn't make ZV_ADDR ==
> GPT_ADDR.  I can see that most places in the code manipulate byte
> positions, not pointers, and I think I understand the reason now :-)

Yes, it's a bit tricky.

> >From dd30d60a7992da0cef4e09a8a7e8989249dab4f1 Mon Sep 17 00:00:00 2001
> From: Pip Cet <pipcet <at> protonmail.com>
> Subject: [PATCH] Respect narrowed buffers when parsing JSON (bug#77325)
> 
> * src/json.c (Fjson_insert): Simplify 'memcpy' argument.
> (Fjson_parse_buffer): Only read to ZV, not all the way to Z.
> * test/src/json-tests.el (with-all-gap-positions-in-temp-buffer):
> New macro.
> (json-parse-buffer/restricted): New test.

Thanks, LGTM.




This bug report was last modified 78 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.