From debbugs-submit-bounces@debbugs.gnu.org Wed Jan 22 15:12:18 2025 Received: (at submit) by debbugs.gnu.org; 22 Jan 2025 20:12:18 +0000 Received: from localhost ([127.0.0.1]:36709 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tah5F-0006X4-Rv for submit@debbugs.gnu.org; Wed, 22 Jan 2025 15:12:18 -0500 Received: from lists.gnu.org ([2001:470:142::17]:42226) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tah5C-0006Wh-Gn for submit@debbugs.gnu.org; Wed, 22 Jan 2025 15:12:15 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tah56-00063Z-L1 for bug-gnu-emacs@gnu.org; Wed, 22 Jan 2025 15:12:08 -0500 Received: from mail-4322.protonmail.ch ([185.70.43.22]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tah54-0008Se-8b for bug-gnu-emacs@gnu.org; Wed, 22 Jan 2025 15:12:08 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1737576722; x=1737835922; bh=LfzfwOAZTCEbOasxe+eLyMyt4vYRiEccJrndzc95KYU=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector: List-Unsubscribe:List-Unsubscribe-Post; b=Uj+8TQp4uLjD3es/trk4B6QPr+CZdQX/mSa4sDNtVK1zMsOBzcbccXQD54Z5/e7Kl F1qZg1fIbZ2m+GLIX+9I/ugirMnPqA48rBON28aYqCBXv5L7OBVzf33B+Yu2nXopyS mOIiXgcx+MtPyq3dyGD7JTpqkIZoq4dR5N3+SY4qN31kT1vWyucren+5fSPmgGXBki povAM7jZDGSuUHjMCgRvwPLsKXyvfU7IsVBfKcEW/NbjmyW5X9HZurX67FfHwVO8ii fFvOpll/Uv+rj6ipG3H0yarlZlxsVVdzOt49FZ3JKgwyHMUZH+WCJxO603kCRpo7ez 1nTMf3d/EZFtQ== Date: Wed, 22 Jan 2025 20:11:58 +0000 To: bug-gnu-emacs@gnu.org From: Pip Cet Subject: Missing assertions to detect access to GC-freed strings Message-ID: <87tt9q8v7e.fsf@protonmail.com> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: d239b0609c7fbf8a5a52553ccb3f288bd59a245e MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.43.22; envelope-from=pipcet@protonmail.com; helo=mail-4322.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.043, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) This is a spin-off from bug#75754. I produced an example program to demonstrate the correctness bug described therein: (setq print-unreadable-function (lambda (&rest args) (garbage-collect) (make-string 100 ?=CE=B8))) (message "%S" (format "%S %S" [1] (symbol-function '+))) This program sometimes completes without throwing an error or causing a segfault on the master branch, but produces incorrect output: the correct output starts with "[1] "; the incorrect output omits the "[1]". This is because the freed string looks like this: (gdb) p info[0].argument $7 =3D XIL(0x555555fdcea4) (gdb) pp $ # (gdb) p XSTRING($7) $8 =3D (struct Lisp_String *) 0x555555fdcea0 (gdb) p *$ $9 =3D { u =3D { s =3D { size =3D 0, size_byte =3D -1, intervals =3D 0x0, data =3D 0x0 }, next =3D 0x0, gcaligned =3D 0 '\000' } } This is obviously invalid; as Emacs strings include a trailing NUL byte, their data pointer must never be NULL. However, as the size field is 0 (this is invalid for all but the pair of empty strings), we never attempt to dereference the NULL pointer, and that means we do not abort or crash soon enough. This needs to be fixed. GC bugs usually lead to segmentation faults and crashes; producing incorrect output and silently corrupting user data is much, much worse. I'm proposing to add checks, at least temporarily, to the --enable-checking builds: these checks would ensure that a string contains a non-NULL data field if it is accessed in any way. Even with this patch, we would fail to recognize impossible strings automatically if even a single bit in the data pointer is set. I don't have a better solution yet, though, but the patch (to follow as soon as this has a bug number) would have detected this particular problem a little sooner. From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 23 12:42:30 2025 Received: (at 75768) by debbugs.gnu.org; 23 Jan 2025 17:42:30 +0000 Received: from localhost ([127.0.0.1]:42198 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tb1Dq-0003AV-9n for submit@debbugs.gnu.org; Thu, 23 Jan 2025 12:42:30 -0500 Received: from mail-4322.protonmail.ch ([185.70.43.22]:17153) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tb1Do-0003AC-3T for 75768@debbugs.gnu.org; Thu, 23 Jan 2025 12:42:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1737654141; x=1737913341; bh=Yct1bGMqNdGiLIu0ccAByk3T2LHJiV5CpjhE0/QYmZs=; h=Date:To:From:Subject:Message-ID:Feedback-ID:From:To:Cc:Date: Subject:Reply-To:Feedback-ID:Message-ID:BIMI-Selector: List-Unsubscribe:List-Unsubscribe-Post; b=u2if33msUQynkcLc+c5EjilHyU+TKHYC1m8ld61TdoNY3KUwqiizO/pgXgZgepxU6 Avgk4aSu3UZqQfCHRgXL/gdv4TizC9AiU1eL/2DGFgtkE7dL4BeA8B96XVi6N/mMRh xxsc95e8QqeSrE9wpcOkYeZGIk4wjVttjN20WK5TP3bgfiveSrJlR+QIeWChIwv0ck VAOeBiZRAp2GuLnux132riWkvhR8pdwEymES8MJECkSweFSyM9hfTAJkFQbA53VaIn HPCZOxcfG0GJV85XlqnmaxneMl0j82qjGOhsqoif0JXmNVIIrdpa/PWAwWD8bceBFE DR3Tbah/rSzRg== Date: Thu, 23 Jan 2025 17:42:16 +0000 To: 75768@debbugs.gnu.org From: Pip Cet Subject: Re: Missing assertions to detect access to GC-freed strings Message-ID: <87ikq54ec1.fsf@protonmail.com> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: 214d12005aa683d5bf04eb812f3c71054826152f MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 75768 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Pip Cet writes: > Even with this patch, we would fail to recognize impossible strings > automatically if even a single bit in the data pointer is set. I don't > have a better solution yet, though, but the patch (to follow as soon as > this has a bug number) would have detected this particular problem a > little sooner. Patch follows, I've confirmed that it would have detected this problem very slightly sooner. >From 5ddd7c3cb24fe1d0b2f53a44500f11db69a09831 Mon Sep 17 00:00:00 2001 From: Pip Cet Subject: [PATCH] Add debug assertions that we never access a free'd string (bug#75768) * src/lisp.h (LIVE_STRING_P): New function. (SDATA): (SCHARS): (STRING_BYTES): (string_immovable_p): Use it in assertions. --- src/lisp.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/lisp.h b/src/lisp.h index 28fa4c8037e..498aa2b98af 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -1617,6 +1617,13 @@ STRINGP (Lisp_Object x) return TAGGEDP (x, Lisp_String); } =20 +/* /\* Should always return true. */ */ +INLINE bool +LIVE_STRING_P (Lisp_Object string) +{ + return XSTRING (string)->u.s.data !=3D NULL; +} + INLINE void CHECK_STRING (Lisp_Object x) { @@ -1685,6 +1692,7 @@ STRING_SET_MULTIBYTE (Lisp_Object str) INLINE unsigned char * SDATA (Lisp_Object string) { + eassert (LIVE_STRING_P (string)); return XSTRING (string)->u.s.data; } INLINE char * @@ -1706,6 +1714,7 @@ SSET (Lisp_Object string, ptrdiff_t index, unsigned c= har new) INLINE ptrdiff_t SCHARS (Lisp_Object string) { + eassert (LIVE_STRING_P (string)); ptrdiff_t nchars =3D XSTRING (string)->u.s.size; eassume (0 <=3D nchars); return nchars; @@ -1717,6 +1726,7 @@ SCHARS (Lisp_Object string) INLINE ptrdiff_t STRING_BYTES (struct Lisp_String *s) { + eassert (LIVE_STRING_P (make_lisp_ptr (s, Lisp_String))); #ifdef GC_CHECK_STRING_BYTES ptrdiff_t nbytes =3D string_bytes (s); #else @@ -1753,6 +1763,7 @@ CHECK_STRING_NULL_BYTES (Lisp_Object string) INLINE bool string_immovable_p (Lisp_Object str) { + eassert (LIVE_STRING_P (str)); return XSTRING (str)->u.s.size_byte =3D=3D -3; } =20 --=20 2.47.1 From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 23 12:45:09 2025 Received: (at 75768) by debbugs.gnu.org; 23 Jan 2025 17:45:09 +0000 Received: from localhost ([127.0.0.1]:42203 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tb1GO-0003Fb-PH for submit@debbugs.gnu.org; Thu, 23 Jan 2025 12:45:09 -0500 Received: from mail-10631.protonmail.ch ([79.135.106.31]:51145) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tb1GL-0003Ei-JF for 75768@debbugs.gnu.org; Thu, 23 Jan 2025 12:45:06 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail3; t=1737654298; x=1737913498; bh=QqLJ5LGAo7ENPBN70y7hhaVdAG5sF1NOu8bu/chNwmI=; h=Date:To:From:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector:List-Unsubscribe:List-Unsubscribe-Post; b=Xc3hLbEHVHVaimQC4H8kTa5rqeIyXDKGsnCyBIf3/MpQgb3ZxmV5X7o7beYMMt5I1 tzTi7D3kNE0tpmT5HyeNQBKHACjI5O0+NDt2CfPXoBzrY8EohpVizElUJJrhG1RZsr RbxsufV/JIQHc7EMRW9mfKh0U2As8jrRGoL95jHqP18GKmSlNGqo1ltkPkZTxOdpio riDcpVxLyCraBgTSKs3/RnoHkJXjWG2KcrBJOhP4lBzND52hST8x1yg1Xf6rt4+coF yk2qT6avEF9QxRIGI7bdWhSIO5z+suFiaPK9Wpr9c7OFgHZXBn0MPb+TsICVKUsY9f +8HCC8BV/sEzQ== Date: Thu, 23 Jan 2025 17:44:54 +0000 To: 75768@debbugs.gnu.org From: Pip Cet Subject: Re: Missing assertions to detect access to GC-freed strings Message-ID: <87cygd4e7n.fsf@protonmail.com> In-Reply-To: <87ikq54ec1.fsf@protonmail.com> References: <87ikq54ec1.fsf@protonmail.com> Feedback-ID: 112775352:user:proton X-Pm-Message-ID: f2849b56c884773c107ecdefc2018b6d47ef89f3 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 75768 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) "Pip Cet" writes: > Pip Cet writes: > >> Even with this patch, we would fail to recognize impossible strings >> automatically if even a single bit in the data pointer is set. I don't >> have a better solution yet, though, but the patch (to follow as soon as >> this has a bug number) would have detected this particular problem a >> little sooner. > > Patch follows, I've confirmed that it would have detected this problem Sorry, wrong patch! >From 5018bdbf3729351f5273c7eecee3b67131aa2859 Mon Sep 17 00:00:00 2001 From: Pip Cet Subject: [PATCH] Additional checks for access to freed strings (bug#75768) * src/lisp.h (LIVE_STRING_P): New function. (SDATA): (SCHARS): (STRING_BYTES): (string_immovable_p): Use it in assertions. --- src/lisp.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/lisp.h b/src/lisp.h index 28fa4c8037e..678f1ba0460 100644 --- a/src/lisp.h +++ b/src/lisp.h @@ -1672,6 +1672,13 @@ STRING_SET_MULTIBYTE (Lisp_Object str) XSTRING (str)->u.s.size_byte =3D XSTRING (str)->u.s.size; } =20 +/* Should always return true. */ +INLINE bool +LIVE_STRING_P (Lisp_Object string) +{ + return XSTRING (string)->u.s.data !=3D NULL; +} + /* Convenience functions for dealing with Lisp strings. */ =20 /* WARNING: Use the 'char *' pointers to string data with care in code @@ -1685,6 +1692,7 @@ STRING_SET_MULTIBYTE (Lisp_Object str) INLINE unsigned char * SDATA (Lisp_Object string) { + eassert (LIVE_STRING_P (string)); return XSTRING (string)->u.s.data; } INLINE char * @@ -1706,6 +1714,7 @@ SSET (Lisp_Object string, ptrdiff_t index, unsigned c= har new) INLINE ptrdiff_t SCHARS (Lisp_Object string) { + eassert (LIVE_STRING_P (string)); ptrdiff_t nchars =3D XSTRING (string)->u.s.size; eassume (0 <=3D nchars); return nchars; @@ -1717,6 +1726,7 @@ SCHARS (Lisp_Object string) INLINE ptrdiff_t STRING_BYTES (struct Lisp_String *s) { + eassert (LIVE_STRING_P (make_lisp_ptr (s, Lisp_String))); #ifdef GC_CHECK_STRING_BYTES ptrdiff_t nbytes =3D string_bytes (s); #else @@ -1753,6 +1763,7 @@ CHECK_STRING_NULL_BYTES (Lisp_Object string) INLINE bool string_immovable_p (Lisp_Object str) { + eassert (LIVE_STRING_P (str)); return XSTRING (str)->u.s.size_byte =3D=3D -3; } =20 --=20 2.47.1 From debbugs-submit-bounces@debbugs.gnu.org Thu Jan 23 13:17:23 2025 Received: (at 75768) by debbugs.gnu.org; 23 Jan 2025 18:17:23 +0000 Received: from localhost ([127.0.0.1]:42308 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1tb1la-0004uM-Jq for submit@debbugs.gnu.org; Thu, 23 Jan 2025 13:17:23 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:52124) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1tb1lY-0004u0-8X for 75768@debbugs.gnu.org; Thu, 23 Jan 2025 13:17:20 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tb1lS-0000Kv-94; Thu, 23 Jan 2025 13:17:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=LFqsxeqSxUaNBFRKh9HIPFNifvsy4R2+5Ft72fvHXzc=; b=Gf4v8AJDabGC pEgNdFIHSRvhDavxa7jYRx3Um7HxQo6dgFKX3Rg8ZFrfznlYsljH5JdpClQkvjSvNwNiPMRoUKqc/ SoBSoMSfNbuHCkDgSAdb+dyXpWejhppNtUMqDD4YOqlbO3VkLfQZ2PIdZwx6MaS4RQeNMwcjGuxZZ OctFNGXKnswwt8yW1RwKT/SLCiGXxRzOHqjwhIxelL35gNbujerSX0ABXOn+QkaOtDb7SVtDnIPNC IVmLgX/kkRr4jyoFAkiB0K7Nk0FCBfNhCJIr+WEbA52ln7GPZ+IQ9DyL3Tw+3mpoJZhoEh5N0QAt0 ufVbLDDqygkKw1H+4OoAIw==; Date: Thu, 23 Jan 2025 20:16:46 +0200 Message-Id: <86frl9z981.fsf@gnu.org> From: Eli Zaretskii To: Pip Cet In-Reply-To: <87ikq54ec1.fsf@protonmail.com> (bug-gnu-emacs@gnu.org) Subject: Re: bug#75768: Missing assertions to detect access to GC-freed strings References: <87tt9q8v7e.fsf@protonmail.com> <87ikq54ec1.fsf@protonmail.com> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 75768 Cc: 75768@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Thu, 23 Jan 2025 17:42:16 +0000 > From: Pip Cet via "Bug reports for GNU Emacs, > the Swiss army knife of text editors" > > Pip Cet writes: > > > Even with this patch, we would fail to recognize impossible strings > > automatically if even a single bit in the data pointer is set. I don't > > have a better solution yet, though, but the patch (to follow as soon as > > this has a bug number) would have detected this particular problem a > > little sooner. > > Patch follows, I've confirmed that it would have detected this problem > very slightly sooner. Thanks. > +/* /\* Should always return true. */ */ I wonder why did you comment out the comment? > +INLINE bool > +LIVE_STRING_P (Lisp_Object string) > +{ > + return XSTRING (string)->u.s.data != NULL; Can we please have a comment here saying that sweep_strings sets the data pointer to NULL, and thus this function checks whether a string was GC'ed? > INLINE ptrdiff_t > SCHARS (Lisp_Object string) > { > + eassert (LIVE_STRING_P (string)); > ptrdiff_t nchars = XSTRING (string)->u.s.size; > eassume (0 <= nchars); > return nchars; > @@ -1717,6 +1726,7 @@ SCHARS (Lisp_Object string) > INLINE ptrdiff_t > STRING_BYTES (struct Lisp_String *s) > { > + eassert (LIVE_STRING_P (make_lisp_ptr (s, Lisp_String))); I understand why we want an assertion in SDATA and SSDATA, but do we really want it in SCHARS and STRING_BYTES? Those do not access the string text, so how can they be harmful?