From debbugs-submit-bounces@debbugs.gnu.org Tue May 16 06:57:57 2023 Received: (at submit) by debbugs.gnu.org; 16 May 2023 10:57:57 +0000 Received: from localhost ([127.0.0.1]:45326 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pysNQ-0000OR-TK for submit@debbugs.gnu.org; Tue, 16 May 2023 06:57:57 -0400 Received: from lists.gnu.org ([209.51.188.17]:58316) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pysNO-0000OF-4s for submit@debbugs.gnu.org; Tue, 16 May 2023 06:57:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pysNN-0007Wi-Vd for bug-gnu-emacs@gnu.org; Tue, 16 May 2023 06:57:53 -0400 Received: from mx3.muc.de ([193.149.48.5]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pysNL-0007rd-Vq for bug-gnu-emacs@gnu.org; Tue, 16 May 2023 06:57:53 -0400 Received: (qmail 85864 invoked by uid 3782); 16 May 2023 12:57:41 +0200 Received: from acm.muc.de (p4fe15025.dip0.t-ipconnect.de [79.225.80.37]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 16 May 2023 12:57:40 +0200 Received: (qmail 9856 invoked by uid 1000); 16 May 2023 10:57:40 -0000 Date: Tue, 16 May 2023 10:57:40 +0000 To: bug-gnu-emacs@gnu.org Subject: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Submission-Agent: TMDA/1.3.x (Ph3nix) From: Alan Mackenzie X-Primary-Address: acm@muc.de Received-SPF: pass client-ip=193.149.48.5; envelope-from=acm@muc.de; helo=mx3.muc.de X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) Hello, Emacs. In the master branch: Consider the following C++ Mode buffer: // comment \ comment line 2 line_3(); .. The backslash at the end of line 1 extends the comment into line 2. Put point at // on L1, and do: M-: (setq s (parse-partial-sexp (point) (+ (point) 9))) .. s gets the parse state of the inside of the comment. Now put point at EOL 1, between the backslash and the LF. Do M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table) .. This ought to leave point at BOL 2, since the syntax before the LF at EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves point wrongly at BOL 3. ######################################################################### The reason for this bug is at L+42 of forw_comment (in syntax.c). There we have && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) .. Checking char_quoted is wrong. Instead the function should check the current parse state. -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Tue May 16 10:03:30 2023 Received: (at 63535) by debbugs.gnu.org; 16 May 2023 14:03:30 +0000 Received: from localhost ([127.0.0.1]:46836 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyvH0-0008S7-48 for submit@debbugs.gnu.org; Tue, 16 May 2023 10:03:30 -0400 Received: from mx3.muc.de ([193.149.48.5]:36370) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyvGw-0008Rr-07 for 63535@debbugs.gnu.org; Tue, 16 May 2023 10:03:28 -0400 Received: (qmail 2285 invoked by uid 3782); 16 May 2023 16:03:19 +0200 Received: from acm.muc.de (p4fe15025.dip0.t-ipconnect.de [79.225.80.37]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 16 May 2023 16:03:18 +0200 Received: (qmail 30900 invoked by uid 1000); 16 May 2023 14:03:18 -0000 Date: Tue, 16 May 2023 14:03:18 +0000 To: 63535@debbugs.gnu.org Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Submission-Agent: TMDA/1.3.x (Ph3nix) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63535 Cc: acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Tue, May 16, 2023 at 10:57:40 +0000, Alan Mackenzie wrote: > Hello, Emacs. > In the master branch: > Consider the following C++ Mode buffer: > // comment \ > comment line 2 > line_3(); > .. The backslash at the end of line 1 extends the comment into line 2. > Put point at // on L1, and do: > M-: (setq s (parse-partial-sexp (point) (+ (point) 9))) > .. s gets the parse state of the inside of the comment. > Now put point at EOL 1, between the backslash and the LF. Do > M-: (parse-partial-sexp (point) (point-max) nil nil s 'syntax-table) > .. This ought to leave point at BOL 2, since the syntax before the LF at > EOL 1 is that of a C++ comment, otherwise neutral. Instead, it leaves > point wrongly at BOL 3. > ######################################################################### > The reason for this bug is at L+42 of forw_comment (in syntax.c). There > we have > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > .. Checking char_quoted is wrong. Instead the function should check the > current parse state. And here is a patch which fixes it. I will apply this patch to master soon if I don't hear any objection. diff --git a/src/syntax.c b/src/syntax.c index e9e04e2d638..76d9f16e4ed 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? (nesting > 0 && --nesting == 0) : nesting < 0) - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) + && !(comment_end_can_be_escaped && + (((prev_syntax & 0xff) == Sescape) + || ((prev_syntax & 0xff) == Scharquote)))) /* We have encountered a comment end of the same style as the comment sequence which began this comment section. */ -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Tue May 16 11:44:01 2023 Received: (at 63535) by debbugs.gnu.org; 16 May 2023 15:44:01 +0000 Received: from localhost ([127.0.0.1]:47022 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pywqH-0002np-EB for submit@debbugs.gnu.org; Tue, 16 May 2023 11:44:01 -0400 Received: from eggs.gnu.org ([209.51.188.92]:42116) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pywqE-0002na-En for 63535@debbugs.gnu.org; Tue, 16 May 2023 11:43:59 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pywq8-0001W6-Ik; Tue, 16 May 2023 11:43:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=5tqU+qKamz2vuH4IKhftICMuamBWhgii0W4FHo+terc=; b=VaRi6Shxq/bl QPdxznFz6r8aOZh3HEqEIF1opta1iCcEL5z4J4NBCHotOaE95TjUGb+9ehcCnUgfHe1cMh8jnLiFh 3FB7EL729D1hlBCaGWLa7nLgvRMSjF2/fSZUbk3uCooprqHrPL3dIe2vu9f/i3a47xW29pkxR2J+3 W0mr157285qdU+lRUrHtpWR+SqM87mHBPR3dMy/0AEmrmO9AUaiikkGIDIbC1ZWK2Ler8BKNoyTEd vPEVGTSUkvX78oEngzLRk2fjy1dVdWFYBFabnggYC/1YrqBseDAmaS7Kpe/ZxbV/gES+SSYNk42Y1 KX1Q/bupz32rwsXtZKlKug==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pywq8-0006Mm-2f; Tue, 16 May 2023 11:43:52 -0400 Date: Tue, 16 May 2023 18:43:59 +0300 Message-Id: <83ttwcz3gg.fsf@gnu.org> From: Eli Zaretskii To: Alan Mackenzie In-Reply-To: (message from Alan Mackenzie on Tue, 16 May 2023 10:57:40 +0000) Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs References: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Tue, 16 May 2023 10:57:40 +0000 > From: Alan Mackenzie > > Hello, Emacs. > > In the master branch: Is it different on emacs-29? > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > .. Checking char_quoted is wrong. Instead the function should check the > current parse state. Why not both? IOW, please explain why char_quoted is not TRT here. From debbugs-submit-bounces@debbugs.gnu.org Tue May 16 12:15:33 2023 Received: (at 63535) by debbugs.gnu.org; 16 May 2023 16:15:33 +0000 Received: from localhost ([127.0.0.1]:47108 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyxKn-00064j-HM for submit@debbugs.gnu.org; Tue, 16 May 2023 12:15:33 -0400 Received: from mx3.muc.de ([193.149.48.5]:40386) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyxKm-00064V-2Z for 63535@debbugs.gnu.org; Tue, 16 May 2023 12:15:32 -0400 Received: (qmail 54713 invoked by uid 3782); 16 May 2023 18:15:25 +0200 Received: from acm.muc.de (p4fe15025.dip0.t-ipconnect.de [79.225.80.37]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 16 May 2023 18:15:25 +0200 Received: (qmail 16483 invoked by uid 1000); 16 May 2023 16:15:24 -0000 Date: Tue, 16 May 2023 16:15:24 +0000 To: Eli Zaretskii Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Message-ID: References: <83ttwcz3gg.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <83ttwcz3gg.fsf@gnu.org> X-Submission-Agent: TMDA/1.3.x (Ph3nix) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, Eli. On Tue, May 16, 2023 at 18:43:59 +0300, Eli Zaretskii wrote: > > Date: Tue, 16 May 2023 10:57:40 +0000 > > From: Alan Mackenzie > > Hello, Emacs. > > In the master branch: > Is it different on emacs-29? No, the bug has been there since ?2016, having been coded, almost certainly, by me. ;-( The context in 2016 was making an escaped NL in a C++ line comment continue the comment's fontification onto the next line. The (then) new variable comment-end-can-be-escaped configured the effect of the backslash at EOL. I have been assuming that it is too unimportant a bug to go into emacs-29 at this late stage. > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > .. Checking char_quoted is wrong. Instead the function should check the > > current parse state. > Why not both? IOW, please explain why char_quoted is not TRT here. Because parse-partial-sexp is not scanning the backslash. The scan starts one character after the backslash, and the syntactic effect of that backslash is not in the OLDSTATE argument to parse-partial-sexp. -- Alan Mackenzie (Nuremberg, Germany) From debbugs-submit-bounces@debbugs.gnu.org Tue May 16 12:29:27 2023 Received: (at 63535) by debbugs.gnu.org; 16 May 2023 16:29:27 +0000 Received: from localhost ([127.0.0.1]:47140 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyxYF-0006Ph-8Y for submit@debbugs.gnu.org; Tue, 16 May 2023 12:29:27 -0400 Received: from eggs.gnu.org ([209.51.188.92]:37972) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyxYB-0006PS-T2 for 63535@debbugs.gnu.org; Tue, 16 May 2023 12:29:25 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pyxY6-00039J-3n; Tue, 16 May 2023 12:29:18 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=HzR0UpO5FS3Qqz6t6Ll6fUtuKl6U3rl6vAVsrJEJ2PM=; b=dSkbKCKtOtCI 6sfIfZl/M+hrl6GNcd7iDH5wCbnART90XLCybj8tdYbuo/6Hg8QXB4EV3xP7NlPW9tIEwIBC0UJ3H f2X2BiMHAJ+4LmZzs/+844bcv459SLa//FM5BMO9mA/v7tpF2g/fdIH83WVR28vg2/nSo0S5piTTX JM8APUCYbRtev8Bz7JHPiQrhu/uhgt7peuom9fAi4bTWDrqE+5godH4k7ZHCapx5QjGr09vHqW3Ga nSUPxLLZgzy3E79y3dhJVqNI6Y2xAFmlT6zA3n13bKm+TMrkEq7yV0TSMHJHnBYT+LdiupP1VqM2F tolL7xGbM+y8ztLoF+diLw==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pyxY5-0005sS-Nn; Tue, 16 May 2023 12:29:17 -0400 Date: Tue, 16 May 2023 19:29:26 +0300 Message-Id: <83edngz1cp.fsf@gnu.org> From: Eli Zaretskii To: Alan Mackenzie In-Reply-To: (message from Alan Mackenzie on Tue, 16 May 2023 16:15:24 +0000) Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs References: <83ttwcz3gg.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Tue, 16 May 2023 16:15:24 +0000 > Cc: 63535@debbugs.gnu.org > From: Alan Mackenzie > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > > > .. Checking char_quoted is wrong. Instead the function should check the > > > current parse state. > > > Why not both? IOW, please explain why char_quoted is not TRT here. > > Because parse-partial-sexp is not scanning the backslash. The scan > starts one character after the backslash, and the syntactic effect of > that backslash is not in the OLDSTATE argument to parse-partial-sexp. Sorry, I still don't follow: char_quoted doesn't call parse-partial-sexp, AFAICT. So why does it matter what parse-partial-sexp does when we are discussing why char_quoted is not TRT? From debbugs-submit-bounces@debbugs.gnu.org Tue May 16 12:59:04 2023 Received: (at 63535) by debbugs.gnu.org; 16 May 2023 16:59:04 +0000 Received: from localhost ([127.0.0.1]:47261 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyy0t-0007Oo-Py for submit@debbugs.gnu.org; Tue, 16 May 2023 12:59:04 -0400 Received: from mx3.muc.de ([193.149.48.5]:41638) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyy0s-0007OI-AN for 63535@debbugs.gnu.org; Tue, 16 May 2023 12:59:02 -0400 Received: (qmail 4763 invoked by uid 3782); 16 May 2023 18:58:56 +0200 Received: from acm.muc.de (p4fe15025.dip0.t-ipconnect.de [79.225.80.37]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 16 May 2023 18:58:55 +0200 Received: (qmail 16770 invoked by uid 1000); 16 May 2023 16:58:55 -0000 Date: Tue, 16 May 2023 16:58:55 +0000 To: Eli Zaretskii Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Message-ID: References: <83ttwcz3gg.fsf@gnu.org> <83edngz1cp.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <83edngz1cp.fsf@gnu.org> X-Submission-Agent: TMDA/1.3.x (Ph3nix) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, Eli. On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote: > > Date: Tue, 16 May 2023 16:15:24 +0000 > > Cc: 63535@debbugs.gnu.org > > From: Alan Mackenzie > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > > > .. Checking char_quoted is wrong. Instead the function should check the > > > > current parse state. > > > Why not both? IOW, please explain why char_quoted is not TRT here. > > Because parse-partial-sexp is not scanning the backslash. The scan > > starts one character after the backslash, and the syntactic effect of > > that backslash is not in the OLDSTATE argument to parse-partial-sexp. > Sorry, I still don't follow: char_quoted doesn't call > parse-partial-sexp, AFAICT. parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted. > So why does it matter what parse-partial-sexp does when we are > discussing why char_quoted is not TRT? parse-partial-sexp is the context in which the bug becomes evident. If, in the C++ line comment with escaped NL, you start parse-partial-sexp at the NL, it behaves as though the scan started at the backslash. This is the bug. The cause of the bug is the use of char_quoted at line 42 of forw_comment. -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Tue May 16 13:50:52 2023 Received: (at 63535) by debbugs.gnu.org; 16 May 2023 17:50:52 +0000 Received: from localhost ([127.0.0.1]:47667 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyyp1-0000WV-Qv for submit@debbugs.gnu.org; Tue, 16 May 2023 13:50:52 -0400 Received: from eggs.gnu.org ([209.51.188.92]:32816) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pyyoz-0000WB-Py for 63535@debbugs.gnu.org; Tue, 16 May 2023 13:50:50 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pyyot-0003e1-9g; Tue, 16 May 2023 13:50:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=USF57KLPdsEtOfNSrbxEUffT/qPWtYyIMBgerjYNvnk=; b=Gp6MUzd7fWxZ l74pwV6b6jDDRMDVeleGF61knlMwClj6EEspjp5cXumvWLAy41pvUtP/FoLgV94FNZtf7f3w0DjGh 759Maj8dshRECaoeUvtONXB5HaZFqMFffzTreiwUD5/VeNLJXn1F/3WHlQDc2ru/A5e3eCD1gAfo7 vPlvN6NvaxGBkw4DRX6YzalPTzYoa8SUpBIuLaPgyK4h+g0GlNT0vkKStCOIN3g+AikEY+zylolbO RKCFdaFTjicubz5Mj5rMeb81AWKgvg+mYilv+U9SArLynf8kNm4JJzzTf/pqwxFtMGNH6Qge5wtE9 2jX7LUYc79D22+IdIz32zg==; Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pyyos-00024E-EU; Tue, 16 May 2023 13:50:42 -0400 Date: Tue, 16 May 2023 20:50:50 +0300 Message-Id: <83bkikyxl1.fsf@gnu.org> From: Eli Zaretskii To: Alan Mackenzie , Stefan Monnier In-Reply-To: (message from Alan Mackenzie on Tue, 16 May 2023 16:58:55 +0000) Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs References: <83ttwcz3gg.fsf@gnu.org> <83edngz1cp.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Tue, 16 May 2023 16:58:55 +0000 > Cc: 63535@debbugs.gnu.org > From: Alan Mackenzie > > On Tue, May 16, 2023 at 19:29:26 +0300, Eli Zaretskii wrote: > > > Date: Tue, 16 May 2023 16:15:24 +0000 > > > Cc: 63535@debbugs.gnu.org > > > From: Alan Mackenzie > > > > > > && !(comment_end_can_be_escaped && char_quoted (from, from_byte)) > > > > > > .. Checking char_quoted is wrong. Instead the function should check the > > > > > current parse state. > > > > > Why not both? IOW, please explain why char_quoted is not TRT here. > > > > Because parse-partial-sexp is not scanning the backslash. The scan > > > starts one character after the backslash, and the syntactic effect of > > > that backslash is not in the OLDSTATE argument to parse-partial-sexp. > > > Sorry, I still don't follow: char_quoted doesn't call > > parse-partial-sexp, AFAICT. > > parse-partial-sexp calls forw_comment which (wrongly) calls char_quoted. > > > So why does it matter what parse-partial-sexp does when we are > > discussing why char_quoted is not TRT? > > parse-partial-sexp is the context in which the bug becomes evident. If, > in the C++ line comment with escaped NL, you start parse-partial-sexp at > the NL, it behaves as though the scan started at the backslash. This is > the bug. > > The cause of the bug is the use of char_quoted at line 42 of > forw_comment. Thanks, let's see what Stefan has to say about this. From debbugs-submit-bounces@debbugs.gnu.org Wed May 17 18:01:47 2023 Received: (at 63535) by debbugs.gnu.org; 17 May 2023 22:01:48 +0000 Received: from localhost ([127.0.0.1]:51330 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pzPDP-0005N5-L8 for submit@debbugs.gnu.org; Wed, 17 May 2023 18:01:47 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:62780) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pzPDN-0005Mp-SE for 63535@debbugs.gnu.org; Wed, 17 May 2023 18:01:46 -0400 Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 09559443697; Wed, 17 May 2023 18:01:40 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 51981443692; Wed, 17 May 2023 18:01:34 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1684360894; bh=zI/VteX/hqjNEsL+Z4kp8El6LOy8W4KdpBtEw51vpZg=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=O+/fGnJ2ElN0SdDcR5Vpm/88Ahe5bRuI+7iDjuFIRi8bZeHPZGgykrSZqs1JchHze yJDgxptX94J40T+k1bxloHSBt1O3SP22rlqytp+xwR61nB+Z4+ZGU4Owt0mW8m+rw4 86+v85hvmjCsAKpJBYqzbbZWzgCFcJFf+p7aQ1CIBxIy0dQ9y0iyxx/sXQrYRjz2O7 gMj0HmaoX6DeTJWQpzeS+Wx/YpL81pTgvUU9SDk4k7MDWfg8VasTNfWbR6E7iotHkr OOzTBvvv+n84e6bd7F9KZtU1eZl1stKp5y1NRUURUFq/sMSaO8+S8/GPcROQINH6Iu Jqd40McwVBFkA== Received: from pastel (unknown [45.72.217.176]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 21A3F1208FE; Wed, 17 May 2023 18:01:34 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs In-Reply-To: (Alan Mackenzie's message of "Tue, 16 May 2023 14:03:18 +0000") Message-ID: References: Date: Wed, 17 May 2023 18:01:32 -0400 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL 0.048 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Alan, > diff --git a/src/syntax.c b/src/syntax.c > index e9e04e2d638..76d9f16e4ed 100644 > --- a/src/syntax.c > +++ b/src/syntax.c > @@ -2344,7 +2344,9 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style > && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? > (nesting > 0 && --nesting == 0) : nesting < 0) > - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) > + && !(comment_end_can_be_escaped && > + (((prev_syntax & 0xff) == Sescape) > + || ((prev_syntax & 0xff) == Scharquote)))) > /* We have encountered a comment end of the same style > as the comment sequence which began this comment > section. */ AFAIK this is your code, so you should know better, but AFAICT `prev_syntax` is not updated in the loop, so it only reflects the syntax before the beginning of the scanned text, rather than anything near `from`. Are you sure this is right? Stefan From debbugs-submit-bounces@debbugs.gnu.org Mon May 22 10:59:45 2023 Received: (at 63535) by debbugs.gnu.org; 22 May 2023 14:59:45 +0000 Received: from localhost ([127.0.0.1]:35753 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q170i-0003yz-Sv for submit@debbugs.gnu.org; Mon, 22 May 2023 10:59:45 -0400 Received: from mx3.muc.de ([193.149.48.5]:10125) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q170f-0003yg-FC for 63535@debbugs.gnu.org; Mon, 22 May 2023 10:59:43 -0400 Received: (qmail 65795 invoked by uid 3782); 22 May 2023 16:59:33 +0200 Received: from acm.muc.de (pd953ae2b.dip0.t-ipconnect.de [217.83.174.43]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Mon, 22 May 2023 16:59:32 +0200 Received: (qmail 2836 invoked by uid 1000); 22 May 2023 14:59:32 -0000 Date: Mon, 22 May 2023 14:59:32 +0000 To: Stefan Monnier Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Submission-Agent: TMDA/1.3.x (Ph3nix) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, Stefan. On Wed, May 17, 2023 at 18:01:32 -0400, Stefan Monnier wrote: > Hi Alan, [ .... ] > AFAIK this is your code, so you should know better, but AFAICT > `prev_syntax` is not updated in the loop, so it only reflects the syntax > before the beginning of the scanned text, rather than anything near `from`. > Are you sure this is right? Thanks, you are correct, the patch was not good. It turned out to be quite tricky to get working. As well as forw_comment, I had to amend scan_sexps_forward to make it return a quoted state to its caller when this happens at the limit of the scan. I think the following patch is better. Would you please have a look at it, in the hope I haven't made any other silly mistakes. Thanks! diff --git a/src/syntax.c b/src/syntax.c index e9e04e2d638..94b2ac2b591 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -2338,13 +2338,16 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, return 0; } c = FETCH_CHAR_AS_MULTIBYTE (from_byte); + prev_syntax = syntax; syntax = SYNTAX_WITH_FLAGS (c); code = syntax & 0xff; if (code == Sendcomment && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? (nesting > 0 && --nesting == 0) : nesting < 0) - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) + && !(comment_end_can_be_escaped + && ((prev_syntax & 0xff) == Sescape + || (prev_syntax & 0xff) == Scharquote))) /* We have encountered a comment end of the same style as the comment sequence which began this comment section. */ @@ -2368,7 +2371,11 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); if (from == stop) continue; /* Failure */ - } + c = FETCH_CHAR_AS_MULTIBYTE (from_byte); + prev_syntax = syntax; + syntax = Smax; + code = syntax; + } inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); @@ -3349,7 +3356,14 @@ do { prev_from = from; \ are invalid now. Luckily, the `done' doesn't use them and the INC_FROM sets them to a sane value without looking at them. */ - if (!found) goto done; + if (!found) + { + if ((prev_from_syntax & 0xff) == Sescape + || (prev_from_syntax & 0xff) == Scharquote) + goto endquoted; + else + goto done; + } INC_FROM; state->incomment = 0; state->comstyle = 0; /* reset the comment style */ > Stefan -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Mon May 22 11:16:50 2023 Received: (at 63535) by debbugs.gnu.org; 22 May 2023 15:16:50 +0000 Received: from localhost ([127.0.0.1]:35859 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q17HF-0004TD-W2 for submit@debbugs.gnu.org; Mon, 22 May 2023 11:16:50 -0400 Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:42491) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q17HE-0004Su-1w for 63535@debbugs.gnu.org; Mon, 22 May 2023 11:16:48 -0400 Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 5611080AFC; Mon, 22 May 2023 11:16:42 -0400 (EDT) Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id D700980800; Mon, 22 May 2023 11:16:40 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1684768600; bh=TsrIfrkCgSbgnN/RojLUaoR9CzknbHdULzkte9u15Dc=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=UmXJV8WDYBYrQbTxCN3hv2vOju6q8ccBYESKYFefUneGXSHG5PE+1k7rzUcj2/Xda +wMzTXJyGF/6CI+OtUK8FBCFnRXrCgvTA9JEri9T0dbWEl5D2mvQU1uta5YWeAv83E sZf4TpQ2jxSihovPGDHXba5frPUZ4lKWohz5VK4pSeX8+byHvWrIM9dCWNemLLKX6j 0SA9KA3oQhLPCwrwmDfHz4v4VxGQ6d7WlSn3P0s4ez8maO2jUFOrjlgVnL1e1oqC0b AL4K3rVF3JNfE2cSPfAY2EYLBhmhEILXjfMkuJA07FWZd/rCSbf/avUdbLWYRlnCX8 P0yvqpWIML4nw== Received: from pastel (unknown [45.72.217.176]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id B1B631202BB; Mon, 22 May 2023 11:16:40 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs In-Reply-To: (Alan Mackenzie's message of "Mon, 22 May 2023 14:59:32 +0000") Message-ID: References: Date: Mon, 22 May 2023 11:16:40 -0400 User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-SPAM-INFO: Spam detection results: 0 ALL_TRUSTED -1 Passed through trusted hosts only via SMTP AWL 0.084 Adjusted score from AWL reputation of From: address BAYES_00 -1.9 Bayes spam probability is 0 to 1% DKIM_SIGNED 0.1 Message has a DKIM or DK signature, not necessarily valid DKIM_VALID -0.1 Message has at least one valid DKIM or DK signature DKIM_VALID_AU -0.1 Message has a valid DKIM or DK signature from author's domain DKIM_VALID_EF -0.1 Message has a valid DKIM or DK signature from envelope-from domain T_SCC_BODY_TEXT_LINE -0.01 - X-SPAM-LEVEL: X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 63535 Cc: 63535@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > I think the following patch is better. Would you please have a look at > it, in the hope I haven't made any other silly mistakes. Thanks! I don't see any silly mistake there, sorry. Stefan PS: It does remind me that we really should do ourselves a favor and get rid of the distinction between `Sescape` and `Scharquote`. IIRC there's a risk of backward incompatibility, so it has to be done progressively, but we should start the process. E.g. first declare one of the two as obsolete, then emit a warning when we see it being used, ... From debbugs-submit-bounces@debbugs.gnu.org Mon May 22 12:17:00 2023 Received: (at 63535-done) by debbugs.gnu.org; 22 May 2023 16:17:00 +0000 Received: from localhost ([127.0.0.1]:36061 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q18DU-0000Ks-BB for submit@debbugs.gnu.org; Mon, 22 May 2023 12:17:00 -0400 Received: from mx3.muc.de ([193.149.48.5]:12471) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1q18DQ-0000KY-MU for 63535-done@debbugs.gnu.org; Mon, 22 May 2023 12:16:59 -0400 Received: (qmail 55688 invoked by uid 3782); 22 May 2023 18:16:50 +0200 Received: from acm.muc.de (pd953ae2b.dip0.t-ipconnect.de [217.83.174.43]) (using STARTTLS) by colin.muc.de (tmda-ofmipd) with ESMTP; Mon, 22 May 2023 18:16:50 +0200 Received: (qmail 13119 invoked by uid 1000); 22 May 2023 16:16:49 -0000 Date: Mon, 22 May 2023 16:16:49 +0000 To: Stefan Monnier Subject: Re: bug#63535: Master branch: Error in forw_comment (syntax.c) handling of escaped LFs Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Submission-Agent: TMDA/1.3.x (Ph3nix) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 63535-done Cc: acm@muc.de, 63535-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, Stefan. On Mon, May 22, 2023 at 11:16:40 -0400, Stefan Monnier wrote: > > I think the following patch is better. Would you please have a look at > > it, in the hope I haven't made any other silly mistakes. Thanks! > I don't see any silly mistake there, sorry. Thanks! I've committed the patch, and I'm closing the bug. > Stefan > PS: It does remind me that we really should do ourselves a favor and get rid > of the distinction between `Sescape` and `Scharquote`. > IIRC there's a risk of backward incompatibility, so it has to be done > progressively, but we should start the process. E.g. first declare one of the > two as obsolete, then emit a warning when we see it being used, ... Yes. I think Sescape should be the survivor. I don't know if Scharquote is used at all in Emacs code. -- Alan Mackenzie (Nuremberg, Germany). From unknown Wed Sep 10 19:15:55 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 20 Jun 2023 11:24:06 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator