From unknown Mon Aug 18 14:19:41 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#23019 <23019@debbugs.gnu.org> To: bug#23019 <23019@debbugs.gnu.org> Subject: Status: parse-partial-sexp doesn't output the full state needed for its continuance. Reply-To: bug#23019 <23019@debbugs.gnu.org> Date: Mon, 18 Aug 2025 21:19:41 +0000 retitle 23019 parse-partial-sexp doesn't output the full state needed for i= ts continuance. reassign 23019 emacs submitter 23019 Alan Mackenzie severity 23019 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 15 05:11:31 2016 Received: (at submit) by debbugs.gnu.org; 15 Mar 2016 09:11:31 +0000 Received: from localhost ([127.0.0.1]:48515 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1afl0l-0003AG-5z for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:11:31 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46028) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1afl0j-00039y-PG for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:11:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afl0d-0001Kr-SO for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:11:24 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:56957) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afl0d-0001Kn-Oq for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:11:23 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36979) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afl0Y-0006nC-PY for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 05:11:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afl0V-0001Hl-Bl for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 05:11:18 -0400 Received: from mail.muc.de ([193.149.48.3]:51332) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afl0V-0001HW-2w for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 05:11:15 -0400 Received: (qmail 8762 invoked by uid 3782); 15 Mar 2016 09:11:13 -0000 Received: from acm.muc.de (p548A54E8.dip0.t-ipconnect.de [84.138.84.232]) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 15 Mar 2016 10:11:11 +0100 Received: (qmail 3166 invoked by uid 1000); 15 Mar 2016 09:13:55 -0000 Date: Tue, 15 Mar 2016 09:13:55 +0000 To: bug-gnu-emacs@gnu.org Subject: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160315091355.GA2263@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) Hello, Emacs. When parse-partial-sexp finishes a parse, it fails to record whether or not its end point is just after the first character of a two character comment starter or ender. When the resulting state is used as an argument to resume the parse, p-p-s will be unaware that the comment has started or ended and produce false results. Proposed solution: Add an extra element to the parser state, recording the syntax of the last character passed over before the end of the parse. This would be used by parse-partial-sexp to initialise its parse. Also: the existing element 9 (the list of currently open parens) and the new element should be explicitly documented in the Elisp manual, together with a statement that there may be further elements in the parse state used internally by parse-partial-sexp (for future expansion). -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 15 05:34:01 2016 Received: (at submit) by debbugs.gnu.org; 15 Mar 2016 09:34:01 +0000 Received: from localhost ([127.0.0.1]:48522 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aflMX-0003hO-3T for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:34:01 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52454) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aflMV-0003h6-E8 for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:33:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aflMK-0007uv-Gw for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:33:54 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:35898) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aflMK-0007uq-Db for submit@debbugs.gnu.org; Tue, 15 Mar 2016 05:33:48 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43384) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aflMJ-0002KO-ER for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 05:33:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aflME-0007se-Em for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 05:33:47 -0400 Received: from mout.kundenserver.de ([212.227.126.134]:61274) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aflME-0007sV-4w for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 05:33:42 -0400 Received: from [192.168.178.35] ([77.3.14.174]) by mrelayeu.kundenserver.de (mreue004) with ESMTPSA (Nemesis) id 0MSFaB-1aGiDf0xxr-00TYEI; Tue, 15 Mar 2016 10:33:40 +0100 Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. To: bug-gnu-emacs@gnu.org References: <20160315091355.GA2263@acm.fritz.box> From: =?UTF-8?Q?Andreas_R=c3=b6hler?= Message-ID: <56E7D74C.4070805@easy-emacs.de> Date: Tue, 15 Mar 2016 10:35:08 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Icedove/38.5.0 MIME-Version: 1.0 In-Reply-To: <20160315091355.GA2263@acm.fritz.box> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:GojAa+tcayaQN/WDVp0/j9cla8X6Qsop9p6Cnuxhf6ptl4YJf/N 6atQkCtVe3l7ENTb7kwX3QgHDzddky9rjpy72fMfx5copmcOeBUQOL+HSBjNHhaWaHwzey8 nzNR9aaXlXfgAR5w9SLlgnzi3yK9uc6ghWY/+MQ/W71Euc11ndj6wTk1nCnrkwy00a+KD83 SrPcS9OoiBUUGQOLMACqQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:6+/vvnIlhAo=:iA7V4IiE8BZ5dZh5psLEfJ +kaWV5MmHCHC3Bb2fu6nW8CcXBZwjM+/UNqnIfPmBMIuTEYvO0ls/hbijS1zA1UGLFt32usdf eaAbw2CP8czQ4snLDm1BJUjkGhJxAbczq0uq471PgFSgljq1O2hyn02eMbgDylLJvt470Xgm9 U8VwbLScitEae9gjnrPypnFojeCJLYyavnVXnnXU+0nCVSOGuzkA3vP9cr+WEjV3dCh9v9Yw/ U9W45sWOSn0dS7aqrTvOQtwjySlrTOsEEhYVEzsG+brQzG752cuDubIfv0xA7zSsZ/Z6oPbbn 58QNliWauqh9pIFKxAYmsjoiUcZQCuRsBO1SdKuE/22F/en1xkcNAKZAY+Zpur5PvD0iEGRFm +7LqN6NHctKA/u4REMdUARSL0GtepVWm6iNBmf0VTVuoC187m+vtA9xT1TxFlpczk7kLGFZZW Oh+KRwRP/Holt/q7rg0onHYJ3UZeYwdOSHMrCtoyoND5rw8+rDYtNuTon7yb26sXGgqmwBptb 7A6QNFs5vGsWz02BXuBT8idmpTP1mbaMzpnWbokm8++WlKe+gH8Ra6qNvv0K5q0DKvWSNkGtQ ooh/cBsU7/lKZL53jxG1Eeqr+o8tJLUDJe+HKzEW5/qasaR8c4/yV8veyQk6YP+K7o97OwRnx Ox52ahT/Antyy8DDlRbTBQ0Q4OK2YBHQjzxf7NnQL4TG5TjYgPlrv7vpxz9J2OYOrFYPU85VJ GU/X+6vprO7/x+3Y X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit Cc: Alan Mackenzie X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) On 15.03.2016 10:13, Alan Mackenzie wrote: > Hello, Emacs. > > When parse-partial-sexp finishes a parse, it fails to record whether or > not its end point is just after the first character of a two character > comment starter or ender. When the resulting state is used as an > argument to resume the parse, p-p-s will be unaware that the comment has > started or ended and produce false results. > > Proposed solution: Add an extra element to the parser state, recording the > syntax of the last character passed over before the end of the parse. > This would be used by parse-partial-sexp to initialise its parse. > > Also: the existing element 9 (the list of currently open parens) and the > new element should be explicitly documented in the Elisp manual, together > with a statement that there may be further elements in the parse state > used internally by parse-partial-sexp (for future expansion). > Hi Alan, a comment start might be composed not just by two characters, but by three or more. What then? Cheers, Andreas From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 15 06:12:55 2016 Received: (at submit) by debbugs.gnu.org; 15 Mar 2016 10:12:55 +0000 Received: from localhost ([127.0.0.1]:48544 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aflyA-0004b1-Og for submit@debbugs.gnu.org; Tue, 15 Mar 2016 06:12:54 -0400 Received: from eggs.gnu.org ([208.118.235.92]:37699) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1afly9-0004ak-3A for submit@debbugs.gnu.org; Tue, 15 Mar 2016 06:12:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afly3-0008DK-00 for submit@debbugs.gnu.org; Tue, 15 Mar 2016 06:12:47 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:47043) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afly2-0008DG-TQ for submit@debbugs.gnu.org; Tue, 15 Mar 2016 06:12:46 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56881) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afly1-0006bo-Vd for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 06:12:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aflxy-00088F-LC for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 06:12:45 -0400 Received: from mail.muc.de ([193.149.48.3]:20852) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aflxy-00087x-CR for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 06:12:42 -0400 Received: (qmail 21107 invoked by uid 3782); 15 Mar 2016 10:12:41 -0000 Received: from acm.muc.de (p548A54E8.dip0.t-ipconnect.de [84.138.84.232]) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 15 Mar 2016 11:12:38 +0100 Received: (qmail 3502 invoked by uid 1000); 15 Mar 2016 10:15:21 -0000 Date: Tue, 15 Mar 2016 10:15:21 +0000 To: Andreas =?iso-8859-1?Q?R=F6hler?= Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160315101521.GB2263@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <56E7D74C.4070805@easy-emacs.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <56E7D74C.4070805@easy-emacs.de> User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-Debbugs-Envelope-To: submit Cc: bug-gnu-emacs@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) Hello, Andreas. On Tue, Mar 15, 2016 at 10:35:08AM +0100, Andreas Röhler wrote: > On 15.03.2016 10:13, Alan Mackenzie wrote: > > Hello, Emacs. > > When parse-partial-sexp finishes a parse, it fails to record whether or > > not its end point is just after the first character of a two character > > comment starter or ender. When the resulting state is used as an > > argument to resume the parse, p-p-s will be unaware that the comment has > > started or ended and produce false results. > > Proposed solution: Add an extra element to the parser state, recording the > > syntax of the last character passed over before the end of the parse. > > This would be used by parse-partial-sexp to initialise its parse. > > Also: the existing element 9 (the list of currently open parens) and the > > new element should be explicitly documented in the Elisp manual, together > > with a statement that there may be further elements in the parse state > > used internally by parse-partial-sexp (for future expansion). > a comment start might be composed not just by two characters, but by > three or more. What then? We'd have to start thinking about extending parse-partial-sexp, or invent some workaround. Maybe. There must be some languages (?html) where this is the case. What is done in these? > Cheers, > Andreas -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 15 09:37:42 2016 Received: (at submit) by debbugs.gnu.org; 15 Mar 2016 13:37:42 +0000 Received: from localhost ([127.0.0.1]:48656 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1afpAM-0002p1-Fv for submit@debbugs.gnu.org; Tue, 15 Mar 2016 09:37:42 -0400 Received: from eggs.gnu.org ([208.118.235.92]:42899) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1afpAK-0002oo-FD for submit@debbugs.gnu.org; Tue, 15 Mar 2016 09:37:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afpAE-0005nO-91 for submit@debbugs.gnu.org; Tue, 15 Mar 2016 09:37:35 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:37388) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afpAE-0005nJ-5z for submit@debbugs.gnu.org; Tue, 15 Mar 2016 09:37:34 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33863) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afpAD-00064o-4y for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 09:37:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1afpA8-0005m9-4q for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 09:37:33 -0400 Received: from mout.kundenserver.de ([212.227.17.13]:57292) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1afpA7-0005lv-Rs for bug-gnu-emacs@gnu.org; Tue, 15 Mar 2016 09:37:28 -0400 Received: from [192.168.178.35] ([77.3.14.174]) by mrelayeu.kundenserver.de (mreue104) with ESMTPSA (Nemesis) id 0MJU4Z-1aeSeT0aEN-0033by; Tue, 15 Mar 2016 14:37:26 +0100 Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. To: Alan Mackenzie References: <20160315091355.GA2263@acm.fritz.box> <56E7D74C.4070805@easy-emacs.de> <20160315101521.GB2263@acm.fritz.box> From: =?UTF-8?Q?Andreas_R=c3=b6hler?= Message-ID: <56E8106E.7020402@easy-emacs.de> Date: Tue, 15 Mar 2016 14:38:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Icedove/38.5.0 MIME-Version: 1.0 In-Reply-To: <20160315101521.GB2263@acm.fritz.box> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Provags-ID: V03:K0:0OssIvbLDFs9Sm9ODr6Xnnj2ICLhpZa/9z+IBo0XuEUOH/VuTIV b6lunIyg3oYTlo1ZbjODv8gIOLY1ldbXXFYFVNe6lmLCYPnvl3WyTnE7AeTaHFZqfqrqNLc 0aEmyBdOIrPslltj2VOwINhrJpCXOBtZV1qZSji6hopfadrYGnp9rNa6ym06Y57zlFR2Xv5 Ba4k6LyNXoE+4mb5RXKdQ== X-UI-Out-Filterresults: notjunk:1;V01:K0:pbUgErAZ0p8=:YiMK5v/6S1iwKxJLMVSgvj G6pjzjgj1aE7w+fFFQ0jf23O2kvp8LFLv/nrDayeQTXLoG8yL/jx8P/jVtbEjdjAMN1ZBQAF2 PHeS7GNb9WVgG/hPt5ANUD8n2/zfm06cwZj6VOupxvhA5NeJvQx6uuhxIgdq/ArD8P7kQWg33 ZjohCKwmVWHA0cgFdxtJi96PgR9pmVskndmJyNhglRjv7UktAlveT6pOh6fdY5U6Craau9Kll qOqW0BwHfKAy/0iewzbj9VappJ0RBXURoMZ7VtQHRiH1pPuR8zY9ntcnsHBWFyO/Fm97/eJ9d MR6lSEw6EeSPs5mGcIlWNArlg57OhvDTKut2wnYeSnxaVu6gk2a1Kr94vEBbVkh2IJ4NPskvU vj97CH4rl2Os7U3DYLI/vUTFCU5VUqu3M4yrUUcYKCrR0jiMy7GD63cob9j2wh5fCrWFwQu/U AaABwF6tbPYheZys3qz8OjakXN/7D996pFuuqVBTS7fqyJvpse8UJfwnpYzYtYDwENADsPn86 mBC2gzNsokvqjfqyT8JLJUUgzb6oZVHegx0vbohEvwpNakh9wBNOsDc2yw3WxhzZQ+yiR3FIs e3d5sdXe6aED0JWgEugdc7Cn6dyNL68etjz7qEih4hxeHFfwykeeF195MKlESDRvVsTH8cLKJ UN/t3H5W5trMZk7lFeXsNaLbxG/qmrWlAAWpyGVSqxuEj8L9eCaich6LxwE1zNZS1e+BDj9ee COP/UuukiS+DOw/m X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit Cc: bug-gnu-emacs@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) On 15.03.2016 11:15, Alan Mackenzie wrote: > Hello, Andreas. > > On Tue, Mar 15, 2016 at 10:35:08AM +0100, Andreas Röhler wrote: > > >> On 15.03.2016 10:13, Alan Mackenzie wrote: >>> Hello, Emacs. >>> When parse-partial-sexp finishes a parse, it fails to record whether or >>> not its end point is just after the first character of a two character >>> comment starter or ender. When the resulting state is used as an >>> argument to resume the parse, p-p-s will be unaware that the comment has >>> started or ended and produce false results. >>> Proposed solution: Add an extra element to the parser state, recording the >>> syntax of the last character passed over before the end of the parse. >>> This would be used by parse-partial-sexp to initialise its parse. >>> Also: the existing element 9 (the list of currently open parens) and the >>> new element should be explicitly documented in the Elisp manual, together >>> with a statement that there may be further elements in the parse state >>> used internally by parse-partial-sexp (for future expansion). > >> a comment start might be composed not just by two characters, but by >> three or more. What then? > We'd have to start thinking about extending parse-partial-sexp, or > invent some workaround. Maybe. There must be some languages (?html) > where this is the case. What is done in these? May you send me this (or more) example use-cases? Couldn't find the one already given, sorry. Addressed this issue in my generic beg-end.el https://github.com/andreas-roehler/werkstatt/blob/master/subroutines/beg-end.el In case beg-end forms used a start-string, look if the char-at-point would match this string. Then look if the char-before is before in string, etc. From debbugs-submit-bounces@debbugs.gnu.org Thu Mar 17 08:58:36 2016 Received: (at 23019) by debbugs.gnu.org; 17 Mar 2016 12:58:36 +0000 Received: from localhost ([127.0.0.1]:50735 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agXVc-0003zl-5U for submit@debbugs.gnu.org; Thu, 17 Mar 2016 08:58:36 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:25472) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agXVZ-0003zX-C1 for 23019@debbugs.gnu.org; Thu, 17 Mar 2016 08:58:34 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0A3FgA731xV/6jw92hcgxCEAoVVwwsEAgKBPDwRAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSQuiAkIzyMBAQEHAgEfizqFBQeELQWQNKRQI4FmVYFZIoJ4AQEB X-IPAS-Result: A0A3FgA731xV/6jw92hcgxCEAoVVwwsEAgKBPDwRAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSQuiAkIzyMBAQEHAgEfizqFBQeELQWQNKRQI4FmVYFZIoJ4AQEB X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="196475014" Received: from 104-247-240-168.cpe.teksavvy.com (HELO pastel.home) ([104.247.240.168]) by ironport2-out.teksavvy.com with ESMTP; 17 Mar 2016 08:58:27 -0400 Received: by pastel.home (Postfix, from userid 20848) id 59CAD6405A; Thu, 17 Mar 2016 08:58:27 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> Date: Thu, 17 Mar 2016 08:58:27 -0400 In-Reply-To: <20160315091355.GA2263@acm.fritz.box> (Alan Mackenzie's message of "Tue, 15 Mar 2016 09:13:55 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) > Proposed solution: Add an extra element to the parser state, recording the > syntax of the last character passed over before the end of the parse. > This would be used by parse-partial-sexp to initialise its parse. Another option is to record "the start of current element" (in case we were in the middle of an element). This could potentially reuse (nth 5 ppss) by generalizing it, or it could use a new entry. The choice probably doesn't matter much and will probably be more a question of "what's easier to implement". > Also: the existing element 9 (the list of currently open parens) and the > new element should be explicitly documented in the Elisp manual, together > with a statement that there may be further elements in the parse state > used internally by parse-partial-sexp (for future expansion). Indeed. Andreas R=F6hler added: > a comment start might be composed not just by two characters, but by three > or more. What then? Andreas, I suggest that you go back and take a closer look at parse-partial-sexp, syntax-ppss, and syntax-tables in general because lately you've made several comments like the one here which show you're just not familiar with the topic at all. Syntax tables do not support comment markers longer than 2 characters (currently). Emacs supports those via the `syntax-table' text-property only (which typically marks the first char of each "long comment starter" as being "the comment starter"). Stefan From debbugs-submit-bounces@debbugs.gnu.org Thu Mar 17 17:46:56 2016 Received: (at 23019) by debbugs.gnu.org; 17 Mar 2016 21:46:56 +0000 Received: from localhost ([127.0.0.1]:51512 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agfkt-0006Wr-G0 for submit@debbugs.gnu.org; Thu, 17 Mar 2016 17:46:56 -0400 Received: from mail.muc.de ([193.149.48.3]:25683) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agfkr-0006Wj-GC for 23019@debbugs.gnu.org; Thu, 17 Mar 2016 17:46:54 -0400 Received: (qmail 43228 invoked by uid 3782); 17 Mar 2016 21:46:52 -0000 Received: from acm.muc.de (p548A5932.dip0.t-ipconnect.de [84.138.89.50]) by colin.muc.de (tmda-ofmipd) with ESMTP; Thu, 17 Mar 2016 22:46:47 +0100 Received: (qmail 26919 invoked by uid 1000); 17 Mar 2016 21:49:34 -0000 Date: Thu, 17 Mar 2016 21:49:34 +0000 To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160317214934.GB9038@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On Thu, Mar 17, 2016 at 08:58:27AM -0400, Stefan Monnier wrote: > > Proposed solution: Add an extra element to the parser state, recording the > > syntax of the last character passed over before the end of the parse. > > This would be used by parse-partial-sexp to initialise its parse. > Another option is to record "the start of current element" (in case we > were in the middle of an element). This could potentially reuse (nth > 5 ppss) by generalizing it, or it could use a new entry. > The choice probably doesn't matter much and will probably be more > a question of "what's easier to implement". > > Also: the existing element 9 (the list of currently open parens) and the > > new element should be explicitly documented in the Elisp manual, together > > with a statement that there may be further elements in the parse state > > used internally by parse-partial-sexp (for future expansion). > Indeed. OK, I've got a patch ready. It's bigger than anticipated, purely because it also does some refactoring. It actually adds two elements to the parser state, and I believe that makes the parser state complete. Here's the patch: Enhance parse-partial-sexp correctly to handle two character commit delimiters Do this by adding two new fields to the parser state: the syntax of the last character scanned, and the last end of comment scanned. This should make the parser state complete. Also document element 9 of the parser state. Also refactor the code a bit. * src/syntax.c (struct lisp_parse_state): Add two new fields. (internalize_parse_state): New function, extracted from scan_sexps_forward. (back_comment): Call internalize_parse_state. (forw_comment): Return the syntax of the last character scanned to the caller. (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment. (scan_sexps_forward): Remove a redundant state parameter. Access all `state' information via the address parameter `state'. Remove the code which converts from external to internal form of `state'. Access buffer contents only from `from' onwards. Reformulate code at the top of the main loop correctly to recognize comment openers when starting in the middle of one. Call forw_comment with extra argument (for return of final syntax value). (Fparse_partial_sexp): Document elements 9, 10, 11 of the parser state in the doc string. Clarify the doc string in general. Call internalize_parse_state. Take account of the new elements when consing up the output parser state. * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new elements 10 and 11. Minor wording corrections (remove reference to "trivial cases"). (Low Level Parsing): Minor corrections diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi index d5a7eba..67a00d7 100644 --- a/doc/lispref/syntax.texi +++ b/doc/lispref/syntax.texi @@ -791,10 +791,10 @@ Parser State @subsection Parser State @cindex parser state - A @dfn{parser state} is a list of ten elements describing the state -of the syntactic parser, after it parses the text between a specified -starting point and a specified end point in the buffer. Parsing -functions such as @code{syntax-ppss} + A @dfn{parser state} is a list of (currently) twelve elements +describing the state of the syntactic parser, after it parses the text +between a specified starting point and a specified end point in the +buffer. Parsing functions such as @code{syntax-ppss} @ifnottex (@pxref{Position Parse}) @end ifnottex @@ -851,15 +851,21 @@ Parser State this element is @code{nil}. @item -Internal data for continuing the parsing. The meaning of this -data is subject to change; it is used if you pass this list -as the @var{state} argument to another call. +The list of the positions of the currently open parentheses, starting +with the outermost. + +@item +The @var{syntax-code} (@pxref{Syntax Table Internals}) of the last +buffer position scanned, or @code{nil} if no scanning has happened. + +@item +The position after the previous end of comment, or @code{nil} if the +scanning has not passed a comment end. @end enumerate Elements 1, 2, and 6 are ignored in a state which you pass as an -argument to continue parsing, and elements 8 and 9 are used only in -trivial cases. Those elements are mainly used internally by the -parser code. +argument to continue parsing. Elements 9 to 11 are mainly used +internally by the parser code. One additional piece of useful information is available from a parser state using this function: @@ -898,11 +904,11 @@ Low-Level Parsing If the fourth argument @var{stop-before} is non-@code{nil}, parsing stops when it comes to any character that starts a sexp. If -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the -start of an unnested comment. If @var{stop-comment} is the symbol +@var{stop-comment} is non-@code{nil}, parsing stops after the start of +an unnested comment. If @var{stop-comment} is the symbol @code{syntax-table}, parsing stops after the start of an unnested -comment or a string, or the end of an unnested comment or a string, -whichever comes first. +comment or a string, or after the end of an unnested comment or a +string, whichever comes first. If @var{state} is @code{nil}, @var{start} is assumed to be at the top level of parenthesis structure, such as the beginning of a function diff --git a/src/syntax.c b/src/syntax.c index 249d0d5..11b1ff0 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -153,6 +153,9 @@ struct lisp_parse_state ptrdiff_t comstr_start; /* Position of last comment/string starter. */ Lisp_Object levelstarts; /* Char numbers of starts-of-expression of levels (starting from outermost). */ + int prev_syntax; /* Syntax of previous character scanned, or Smax. */ + ptrdiff_t prev_comment_end; /* Position after end of last closed + comment, or -1. */ }; /* These variables are a cache for finding the start of a defun. @@ -176,7 +179,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object); static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool); static void scan_sexps_forward (struct lisp_parse_state *, ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT, - bool, Lisp_Object, int); + bool, int); +static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *); static bool in_classes (int, Lisp_Object); static void parse_sexp_propertize (ptrdiff_t charpos); @@ -911,10 +915,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, } do { + internalize_parse_state (Qnil, &state); scan_sexps_forward (&state, defun_start, defun_start_byte, comment_end, TYPE_MINIMUM (EMACS_INT), - 0, Qnil, 0); + 0, 0); defun_start = comment_end; if (!adjusted) { @@ -2314,7 +2319,9 @@ in_classes (int c, Lisp_Object iso_classes) into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR. Else, return false and store the charpos STOP into *CHARPOS_PTR, the corresponding bytepos into *BYTEPOS_PTR and the current nesting - (as defined for state.incomment) in *INCOMMENT_PTR. + (as defined for state->incomment) in *INCOMMENT_PTR. The + SYNTAX_WITH_FLAGS of the last character scanned in the comment is + stored into *last_syntax_ptr. The comment end is the last character of the comment rather than the character just after the comment. @@ -2326,7 +2333,7 @@ static bool forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, EMACS_INT nesting, int style, int prev_syntax, ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr, - EMACS_INT *incomment_ptr) + EMACS_INT *incomment_ptr, int *last_syntax_ptr) { register int c, c1; register enum syntaxcode code; @@ -2346,6 +2353,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, *incomment_ptr = nesting; *charpos_ptr = from; *bytepos_ptr = from_byte; + *last_syntax_ptr = syntax; return 0; } c = FETCH_CHAR_AS_MULTIBYTE (from_byte); @@ -2415,6 +2423,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, } *charpos_ptr = from; *bytepos_ptr = from_byte; + *last_syntax_ptr = syntax; return 1; } @@ -2436,6 +2445,7 @@ between them, return t; otherwise return nil. */) EMACS_INT count1; ptrdiff_t out_charpos, out_bytepos; EMACS_INT dummy; + int dummy2; CHECK_NUMBER (count); count1 = XINT (count); @@ -2499,7 +2509,7 @@ between them, return t; otherwise return nil. */) } /* We're at the start of a comment. */ found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, - &out_charpos, &out_bytepos, &dummy); + &out_charpos, &out_bytepos, &dummy, &dummy2); from = out_charpos; from_byte = out_bytepos; if (!found) { @@ -2659,6 +2669,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) ptrdiff_t from_byte; ptrdiff_t out_bytepos, out_charpos; EMACS_INT dummy; + int dummy2; bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol; if (depth > 0) min_depth = 0; @@ -2755,7 +2766,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) UPDATE_SYNTAX_TABLE_FORWARD (from); found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, - &out_charpos, &out_bytepos, &dummy); + &out_charpos, &out_bytepos, &dummy, + &dummy2); from = out_charpos, from_byte = out_bytepos; if (!found) { @@ -3119,7 +3131,7 @@ the prefix syntax flag (p). */) } /* Parse forward from FROM / FROM_BYTE to END, - assuming that FROM has state OLDSTATE (nil means FROM is start of function), + assuming that FROM has state STATE (nil means FROM is start of function), and return a description of the state of the parse at END. If STOPBEFORE, stop at the start of an atom. If COMMENTSTOP is 1, stop at the start of a comment. @@ -3127,12 +3139,11 @@ the prefix syntax flag (p). */) after the beginning of a string, or after the end of a string. */ static void -scan_sexps_forward (struct lisp_parse_state *stateptr, +scan_sexps_forward (struct lisp_parse_state *state, ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end, EMACS_INT targetdepth, bool stopbefore, - Lisp_Object oldstate, int commentstop) + int commentstop) { - struct lisp_parse_state state; enum syntaxcode code; int c1; bool comnested; @@ -3148,7 +3159,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, Lisp_Object tem; ptrdiff_t prev_from; /* Keep one character before FROM. */ ptrdiff_t prev_from_byte; - int prev_from_syntax; + int prev_from_syntax, prev_prev_from_syntax; bool boundary_stop = commentstop == -1; bool nofence; bool found; @@ -3165,6 +3176,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, do { prev_from = from; \ prev_from_byte = from_byte; \ temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte); \ + prev_prev_from_syntax = prev_from_syntax; \ prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \ INC_BOTH (from, from_byte); \ if (from < end) \ @@ -3174,88 +3186,38 @@ do { prev_from = from; \ immediate_quit = 1; QUIT; - if (NILP (oldstate)) - { - depth = 0; - state.instring = -1; - state.incomment = 0; - state.comstyle = 0; /* comment style a by default. */ - state.comstr_start = -1; /* no comment/string seen. */ - } - else - { - tem = Fcar (oldstate); - if (!NILP (tem)) - depth = XINT (tem); - else - depth = 0; - - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - /* Check whether we are inside string_fence-style string: */ - state.instring = (!NILP (tem) - ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) - : -1); - - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.incomment = (!NILP (tem) - ? (INTEGERP (tem) ? XINT (tem) : -1) - : 0); - - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - start_quoted = !NILP (tem); + depth = state->depth; + start_quoted = state->quoted; + prev_prev_from_syntax = Smax; + prev_from_syntax = state->prev_syntax; - /* if the eighth element of the list is nil, we are in comment - style a. If it is non-nil, we are in comment style b */ - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.comstyle = (NILP (tem) - ? 0 - : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) - ? XINT (tem) - : ST_COMMENT_STYLE)); - - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.comstr_start = - RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - while (!NILP (tem)) /* >= second enclosing sexps. */ - { - Lisp_Object temhd = Fcar (tem); - if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) - curlevel->last = XINT (temhd); - if (++curlevel == endlevel) - curlevel--; /* error ("Nesting too deep for parser"); */ - curlevel->prev = -1; - curlevel->last = -1; - tem = Fcdr (tem); - } + tem = state->levelstarts; + while (!NILP (tem)) /* >= second enclosing sexps. */ + { + Lisp_Object temhd = Fcar (tem); + if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) + curlevel->last = XINT (temhd); + if (++curlevel == endlevel) + curlevel--; /* error ("Nesting too deep for parser"); */ + curlevel->prev = -1; + curlevel->last = -1; + tem = Fcdr (tem); } - state.quoted = 0; - mindepth = depth; - curlevel->prev = -1; curlevel->last = -1; - SETUP_SYNTAX_TABLE (prev_from, 1); - temp = FETCH_CHAR (prev_from_byte); - prev_from_syntax = SYNTAX_WITH_FLAGS (temp); - UPDATE_SYNTAX_TABLE_FORWARD (from); + state->quoted = 0; + mindepth = depth; + + SETUP_SYNTAX_TABLE (from, 1); /* Enter the loop at a place appropriate for initial state. */ - if (state.incomment) + if (state->incomment) goto startincomment; - if (state.instring >= 0) + if (state->instring >= 0) { - nofence = state.instring != ST_STRING_STYLE; + nofence = state->instring != ST_STRING_STYLE; if (start_quoted) goto startquotedinstring; goto startinstring; @@ -3266,10 +3228,10 @@ do { prev_from = from; \ while (from < end) { int syntax; - INC_FROM; - code = prev_from_syntax & 0xff; if (from < end + && (state->prev_comment_end == -1 + || prev_from >= state->prev_comment_end) && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) && (c1 = FETCH_CHAR (from_byte), syntax = SYNTAX_WITH_FLAGS (c1), @@ -3280,32 +3242,37 @@ do { prev_from = from; \ /* Record the comment style we have entered so that only the comment-end sequence of the same style actually terminates the comment section. */ - state.comstyle + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax); comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) | SYNTAX_FLAGS_COMMENT_NESTED (syntax)); - state.incomment = comnested ? 1 : -1; - state.comstr_start = prev_from; + state->incomment = comnested ? 1 : -1; + state->comstr_start = prev_from; INC_FROM; code = Scomment; } - else if (code == Scomment_fence) - { - /* Record the comment style we have entered so that only - the comment-end sequence of the same style actually - terminates the comment section. */ - state.comstyle = ST_COMMENT_STYLE; - state.incomment = -1; - state.comstr_start = prev_from; - code = Scomment; - } - else if (code == Scomment) - { - state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); - state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? - 1 : -1); - state.comstr_start = prev_from; - } + else + { + INC_FROM; + code = prev_from_syntax & 0xff; + if (code == Scomment_fence) + { + /* Record the comment style we have entered so that only + the comment-end sequence of the same style actually + terminates the comment section. */ + state->comstyle = ST_COMMENT_STYLE; + state->incomment = -1; + state->comstr_start = prev_from; + code = Scomment; + } + else if (code == Scomment) + { + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); + state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? + 1 : -1); + state->comstr_start = prev_from; + } + } if (SYNTAX_FLAGS_PREFIX (prev_from_syntax)) continue; @@ -3357,18 +3324,21 @@ do { prev_from = from; \ middle of it. We don't want to do that if we're just at the beginning of the comment (think of (*) ... (*)). */ found = forw_comment (from, from_byte, end, - state.incomment, state.comstyle, - (from == BEGV || from < state.comstr_start + 3) + state->incomment, state->comstyle, + (from == BEGV || from < state->comstr_start + 3) ? 0 : prev_from_syntax, - &out_charpos, &out_bytepos, &state.incomment); + &out_charpos, &out_bytepos, &state->incomment, + &prev_from_syntax); from = out_charpos; from_byte = out_bytepos; - /* Beware! prev_from and friends are invalid now. - Luckily, the `done' doesn't use them and the INC_FROM - sets them to a sane value without looking at them. */ + /* Beware! prev_from and friends (except prev_from_syntax) + are invalid now. Luckily, the `done' doesn't use them + and the INC_FROM sets them to a sane value without + looking at them. */ if (!found) goto done; INC_FROM; - state.incomment = 0; - state.comstyle = 0; /* reset the comment style */ + state->incomment = 0; + state->comstyle = 0; /* reset the comment style */ + state->prev_comment_end = from; if (boundary_stop) goto done; break; @@ -3396,16 +3366,16 @@ do { prev_from = from; \ case Sstring: case Sstring_fence: - state.comstr_start = from - 1; + state->comstr_start = from - 1; if (stopbefore) goto stop; /* this arg means stop at sexp start */ curlevel->last = prev_from; - state.instring = (code == Sstring + state->instring = (code == Sstring ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte)) : ST_STRING_STYLE); if (boundary_stop) goto done; startinstring: { - nofence = state.instring != ST_STRING_STYLE; + nofence = state->instring != ST_STRING_STYLE; while (1) { @@ -3419,7 +3389,7 @@ do { prev_from = from; \ /* Check C_CODE here so that if the char has a syntax-table property which says it is NOT a string character, it does not end the string. */ - if (nofence && c == state.instring && c_code == Sstring) + if (nofence && c == state->instring && c_code == Sstring) break; switch (c_code) @@ -3442,7 +3412,7 @@ do { prev_from = from; \ } } string_end: - state.instring = -1; + state->instring = -1; curlevel->prev = curlevel->last; INC_FROM; if (boundary_stop) goto done; @@ -3461,25 +3431,99 @@ do { prev_from = from; \ stop: /* Here if stopping before start of sexp. */ from = prev_from; /* We have just fetched the char that starts it; */ from_byte = prev_from_byte; + prev_from_syntax = prev_prev_from_syntax; goto done; /* but return the position before it. */ endquoted: - state.quoted = 1; + state->quoted = 1; done: - state.depth = depth; - state.mindepth = mindepth; - state.thislevelstart = curlevel->prev; - state.prevlevelstart + state->depth = depth; + state->mindepth = mindepth; + state->thislevelstart = curlevel->prev; + state->prevlevelstart = (curlevel == levelstart) ? -1 : (curlevel - 1)->last; - state.location = from; - state.location_byte = from_byte; - state.levelstarts = Qnil; + state->location = from; + state->location_byte = from_byte; + state->levelstarts = Qnil; while (curlevel > levelstart) - state.levelstarts = Fcons (make_number ((--curlevel)->last), - state.levelstarts); + state->levelstarts = Fcons (make_number ((--curlevel)->last), + state->levelstarts); + state->prev_syntax = prev_from_syntax; immediate_quit = 0; +} + +/* Convert a (lisp) parse state to the internal form used in + scan_sexps_forward. */ +static void +internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state) +{ + Lisp_Object tem; + + if (NILP (external)) + { + state->depth = 0; + state->instring = -1; + state->incomment = 0; + state->quoted = 0; + state->comstyle = 0; /* comment style a by default. */ + state->comstr_start = -1; /* no comment/string seen. */ + state->levelstarts = Qnil; + state->prev_syntax = Smax; + state->prev_comment_end = -1; + } + else + { + tem = Fcar (external); + if (!NILP (tem)) + state->depth = XINT (tem); + else + state->depth = 0; + + external = Fcdr (external); + external = Fcdr (external); + external = Fcdr (external); + tem = Fcar (external); + /* Check whether we are inside string_fence-style string: */ + state->instring = (!NILP (tem) + ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) + : -1); + + external = Fcdr (external); + tem = Fcar (external); + state->incomment = (!NILP (tem) + ? (INTEGERP (tem) ? XINT (tem) : -1) + : 0); - *stateptr = state; + external = Fcdr (external); + tem = Fcar (external); + state->quoted = !NILP (tem); + + /* if the eighth element of the list is nil, we are in comment + style a. If it is non-nil, we are in comment style b */ + external = Fcdr (external); + external = Fcdr (external); + tem = Fcar (external); + state->comstyle = (NILP (tem) + ? 0 + : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) + ? XINT (tem) + : ST_COMMENT_STYLE)); + + external = Fcdr (external); + tem = Fcar (external); + state->comstr_start = + RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; + external = Fcdr (external); + tem = Fcar (external); + state->levelstarts = tem; + + external = Fcdr (external); + tem = Fcar (external); + state->prev_syntax = NILP (tem) ? Smax : XINT (tem); + external = Fcdr (external); + tem = Fcar (external); + state->prev_comment_end = NILP (tem) ? -1 : XINT (tem); + } } DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0, @@ -3488,6 +3532,7 @@ Parsing stops at TO or when certain criteria are met; point is set to where parsing stops. If fifth arg OLDSTATE is omitted or nil, parsing assumes that FROM is the beginning of a function. + Value is a list of elements describing final state of parsing: 0. depth in parens. 1. character address of start of innermost containing list; nil if none. @@ -3501,16 +3546,20 @@ Value is a list of elements describing final state of parsing: 6. the minimum paren-depth encountered during this scan. 7. style of comment, if any. 8. character address of start of comment or string; nil if not in one. - 9. Intermediate data for continuation of parsing (subject to change). + 9. List of positions of currently open parens, outermost first. +10. Syntax of last character scanned, or nil if no scanning has happened. +11. Position after end of previous comment scanned, or nil. +12..... Possible further internal information used by `parse-partial-sexp'. + If third arg TARGETDEPTH is non-nil, parsing stops if the depth in parentheses becomes equal to TARGETDEPTH. -Fourth arg STOPBEFORE non-nil means stop when come to +Fourth arg STOPBEFORE non-nil means stop when we come to any character that starts a sexp. Fifth arg OLDSTATE is a list like what this function returns. It is used to initialize the state of the parse. Elements number 1, 2, 6 are ignored. -Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. - If it is symbol `syntax-table', stop after the start of a comment or a +Sixth arg COMMENTSTOP non-nil means stop after the start of a comment. + If it is the symbol `syntax-table', stop after the start of a comment or a string, or after end of a comment or a string. */) (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth, Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop) @@ -3527,15 +3576,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. target = TYPE_MINIMUM (EMACS_INT); /* We won't reach this depth. */ validate_region (&from, &to); + internalize_parse_state (oldstate, &state); scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)), XINT (to), - target, !NILP (stopbefore), oldstate, + target, !NILP (stopbefore), (NILP (commentstop) ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1))); SET_PT_BOTH (state.location, state.location_byte); - return Fcons (make_number (state.depth), + return + Fcons (make_number (state.depth), Fcons (state.prevlevelstart < 0 ? Qnil : make_number (state.prevlevelstart), Fcons (state.thislevelstart < 0 @@ -3553,11 +3604,18 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. ? Qsyntax_table : make_number (state.comstyle)) : Qnil), - Fcons (((state.incomment - || (state.instring >= 0)) - ? make_number (state.comstr_start) - : Qnil), - Fcons (state.levelstarts, Qnil)))))))))); + Fcons (((state.incomment + || (state.instring >= 0)) + ? make_number (state.comstr_start) + : Qnil), + Fcons (state.levelstarts, + Fcons (state.prev_syntax == Smax + ? Qnil + : make_number (state.prev_syntax), + Fcons (state.prev_comment_end == -1 + ? Qnil + : make_number (state.prev_comment_end), + Qnil)))))))))))); } void > Stefan -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 00:49:18 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 04:49:18 +0000 Received: from localhost ([127.0.0.1]:51652 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agmLd-0001jw-OV for submit@debbugs.gnu.org; Fri, 18 Mar 2016 00:49:18 -0400 Received: from pruche.dit.umontreal.ca ([132.204.246.22]:42879) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agmLZ-0001jm-U6 for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 00:49:15 -0400 Received: from ceviche.home (lechon.iro.umontreal.ca [132.204.27.242]) by pruche.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id u2I4n86m012383; Fri, 18 Mar 2016 00:49:10 -0400 Received: by ceviche.home (Postfix, from userid 20848) id E0120661AA; Fri, 18 Mar 2016 00:49:07 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> Date: Fri, 18 Mar 2016 00:49:07 -0400 In-Reply-To: <20160317214934.GB9038@acm.fritz.box> (Alan Mackenzie's message of "Thu, 17 Mar 2016 21:49:34 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.2 X-NAI-Spam-Rules: 2 Rules triggered GEN_SPAM_FEATRE=0.2, RV5613=0 X-NAI-Spam-Version: 2.3.0.9418 : core <5613> : inlines <4527> : streams <1604724> : uri <2168729> X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.3 (-) > Do this by adding two new fields to the parser state: the syntax of the last > character scanned, and the last end of comment scanned. This should make the > parser state complete. Thanks. I like the "syntax of the last character scanned", but I don't understand the reasoning behind "last end of comment scanned". Why is this relevant? Is it in case the "last character scanned" was a "slash ending a comment" so as to avoid treating "*/*" as both a comment closer and a subsequent opener? If so, I'm not sure I like it. It sounds to me like there's a chance it's actually incomplete (e.g. it doesn't address the similar problem when the "last character scanned" is an end of a string which also happens to be a valid first-char of a comment-starter), and even if it isn't, it "feels ad-hoc" to me. Would it be difficult to do the following instead: - get rid of element 11. - change element 10 so it's nil if the last char was an "end of something". Another way to look at it, is that the element 10 should only be non-nil if the "next lexeme" might start on that previous character. I also have a side question: IIUC your patch makes the 5th element redundant (can be replaced with a test whether "last char syntax" was "escape"), is that right? Stefan > Also document element 9 of the parser state. Also refactor the code a bit. > * src/syntax.c (struct lisp_parse_state): Add two new fields. > (internalize_parse_state): New function, extracted from scan_sexps_forward. > (back_comment): Call internalize_parse_state. > (forw_comment): Return the syntax of the last character scanned to the caller. > (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment. > (scan_sexps_forward): Remove a redundant state parameter. Access all `state' > information via the address parameter `state'. Remove the code which converts > from external to internal form of `state'. Access buffer contents only from > `from' onwards. Reformulate code at the top of the main loop correctly to > recognize comment openers when starting in the middle of one. Call > forw_comment with extra argument (for return of final syntax value). > (Fparse_partial_sexp): Document elements 9, 10, 11 of the parser state in the > doc string. Clarify the doc string in general. Call > internalize_parse_state. Take account of the new elements when consing up the > output parser state. > * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new > elements 10 and 11. Minor wording corrections (remove reference to "trivial > cases"). > (Low Level Parsing): Minor corrections > diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi > index d5a7eba..67a00d7 100644 > --- a/doc/lispref/syntax.texi > +++ b/doc/lispref/syntax.texi > @@ -791,10 +791,10 @@ Parser State > @subsection Parser State > @cindex parser state > - A @dfn{parser state} is a list of ten elements describing the state > -of the syntactic parser, after it parses the text between a specified > -starting point and a specified end point in the buffer. Parsing > -functions such as @code{syntax-ppss} > + A @dfn{parser state} is a list of (currently) twelve elements > +describing the state of the syntactic parser, after it parses the text > +between a specified starting point and a specified end point in the > +buffer. Parsing functions such as @code{syntax-ppss} > @ifnottex > (@pxref{Position Parse}) > @end ifnottex > @@ -851,15 +851,21 @@ Parser State > this element is @code{nil}. > @item > -Internal data for continuing the parsing. The meaning of this > -data is subject to change; it is used if you pass this list > -as the @var{state} argument to another call. > +The list of the positions of the currently open parentheses, starting > +with the outermost. > + > +@item > +The @var{syntax-code} (@pxref{Syntax Table Internals}) of the last > +buffer position scanned, or @code{nil} if no scanning has happened. > + > +@item > +The position after the previous end of comment, or @code{nil} if the > +scanning has not passed a comment end. > @end enumerate > Elements 1, 2, and 6 are ignored in a state which you pass as an > -argument to continue parsing, and elements 8 and 9 are used only in > -trivial cases. Those elements are mainly used internally by the > -parser code. > +argument to continue parsing. Elements 9 to 11 are mainly used > +internally by the parser code. > One additional piece of useful information is available from a > parser state using this function: > @@ -898,11 +904,11 @@ Low-Level Parsing > If the fourth argument @var{stop-before} is non-@code{nil}, parsing > stops when it comes to any character that starts a sexp. If > -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the > -start of an unnested comment. If @var{stop-comment} is the symbol > +@var{stop-comment} is non-@code{nil}, parsing stops after the start of > +an unnested comment. If @var{stop-comment} is the symbol > @code{syntax-table}, parsing stops after the start of an unnested > -comment or a string, or the end of an unnested comment or a string, > -whichever comes first. > +comment or a string, or after the end of an unnested comment or a > +string, whichever comes first. > If @var{state} is @code{nil}, @var{start} is assumed to be at the top > level of parenthesis structure, such as the beginning of a function > diff --git a/src/syntax.c b/src/syntax.c > index 249d0d5..11b1ff0 100644 > --- a/src/syntax.c > +++ b/src/syntax.c > @@ -153,6 +153,9 @@ struct lisp_parse_state > ptrdiff_t comstr_start; /* Position of last comment/string starter. */ > Lisp_Object levelstarts; /* Char numbers of starts-of-expression > of levels (starting from outermost). */ > + int prev_syntax; /* Syntax of previous character scanned, or Smax. */ > + ptrdiff_t prev_comment_end; /* Position after end of last closed > + comment, or -1. */ > }; > > /* These variables are a cache for finding the start of a defun. > @@ -176,7 +179,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object); > static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool); > static void scan_sexps_forward (struct lisp_parse_state *, > ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT, > - bool, Lisp_Object, int); > + bool, int); > +static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *); > static bool in_classes (int, Lisp_Object); > static void parse_sexp_propertize (ptrdiff_t charpos); > @@ -911,10 +915,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > } > do > { > + internalize_parse_state (Qnil, &state); > scan_sexps_forward (&state, > defun_start, defun_start_byte, > comment_end, TYPE_MINIMUM (EMACS_INT), > - 0, Qnil, 0); > + 0, 0); > defun_start = comment_end; > if (!adjusted) > { > @@ -2314,7 +2319,9 @@ in_classes (int c, Lisp_Object iso_classes) > into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR. > Else, return false and store the charpos STOP into *CHARPOS_PTR, the > corresponding bytepos into *BYTEPOS_PTR and the current nesting > - (as defined for state.incomment) in *INCOMMENT_PTR. > + (as defined for state->incomment) in *INCOMMENT_PTR. The > + SYNTAX_WITH_FLAGS of the last character scanned in the comment is > + stored into *last_syntax_ptr. > The comment end is the last character of the comment rather than the > character just after the comment. > @@ -2326,7 +2333,7 @@ static bool > forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > EMACS_INT nesting, int style, int prev_syntax, > ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr, > - EMACS_INT *incomment_ptr) > + EMACS_INT *incomment_ptr, int *last_syntax_ptr) > { > register int c, c1; > register enum syntaxcode code; > @@ -2346,6 +2353,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > *incomment_ptr = nesting; > *charpos_ptr = from; > *bytepos_ptr = from_byte; > + *last_syntax_ptr = syntax; > return 0; > } > c = FETCH_CHAR_AS_MULTIBYTE (from_byte); > @@ -2415,6 +2423,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, > } > *charpos_ptr = from; > *bytepos_ptr = from_byte; > + *last_syntax_ptr = syntax; > return 1; > } > @@ -2436,6 +2445,7 @@ between them, return t; otherwise return nil. */) > EMACS_INT count1; > ptrdiff_t out_charpos, out_bytepos; > EMACS_INT dummy; > + int dummy2; > CHECK_NUMBER (count); > count1 = XINT (count); > @@ -2499,7 +2509,7 @@ between them, return t; otherwise return nil. */) > } > /* We're at the start of a comment. */ > found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, > - &out_charpos, &out_bytepos, &dummy); > + &out_charpos, &out_bytepos, &dummy, &dummy2); > from = out_charpos; from_byte = out_bytepos; > if (!found) > { > @@ -2659,6 +2669,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) > ptrdiff_t from_byte; > ptrdiff_t out_bytepos, out_charpos; > EMACS_INT dummy; > + int dummy2; > bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol; > if (depth > 0) min_depth = 0; > @@ -2755,7 +2766,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) > UPDATE_SYNTAX_TABLE_FORWARD (from); > found = forw_comment (from, from_byte, stop, > comnested, comstyle, 0, > - &out_charpos, &out_bytepos, &dummy); > + &out_charpos, &out_bytepos, &dummy, > + &dummy2); > from = out_charpos, from_byte = out_bytepos; > if (!found) > { > @@ -3119,7 +3131,7 @@ the prefix syntax flag (p). */) > } > > /* Parse forward from FROM / FROM_BYTE to END, > - assuming that FROM has state OLDSTATE (nil means FROM is start of function), > + assuming that FROM has state STATE (nil means FROM is start of function), > and return a description of the state of the parse at END. > If STOPBEFORE, stop at the start of an atom. > If COMMENTSTOP is 1, stop at the start of a comment. > @@ -3127,12 +3139,11 @@ the prefix syntax flag (p). */) > after the beginning of a string, or after the end of a string. */ > static void > -scan_sexps_forward (struct lisp_parse_state *stateptr, > +scan_sexps_forward (struct lisp_parse_state *state, > ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end, > EMACS_INT targetdepth, bool stopbefore, > - Lisp_Object oldstate, int commentstop) > + int commentstop) > { > - struct lisp_parse_state state; > enum syntaxcode code; > int c1; > bool comnested; > @@ -3148,7 +3159,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, > Lisp_Object tem; > ptrdiff_t prev_from; /* Keep one character before FROM. */ > ptrdiff_t prev_from_byte; > - int prev_from_syntax; > + int prev_from_syntax, prev_prev_from_syntax; > bool boundary_stop = commentstop == -1; > bool nofence; > bool found; > @@ -3165,6 +3176,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, > do { prev_from = from; \ > prev_from_byte = from_byte; \ > temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte); \ > + prev_prev_from_syntax = prev_from_syntax; \ > prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \ > INC_BOTH (from, from_byte); \ > if (from < end) \ > @@ -3174,88 +3186,38 @@ do { prev_from = from; \ > immediate_quit = 1; > QUIT; > - if (NILP (oldstate)) > - { > - depth = 0; > - state.instring = -1; > - state.incomment = 0; > - state.comstyle = 0; /* comment style a by default. */ > - state.comstr_start = -1; /* no comment/string seen. */ > - } > - else > - { > - tem = Fcar (oldstate); > - if (!NILP (tem)) > - depth = XINT (tem); > - else > - depth = 0; > - > - oldstate = Fcdr (oldstate); > - oldstate = Fcdr (oldstate); > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - /* Check whether we are inside string_fence-style string: */ > - state.instring = (!NILP (tem) > - ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) > - : -1); > - > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - state.incomment = (!NILP (tem) > - ? (INTEGERP (tem) ? XINT (tem) : -1) > - : 0); > - > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - start_quoted = !NILP (tem); > + depth = state->depth; > + start_quoted = state->quoted; > + prev_prev_from_syntax = Smax; > + prev_from_syntax = state->prev_syntax; > - /* if the eighth element of the list is nil, we are in comment > - style a. If it is non-nil, we are in comment style b */ > - oldstate = Fcdr (oldstate); > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - state.comstyle = (NILP (tem) > - ? 0 > - : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) > - ? XINT (tem) > - : ST_COMMENT_STYLE)); > - > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - state.comstr_start = > - RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; > - oldstate = Fcdr (oldstate); > - tem = Fcar (oldstate); > - while (!NILP (tem)) /* >= second enclosing sexps. */ > - { > - Lisp_Object temhd = Fcar (tem); > - if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) > - curlevel->last = XINT (temhd); > - if (++curlevel == endlevel) > - curlevel--; /* error ("Nesting too deep for parser"); */ > - curlevel->prev = -1; > - curlevel->last = -1; > - tem = Fcdr (tem); > - } > + tem = state->levelstarts; > + while (!NILP (tem)) /* >= second enclosing sexps. */ > + { > + Lisp_Object temhd = Fcar (tem); > + if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) > + curlevel->last = XINT (temhd); > + if (++curlevel == endlevel) > + curlevel--; /* error ("Nesting too deep for parser"); */ > + curlevel->prev = -1; > + curlevel->last = -1; > + tem = Fcdr (tem); > } > - state.quoted = 0; > - mindepth = depth; > - curlevel-> prev = -1; curlevel-> last = -1; > - SETUP_SYNTAX_TABLE (prev_from, 1); > - temp = FETCH_CHAR (prev_from_byte); > - prev_from_syntax = SYNTAX_WITH_FLAGS (temp); > - UPDATE_SYNTAX_TABLE_FORWARD (from); > + state->quoted = 0; > + mindepth = depth; > + > + SETUP_SYNTAX_TABLE (from, 1); > /* Enter the loop at a place appropriate for initial state. */ > - if (state.incomment) > + if (state->incomment) > goto startincomment; > - if (state.instring >= 0) > + if (state->instring >= 0) > { > - nofence = state.instring != ST_STRING_STYLE; > + nofence = state->instring != ST_STRING_STYLE; > if (start_quoted) > goto startquotedinstring; > goto startinstring; > @@ -3266,10 +3228,10 @@ do { prev_from = from; \ > while (from < end) > { > int syntax; > - INC_FROM; > - code = prev_from_syntax & 0xff; > if (from < end > + && (state->prev_comment_end == -1 > + || prev_from >= state->prev_comment_end) > && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) > && (c1 = FETCH_CHAR (from_byte), > syntax = SYNTAX_WITH_FLAGS (c1), > @@ -3280,32 +3242,37 @@ do { prev_from = from; \ > /* Record the comment style we have entered so that only > the comment-end sequence of the same style actually > terminates the comment section. */ > - state.comstyle > + state->comstyle > = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax); > comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) > | SYNTAX_FLAGS_COMMENT_NESTED (syntax)); > - state.incomment = comnested ? 1 : -1; > - state.comstr_start = prev_from; > + state->incomment = comnested ? 1 : -1; > + state->comstr_start = prev_from; > INC_FROM; > code = Scomment; > } > - else if (code == Scomment_fence) > - { > - /* Record the comment style we have entered so that only > - the comment-end sequence of the same style actually > - terminates the comment section. */ > - state.comstyle = ST_COMMENT_STYLE; > - state.incomment = -1; > - state.comstr_start = prev_from; > - code = Scomment; > - } > - else if (code == Scomment) > - { > - state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); > - state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? > - 1 : -1); > - state.comstr_start = prev_from; > - } > + else > + { > + INC_FROM; > + code = prev_from_syntax & 0xff; > + if (code == Scomment_fence) > + { > + /* Record the comment style we have entered so that only > + the comment-end sequence of the same style actually > + terminates the comment section. */ > + state->comstyle = ST_COMMENT_STYLE; > + state->incomment = -1; > + state->comstr_start = prev_from; > + code = Scomment; > + } > + else if (code == Scomment) > + { > + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); > + state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? > + 1 : -1); > + state->comstr_start = prev_from; > + } > + } > if (SYNTAX_FLAGS_PREFIX (prev_from_syntax)) > continue; > @@ -3357,18 +3324,21 @@ do { prev_from = from; \ > middle of it. We don't want to do that if we're just at the > beginning of the comment (think of (*) ... (*)). */ > found = forw_comment (from, from_byte, end, > - state.incomment, state.comstyle, > - (from == BEGV || from < state.comstr_start + 3) > + state->incomment, state->comstyle, > + (from == BEGV || from < state->comstr_start + 3) > ? 0 : prev_from_syntax, > - &out_charpos, &out_bytepos, &state.incomment); > + &out_charpos, &out_bytepos, &state->incomment, > + &prev_from_syntax); > from = out_charpos; from_byte = out_bytepos; > - /* Beware! prev_from and friends are invalid now. > - Luckily, the `done' doesn't use them and the INC_FROM > - sets them to a sane value without looking at them. */ > + /* Beware! prev_from and friends (except prev_from_syntax) > + are invalid now. Luckily, the `done' doesn't use them > + and the INC_FROM sets them to a sane value without > + looking at them. */ > if (!found) goto done; > INC_FROM; > - state.incomment = 0; > - state.comstyle = 0; /* reset the comment style */ > + state->incomment = 0; > + state->comstyle = 0; /* reset the comment style */ > + state->prev_comment_end = from; > if (boundary_stop) goto done; > break; > @@ -3396,16 +3366,16 @@ do { prev_from = from; \ > case Sstring: > case Sstring_fence: > - state.comstr_start = from - 1; > + state->comstr_start = from - 1; > if (stopbefore) goto stop; /* this arg means stop at sexp start */ curlevel-> last = prev_from; > - state.instring = (code == Sstring > + state->instring = (code == Sstring > ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte)) > : ST_STRING_STYLE); > if (boundary_stop) goto done; > startinstring: > { > - nofence = state.instring != ST_STRING_STYLE; > + nofence = state->instring != ST_STRING_STYLE; > while (1) > { > @@ -3419,7 +3389,7 @@ do { prev_from = from; \ > /* Check C_CODE here so that if the char has > a syntax-table property which says it is NOT > a string character, it does not end the string. */ > - if (nofence && c == state.instring && c_code == Sstring) > + if (nofence && c == state->instring && c_code == Sstring) > break; > switch (c_code) > @@ -3442,7 +3412,7 @@ do { prev_from = from; \ > } > } > string_end: > - state.instring = -1; > + state->instring = -1; curlevel-> prev = curlevel->last; > INC_FROM; > if (boundary_stop) goto done; > @@ -3461,25 +3431,99 @@ do { prev_from = from; \ > stop: /* Here if stopping before start of sexp. */ > from = prev_from; /* We have just fetched the char that starts it; */ > from_byte = prev_from_byte; > + prev_from_syntax = prev_prev_from_syntax; > goto done; /* but return the position before it. */ > endquoted: > - state.quoted = 1; > + state->quoted = 1; > done: > - state.depth = depth; > - state.mindepth = mindepth; > - state.thislevelstart = curlevel->prev; > - state.prevlevelstart > + state->depth = depth; > + state->mindepth = mindepth; > + state->thislevelstart = curlevel->prev; > + state->prevlevelstart > = (curlevel == levelstart) ? -1 : (curlevel - 1)->last; > - state.location = from; > - state.location_byte = from_byte; > - state.levelstarts = Qnil; > + state->location = from; > + state->location_byte = from_byte; > + state->levelstarts = Qnil; > while (curlevel > levelstart) > - state.levelstarts = Fcons (make_number ((--curlevel)->last), > - state.levelstarts); > + state->levelstarts = Fcons (make_number ((--curlevel)->last), > + state->levelstarts); > + state->prev_syntax = prev_from_syntax; > immediate_quit = 0; > +} > + > +/* Convert a (lisp) parse state to the internal form used in > + scan_sexps_forward. */ > +static void > +internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state) > +{ > + Lisp_Object tem; > + > + if (NILP (external)) > + { > + state->depth = 0; > + state->instring = -1; > + state->incomment = 0; > + state->quoted = 0; > + state->comstyle = 0; /* comment style a by default. */ > + state->comstr_start = -1; /* no comment/string seen. */ > + state->levelstarts = Qnil; > + state->prev_syntax = Smax; > + state->prev_comment_end = -1; > + } > + else > + { > + tem = Fcar (external); > + if (!NILP (tem)) > + state->depth = XINT (tem); > + else > + state->depth = 0; > + > + external = Fcdr (external); > + external = Fcdr (external); > + external = Fcdr (external); > + tem = Fcar (external); > + /* Check whether we are inside string_fence-style string: */ > + state->instring = (!NILP (tem) > + ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) > + : -1); > + > + external = Fcdr (external); > + tem = Fcar (external); > + state->incomment = (!NILP (tem) > + ? (INTEGERP (tem) ? XINT (tem) : -1) > + : 0); > - *stateptr = state; > + external = Fcdr (external); > + tem = Fcar (external); > + state->quoted = !NILP (tem); > + > + /* if the eighth element of the list is nil, we are in comment > + style a. If it is non-nil, we are in comment style b */ > + external = Fcdr (external); > + external = Fcdr (external); > + tem = Fcar (external); > + state->comstyle = (NILP (tem) > + ? 0 > + : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) > + ? XINT (tem) > + : ST_COMMENT_STYLE)); > + > + external = Fcdr (external); > + tem = Fcar (external); > + state->comstr_start = > + RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; > + external = Fcdr (external); > + tem = Fcar (external); > + state->levelstarts = tem; > + > + external = Fcdr (external); > + tem = Fcar (external); > + state->prev_syntax = NILP (tem) ? Smax : XINT (tem); > + external = Fcdr (external); > + tem = Fcar (external); > + state->prev_comment_end = NILP (tem) ? -1 : XINT (tem); > + } > } > DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0, > @@ -3488,6 +3532,7 @@ Parsing stops at TO or when certain criteria are met; > point is set to where parsing stops. > If fifth arg OLDSTATE is omitted or nil, > parsing assumes that FROM is the beginning of a function. > + > Value is a list of elements describing final state of parsing: > 0. depth in parens. > 1. character address of start of innermost containing list; nil if none. > @@ -3501,16 +3546,20 @@ Value is a list of elements describing final state of parsing: > 6. the minimum paren-depth encountered during this scan. > 7. style of comment, if any. > 8. character address of start of comment or string; nil if not in one. > - 9. Intermediate data for continuation of parsing (subject to change). > + 9. List of positions of currently open parens, outermost first. > +10. Syntax of last character scanned, or nil if no scanning has happened. > +11. Position after end of previous comment scanned, or nil. > +12..... Possible further internal information used by `parse-partial-sexp'. > + > If third arg TARGETDEPTH is non-nil, parsing stops if the depth > in parentheses becomes equal to TARGETDEPTH. > -Fourth arg STOPBEFORE non-nil means stop when come to > +Fourth arg STOPBEFORE non-nil means stop when we come to > any character that starts a sexp. > Fifth arg OLDSTATE is a list like what this function returns. > It is used to initialize the state of the parse. Elements number 1, 2, 6 > are ignored. > -Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. > - If it is symbol `syntax-table', stop after the start of a comment or a > +Sixth arg COMMENTSTOP non-nil means stop after the start of a comment. > + If it is the symbol `syntax-table', stop after the start of a comment or a > string, or after end of a comment or a string. */) > (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth, > Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop) > @@ -3527,15 +3576,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. > target = TYPE_MINIMUM (EMACS_INT); /* We won't reach this depth. */ > validate_region (&from, &to); > + internalize_parse_state (oldstate, &state); > scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)), > XINT (to), > - target, !NILP (stopbefore), oldstate, > + target, !NILP (stopbefore), > (NILP (commentstop) > ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1))); > SET_PT_BOTH (state.location, state.location_byte); > - return Fcons (make_number (state.depth), > + return > + Fcons (make_number (state.depth), > Fcons (state.prevlevelstart < 0 > ? Qnil : make_number (state.prevlevelstart), > Fcons (state.thislevelstart < 0 > @@ -3553,11 +3604,18 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. > ? Qsyntax_table > : make_number (state.comstyle)) > : Qnil), > - Fcons (((state.incomment > - || (state.instring >= 0)) > - ? make_number (state.comstr_start) > - : Qnil), > - Fcons (state.levelstarts, Qnil)))))))))); > + Fcons (((state.incomment > + || (state.instring >= 0)) > + ? make_number (state.comstr_start) > + : Qnil), > + Fcons (state.levelstarts, > + Fcons (state.prev_syntax == Smax > + ? Qnil > + : make_number (state.prev_syntax), > + Fcons (state.prev_comment_end == -1 > + ? Qnil > + : make_number (state.prev_comment_end), > + Qnil)))))))))))); > } > > void >> Stefan > -- > Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 11:09:13 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 15:09:13 +0000 Received: from localhost ([127.0.0.1]:52783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agw1Z-0003Jy-0L for submit@debbugs.gnu.org; Fri, 18 Mar 2016 11:09:13 -0400 Received: from mail.muc.de ([193.149.48.3]:48652) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agw1X-0003Jq-RT for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 11:09:12 -0400 Received: (qmail 21732 invoked by uid 3782); 18 Mar 2016 15:09:09 -0000 Received: from acm.muc.de (p548A53B1.dip0.t-ipconnect.de [84.138.83.177]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 18 Mar 2016 16:09:07 +0100 Received: (qmail 9531 invoked by uid 1000); 18 Mar 2016 15:11:55 -0000 Date: Fri, 18 Mar 2016 15:11:55 +0000 To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160318151154.GA9433@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Stefan. On Fri, Mar 18, 2016 at 12:49:07AM -0400, Stefan Monnier wrote: > > Do this by adding two new fields to the parser state: the syntax of the last > > character scanned, and the last end of comment scanned. This should make the > > parser state complete. > Thanks. I like the "syntax of the last character scanned", but I don't > understand the reasoning behind "last end of comment scanned". Why is > this relevant? Is it in case the "last character scanned" was a "slash > ending a comment" so as to avoid treating "*/*" as both a comment closer and > a subsequent opener? That's exactly the reason. > If so, I'm not sure I like it. I don't really like it either. > It sounds to me like there's a chance it's actually incomplete (e.g. > it doesn't address the similar problem when the "last character > scanned" is an end of a string which also happens to be a valid > first-char of a comment-starter), and even if it isn't, it "feels > ad-hoc" to me. Now even I wouldn't have come up with that end-of-string scenario. ;-) Such a scenario is presumably one reason why, in scan_sexps_forward, two character comment delimiters are handled before strings. > Would it be difficult to do the following instead: > - get rid of element 11. Done. > - change element 10 so it's nil if the last char was an "end of > something". Another way to look at it, is that the element 10 should > only be non-nil if the "next lexeme" might start on that > previous character. I've tried this, and it's somewhat ugly. Setting the "previous_syntax" to nil is also needed for the asterisk in "/*". The nil would appear to mean "the syntactic value of the last character has already been used up". So the "previous_syntax" is nil in the most interesting cases. It also feels somewhat ad-hoc. How about this idea: element 10 will record the syntax of the previous character ONLY when it is potentially the first character of a two character comment delimiter, otherwise it'll be nil. At least that's being honest about what the thing's being used for. > I also have a side question: IIUC your patch makes the 5th element > redundant (can be replaced with a test whether "last char syntax" was > "escape"), is that right? It would appear to be, yes. We really can't get rid of element 5, though, because there will surely be code out there that uses it. But if I change element 10 as outlined above, element 5 will no longer be redundant. > Stefan -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 11:19:35 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 15:19:35 +0000 Received: from localhost ([127.0.0.1]:52791 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agwBb-0003ZA-Cq for submit@debbugs.gnu.org; Fri, 18 Mar 2016 11:19:35 -0400 Received: from mail.muc.de ([193.149.48.3]:25701) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agwBZ-0003Z2-5Y for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 11:19:33 -0400 Received: (qmail 24108 invoked by uid 3782); 18 Mar 2016 15:19:32 -0000 Received: from acm.muc.de (p548A53B1.dip0.t-ipconnect.de [84.138.83.177]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 18 Mar 2016 16:19:31 +0100 Received: (qmail 9578 invoked by uid 1000); 18 Mar 2016 15:22:18 -0000 Date: Fri, 18 Mar 2016 15:22:18 +0000 To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160318152218.GA9552@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160318151154.GA9433@acm.fritz.box> User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello again, Stefan. On Fri, Mar 18, 2016 at 03:11:55PM +0000, Alan Mackenzie wrote: > Done. > > - change element 10 so it's nil if the last char was an "end of > > something". Another way to look at it, is that the element 10 should > > only be non-nil if the "next lexeme" might start on that > > previous character. [ .... ] > How about this idea: element 10 will record the syntax of the previous > character ONLY when it is potentially the first character of a two > character comment delimiter, otherwise it'll be nil. At least that's > being honest about what the thing's being used for. That's exactly what you suggested. Apologies for not reading your post a bit more carefully. I think we're agreed, then. I'll implement it. > > Stefan -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 12:23:10 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 16:23:10 +0000 Received: from localhost ([127.0.0.1]:52848 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agxB8-00056u-73 for submit@debbugs.gnu.org; Fri, 18 Mar 2016 12:23:10 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:23809) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agxB6-00056f-Hy for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 12:23:08 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0A+FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPD0QAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4FmJBwVgVkigngBAQE X-IPAS-Result: A0A+FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPD0QAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4FmJBwVgVkigngBAQE X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="196675682" Received: from 107-179-144-20.cpe.teksavvy.com (HELO pastel.home) ([107.179.144.20]) by ironport2-out.teksavvy.com with ESMTP; 18 Mar 2016 12:23:03 -0400 Received: by pastel.home (Postfix, from userid 20848) id E37965FE67; Fri, 18 Mar 2016 12:23:02 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> Date: Fri, 18 Mar 2016 12:23:02 -0400 In-Reply-To: <20160318151154.GA9433@acm.fritz.box> (Alan Mackenzie's message of "Fri, 18 Mar 2016 15:11:55 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) >> It sounds to me like there's a chance it's actually incomplete (e.g. >> it doesn't address the similar problem when the "last character >> scanned" is an end of a string which also happens to be a valid >> first-char of a comment-starter), and even if it isn't, it "feels >> ad-hoc" to me. > Now even I wouldn't have come up with that end-of-string scenario. ;-) I don't work in embedded systems, but Coq/Agda's total functions force you to consider all possible cases. > Such a scenario is presumably one reason why, in scan_sexps_forward, two > character comment delimiters are handled before strings. It doesn't handle the exact same situation, but it's closely related indeed. >> - change element 10 so it's nil if the last char was an "end of >> something". Another way to look at it, is that the element 10 should >> only be non-nil if the "next lexeme" might start on that >> previous character. > I've tried this, and it's somewhat ugly. Setting the "previous_syntax" > to nil is also needed for the asterisk in "/*". The nil would appear to > mean "the syntactic value of the last character has already been used > up". So the "previous_syntax" is nil in the most interesting cases. It > also feels somewhat ad-hoc. > How about this idea: element 10 will record the syntax of the previous > character ONLY when it is potentially the first character of a two > character comment delimiter, otherwise it'll be nil. At least that's > being honest about what the thing's being used for. IIUC the only difference between what I (think I) suggested and what you're proposing is that you want to return nil for the "prev is backslash" whereas I was suggesting to return non-nil in that case. [ AFAIK the only two-char elements we handle so far as the comment delimiters and the backslash escapes. ] Do I understand this right? > It would appear to be, yes. We really can't get rid of element 5, > though, because there will surely be code out there that uses it. But > if I change element 10 as outlined above, element 5 will no longer be > redundant. I'd even be tempted to re-use element 5, although it might conceivably break some code out there. But even if we don't re-use element 5, I would actually much prefer to render element 5 redundant. Stefan From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 12:27:50 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 16:27:50 +0000 Received: from localhost ([127.0.0.1]:52852 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agxFd-0005D8-QE for submit@debbugs.gnu.org; Fri, 18 Mar 2016 12:27:49 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:34765) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agxFb-0005Cv-Gy for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 12:27:47 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0A+FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPD0QAQEBAQEBAYEKQQWDXQEBBFYjEAsOJhIUGA0kiD/PIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4I7gVkigngBAQE X-IPAS-Result: A0A+FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPD0QAQEBAQEBAYEKQQWDXQEBBFYjEAsOJhIUGA0kiD/PIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4I7gVkigngBAQE X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="196676235" Received: from 107-179-144-20.cpe.teksavvy.com (HELO pastel.home) ([107.179.144.20]) by ironport2-out.teksavvy.com with ESMTP; 18 Mar 2016 12:27:36 -0400 Received: by pastel.home (Postfix, from userid 20848) id C76E85FE67; Fri, 18 Mar 2016 12:27:36 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> Date: Fri, 18 Mar 2016 12:27:36 -0400 In-Reply-To: <20160317214934.GB9038@acm.fritz.box> (Alan Mackenzie's message of "Thu, 17 Mar 2016 21:49:34 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) > (scan_sexps_forward): Remove a redundant state parameter. Access all `state' > information via the address parameter `state'. Have you taken a look at the performance impact of this part of the change? I don't expect it will make much difference, but I'm actually wondering whether it makes things slower or faster. Stefan From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 14:23:07 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 18:23:07 +0000 Received: from localhost ([127.0.0.1]:52943 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agz3C-00082q-6S for submit@debbugs.gnu.org; Fri, 18 Mar 2016 14:23:07 -0400 Received: from mail.muc.de ([193.149.48.3]:10776) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agz39-00082h-Oe for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 14:23:05 -0400 Received: (qmail 68212 invoked by uid 3782); 18 Mar 2016 18:23:02 -0000 Received: from acm.muc.de (p548A53B1.dip0.t-ipconnect.de [84.138.83.177]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 18 Mar 2016 19:23:01 +0100 Received: (qmail 11281 invoked by uid 1000); 18 Mar 2016 18:25:47 -0000 Date: Fri, 18 Mar 2016 18:25:47 +0000 To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160318182547.GB9433@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Stefan. On Fri, Mar 18, 2016 at 12:23:02PM -0400, Stefan Monnier wrote: > >> - change element 10 so it's nil if the last char was an "end of > >> something". Another way to look at it, is that the element 10 should > >> only be non-nil if the "next lexeme" might start on that > >> previous character. > > I've tried this, and it's somewhat ugly. Setting the "previous_syntax" > > to nil is also needed for the asterisk in "/*". The nil would appear to > > mean "the syntactic value of the last character has already been used > > up". So the "previous_syntax" is nil in the most interesting cases. It > > also feels somewhat ad-hoc. > > How about this idea: element 10 will record the syntax of the previous > > character ONLY when it is potentially the first character of a two > > character comment delimiter, otherwise it'll be nil. At least that's > > being honest about what the thing's being used for. > IIUC the only difference between what I (think I) suggested and what > you're proposing is that you want to return nil for the "prev is > backslash" whereas I was suggesting to return non-nil in that case. > [ AFAIK the only two-char elements we handle so far as the comment > delimiters and the backslash escapes. ] We also have Scharquote, which scan_sexps_forward handles identically to Sescape. > Do I understand this right? Yes, but I've no strong feelings on the matter. > > It would appear to be, yes. We really can't get rid of element 5, > > though, because there will surely be code out there that uses it. But > > if I change element 10 as outlined above, element 5 will no longer be > > redundant. > I'd even be tempted to re-use element 5, although it might > conceivably break some code out there. I have bad feelings about that. Is it really worth the risk, just to save one cons cell on a list that not that many instances of exist at any time? > But even if we don't re-use element 5, I would actually much prefer to > render element 5 redundant. OK. Here's an updated patch which does just that. Comments would be welcome. > Stefan Amend parse-partial-sexp correctly to handle two character comment delimiters Do this by adding a new field to the parser state: the syntax of the last character scanned, should that be the first char of a (potential) two char construct, nil otherwise. This should make the parser state complete. Also document element 9 of the parser state. Also refactor the code a bit. * src/syntax.c (struct lisp_parse_state): Add a new field. (SYNTAX_FLAGS_COMSTARTEND_FIRST): New function. (internalize_parse_state): New function, extracted from scan_sexps_forward. (back_comment): Call internalize_parse_state. (forw_comment): Return the syntax of the last character scanned to the caller. (Fforward_comment, scan_lists): New dummy variables, passed to forw_comment. (scan_sexps_forward): Remove a redundant state parameter. Access all `state' information via the address parameter `state'. Remove the code which converts from external to internal form of `state'. Access buffer contents only from `from' onwards. Reformulate code at the top of the main loop correctly to recognize comment openers when starting in the middle of one. Call forw_comment with extra argument (for return of final syntax value). (Fparse_partial_sexp): Document elements 9, 10 of the parser state in the doc string. Clarify the doc string in general. Call internalize_parse_state. Take account of the new elements when consing up the output parser state. * doc/lispref/syntax.texi: (Parser State): Document element 9 and the new element 10. Minor wording corrections (remove reference to "trivial cases"). (Low Level Parsing): Minor corrections. diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi index d5a7eba..f81c164 100644 --- a/doc/lispref/syntax.texi +++ b/doc/lispref/syntax.texi @@ -791,10 +791,10 @@ Parser State @subsection Parser State @cindex parser state - A @dfn{parser state} is a list of ten elements describing the state -of the syntactic parser, after it parses the text between a specified -starting point and a specified end point in the buffer. Parsing -functions such as @code{syntax-ppss} + A @dfn{parser state} is a list of (currently) eleven elements +describing the state of the syntactic parser, after it parses the text +between a specified starting point and a specified end point in the +buffer. Parsing functions such as @code{syntax-ppss} @ifnottex (@pxref{Position Parse}) @end ifnottex @@ -851,15 +851,20 @@ Parser State this element is @code{nil}. @item -Internal data for continuing the parsing. The meaning of this -data is subject to change; it is used if you pass this list -as the @var{state} argument to another call. +The list of the positions of the currently open parentheses, starting +with the outermost. + +@item +When the last buffer position scanned was the (potential) first +character of a two character construct (comment delimiter or +escaped/char-quoted character pair), the @var{syntax-code} +(@pxref{Syntax Table Internals}) of that position. Otherwise +@code{nil}. @end enumerate Elements 1, 2, and 6 are ignored in a state which you pass as an -argument to continue parsing, and elements 8 and 9 are used only in -trivial cases. Those elements are mainly used internally by the -parser code. +argument to continue parsing. Elements 9 and 10 are mainly used +internally by the parser code. One additional piece of useful information is available from a parser state using this function: @@ -898,11 +903,11 @@ Low-Level Parsing If the fourth argument @var{stop-before} is non-@code{nil}, parsing stops when it comes to any character that starts a sexp. If -@var{stop-comment} is non-@code{nil}, parsing stops when it comes to the -start of an unnested comment. If @var{stop-comment} is the symbol +@var{stop-comment} is non-@code{nil}, parsing stops after the start of +an unnested comment. If @var{stop-comment} is the symbol @code{syntax-table}, parsing stops after the start of an unnested -comment or a string, or the end of an unnested comment or a string, -whichever comes first. +comment or a string, or after the end of an unnested comment or a +string, whichever comes first. If @var{state} is @code{nil}, @var{start} is assumed to be at the top level of parenthesis structure, such as the beginning of a function diff --git a/src/syntax.c b/src/syntax.c index 249d0d5..e6a1942 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -81,6 +81,11 @@ SYNTAX_FLAGS_COMEND_SECOND (int flags) return (flags >> 19) & 1; } static bool +SYNTAX_FLAGS_COMSTARTEND_FIRST (int flags) +{ + return (flags & 0x50000) != 0; +} +static bool SYNTAX_FLAGS_PREFIX (int flags) { return (flags >> 20) & 1; @@ -153,6 +158,10 @@ struct lisp_parse_state ptrdiff_t comstr_start; /* Position of last comment/string starter. */ Lisp_Object levelstarts; /* Char numbers of starts-of-expression of levels (starting from outermost). */ + int prev_syntax; /* Syntax of previous position scanned, when + that position (potentially) holds the first char + of a 2-char construct, i.e. comment delimiter + or Sescape, etc. Smax otherwise. */ }; /* These variables are a cache for finding the start of a defun. @@ -176,7 +185,8 @@ static Lisp_Object skip_syntaxes (bool, Lisp_Object, Lisp_Object); static Lisp_Object scan_lists (EMACS_INT, EMACS_INT, EMACS_INT, bool); static void scan_sexps_forward (struct lisp_parse_state *, ptrdiff_t, ptrdiff_t, ptrdiff_t, EMACS_INT, - bool, Lisp_Object, int); + bool, int); +static void internalize_parse_state (Lisp_Object, struct lisp_parse_state *); static bool in_classes (int, Lisp_Object); static void parse_sexp_propertize (ptrdiff_t charpos); @@ -911,10 +921,11 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, } do { + internalize_parse_state (Qnil, &state); scan_sexps_forward (&state, defun_start, defun_start_byte, comment_end, TYPE_MINIMUM (EMACS_INT), - 0, Qnil, 0); + 0, 0); defun_start = comment_end; if (!adjusted) { @@ -2314,7 +2325,9 @@ in_classes (int c, Lisp_Object iso_classes) into *CHARPOS_PTR and the corresponding bytepos into *BYTEPOS_PTR. Else, return false and store the charpos STOP into *CHARPOS_PTR, the corresponding bytepos into *BYTEPOS_PTR and the current nesting - (as defined for state.incomment) in *INCOMMENT_PTR. + (as defined for state->incomment) in *INCOMMENT_PTR. The + SYNTAX_WITH_FLAGS of the last character scanned in the comment is + stored into *last_syntax_ptr. The comment end is the last character of the comment rather than the character just after the comment. @@ -2326,7 +2339,7 @@ static bool forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, EMACS_INT nesting, int style, int prev_syntax, ptrdiff_t *charpos_ptr, ptrdiff_t *bytepos_ptr, - EMACS_INT *incomment_ptr) + EMACS_INT *incomment_ptr, int *last_syntax_ptr) { register int c, c1; register enum syntaxcode code; @@ -2346,6 +2359,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, *incomment_ptr = nesting; *charpos_ptr = from; *bytepos_ptr = from_byte; + *last_syntax_ptr = syntax; return 0; } c = FETCH_CHAR_AS_MULTIBYTE (from_byte); @@ -2415,6 +2429,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, } *charpos_ptr = from; *bytepos_ptr = from_byte; + *last_syntax_ptr = syntax; return 1; } @@ -2436,6 +2451,7 @@ between them, return t; otherwise return nil. */) EMACS_INT count1; ptrdiff_t out_charpos, out_bytepos; EMACS_INT dummy; + int dummy2; CHECK_NUMBER (count); count1 = XINT (count); @@ -2499,7 +2515,7 @@ between them, return t; otherwise return nil. */) } /* We're at the start of a comment. */ found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, - &out_charpos, &out_bytepos, &dummy); + &out_charpos, &out_bytepos, &dummy, &dummy2); from = out_charpos; from_byte = out_bytepos; if (!found) { @@ -2659,6 +2675,7 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) ptrdiff_t from_byte; ptrdiff_t out_bytepos, out_charpos; EMACS_INT dummy; + int dummy2; bool multibyte_symbol_p = sexpflag && multibyte_syntax_as_symbol; if (depth > 0) min_depth = 0; @@ -2755,7 +2772,8 @@ scan_lists (EMACS_INT from, EMACS_INT count, EMACS_INT depth, bool sexpflag) UPDATE_SYNTAX_TABLE_FORWARD (from); found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, - &out_charpos, &out_bytepos, &dummy); + &out_charpos, &out_bytepos, &dummy, + &dummy2); from = out_charpos, from_byte = out_bytepos; if (!found) { @@ -3119,7 +3137,7 @@ the prefix syntax flag (p). */) } /* Parse forward from FROM / FROM_BYTE to END, - assuming that FROM has state OLDSTATE (nil means FROM is start of function), + assuming that FROM has state STATE, and return a description of the state of the parse at END. If STOPBEFORE, stop at the start of an atom. If COMMENTSTOP is 1, stop at the start of a comment. @@ -3127,12 +3145,11 @@ the prefix syntax flag (p). */) after the beginning of a string, or after the end of a string. */ static void -scan_sexps_forward (struct lisp_parse_state *stateptr, +scan_sexps_forward (struct lisp_parse_state *state, ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t end, EMACS_INT targetdepth, bool stopbefore, - Lisp_Object oldstate, int commentstop) + int commentstop) { - struct lisp_parse_state state; enum syntaxcode code; int c1; bool comnested; @@ -3148,7 +3165,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, Lisp_Object tem; ptrdiff_t prev_from; /* Keep one character before FROM. */ ptrdiff_t prev_from_byte; - int prev_from_syntax; + int prev_from_syntax, prev_prev_from_syntax; bool boundary_stop = commentstop == -1; bool nofence; bool found; @@ -3165,6 +3182,7 @@ scan_sexps_forward (struct lisp_parse_state *stateptr, do { prev_from = from; \ prev_from_byte = from_byte; \ temp = FETCH_CHAR_AS_MULTIBYTE (prev_from_byte); \ + prev_prev_from_syntax = prev_from_syntax; \ prev_from_syntax = SYNTAX_WITH_FLAGS (temp); \ INC_BOTH (from, from_byte); \ if (from < end) \ @@ -3174,88 +3192,38 @@ do { prev_from = from; \ immediate_quit = 1; QUIT; - if (NILP (oldstate)) - { - depth = 0; - state.instring = -1; - state.incomment = 0; - state.comstyle = 0; /* comment style a by default. */ - state.comstr_start = -1; /* no comment/string seen. */ - } - else - { - tem = Fcar (oldstate); - if (!NILP (tem)) - depth = XINT (tem); - else - depth = 0; - - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - /* Check whether we are inside string_fence-style string: */ - state.instring = (!NILP (tem) - ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) - : -1); + depth = state->depth; + start_quoted = state->quoted; + prev_prev_from_syntax = Smax; + prev_from_syntax = state->prev_syntax; - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.incomment = (!NILP (tem) - ? (INTEGERP (tem) ? XINT (tem) : -1) - : 0); - - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - start_quoted = !NILP (tem); - - /* if the eighth element of the list is nil, we are in comment - style a. If it is non-nil, we are in comment style b */ - oldstate = Fcdr (oldstate); - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.comstyle = (NILP (tem) - ? 0 - : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) - ? XINT (tem) - : ST_COMMENT_STYLE)); - - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - state.comstr_start = - RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; - oldstate = Fcdr (oldstate); - tem = Fcar (oldstate); - while (!NILP (tem)) /* >= second enclosing sexps. */ - { - Lisp_Object temhd = Fcar (tem); - if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) - curlevel->last = XINT (temhd); - if (++curlevel == endlevel) - curlevel--; /* error ("Nesting too deep for parser"); */ - curlevel->prev = -1; - curlevel->last = -1; - tem = Fcdr (tem); - } + tem = state->levelstarts; + while (!NILP (tem)) /* >= second enclosing sexps. */ + { + Lisp_Object temhd = Fcar (tem); + if (RANGED_INTEGERP (PTRDIFF_MIN, temhd, PTRDIFF_MAX)) + curlevel->last = XINT (temhd); + if (++curlevel == endlevel) + curlevel--; /* error ("Nesting too deep for parser"); */ + curlevel->prev = -1; + curlevel->last = -1; + tem = Fcdr (tem); } - state.quoted = 0; - mindepth = depth; - curlevel->prev = -1; curlevel->last = -1; - SETUP_SYNTAX_TABLE (prev_from, 1); - temp = FETCH_CHAR (prev_from_byte); - prev_from_syntax = SYNTAX_WITH_FLAGS (temp); - UPDATE_SYNTAX_TABLE_FORWARD (from); + state->quoted = 0; + mindepth = depth; + + SETUP_SYNTAX_TABLE (from, 1); /* Enter the loop at a place appropriate for initial state. */ - if (state.incomment) + if (state->incomment) goto startincomment; - if (state.instring >= 0) + if (state->instring >= 0) { - nofence = state.instring != ST_STRING_STYLE; + nofence = state->instring != ST_STRING_STYLE; if (start_quoted) goto startquotedinstring; goto startinstring; @@ -3266,11 +3234,8 @@ do { prev_from = from; \ while (from < end) { int syntax; - INC_FROM; - code = prev_from_syntax & 0xff; - if (from < end - && SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) + if (SYNTAX_FLAGS_COMSTART_FIRST (prev_from_syntax) && (c1 = FETCH_CHAR (from_byte), syntax = SYNTAX_WITH_FLAGS (c1), SYNTAX_FLAGS_COMSTART_SECOND (syntax))) @@ -3280,32 +3245,39 @@ do { prev_from = from; \ /* Record the comment style we have entered so that only the comment-end sequence of the same style actually terminates the comment section. */ - state.comstyle + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, prev_from_syntax); comnested = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) | SYNTAX_FLAGS_COMMENT_NESTED (syntax)); - state.incomment = comnested ? 1 : -1; - state.comstr_start = prev_from; + state->incomment = comnested ? 1 : -1; + state->comstr_start = prev_from; INC_FROM; + prev_from_syntax = Smax; /* the syntax has already been + "used up". */ code = Scomment; } - else if (code == Scomment_fence) - { - /* Record the comment style we have entered so that only - the comment-end sequence of the same style actually - terminates the comment section. */ - state.comstyle = ST_COMMENT_STYLE; - state.incomment = -1; - state.comstr_start = prev_from; - code = Scomment; - } - else if (code == Scomment) - { - state.comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); - state.incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? - 1 : -1); - state.comstr_start = prev_from; - } + else + { + INC_FROM; + code = prev_from_syntax & 0xff; + if (code == Scomment_fence) + { + /* Record the comment style we have entered so that only + the comment-end sequence of the same style actually + terminates the comment section. */ + state->comstyle = ST_COMMENT_STYLE; + state->incomment = -1; + state->comstr_start = prev_from; + code = Scomment; + } + else if (code == Scomment) + { + state->comstyle = SYNTAX_FLAGS_COMMENT_STYLE (prev_from_syntax, 0); + state->incomment = (SYNTAX_FLAGS_COMMENT_NESTED (prev_from_syntax) ? + 1 : -1); + state->comstr_start = prev_from; + } + } if (SYNTAX_FLAGS_PREFIX (prev_from_syntax)) continue; @@ -3350,25 +3322,28 @@ do { prev_from = from; \ case Scomment_fence: /* Can't happen because it's handled above. */ case Scomment: - if (commentstop || boundary_stop) goto done; + if (commentstop || boundary_stop) goto done; startincomment: /* The (from == BEGV) test was to enter the loop in the middle so that we find a 2-char comment ender even if we start in the middle of it. We don't want to do that if we're just at the beginning of the comment (think of (*) ... (*)). */ found = forw_comment (from, from_byte, end, - state.incomment, state.comstyle, - (from == BEGV || from < state.comstr_start + 3) - ? 0 : prev_from_syntax, - &out_charpos, &out_bytepos, &state.incomment); + state->incomment, state->comstyle, + from == BEGV ? 0 : prev_from_syntax, + &out_charpos, &out_bytepos, &state->incomment, + &prev_from_syntax); from = out_charpos; from_byte = out_bytepos; - /* Beware! prev_from and friends are invalid now. - Luckily, the `done' doesn't use them and the INC_FROM - sets them to a sane value without looking at them. */ + /* Beware! prev_from and friends (except prev_from_syntax) + are invalid now. Luckily, the `done' doesn't use them + and the INC_FROM sets them to a sane value without + looking at them. */ if (!found) goto done; INC_FROM; - state.incomment = 0; - state.comstyle = 0; /* reset the comment style */ + state->incomment = 0; + state->comstyle = 0; /* reset the comment style */ + prev_from_syntax = Smax; /* Ensure "*|*" can't open a spurious new + comment. */ if (boundary_stop) goto done; break; @@ -3396,16 +3371,16 @@ do { prev_from = from; \ case Sstring: case Sstring_fence: - state.comstr_start = from - 1; + state->comstr_start = from - 1; if (stopbefore) goto stop; /* this arg means stop at sexp start */ curlevel->last = prev_from; - state.instring = (code == Sstring + state->instring = (code == Sstring ? (FETCH_CHAR_AS_MULTIBYTE (prev_from_byte)) : ST_STRING_STYLE); if (boundary_stop) goto done; startinstring: { - nofence = state.instring != ST_STRING_STYLE; + nofence = state->instring != ST_STRING_STYLE; while (1) { @@ -3419,7 +3394,7 @@ do { prev_from = from; \ /* Check C_CODE here so that if the char has a syntax-table property which says it is NOT a string character, it does not end the string. */ - if (nofence && c == state.instring && c_code == Sstring) + if (nofence && c == state->instring && c_code == Sstring) break; switch (c_code) @@ -3442,7 +3417,7 @@ do { prev_from = from; \ } } string_end: - state.instring = -1; + state->instring = -1; curlevel->prev = curlevel->last; INC_FROM; if (boundary_stop) goto done; @@ -3461,25 +3436,96 @@ do { prev_from = from; \ stop: /* Here if stopping before start of sexp. */ from = prev_from; /* We have just fetched the char that starts it; */ from_byte = prev_from_byte; + prev_from_syntax = prev_prev_from_syntax; goto done; /* but return the position before it. */ endquoted: - state.quoted = 1; + state->quoted = 1; done: - state.depth = depth; - state.mindepth = mindepth; - state.thislevelstart = curlevel->prev; - state.prevlevelstart + state->depth = depth; + state->mindepth = mindepth; + state->thislevelstart = curlevel->prev; + state->prevlevelstart = (curlevel == levelstart) ? -1 : (curlevel - 1)->last; - state.location = from; - state.location_byte = from_byte; - state.levelstarts = Qnil; + state->location = from; + state->location_byte = from_byte; + state->levelstarts = Qnil; while (curlevel > levelstart) - state.levelstarts = Fcons (make_number ((--curlevel)->last), - state.levelstarts); + state->levelstarts = Fcons (make_number ((--curlevel)->last), + state->levelstarts); + state->prev_syntax = (SYNTAX_FLAGS_COMSTARTEND_FIRST (prev_from_syntax) + || state->quoted) ? prev_from_syntax : Smax; immediate_quit = 0; +} + +/* Convert a (lisp) parse state to the internal form used in + scan_sexps_forward. */ +static void +internalize_parse_state (Lisp_Object external, struct lisp_parse_state *state) +{ + Lisp_Object tem; + + if (NILP (external)) + { + state->depth = 0; + state->instring = -1; + state->incomment = 0; + state->quoted = 0; + state->comstyle = 0; /* comment style a by default. */ + state->comstr_start = -1; /* no comment/string seen. */ + state->levelstarts = Qnil; + state->prev_syntax = Smax; + } + else + { + tem = Fcar (external); + if (!NILP (tem)) + state->depth = XINT (tem); + else + state->depth = 0; + + external = Fcdr (external); + external = Fcdr (external); + external = Fcdr (external); + tem = Fcar (external); + /* Check whether we are inside string_fence-style string: */ + state->instring = (!NILP (tem) + ? (CHARACTERP (tem) ? XFASTINT (tem) : ST_STRING_STYLE) + : -1); + + external = Fcdr (external); + tem = Fcar (external); + state->incomment = (!NILP (tem) + ? (INTEGERP (tem) ? XINT (tem) : -1) + : 0); + + external = Fcdr (external); + tem = Fcar (external); + state->quoted = !NILP (tem); - *stateptr = state; + /* if the eighth element of the list is nil, we are in comment + style a. If it is non-nil, we are in comment style b */ + external = Fcdr (external); + external = Fcdr (external); + tem = Fcar (external); + state->comstyle = (NILP (tem) + ? 0 + : (RANGED_INTEGERP (0, tem, ST_COMMENT_STYLE) + ? XINT (tem) + : ST_COMMENT_STYLE)); + + external = Fcdr (external); + tem = Fcar (external); + state->comstr_start = + RANGED_INTEGERP (PTRDIFF_MIN, tem, PTRDIFF_MAX) ? XINT (tem) : -1; + external = Fcdr (external); + tem = Fcar (external); + state->levelstarts = tem; + + external = Fcdr (external); + tem = Fcar (external); + state->prev_syntax = NILP (tem) ? Smax : XINT (tem); + } } DEFUN ("parse-partial-sexp", Fparse_partial_sexp, Sparse_partial_sexp, 2, 6, 0, @@ -3488,6 +3534,7 @@ Parsing stops at TO or when certain criteria are met; point is set to where parsing stops. If fifth arg OLDSTATE is omitted or nil, parsing assumes that FROM is the beginning of a function. + Value is a list of elements describing final state of parsing: 0. depth in parens. 1. character address of start of innermost containing list; nil if none. @@ -3501,16 +3548,22 @@ Value is a list of elements describing final state of parsing: 6. the minimum paren-depth encountered during this scan. 7. style of comment, if any. 8. character address of start of comment or string; nil if not in one. - 9. Intermediate data for continuation of parsing (subject to change). + 9. List of positions of currently open parens, outermost first. +10. When the last position scanned holds the first character of a + (potential) two character construct, the syntax of that position, + otherwise nil. That construct can be a two character comment + delimiter or an Escaped or Char-quoted character. +11..... Possible further internal information used by `parse-partial-sexp'. + If third arg TARGETDEPTH is non-nil, parsing stops if the depth in parentheses becomes equal to TARGETDEPTH. -Fourth arg STOPBEFORE non-nil means stop when come to +Fourth arg STOPBEFORE non-nil means stop when we come to any character that starts a sexp. Fifth arg OLDSTATE is a list like what this function returns. It is used to initialize the state of the parse. Elements number 1, 2, 6 are ignored. -Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. - If it is symbol `syntax-table', stop after the start of a comment or a +Sixth arg COMMENTSTOP non-nil means stop after the start of a comment. + If it is the symbol `syntax-table', stop after the start of a comment or a string, or after end of a comment or a string. */) (Lisp_Object from, Lisp_Object to, Lisp_Object targetdepth, Lisp_Object stopbefore, Lisp_Object oldstate, Lisp_Object commentstop) @@ -3527,15 +3580,17 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. target = TYPE_MINIMUM (EMACS_INT); /* We won't reach this depth. */ validate_region (&from, &to); + internalize_parse_state (oldstate, &state); scan_sexps_forward (&state, XINT (from), CHAR_TO_BYTE (XINT (from)), XINT (to), - target, !NILP (stopbefore), oldstate, + target, !NILP (stopbefore), (NILP (commentstop) ? 0 : (EQ (commentstop, Qsyntax_table) ? -1 : 1))); SET_PT_BOTH (state.location, state.location_byte); - return Fcons (make_number (state.depth), + return + Fcons (make_number (state.depth), Fcons (state.prevlevelstart < 0 ? Qnil : make_number (state.prevlevelstart), Fcons (state.thislevelstart < 0 @@ -3553,11 +3608,15 @@ Sixth arg COMMENTSTOP non-nil means stop at the start of a comment. ? Qsyntax_table : make_number (state.comstyle)) : Qnil), - Fcons (((state.incomment - || (state.instring >= 0)) - ? make_number (state.comstr_start) - : Qnil), - Fcons (state.levelstarts, Qnil)))))))))); + Fcons (((state.incomment + || (state.instring >= 0)) + ? make_number (state.comstr_start) + : Qnil), + Fcons (state.levelstarts, + Fcons (state.prev_syntax == Smax + ? Qnil + : make_number (state.prev_syntax), + Qnil))))))))))); } void -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 15:13:51 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 19:13:51 +0000 Received: from localhost ([127.0.0.1]:52954 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agzqJ-0000mU-8I for submit@debbugs.gnu.org; Fri, 18 Mar 2016 15:13:51 -0400 Received: from mail.muc.de ([193.149.48.3]:19511) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1agzqH-0000mM-CF for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 15:13:49 -0400 Received: (qmail 84555 invoked by uid 3782); 18 Mar 2016 19:13:48 -0000 Received: from acm.muc.de (p548A53B1.dip0.t-ipconnect.de [84.138.83.177]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 18 Mar 2016 20:13:47 +0100 Received: (qmail 11916 invoked by uid 1000); 18 Mar 2016 19:16:33 -0000 Date: Fri, 18 Mar 2016 19:16:33 +0000 To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160318191633.GC9433@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Stefan. On Fri, Mar 18, 2016 at 12:27:36PM -0400, Stefan Monnier wrote: > > (scan_sexps_forward): Remove a redundant state parameter. Access all `state' > > information via the address parameter `state'. > Have you taken a look at the performance impact of this part of the change? > I don't expect it will make much difference, but I'm actually wondering > whether it makes things slower or faster. I didn't give all that much thought to it. With a "local" state, state.field will be addressed as a constant offset from the stack frame base register. With a "remote" state, state->field will be addressed as a constant offset from some address register. Provided the processor has enough registers available, it shouldn't make a difference. But on an architecture with a restricted set of registers (?old 80x86), it might make things slower if an address register needs to be repeatedly loaded, or even repeatedly stacked around function calls. I'm going to try timing it both ways: (parse-partial-sexp (point-min) (point-max)) on xdisp.c (what else?): Code with "->": 0.03793740272521973 seconds. Code with "." : 0.03828787803649902 seconds. So, at least on my machine, the "indirect" version is faster, by around 1%. Not a great difference, but I'm surprised by the way it went. > Stefan -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 15:36:50 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 19:36:50 +0000 Received: from localhost ([127.0.0.1]:52959 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ah0CY-0001JU-61 for submit@debbugs.gnu.org; Fri, 18 Mar 2016 15:36:50 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:50229) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ah0CW-0001JG-EH for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 15:36:49 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0A+FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPD0QAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4I7gVkigngBAQE X-IPAS-Result: A0A+FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPD0QAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4I7gVkigngBAQE X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="196706370" Received: from 107-179-144-20.cpe.teksavvy.com (HELO pastel.home) ([107.179.144.20]) by ironport2-out.teksavvy.com with ESMTP; 18 Mar 2016 15:36:41 -0400 Received: by pastel.home (Postfix, from userid 20848) id 066055FE67; Fri, 18 Mar 2016 15:36:41 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> Date: Fri, 18 Mar 2016 15:36:40 -0400 In-Reply-To: <20160318182547.GB9433@acm.fritz.box> (Alan Mackenzie's message of "Fri, 18 Mar 2016 18:25:47 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) > We also have Scharquote, which scan_sexps_forward handles identically to > Sescape. Yes, it's two syntax codes which are 100% equivalent. An accident of history I guess. > I have bad feelings about that. Is it really worth the risk, just to > save one cons cell on a list that not that many instances of exist at > any time? As you know, I like to take short term risks for long term benefits. >> But even if we don't re-use element 5, I would actually much prefer to >> render element 5 redundant. > OK. Here's an updated patch which does just that. Comments would be > welcome. I'll take a closer look later, thanks. Stefan From debbugs-submit-bounces@debbugs.gnu.org Fri Mar 18 15:40:41 2016 Received: (at 23019) by debbugs.gnu.org; 18 Mar 2016 19:40:41 +0000 Received: from localhost ([127.0.0.1]:52963 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ah0GH-0001Ox-M9 for submit@debbugs.gnu.org; Fri, 18 Mar 2016 15:40:41 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:51628) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ah0GF-0001Ok-R9 for 23019@debbugs.gnu.org; Fri, 18 Mar 2016 15:40:40 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0A3FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPDwRAQEBAQEBAYEKQQWDXQEBAwEnLyMFCwsOBCISFBgNEBSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4I7gVkigngBAQE X-IPAS-Result: A0A3FgA731xV/xSQs2tcgxCEAoVVwwsEAgKBPDwRAQEBAQEBAYEKQQWDXQEBAwEnLyMFCwsOBCISFBgNEBSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbUEI4I7gVkigngBAQE X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="196706977" Received: from 107-179-144-20.cpe.teksavvy.com (HELO pastel.home) ([107.179.144.20]) by ironport2-out.teksavvy.com with ESMTP; 18 Mar 2016 15:40:34 -0400 Received: by pastel.home (Postfix, from userid 20848) id 3BC065FE67; Fri, 18 Mar 2016 15:40:34 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318191633.GC9433@acm.fritz.box> Date: Fri, 18 Mar 2016 15:40:34 -0400 In-Reply-To: <20160318191633.GC9433@acm.fritz.box> (Alan Mackenzie's message of "Fri, 18 Mar 2016 19:16:33 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) > I didn't give all that much thought to it. With a "local" state, > state.field will be addressed as a constant offset from the stack frame > base register. With a "remote" state, state->field will be addressed as > a constant offset from some address register. Provided the processor > has enough registers available, it shouldn't make a difference. But on > an architecture with a restricted set of registers (?old 80x86), it might > make things slower if an address register needs to be repeatedly loaded, > or even repeatedly stacked around function calls. That was my first reaction as well. But my other self was telling me "I can't say why, but my gut feeling says that this code is "cleaner" and should hence be easier to optimize". > So, at least on my machine, the "indirect" version is faster, by > around 1%. Not a great difference, but I'm surprised by the way > it went. Thanks for the test. As expected, it's a wash, but it's good to confirm that the cleaner version is at least no slower, Stefan From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 19 13:03:42 2016 Received: (at 23019) by debbugs.gnu.org; 19 Mar 2016 17:03:42 +0000 Received: from localhost ([127.0.0.1]:53871 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ahKHu-00008C-59 for submit@debbugs.gnu.org; Sat, 19 Mar 2016 13:03:42 -0400 Received: from mail.muc.de ([193.149.48.3]:61345) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ahKHs-000083-EW for 23019@debbugs.gnu.org; Sat, 19 Mar 2016 13:03:40 -0400 Received: (qmail 14430 invoked by uid 3782); 19 Mar 2016 17:03:39 -0000 Received: from acm.muc.de (p548A5545.dip0.t-ipconnect.de [84.138.85.69]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sat, 19 Mar 2016 18:03:37 +0100 Received: (qmail 5044 invoked by uid 1000); 19 Mar 2016 17:06:24 -0000 Date: Sat, 19 Mar 2016 17:06:24 +0000 To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160319170624.GC2644@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Stefan. On Fri, Mar 18, 2016 at 03:36:40PM -0400, Stefan Monnier wrote: > > OK. Here's an updated patch which does just that. Comments would be > > welcome. > I'll take a closer look later, thanks. I found some problems at ends of comments. The upshot is that forw_comment must inform scan_sexps_forward, on a failed search, whether the last character it scanned is still "syntactically live", or whether that last character's syntax was "used up" in closing or opening a comment. On a successful search, that character's syntax is always "used up" in closing the comment. Would you like to see the patch again, or should I just commit it? > Stefan -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 19 21:30:38 2016 Received: (at 23019) by debbugs.gnu.org; 20 Mar 2016 01:30:38 +0000 Received: from localhost ([127.0.0.1]:54040 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ahSCT-0001e9-ST for submit@debbugs.gnu.org; Sat, 19 Mar 2016 21:30:38 -0400 Received: from chene.dit.umontreal.ca ([132.204.246.20]:37431) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ahSCR-0001dz-AW for 23019@debbugs.gnu.org; Sat, 19 Mar 2016 21:30:36 -0400 Received: from fmsmemgm.homelinux.net (lechon.iro.umontreal.ca [132.204.27.242]) by chene.dit.umontreal.ca (8.14.1/8.14.1) with ESMTP id u2K1V0QX010508; Sat, 19 Mar 2016 21:31:01 -0400 Received: by fmsmemgm.homelinux.net (Postfix, from userid 20848) id 23452AE665; Sat, 19 Mar 2016 21:30:32 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> <20160319170624.GC2644@acm.fritz.box> Date: Sat, 19 Mar 2016 21:30:32 -0400 In-Reply-To: <20160319170624.GC2644@acm.fritz.box> (Alan Mackenzie's message of "Sat, 19 Mar 2016 17:06:24 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-NAI-Spam-Flag: NO X-NAI-Spam-Level: X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0.2 X-NAI-Spam-Rules: 2 Rules triggered GEN_SPAM_FEATRE=0.2, RV5615=0 X-NAI-Spam-Version: 2.3.0.9418 : core <5615> : inlines <4535> : streams <1605715> : uri <2170232> X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.3 (-) > Would you like to see the patch again, or should I just commit it? I'd like to hear what John thinks about the idea of re-using "nth 5" instead of adding a new entry, but other than that, I think it's OK to commit, thanks. Stefan From debbugs-submit-bounces@debbugs.gnu.org Sun Mar 20 09:38:50 2016 Received: (at 23019-done) by debbugs.gnu.org; 20 Mar 2016 13:38:50 +0000 Received: from localhost ([127.0.0.1]:54297 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ahdZC-0001Wq-6o for submit@debbugs.gnu.org; Sun, 20 Mar 2016 09:38:50 -0400 Received: from mail.muc.de ([193.149.48.3]:10130) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ahdZA-0001Wh-Da for 23019-done@debbugs.gnu.org; Sun, 20 Mar 2016 09:38:48 -0400 Received: (qmail 23619 invoked by uid 3782); 20 Mar 2016 13:38:47 -0000 Received: from acm.muc.de (p5B146DE7.dip0.t-ipconnect.de [91.20.109.231]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sun, 20 Mar 2016 14:38:46 +0100 Received: (qmail 3746 invoked by uid 1000); 20 Mar 2016 13:41:34 -0000 Date: Sun, 20 Mar 2016 13:41:34 +0000 To: 23019-done@debbugs.gnu.org Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160320134134.GA3603@acm.fritz.box> References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> <20160319170624.GC2644@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 23019-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Bug fixed. On Sat, Mar 19, 2016 at 09:30:32PM -0400, Stefan Monnier wrote: -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 03 18:53:33 2016 Received: (at 23019) by debbugs.gnu.org; 3 Apr 2016 22:53:33 +0000 Received: from localhost ([127.0.0.1]:50396 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amqth-0004QT-LM for submit@debbugs.gnu.org; Sun, 03 Apr 2016 18:53:33 -0400 Received: from mail-oi0-f49.google.com ([209.85.218.49]:35020) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1amqtf-0004QF-5I for 23019@debbugs.gnu.org; Sun, 03 Apr 2016 18:53:31 -0400 Received: by mail-oi0-f49.google.com with SMTP id p188so145115771oih.2 for <23019@debbugs.gnu.org>; Sun, 03 Apr 2016 15:53:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:in-reply-to:date:message-id:references :user-agent:mime-version; bh=ixXGX4/AV+yNnE9dKhFehN5LVS/GFLn4UijDr8MVL90=; b=kQdjXSyd9K9/11rSEPpW8rTb/UB7nBkk8UpViulaIOX9TRPq4dw3zwwoTI78Xh7a5+ cJYjf5ks4rQjP1FEEz2NdYNwtaU+fbEJT1zp8JGqH1FighQ0yjwsCHMK3kzp/EZNWRPc plY5DE5Gk6TXfWwdeCQPPrqmeRelqqBj39HM7z7KcXoX3xCZaecAb8oYddLuYKNQCRbU SEE+6i+WCsrJGlSRNDBhg77crb3ZBiO9oSAqhO//qP45p3JqaxxKBGDJEAb0NnI3ropu Fkj1Sa2F55OoAxIwPOhjtzx9PAg0s7nXpQ5cV/zJ4OSVbq4fmyVDxnSgYQaUIGzJHB8C dWOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:in-reply-to:date:message-id :references:user-agent:mime-version; bh=ixXGX4/AV+yNnE9dKhFehN5LVS/GFLn4UijDr8MVL90=; b=RI7r95VlZK6QZen7pL2IbZp2N0hPggEKfs8gQF4+k3gM7ZgogidKymeEJCM0eagGjz IBrSPzJ1ggEAd2EiMhbaTUl6rP3H0kwteh1j7b1IepQUOSPC0hINQKZ08KZ1xRjruuD2 q+ZcnKVtGmMI0Mc+OIl01HqN1Suf/Q6gs0mj46O+1puDseF7vomIOFdgtgiSQBVst0WW ulWA3UORB/c6ZZWko7txD8VxPQusVPJLcQgvQQL6OnzF58eZJUR+qU6gGRIqxcqyc41i TkkZl2sN3yrCqsIVo4kDUpbEOGQLwVkz7vmd/HzgfE6eSneqE3eY9z3kvuoVpmtPDABa gs5g== X-Gm-Message-State: AD7BkJIH/lxnVpc91QUJU464+KtAgP8RKB/x0h37O9Nh7qbXqSVfM/0losSVvIDzOydRjw== X-Received: by 10.157.14.7 with SMTP id c7mr1927381otc.106.1459724005617; Sun, 03 Apr 2016 15:53:25 -0700 (PDT) Received: from Vulcan.local (76-234-68-79.lightspeed.frokca.sbcglobal.net. [76.234.68.79]) by smtp.gmail.com with ESMTPSA id yn3sm7642058obc.27.2016.04.03.15.53.24 (version=TLS1 cipher=AES128-SHA bits=128/128); Sun, 03 Apr 2016 15:53:24 -0700 (PDT) From: John Wiegley X-Google-Original-From: "John Wiegley" Received: by Vulcan.local (Postfix, from userid 501) id AEB6D13DAE07E; Sun, 3 Apr 2016 15:53:23 -0700 (PDT) To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. In-Reply-To: (Stefan Monnier's message of "Sat, 19 Mar 2016 21:30:32 -0400") Date: Sun, 03 Apr 2016 15:53:02 -0700 Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> <20160319170624.GC2644@acm.fritz.box> User-Agent: Gnus/5.130014 (Ma Gnus v0.14) Emacs/25.1.50 (darwin) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 23019 Cc: Alan Mackenzie , 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) >>>>> Stefan Monnier writes: > I'd like to hear what John thinks about the idea of re-using "nth 5" instead > of adding a new entry, but other than that, I think it's OK to commit, > thanks. How long has this stuff been out in the field? Do you think it's well known enough that anyone is depending on the earlier behavior of the nth 5 value? I have a feeling it's OK to re-use it. -- John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 04 08:16:01 2016 Received: (at 23019) by debbugs.gnu.org; 4 Apr 2016 12:16:01 +0000 Received: from localhost ([127.0.0.1]:50679 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1an3QH-0007a5-4A for submit@debbugs.gnu.org; Mon, 04 Apr 2016 08:16:01 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:12491) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1an3QD-0007Zq-M2 for 23019@debbugs.gnu.org; Mon, 04 Apr 2016 08:15:59 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0AyFgA731xV/0+KpUVcgxCEAoVVu0CHSwQCAoE8OhMBAQEBAQEBgQpBBYNdAQEDAVYjBQsLDiYSFBgNJIg3CM8jAQEBAQYBAQEBHos6hQUHhC0FkDSjC4FFI4I7gVkigngBAQE X-IPAS-Result: A0AyFgA731xV/0+KpUVcgxCEAoVVu0CHSwQCAoE8OhMBAQEBAQEBgQpBBYNdAQEDAVYjBQsLDiYSFBgNJIg3CM8jAQEBAQYBAQEBHos6hQUHhC0FkDSjC4FFI4I7gVkigngBAQE X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="204805780" Received: from 69-165-138-79.dsl.teksavvy.com (HELO pastel.home) ([69.165.138.79]) by ironport2-out.teksavvy.com with ESMTP; 04 Apr 2016 08:15:52 -0400 Received: by pastel.home (Postfix, from userid 20848) id 06ADD6226D; Mon, 4 Apr 2016 08:15:52 -0400 (EDT) From: Stefan Monnier To: John Wiegley Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160315091355.GA2263@acm.fritz.box> <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> <20160319170624.GC2644@acm.fritz.box> Date: Mon, 04 Apr 2016 08:15:52 -0400 In-Reply-To: (John Wiegley's message of "Sun, 03 Apr 2016 15:53:02 -0700") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: 23019 Cc: Alan Mackenzie , 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) >> I'd like to hear what John thinks about the idea of re-using "nth 5" instead >> of adding a new entry, but other than that, I think it's OK to commit, >> thanks. > How long has this stuff been out in the field? Many many years. > Do you think it's well known enough that anyone is depending on the > earlier behavior of the nth 5 value? There are definitely packages which use the (nth 5 ..) value returned from parse-partial-sexp. E.g. cperl-mode does: state (parse-partial-sexp pre-B p)) (or (nth 3 state) (nth 4 state) (nth 5 state) (error "`%s' inside `%s' BLOCK" A if-string)) as well as (let ((pps (parse-partial-sexp (point) found))) (or (nth 3 pps) (nth 4 pps) (nth 5 pps))))) and verilog-mode does: (setq state (parse-partial-sexp (point) end-mod-point 0 t nil)) (or (> (car state) 0) ; in parens (nth 5 state) ; comment )) sh-script also uses it, along with perl-mode. > I have a feeling it's OK to re-use it. That's also my feeling. All the uses I've found would be unaffected (e.g. because they're in modes where there are no 2-char comment markers, so there is really no change in behavior; or because it's only used at positions which can't be in the middle of a 2-char comment marker). It's a "natural extension" of the previous meaning of "nth 5". But admittedly, it's hard/impossible to find all uses, so I can't claim with confidence that it won't break some code somewhere. Stefan From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 05 08:51:16 2016 Received: (at 23019) by debbugs.gnu.org; 5 Apr 2016 12:51:16 +0000 Received: from localhost ([127.0.0.1]:51793 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anQRw-00010C-6H for submit@debbugs.gnu.org; Tue, 05 Apr 2016 08:51:16 -0400 Received: from mail.muc.de ([193.149.48.3]:62402) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anQRs-000102-SQ for 23019@debbugs.gnu.org; Tue, 05 Apr 2016 08:51:14 -0400 Received: (qmail 63018 invoked by uid 3782); 5 Apr 2016 12:51:10 -0000 Received: from acm.muc.de (p548A5A8B.dip0.t-ipconnect.de [84.138.90.139]) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 05 Apr 2016 14:51:08 +0200 Received: (qmail 4502 invoked by uid 1000); 5 Apr 2016 12:54:09 -0000 Date: Tue, 5 Apr 2016 12:54:09 +0000 To: John Wiegley Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160405125409.GB3463@acm.fritz.box> References: <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> <20160319170624.GC2644@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23019 Cc: 23019@debbugs.gnu.org, Stefan Monnier X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, John. On Sun, Apr 03, 2016 at 03:53:02PM -0700, John Wiegley wrote: > >>>>> Stefan Monnier writes: > > I'd like to hear what John thinks about the idea of re-using "nth 5" instead > > of adding a new entry, but other than that, I think it's OK to commit, > > thanks. > How long has this stuff been out in the field? Do you think it's well known > enough that anyone is depending on the earlier behavior of the nth 5 value? I > have a feeling it's OK to re-use it. My feeling is that it would be better not to change the definition of the fifth element, but it's not a strong feeling. One concern I have is that there is code out there which compensates for the previous inadequate behaviour (I know there is in CC Mode), and it may be more difficult to switch off this compensation if there isn't an easy way to distinguish new from old, such as (> (length state) 10). > -- > John Wiegley GPG fingerprint = 4710 CF98 AF9B 327B B80F > http://newartisans.com 60E1 46C4 BD1A 7AC1 4BA2 -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 05 09:50:48 2016 Received: (at 23019) by debbugs.gnu.org; 5 Apr 2016 13:50:48 +0000 Received: from localhost ([127.0.0.1]:51812 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anRNY-0002Pr-Ig for submit@debbugs.gnu.org; Tue, 05 Apr 2016 09:50:48 -0400 Received: from ironport2-out.teksavvy.com ([206.248.154.181]:56207) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anRNX-0002Pb-FJ for 23019@debbugs.gnu.org; Tue, 05 Apr 2016 09:50:48 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A0A2FgA731xV/0+KpUVcgxCEAoVVwwsEAgKBPDwRAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbM/gUUjgjuBWSKCeAEBAQ X-IPAS-Result: A0A2FgA731xV/0+KpUVcgxCEAoVVwwsEAgKBPDwRAQEBAQEBAYEKQQWDXQEBAwFWIwULCw4mEhQYDSSINwjPIwEBAQEGAQEBAR6LOoUFB4QtBbM/gUUjgjuBWSKCeAEBAQ X-IronPort-AV: E=Sophos;i="5.13,465,1427774400"; d="scan'208";a="204959017" Received: from 69-165-138-79.dsl.teksavvy.com (HELO pastel.home) ([69.165.138.79]) by ironport2-out.teksavvy.com with ESMTP; 05 Apr 2016 09:50:41 -0400 Received: by pastel.home (Postfix, from userid 20848) id 12D196225E; Tue, 5 Apr 2016 09:50:41 -0400 (EDT) From: Stefan Monnier To: Alan Mackenzie Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: References: <20160317214934.GB9038@acm.fritz.box> <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> <20160319170624.GC2644@acm.fritz.box> <20160405125409.GB3463@acm.fritz.box> Date: Tue, 05 Apr 2016 09:50:41 -0400 In-Reply-To: <20160405125409.GB3463@acm.fritz.box> (Alan Mackenzie's message of "Tue, 5 Apr 2016 12:54:09 +0000") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 1.8 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > One concern I have is that there is code out there which compensates for > the previous inadequate behaviour (I know there is in CC Mode), and it > may be more difficult to switch off this compensation if there isn't an > easy way to distinguish new from old, such as (> (length state) 10). [...] Content analysis details: (1.8 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [206.248.154.181 listed in wl.mailspike.net] -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [206.248.154.181 listed in list.dnswl.org] 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 1.5 COMPENSATION "Compensation" X-Debbugs-Envelope-To: 23019 Cc: John Wiegley , 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.3 (/) > One concern I have is that there is code out there which compensates for > the previous inadequate behaviour (I know there is in CC Mode), and it > may be more difficult to switch off this compensation if there isn't an > easy way to distinguish new from old, such as (> (length state) 10). I'd be very surprised if other packages went to that trouble, but if needed you can still distinguish the new from the old with something like: (defconst pps-is-new (let ((st (make-syntax-table))) (modify-syntax-entry ?/ ". 14" st) (with-temp-buffer (with-syntax-table st (insert "/") (nth 5 (parse-partial-sexp (point-min) (point-max))))))) -- Stefan From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 05 10:41:57 2016 Received: (at 23019) by debbugs.gnu.org; 5 Apr 2016 14:41:57 +0000 Received: from localhost ([127.0.0.1]:52428 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anSB3-0003tO-Bg for submit@debbugs.gnu.org; Tue, 05 Apr 2016 10:41:57 -0400 Received: from mail.muc.de ([193.149.48.3]:63008) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anSB1-0003tF-HB for 23019@debbugs.gnu.org; Tue, 05 Apr 2016 10:41:56 -0400 Received: (qmail 86609 invoked by uid 3782); 5 Apr 2016 14:41:53 -0000 Received: from acm.muc.de (p548A5A8B.dip0.t-ipconnect.de [84.138.90.139]) by colin.muc.de (tmda-ofmipd) with ESMTP; Tue, 05 Apr 2016 16:41:52 +0200 Received: (qmail 4863 invoked by uid 1000); 5 Apr 2016 14:44:53 -0000 Date: Tue, 5 Apr 2016 14:44:53 +0000 To: Stefan Monnier Subject: Re: bug#23019: parse-partial-sexp doesn't output the full state needed for its continuance. Message-ID: <20160405144453.GC3463@acm.fritz.box> References: <20160318151154.GA9433@acm.fritz.box> <20160318182547.GB9433@acm.fritz.box> <20160319170624.GC2644@acm.fritz.box> <20160405125409.GB3463@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -1.0 (-) X-Debbugs-Envelope-To: 23019 Cc: John Wiegley , 23019@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hello, Stefan. On Tue, Apr 05, 2016 at 09:50:41AM -0400, Stefan Monnier wrote: > > One concern I have is that there is code out there which compensates for > > the previous inadequate behaviour (I know there is in CC Mode), and it > > may be more difficult to switch off this compensation if there isn't an > > easy way to distinguish new from old, such as (> (length state) 10). > I'd be very surprised if other packages went to that trouble, but if > needed you can still distinguish the new from the old with something like: > (defconst pps-is-new > (let ((st (make-syntax-table))) > (modify-syntax-entry ?/ ". 14" st) > (with-temp-buffer > (with-syntax-table st > (insert "/") > (nth 5 (parse-partial-sexp (point-min) (point-max))))))) It can certainly be done, yes, but that way it can only really be done at set up time, wherease (> (length state) 10) could be done more or less at any time. It was just a small point, really. > -- Stefan -- Alan Mackenzie (Nuremberg, Germany). From unknown Mon Aug 18 14:19:41 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 04 May 2016 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator