GNU bug report logs - #31290
Fundamental bugs in syntax-propertize

Package: emacs;

Reported by: Alan Mackenzie <acm <at> muc.de>

Date: Fri, 27 Apr 2018 21:15:02 UTC

Severity: normal

To reply to this bug, email your comments to 31290 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#31290; Package emacs. (Fri, 27 Apr 2018 21:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alan Mackenzie <acm <at> muc.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 27 Apr 2018 21:15:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: bug-gnu-emacs <at> gnu.org
Subject: Fundamental bugs in syntax-propertize
Date: Fri, 27 Apr 2018 21:08:59 +0000

Hello, Emacs.

There are fundamental bugs in syntax-propertize and
syntax-propertize-function.  The doc string of the latter states:

    The specified function may call `syntax-ppss' on any position before
    END, ....

This is untrue.  True is that syntax-ppss can be called on a position
only up to syntax-propertize--done.  After this point, the syntax-table
properties haven't been applied, so calling syntax-ppss is, in general,
going to give a false result.

At least that would be true if syntax-propertize--done hadn't been
prematurely and spuriously increased, crudely to prevent an infinite
recursion, falsely indicating to the syntax-ppss infrastructure that the
syntax-table properties have already been applied to the region (BEGIN
END).

    .... but it should not call `syntax-ppss-flush-cache', ....

Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
back to its true value, allowing the wrongly allowed syntax-ppss calls at
a later position to cause a recursive loop.

    .... which means that it should not call `syntax-ppss' on some
    position and later modify the buffer on some earlier position.

This is a bad restriction, because sometimes syntax-table properties can
only be correctly determined by examining the syntax of later buffer
positions.  An example of this is giving the string-fence syntax-table
text property to an unbalanced opening string quote, but not to correctly
matched quotes.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

The plain fact is that (syntax-ppss pos) calls (syntax-propertize pos),
so syntax-propertize cannot itself use syntax-ppss because of the
recursive loop thus created.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Proposed solutions:

1. Major modes' syntax-propertize-function's are somehow given read
access to syntax-propertize--done, and may call syntax-ppss up to that
point only.  syntax-propertize--done is updated only after the
syntax-table properties have been applied.  Or....

2. syntax-propertize-function's are banned from using syntax-ppss, the
documentation instead directing them to use parse-partial-sexp directly.

In either solution, the restriction on using syntax-ppss-flush-cache
would no longer be necessary, and there would be no restriction on
setting syntax-table text properties at an earlier position than the one
currently being analysed.

I think solution 2 is the better one.

--
Alan Mackenzie (Nuremberg, Germany).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31290; Package emacs. (Tue, 08 May 2018 12:36:02 GMT) Full text and rfc822 format available.

Message #8 received at 31290 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Alan Mackenzie <acm <at> muc.de>, 31290 <at> debbugs.gnu.org
Subject: Re: bug#31290: Fundamental bugs in syntax-propertize
Date: Tue, 8 May 2018 15:35:14 +0300

On 4/28/18 12:08 AM, Alan Mackenzie wrote:

> At least that would be true if syntax-propertize--done hadn't been
> prematurely and spuriously increased, crudely to prevent an infinite
> recursion, falsely indicating to the syntax-ppss infrastructure that the
> syntax-table properties have already been applied to the region (BEGIN
> END).
> 
>      .... but it should not call `syntax-ppss-flush-cache', ....
> 
> Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
> back to its true value, allowing the wrongly allowed syntax-ppss calls at
> a later position to cause a recursive loop.

Maybe we should "allow" it to loop, in certain cases? Leaving it to be 
the responsibility of the programmer, to make sure the result doesn't 
infloop, even if these rules are violated.

>      .... which means that it should not call `syntax-ppss' on some
>      position and later modify the buffer on some earlier position.
> 
> This is a bad restriction, because sometimes syntax-table properties can
> only be correctly determined by examining the syntax of later buffer
> positions.  An example of this is giving the string-fence syntax-table
> text property to an unbalanced opening string quote, but not to correctly
> matched quotes.

I'm not exactly convinced by the given example (why would we use the 
string-fence in that case?), but it might be better if something like 
this was possible, indeed.

> 2. syntax-propertize-function's are banned from using syntax-ppss, the
> documentation instead directing them to use parse-partial-sexp directly.

The ones that currently call syntax-ppss, can't simply switch over to 
parse-partial-sexp without becoming slower due to the lack of cache.

Before tackling this bug, I'd rather we see a real-world problem that it 
caused, and pick a particular approach based on it.

But off the top of my head, we could introduce a "stricter but somewhat 
slower" variation of syntax-ppss to be called inside 
syntax-propertize-function's, which would treat the values in question 
more carefully, somehow.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31290; Package emacs. (Sat, 12 May 2018 11:34:01 GMT) Full text and rfc822 format available.

Message #11 received at 31290 <at> debbugs.gnu.org (full text, mbox):

From: Alan Mackenzie <acm <at> muc.de>
To: Dmitry Gutov <dgutov <at> yandex.ru>
Cc: 31290 <at> debbugs.gnu.org
Subject: Re: bug#31290: Fundamental bugs in syntax-propertize
Date: Sat, 12 May 2018 11:26:12 +0000

Hello, Dmitry.

On Tue, May 08, 2018 at 15:35:14 +0300, Dmitry Gutov wrote:
> On 4/28/18 12:08 AM, Alan Mackenzie wrote:

> > At least that would be true if syntax-propertize--done hadn't been
> > prematurely and spuriously increased, crudely to prevent an infinite
> > recursion, falsely indicating to the syntax-ppss infrastructure that the
> > syntax-table properties have already been applied to the region (BEGIN
> > END).

> >      .... but it should not call `syntax-ppss-flush-cache', ....

> > Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
> > back to its true value, allowing the wrongly allowed syntax-ppss calls at
> > a later position to cause a recursive loop.

> Maybe we should "allow" it to loop, in certain cases? Leaving it to be 
> the responsibility of the programmer, to make sure the result doesn't 
> infloop, even if these rules are violated.

I'm not sure how this could work.  We would need to formalise the rules
very carefully, to avoid the need to read syntax.{c,el}'s source code.

> >      .... which means that it should not call `syntax-ppss' on some
> >      position and later modify the buffer on some earlier position.

> > This is a bad restriction, because sometimes syntax-table properties can
> > only be correctly determined by examining the syntax of later buffer
> > positions.  An example of this is giving the string-fence syntax-table
> > text property to an unbalanced opening string quote, but not to correctly
> > matched quotes.

> I'm not exactly convinced by the given example (why would we use the 
> string-fence in that case?), but it might be better if something like 
> this was possible, indeed.

String fence can be used to signal to font lock that the delimiter
(together with the "mismatching" unescaped EOL) should be fontified in
warning face.

A better example might be C++ Mode's marking of a "< ... >" pair with
paren syntax.  This isn't done with syntax-propertize-function (as you
know), but it would be nice if this were possible.

> > 2. syntax-propertize-function's are banned from using syntax-ppss, the
> > documentation instead directing them to use parse-partial-sexp directly.

> The ones that currently call syntax-ppss, can't simply switch over to 
> parse-partial-sexp without becoming slower due to the lack of cache.

The cache at the pertinent buffer position doesn't exist at the time:
consistent syntax-table properties aren't on the preceding buffer
positions.

> Before tackling this bug, I'd rather we see a real-world problem that it 
> caused, and pick a particular approach based on it.

My enhancements for bug#30393: "24.4; cperl-mode: indentation failure -
Documentation enhancements", where (almost) any change which affects the
syntactic state is programmed to call syntax-ppss-flush-cache from the C
level, clashes with the mechanism in this bug report.  Most of the time
it's fine, but when a change affecting the syntactic state is made from
inside a synax-propertize-function, Emacs goes into an infinite recursive
loop.

This isn't good.

> But off the top of my head, we could introduce a "stricter but somewhat 
> slower" variation of syntax-ppss to be called inside 
> syntax-propertize-function's, which would treat the values in question 
> more carefully, somehow.

That's an idea worth exploring.

-- 
Alan Mackenzie (Nuremberg, Germany).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31290; Package emacs. (Sun, 13 May 2018 07:33:01 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#31290: Fundamental bugs in syntax-propertize
Date: Sun, 13 May 2018 09:33:20 +0200

On 12.05.2018 13:26, Alan Mackenzie wrote:
> Hello, Dmitry.
> 
> On Tue, May 08, 2018 at 15:35:14 +0300, Dmitry Gutov wrote:
>> On 4/28/18 12:08 AM, Alan Mackenzie wrote:
> 
>>> At least that would be true if syntax-propertize--done hadn't been
>>> prematurely and spuriously increased, crudely to prevent an infinite
>>> recursion, falsely indicating to the syntax-ppss infrastructure that the
>>> syntax-table properties have already been applied to the region (BEGIN
>>> END).
> 
>>>       .... but it should not call `syntax-ppss-flush-cache', ....
> 
>>> Why not?  Because syntax-ppss-flush-cache sets syntax-propertize--done
>>> back to its true value, allowing the wrongly allowed syntax-ppss calls at
>>> a later position to cause a recursive loop.
> 
>> Maybe we should "allow" it to loop, in certain cases? Leaving it to be
>> the responsibility of the programmer, to make sure the result doesn't
>> infloop, even if these rules are violated.
> 
> I'm not sure how this could work.  We would need to formalise the rules
> very carefully, to avoid the need to read syntax.{c,el}'s source code.
> 
>>>       .... which means that it should not call `syntax-ppss' on some
>>>       position and later modify the buffer on some earlier position.
> 
>>> This is a bad restriction, because sometimes syntax-table properties can
>>> only be correctly determined by examining the syntax of later buffer
>>> positions.  An example of this is giving the string-fence syntax-table
>>> text property to an unbalanced opening string quote, but not to correctly
>>> matched quotes.
> 
>> I'm not exactly convinced by the given example (why would we use the
>> string-fence in that case?), but it might be better if something like
>> this was possible, indeed.
> 
> String fence can be used to signal to font lock that the delimiter
> (together with the "mismatching" unescaped EOL) should be fontified in
> warning face.
> 
> A better example might be C++ Mode's marking of a "< ... >" pair with
> paren syntax.  This isn't done with syntax-propertize-function (as you
> know), but it would be nice if this were possible.
> 
>>> 2. syntax-propertize-function's are banned from using syntax-ppss, the
>>> documentation instead directing them to use parse-partial-sexp directly.
> 
>> The ones that currently call syntax-ppss, can't simply switch over to
>> parse-partial-sexp without becoming slower due to the lack of cache.
> 
> The cache at the pertinent buffer position doesn't exist at the time:
> consistent syntax-table properties aren't on the preceding buffer
> positions.
> 
>> Before tackling this bug, I'd rather we see a real-world problem that it
>> caused, and pick a particular approach based on it.
> 
> My enhancements for bug#30393: "24.4; cperl-mode: indentation failure -
> Documentation enhancements", where (almost) any change which affects the
> syntactic state is programmed to call syntax-ppss-flush-cache from the C
> level, clashes with the mechanism in this bug report.  Most of the time
> it's fine, but when a change affecting the syntactic state is made from
> inside a synax-propertize-function, Emacs goes into an infinite recursive
> loop.
> 
> This isn't good.
> 
>> But off the top of my head, we could introduce a "stricter but somewhat
>> slower" variation of syntax-ppss to be called inside
>> syntax-propertize-function's, which would treat the values in question
>> more carefully, somehow.
> 
> That's an idea worth exploring.
> 

Hi folks,

from what I've seen month ago just may stress the term fundamental.
Gave up to follow details WRT to check-ins made.

That part needs some person treating the gordic knot according to its 
quality...

This bug report was last modified 7 years and 96 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #31290 Fundamental bugs in syntax-propertize

GNU bug report logs - #31290
Fundamental bugs in syntax-propertize