GNU bug report logs - #67262
python-ts-mode cannot identify triple-quoted-strings

Previous Next

Package: emacs;

Reported by: JD Smith <jdtsmith <at> gmail.com>

Date: Sat, 18 Nov 2023 15:53:02 UTC

Severity: normal

Merged with 67394

Found in version 29.1.90

Done: Yuan Fu <casouri <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 67262 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: JD Smith <jdtsmith <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 Yuan Fu <casouri <at> gmail.com>
Cc: 67262 <at> debbugs.gnu.org
Subject: Re: bug#67262: python-ts-mode cannot identify triple-quoted-strings
Date: Sun, 26 Nov 2023 04:04:07 +0200
[Message part 1 (text/plain, inline)]
On 25/11/2023 16:42, JD Smith wrote:
> Bridging emacs syntax to treesitter in a robust way seems like it could be a subtle enterprise, so I’d prefer to leave that to one of the experts.  Right now the syntax-propertize-function in python-mode does one simple thing: ensure triple quotes are properly marked as strings.  Since the treesitter grammar doesn’t distinguish between different flavors of strings, something similar would still be needed, if we want to continue to treat various string flavors distinctly using syntax.
> 
> Is moving all syntactification (beyond just font-lock) over to TS an explicit goal for all the *-ts-mode’s?

It would make sense - since this way we would only have one source of 
syntax-recognition bugs, rather than two (both the grammar and the 
definition in Elisp).

Attached is a patch you can try (that uses treesit for s-p-f).

Unfortunately, it's not quite perfect (nor is python-syntax-stringify, 
according to its FIXME inside): after certain modifications, the 
syntax-table property is not applied.

I've done some print-debugging in python--treesit-parser-after-change, 
and it looks like the problem is this: in certain cases (e.g. when 
electric-pair-post-self-insert-function fires) the parser notifier fires 
only after syntax-propertize has been called -- and it fires inside of 
it. Meaning it's too late to flush the syntax-propertize cache at that 
point.

The reason for it is, overall, the fast that we're trigger parser's 
after-change notifiers lazily: only after some other feature has to 
initialize the parser, calling treesit_ensure_parsed from 
treesit-parser-root-node.

I think bug#66732 might also be a variation of this problem.

As for what to do about this one -- probably something involving 
syntax-propertize-extend-region-functions, adding an entry which would 
initialize the parser, but not call syntax-ppss-flush-cache directly (or 
at least not just that). It would signal the earlier position to extend 
to through some dynamic variable. This is getting tricky enough to move 
from the individual major modes into treesit.el proper, I think.

Yuan and others, thoughts welcome.

JD, I do believe the attached patch is TRT (or close to it), but 
depending on how it works for you, and how quickly we deal with the 
above problem, it might make sense to enact your original suggestion first.

And finally, here's the backtrace that led me to the above conclusions:

  backtrace()
  (message "in progress, backtrace %s" (backtrace))
  (progn (message "in progress, backtrace %s" (backtrace)))
  (if (syntax-propertize--in-process-p) (progn (message "in progress, 
backtrace %s" (backtrace))))
  (save-current-buffer (set-buffer (treesit-parser-buffer parser)) 
(message "flushing %s up to %s" ranges (let* ((--cl-var-- ranges) (r 
nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car 
--cl-var--)) (let* ((temp (car r))) (setq --cl-var-- (if --cl-var-- (min 
--cl-var-- temp) temp))) (setq --cl-var-- (cdr --cl-var--))) 
--cl-var--)) (syntax-ppss-flush-cache (let* ((--cl-var-- ranges) (r nil) 
(--cl-var-- nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) 
(let* ((temp (car r))) (setq --cl-var-- (if --cl-var-- (min --cl-var-- 
temp) temp))) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) (if 
(syntax-propertize--in-process-p) (progn (message "in progress, 
backtrace %s" (backtrace)))) (message "flushed up to %d, %s" 
syntax-propertize--done syntax-ppss-wide))
  (progn (save-current-buffer (set-buffer (treesit-parser-buffer 
parser)) (message "flushing %s up to %s" ranges (let* ((--cl-var-- 
ranges) (r nil) (--cl-var-- nil)) (while (consp --cl-var--) (setq r (car 
--cl-var--)) (let* ((temp ...)) (setq --cl-var-- (if --cl-var-- ... 
temp))) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) 
(syntax-ppss-flush-cache (let* ((--cl-var-- ranges) (r nil) (--cl-var-- 
nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* ((temp 
...)) (setq --cl-var-- (if --cl-var-- ... temp))) (setq --cl-var-- (cdr 
--cl-var--))) --cl-var--)) (if (syntax-propertize--in-process-p) (progn 
(message "in progress, backtrace %s" (backtrace)))) (message "flushed up 
to %d, %s" syntax-propertize--done syntax-ppss-wide)))
  (if ranges (progn (save-current-buffer (set-buffer 
(treesit-parser-buffer parser)) (message "flushing %s up to %s" ranges 
(let* ((--cl-var-- ranges) (r nil) (--cl-var-- nil)) (while (consp 
--cl-var--) (setq r (car --cl-var--)) (let* (...) (setq --cl-var-- ...)) 
(setq --cl-var-- (cdr --cl-var--))) --cl-var--)) 
(syntax-ppss-flush-cache (let* ((--cl-var-- ranges) (r nil) (--cl-var-- 
nil)) (while (consp --cl-var--) (setq r (car --cl-var--)) (let* (...) 
(setq --cl-var-- ...)) (setq --cl-var-- (cdr --cl-var--))) --cl-var--)) 
(if (syntax-propertize--in-process-p) (progn (message "in progress, 
backtrace %s" (backtrace)))) (message "flushed up to %d, %s" 
syntax-propertize--done syntax-ppss-wide))))
  python--treesit-parser-after-change(((27 . 50)) #<treesit-parser for 
python>)
  treesit-buffer-root-node(python)
  treesit-node-at(42)
  (let ((node (treesit-node-at (point)))) (cond ((equal 
(treesit-node-type node) "string_content") (put-text-property (- (point) 
3) (- (point) 2) 'syntax-table (string-to-syntax "|"))) ((and (equal 
(treesit-node-type node) "\"") (= (treesit-node-start node) (- (point) 
3))) (put-text-property (1- (point)) (point) 'syntax-table 
(string-to-syntax "|")))))
  (cond (t (message "pt %s" (point)) (let ((node (treesit-node-at 
(point)))) (cond ((equal (treesit-node-type node) "string_content") 
(put-text-property (- (point) 3) (- (point) 2) 'syntax-table 
(string-to-syntax "|"))) ((and (equal (treesit-node-type node) "\"") (= 
(treesit-node-start node) (- ... 3))) (put-text-property (1- (point)) 
(point) 'syntax-table (string-to-syntax "|")))))))
  (while (and (< (point) end) (re-search-forward "\\(?:\"\"\"\\|'''\\)" 
end t)) (cond (t (message "pt %s" (point)) (let ((node (treesit-node-at 
(point)))) (cond ((equal (treesit-node-type node) "string_content") 
(put-text-property (- ... 3) (- ... 2) 'syntax-table (string-to-syntax 
"|"))) ((and (equal ... "\"") (= ... ...)) (put-text-property (1- ...) 
(point) 'syntax-table (string-to-syntax "|"))))))))
  (closure (t) (start end) (goto-char start) (while (and (< (point) 
end) (re-search-forward "\\(?:\"\"\"\\|'''\\)" end t)) (cond (t (message 
"pt %s" (point)) (let ((node ...)) (cond (... ...) (... ...)))))))(39 50)
  funcall((closure (t) (start end) (goto-char start) (while (and (< 
(point) end) (re-search-forward "\\(?:\"\"\"\\|'''\\)" end t)) (cond (t 
(message "pt %s" (point)) (let ((node ...)) (cond (... ...) (... 
...))))))) 39 50)
  python--treesit-syntax-propertize-function-1(39 50)
  syntax-propertize(42)
  syntax-ppss(42)
  electric-pair-syntax-info(39)
  electric-pair-post-self-insert-function()
  self-insert-command(1 39)
  funcall-interactively(self-insert-command 1 39)
  #<subr call-interactively>(self-insert-command nil nil)
  call-interactively <at> ido-cr+-record-current-command(#<subr 
call-interactively> self-insert-command nil nil)
  apply(call-interactively <at> ido-cr+-record-current-command #<subr 
call-interactively> (self-insert-command nil nil))
  call-interactively(self-insert-command nil nil)
  command-execute(self-insert-command)
[python--treesit-syntax-propertize-function.diff (text/x-patch, attachment)]

This bug report was last modified 1 year and 152 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.