GNU bug report logs - #63622
lisp/progmodes/python.el: performance regression introduced by multiline font-lock

Previous Next

Package: emacs;

Reported by: Tom Gillespie <tgbugs <at> gmail.com>

Date: Sun, 21 May 2023 03:15:02 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #32 received at 63622 <at> debbugs.gnu.org (full text, mbox):

From: kobarity <kobarity <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Tom Gillespie <tgbugs <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>,
 63622 <at> debbugs.gnu.org
Subject: Re: bug#63622: lisp/progmodes/python.el: performance regression
 introduced by multiline font-lock
Date: Wed, 24 May 2023 00:45:27 +0900
[Message part 1 (text/plain, inline)]
Stefan Monnier wrote:
> > @@ -6708,8 +6710,8 @@ python-mode
> >                  nil nil nil nil
> >                  (font-lock-syntactic-face-function
> >                   . python-font-lock-syntactic-face-function)
> > -                (font-lock-extend-after-change-region-function
> > -                 . python-font-lock-extend-region)))
> > +                (font-lock-extend-region-functions
> > +                 . (python-font-lock-extend-region))))
> >    (setq-local syntax-propertize-function
> >                python-syntax-propertize-function)
> >    (setq-local imenu-create-index-function
> 
> This is bound to break in some cases.  Please use `add-hook` instead.

Thanks.  I also revised python-font-lock-extend-region.  The attached
0001-Use-font-lock-extend-region-functions-in-python-mode.patch is the
revised patch.

As for the performance degradation, I am almost certain that it is a
bug in python-info-docstring-p and/or python-rx string-delimiter.
python-info-docstring-p uses the following regex as one of the
conditions to determine if it is a docstring.

          (re (concat "[uU]?[rR]?"
                      (python-rx string-delimiter))))

One of the problems is that string-delimiter matches even if there is
a single character preceding the string.  For example,
string-delimiter matches both x" and 1".  It might be reasonable to
match r" or b", for example.  Because they are correct string starter
in Python.  However, there is no reason to match x" or 1".  Besides,
python-info-docstring-p explicitly states "[uU]?[rR]?" as above.  So
(python-rx string-delimiter) in the above code should only match
string delimiter without prefixes.  Current python-info-docstring-p
incorrectly considers a code like [""] or {""} to be a docstring.

The performance problem in the example shown by Tom can be resolved by
modifying the above code as follows:

          (re (concat "[uU]?[rR]?"
                      (rx (or "\"\"\"" "\"" "'''" "'")))))

It breaks some ERTs, but I think we should fix the ERTs.  However,
there seems to be another problem in python-info-docstring-p.  It
intentionally considers contiguous strings as docstring as in the ERT
python-info-docstring-p-1:

'''
Module Docstring Django style.
'''
u'''Additional module docstring.'''
'''Not a module docstring.'''

However, as far as I have tried with Python 3 and Python 2.7, this is
not correct.  Therefore, I feel it is better to mark the ERTs as
expected failures than to modify it at this stage.  The attached
0001-Fix-python-info-docstring-p.patch is the patch to do this.
[0001-Use-font-lock-extend-region-functions-in-python-mode.patch (application/octet-stream, attachment)]
[0001-Fix-python-info-docstring-p.patch (application/octet-stream, attachment)]

This bug report was last modified 1 year and 361 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.