GNU bug report logs -
#38104
27.0.50; elixir-mode fontification is very slow
Previous Next
Reported by: Dmitry Gutov <dgutov <at> yandex.ru>
Date: Thu, 7 Nov 2019 15:41:02 UTC
Severity: normal
Found in version 27.0.50
Done: Dmitry Gutov <dgutov <at> yandex.ru>
Bug is archived. No further changes may be made.
Full log
Message #25 received at 38104-done <at> debbugs.gnu.org (full text, mbox):
Hi Mattias,
On 26.11.2019 21:32, Mattias EngdegÄrd wrote:
> As it turned out, rx is fine (now); elixir-mode, not quite. In elixir-mode.el, we have
>
> (identifiers . ,(rx (one-or-more (any "A-Z" "a-z" "_"))
> (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
> (optional (or "?" "!"))))
>
> First, this regex is suboptimal: the first character of an identifier should occur exactly once, or you get bad backtracking behaviour. Just remove the one-or-more construct:
>
> (identifiers . ,(rx (any "A-Z" "a-z" "_")
> (zero-or-more (any "A-Z" "a-z" "0-9" "_"))
> (optional (or "?" "!"))))
>
> This definition is then used in several places, but two in particular are of interest to us:
>
> ;; Module attributes
> (,(elixir-rx (and "@" (1+ identifiers)))
>
> The construct (1+ identifiers) was perhaps meant to match multiple identifiers, but it doesn't (no separator); it just matches an identifier in several ways, which again leads to bad backtracking behaviour.
> The same problem here:
>
> ;; Map keys
> (,(elixir-rx (group (and (one-or-more identifiers) ":")) space)
>
> Remove the 1+ and one-or-more and it's fast again.
That makes a lot of sense. I removed these one-or-more's and 1+ (and a
few others), and it became fast again.
I'll send a patch upstream. Thanks for your help!
(Looking at the tracker, they have a minor version of this change
submitted already).
> Why did this "work" with the old rx implementation? Because that code had a nasty bug: it does not bracket definitions in rx-constituents properly. Example:
>
> (let ((rx-constituents (cons '(hello . "HELLO") rx-constituents)))
> (rx-to-string '(1+ hello) t))
> => "HELLO+"
>
> The new rx implementation does not suffer from this bug.
>
> The result in your case is that the old rx, when translating (1+ identifiers), only tacked the "+" onto whatever regexp 'identifiers' produced, resulting in
>
> "[A-Z_a-z]+[0-9A-Z_a-z]*[!?]?+"
>
> which is a lot faster, since only the final [!?] is repeated twice (and it probably doesn't match very often).
It's funny to think how someone probably beaten the current code into
submission by trial and error.
This bug report was last modified 5 years and 235 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.