GNU bug report logs - #56682
Fix the long lines font locking related slowdowns

Previous Next

Package: emacs;

Reported by: Gregory Heytings <gregory <at> heytings.org>

Date: Thu, 21 Jul 2022 18:01:01 UTC

Severity: normal

Done: Gregory Heytings <gregory <at> heytings.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56682 <at> debbugs.gnu.org, gregory <at> heytings.org, monnier <at> iro.umontreal.ca
Subject: bug#56682: Fix the long lines font locking related slowdowns
Date: Sun, 14 Aug 2022 23:46:13 +0300
On 14.08.2022 20:59, Eli Zaretskii wrote:
>> Date: Sun, 14 Aug 2022 20:47:40 +0300
>> Cc: 56682 <at> debbugs.gnu.org, gregory <at> heytings.org, monnier <at> iro.umontreal.ca
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>>
>>> The better way is to acknowledge that some inaccuracies are acceptable
>>> in those cases.  With that in mind, one can design a syntax analyzer
>>> that looks back only a short ways, until it finds some place that
>>> could reasonably serve as an anchor point for heuristic decisions
>>> about whether we are inside or outside a string or comment, and then
>>> verifying that guess with some telltale syntactic elements that follow
>>> (like semi-colons or comment-end delimiters in C).  While this kind of
>>> heuristics can sometimes fail, if they only fail rarely, the result is
>>> a huge win.
>>
>> You cannot design a language-agnostic syntax analyzer like that.
> 
> _I_ cannot, but hopefully someone else will.

That seems unlikely. Nothing's impossible, of course, but I wouldn't 
want to wait for such an invention to come up before we make the 
decision on how to proceed now.

What _can_ be done is make syntax-ppss's cache invalidations more local 
by introducing a "repair" step. That would only speed up certain 
operations, at most, and the initial wait near EOB can't be avoided this 
way.

>>> In any case, the way to speed up these cases is to look at the profile
>>> and identify the code that is slowing us down; then attempt to make it
>>> faster.  (20 sec is actually long enough for us to interrupt Emacs
>>> under a debugger and look at the backtrace to find the culprit.)
>>
>> I've profiled and benchmarked this scenario already: all of the delay
>> (17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot.
> 
> Before we get to 1GB files, there are 20MB files and 250MB files.  I
> found quite a few low-hanging fruit in those that are worth plucking,
> while we wait for parse-partial-sexp to get its act together.

Definitely.

But when the profiler output in a 1 GB file comes down to syntax-ppss 
only, that means the low-handing fruit has been picked.

>>> If that solves the problems in a reasonable way for very long lines,
>>> maybe we will eventually have such an option.
>>
>> Can I merge the branch, then?
> 
> Please wait until I have time to review it.
> 
>> I was hoping for a stylistic review, perhaps. Like, whether you like the
>> name of the variable, and should it be split in two.
>>
>> A change of the default value(s) is on the table too.
> 
> Will definitely do, I'm just busy with "other things" right now, most
> of them related to other aspects of long lines.

Roger that.

>>> One such major mode and one such file was presented long ago : a
>>> single-line XML file.
>>
>> XMl is indeed slower. It takes almost 3 seconds for me to scroll to the
>> end of a 20 MB XML file.
>>
>> Most of it comes from sgml--syntax-propertize-ppss, which is probably
>> justified: XML is a more complex language.
> 
> Did you wait till nxml-mode did its initial scan and displayed "Valid"
> in the mode line?  The performance is quite different before and after
> that.

It takes a while to switch from "Validated: 0" to "Valid", but the 
performance seems about the same in both states.

Maybe some other example file would show different behavior, IDK.

>> But other than the initial delay, scrolling, and isearch, and local
>> editing, all work fast, unlike the original situation with JSON.
> 
> With which branch?

scratch/font_lock_large_files, with 'emacs -Q'

I've also run this test on master now, and M-> is not instant there 
either. Apparently, a fair amount time is also spent in 
nxml-extend-region (which calls sgml-syntax-propertize and syntax-ppss).

Not sure why it would spend any significant time in either, though, if 
they're called inside a narrowing.




This bug report was last modified 2 years and 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.