GNU bug report logs -
#56682
Fix the long lines font locking related slowdowns
Previous Next
Full log
View this message in rfc822 format
On 14.08.2022 20:59, Eli Zaretskii wrote:
>> Date: Sun, 14 Aug 2022 20:47:40 +0300
>> Cc: 56682 <at> debbugs.gnu.org, gregory <at> heytings.org, monnier <at> iro.umontreal.ca
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>>
>>> The better way is to acknowledge that some inaccuracies are acceptable
>>> in those cases. With that in mind, one can design a syntax analyzer
>>> that looks back only a short ways, until it finds some place that
>>> could reasonably serve as an anchor point for heuristic decisions
>>> about whether we are inside or outside a string or comment, and then
>>> verifying that guess with some telltale syntactic elements that follow
>>> (like semi-colons or comment-end delimiters in C). While this kind of
>>> heuristics can sometimes fail, if they only fail rarely, the result is
>>> a huge win.
>>
>> You cannot design a language-agnostic syntax analyzer like that.
>
> _I_ cannot, but hopefully someone else will.
That seems unlikely. Nothing's impossible, of course, but I wouldn't
want to wait for such an invention to come up before we make the
decision on how to proceed now.
What _can_ be done is make syntax-ppss's cache invalidations more local
by introducing a "repair" step. That would only speed up certain
operations, at most, and the initial wait near EOB can't be avoided this
way.
>>> In any case, the way to speed up these cases is to look at the profile
>>> and identify the code that is slowing us down; then attempt to make it
>>> faster. (20 sec is actually long enough for us to interrupt Emacs
>>> under a debugger and look at the backtrace to find the culprit.)
>>
>> I've profiled and benchmarked this scenario already: all of the delay
>> (17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot.
>
> Before we get to 1GB files, there are 20MB files and 250MB files. I
> found quite a few low-hanging fruit in those that are worth plucking,
> while we wait for parse-partial-sexp to get its act together.
Definitely.
But when the profiler output in a 1 GB file comes down to syntax-ppss
only, that means the low-handing fruit has been picked.
>>> If that solves the problems in a reasonable way for very long lines,
>>> maybe we will eventually have such an option.
>>
>> Can I merge the branch, then?
>
> Please wait until I have time to review it.
>
>> I was hoping for a stylistic review, perhaps. Like, whether you like the
>> name of the variable, and should it be split in two.
>>
>> A change of the default value(s) is on the table too.
>
> Will definitely do, I'm just busy with "other things" right now, most
> of them related to other aspects of long lines.
Roger that.
>>> One such major mode and one such file was presented long ago : a
>>> single-line XML file.
>>
>> XMl is indeed slower. It takes almost 3 seconds for me to scroll to the
>> end of a 20 MB XML file.
>>
>> Most of it comes from sgml--syntax-propertize-ppss, which is probably
>> justified: XML is a more complex language.
>
> Did you wait till nxml-mode did its initial scan and displayed "Valid"
> in the mode line? The performance is quite different before and after
> that.
It takes a while to switch from "Validated: 0" to "Valid", but the
performance seems about the same in both states.
Maybe some other example file would show different behavior, IDK.
>> But other than the initial delay, scrolling, and isearch, and local
>> editing, all work fast, unlike the original situation with JSON.
>
> With which branch?
scratch/font_lock_large_files, with 'emacs -Q'
I've also run this test on master now, and M-> is not instant there
either. Apparently, a fair amount time is also spent in
nxml-extend-region (which calls sgml-syntax-propertize and syntax-ppss).
Not sure why it would spend any significant time in either, though, if
they're called inside a narrowing.
This bug report was last modified 2 years and 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.