GNU bug report logs - #56682
Fix the long lines font locking related slowdowns

Previous Next

Package: emacs;

Reported by: Gregory Heytings <gregory <at> heytings.org>

Date: Thu, 21 Jul 2022 18:01:01 UTC

Severity: normal

Done: Gregory Heytings <gregory <at> heytings.org>

Bug is archived. No further changes may be made.

Full log


Message #1408 received at 56682 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56682 <at> debbugs.gnu.org, gregory <at> heytings.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#56682: Fix the long lines font locking related slowdowns
Date: Sun, 14 Aug 2022 20:47:40 +0300
On 14.08.2022 16:15, Eli Zaretskii wrote:

>> There's no point in doing that. Either we narrow to some area around
>> point (might even using a larger radius like 1 MB), or we we only
>> fontify up to some position. The former easily creates bad fontification.
>>
>> The alternative, of course, is to pay the price of syntax-ppss on larger
>> spans and wait the corresponding amount of time the first time the user
>> scrolls to EOB. That's what the current default on the branch does.
> 
> You are still thinking in terms of the original design of syntactical
> analysis which strives to produce 100% accurate results.  That design
> principle doesn't work with very long lines, so sticking to it would
> indeed lead us to give up on solving the problem.

s/very long lines/very large files

In any case, the "original design" is not going anywhere (as the only 
way to achieve correctness), and I'm talking in terms of balance between 
accuracy and performance. To use Gregory's narrowing approach in 
font-lock, checkout the branch under discussion 
(scratch/font_lock_large_files) and evaluate

  (setq font-lock-large-files '(narrow . 5000))

You'll see the same behavior as on master now (except narrowing isn't 
"hard"), with the same performance characteristics.

> The better way is to acknowledge that some inaccuracies are acceptable
> in those cases.  With that in mind, one can design a syntax analyzer
> that looks back only a short ways, until it finds some place that
> could reasonably serve as an anchor point for heuristic decisions
> about whether we are inside or outside a string or comment, and then
> verifying that guess with some telltale syntactic elements that follow
> (like semi-colons or comment-end delimiters in C).  While this kind of
> heuristics can sometimes fail, if they only fail rarely, the result is
> a huge win.

You cannot design a language-agnostic syntax analyzer like that. It's 
something every major mode would have to consider how to implement.

It's relatively easy to design for JSON (again) because the syntax is so 
simple, but for others -- not so much.

So we need to settle on the basic design first. The code on the branch 
includes the narrowing approach which is trivially extended to use the 
"find safe place" hook when it's available. But it won't be always 
available.

>> But as Gregory shows, when you get to _really_ large files (like 1 GB
>> JSON file in his example), pressing M-> will still make you wait (I have
>> to wait around 20 seconds).
> 
> Try with the latest master, it might have improved (fingers crossed).

All improvements are welcome, but that's unlikely:

> In any case, the way to speed up these cases is to look at the profile
> and identify the code that is slowing us down; then attempt to make it
> faster.  (20 sec is actually long enough for us to interrupt Emacs
> under a debugger and look at the backtrace to find the culprit.)

I've profiled and benchmarked this scenario already: all of the delay 
(17 seconds, to be precise) come from parse-partial-sexp. 1 GB is a lot.

>>>> So the "don't fontify past X" strategy is simply based on the idea
>>>> that no fontification is probably better than unreliable and
>>>> obviously incorrect one.
>>>
>>> I disagree with that idea, but if someone agrees with you, they can
>>> simply turn off font-lock.  As was already mentioned many times in
>>> this endless discussion.
>>
>> If someone agrees with me, they will simply be able to customize
>> font-lock-large-files to choose this strategy.
> 
> If that solves the problems in a reasonable way for very long lines,
> maybe we will eventually have such an option.

Can I merge the branch, then?

I was hoping for a stylistic review, perhaps. Like, whether you like the 
name of the variable, and should it be split in two.

A change of the default value(s) is on the table too.

>> I'm still waiting for people to come forward with other major modes
>> which have the same kind of problems. Preferably ones that are likely to
>> be used with large files.
> 
> One such major mode and one such file was presented long ago : a
> single-line XML file.

XMl is indeed slower. It takes almost 3 seconds for me to scroll to the 
end of a 20 MB XML file.

Most of it comes from sgml--syntax-propertize-ppss, which is probably 
justified: XML is a more complex language.

But other than the initial delay, scrolling, and isearch, and local 
editing, all work fast, unlike the original situation with JSON.




This bug report was last modified 2 years and 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.