GNU bug report logs - #57245
29.0.50; M-> in a large XML file (without long lines) is slow

Previous Next

Package: emacs;

Reported by: Dmitry Gutov <dgutov <at> yandex.ru>

Date: Tue, 16 Aug 2022 14:35:02 UTC

Severity: normal

Found in version 29.0.50

Full log


View this message in rfc822 format

From: Dmitry Gutov <dgutov <at> yandex.ru>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 57245 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: bug#57245: 29.0.50; M-> in a large XML file (without long lines) is slow
Date: Wed, 17 Aug 2022 15:14:07 +0300
On 17.08.2022 14:24, Eli Zaretskii wrote:
>> Date: Tue, 16 Aug 2022 22:32:23 +0300
>> Cc: 57245 <at> debbugs.gnu.org
>> From: Dmitry Gutov <dgutov <at> yandex.ru>
>>
>> On 16.08.2022 19:54, Eli Zaretskii wrote:
>>> Stefan, can you see why syntax-related stuff in sgml-mode is so heavy
>>> here?
>>
>> nxml-syntax-propertize might well be heavier than average, but the delay
>> scales linearly with the size of the file.
> 
> Which is generally not a good scaling factor, especially if the
> coefficient is quite large (as it seems to be in this case).

Someone can work on the coefficient, but any accurate parser has to scan 
the buffer from the beginning. At least once.

Migration to tree-sitter might give us a better coefficient later, but 
the principle will remain.

>> Which seems to be exactly the behavior the "font-lock narrowing" was
>> supposed to guard from?
> 
> No.  It wasn't supposed to fix modes that foolishly scan the buffer
> from BOB to point.

You might want to choose words better.

> It was supposed to fix modes which scan from the
> beginning of line, and that is (a) only a problem when lines are very
> long, and (b) much harder to solve in the mode itself, because
> font-lock very frequently uses anchored regexps and otherwise likes to
> start from BOL, and syntax processing also likes starting from BOL.

syntax-wholelines-max handles that problem.

Though it might depend on what you mean by "anchored regexps".

> Btw, does nXML and/or sgml-mode use libxml for their analysis?  If
> not, why not? wouldn't that be faster (and possibly more accurate)?

Might be "a simple matter of coding".

But we do need syntax-propertize to run, so that the user commands can 
rely on proper syntax information in the buffer. It remains to be seen 
whether xml-parse-region is a good base for nxml-syntax-propertize, and 
how much of a performance improvement it can bring (with all the string 
marshaling around).

nxml also probably handles invalid documents better, which might or 
might not be important.




This bug report was last modified 2 years and 304 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.