GNU bug report logs -
#25706
26.0.50; Slow C file fontification
Previous Next
Reported by: Sujith <m.sujith <at> gmail.com>
Date: Mon, 13 Feb 2017 18:41:01 UTC
Severity: normal
Tags: moreinfo
Found in version 26.0.50
Done: Alan Mackenzie <acm <at> muc.de>
Bug is archived. No further changes may be made.
Full log
Message #107 received at 25706 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
First, some Emacs regexp basics:
1. If A and B match single characters, then A\|B should be written [AB] whenever possible. The reason is that A\|B adds a backtrack record which uses stack space and wastes time if matching fails later on. The cost can be quite noticeable, which we have seen.
2. Syntax-class constructs are usually better written as character alternatives when possible.
The \sX construct, for some X, is typically somewhat slower to match than explicitly listing the characters to match. For example, if all you care about are space and tab, then "\\s *" should be written "[ \t]*".
3. Unicode character classes are slower to match than ASCII-only ones. For example, [[:alpha:]] is slower than [A-Za-z], assuming only those characters are of interest.
4. [^...] will match \n unless included in the set. For example, "[^a]\\|$" will almost never match the $ (end-of-line) branch, because a newline will be matched by the first branch. The only exception is at the very end of the buffer if it is not newline-terminated, but that is rarely worth considering for source code.
5. \r (carriage return) normally doesn't appear in buffers even if the file uses DOS line endings. Line endings are converted into a single \n (newline) when the buffer is read. In particular, $ does NOT match at \r, only before \n.
When \r appears it is usually because the file contains a mixture of line-ending styles, typically from being edited using broken tools. Whether you want to take such files into account is a matter of judgement; most modes don't bother.
6. Capturing groups costs more than non-capturing groups, but you already know that.
On to specifics: here are annotations for possible improvements in cc-langs.el. (I didn't bother about capturing groups here.)
[cc-regexp-annot.diff (application/octet-stream, attachment)]
This bug report was last modified 4 years and 213 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.