GNU bug report logs - #16800
24.3; flyspell works slow on very short words at the end of big file

Previous Next

Package: emacs;

Reported by: Aleksey Cherepanov <aleksey.4erepanov <at> gmail.com>

Date: Tue, 18 Feb 2014 20:59:02 UTC

Severity: normal

Found in version 24.3

Fixed in version 24.5

Done: Agustin Martin <agustin6martin <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Aleksey Cherepanov <aleksey.4erepanov <at> gmail.com>
To: Agustin Martin <agustin.martin <at> hispalinux.es>
Cc: 16800 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: bug#16800: 24.3; flyspell works slow on very short words at the end of big file
Date: Mon, 24 Feb 2014 03:02:51 +0400
I've performed some tests against my .org file (not in emacs -Q):

(insert
 (mapconcat (lambda (re)
              (save-excursion
                (let ((time (current-time))
                      (count 0))
                  (while (re-search-backward re nil t)
                    (setq count (1+ count)))
                  (format "%d: %S :: %s" count (subtract-time (current-time) time) re))))
            '("\\<[[:alpha:]]"
              "\\b[[:alpha:]]"
              "\\([^[:alpha:]]\\|\\b\\)[[:alpha:]]"
              "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\|\\`\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\)[[:alpha:]]"
              "[^[:alpha:]][[:alpha:]]"
              "\\(?:\\b\\|'\\)[[:alpha:]]"
              "\\(?:[^[:alpha:]]\\|\\`\\)\\([[:alpha:]]+\\)"
              "\\([^[:alpha:]]\\|\\`\\)\\(?:[[:alpha:]]+\\)"
              "\\([^[:alpha:]]\\|\\`\\)[[:alpha:]]+")
            "\n"))

Matches| Time              | Regexp tried
299158: (0 2 841190 614000) :: \<[[:alpha:]]
299158: (0 2 876846 547000) :: \b[[:alpha:]]
307919: (0 3 321676 163000) :: \([^[:alpha:]]\|\b\)[[:alpha:]]
307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]]
307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]]
299518: (0 2 998895 976000) :: \(?:\b\|'\)[[:alpha:]]
307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\)
307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\)
307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+

I should admit that word search breaks things even for setup with
[[:alpha:]]: a0a is 1 word for emacs and 2 for flyspell. I missed it
because Russian behaves differently (there is word boundary on border
between digits and Russian letters). My bad.

307899: (0 2 760125 839000) :: \(?:[^[:alpha:]]\)[[:alpha:]]
307899: (0 2 765410 758000) :: [^[:alpha:]][[:alpha:]]
These two suggest that it may provide a speed up if we do not check
beginning of buffer in regexp but check it separately. But I doubt it
is worth it.

On Sun, Feb 23, 2014 at 11:56:59PM +0400, Aleksey Cherepanov wrote:
> Also not capturing group ("\\(?:") could be used because we do not
> need a match data of the first group. It should work faster but I
> don't really know.

307899: (0 3 291931 838000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]
307899: (0 2 821347 257000) :: \(?:[^[:alpha:]]\|\`\)[[:alpha:]]
The test shows that not capturing group is faster.

> Maybe it would be faster to not capture word but capture one char or
> void but I doubt the difference would be noticable.

307899: (0 3 174172 939000) :: \(?:[^[:alpha:]]\|\`\)\([[:alpha:]]+\)
307899: (0 3 250515 907000) :: \([^[:alpha:]]\|\`\)\(?:[[:alpha:]]+\)
307899: (0 3 218270 136000) :: \([^[:alpha:]]\|\`\)[[:alpha:]]+
Unexpectedly capturing of word works a bit faster. Maybe it is not a
word but the second group and it would work differently for search
forward. Or alpha+ instead of fixed word caused it. Anyway the
difference is very small.

Capturing word allows us to make a function to wrap a word into regexp
like word-search-regexp function wraps a word for
word-search-forward/-backward functions.

> I guess that \b would work faster than the group so we could have 'if'
> statement around the whole loop that has one implementation with \b
> for case when casechars are "[[:alpha:]]" and not-casechars are
> "[^[:alpha:]]" and another implementation as above for other cases.
> But it seems cumbersome.

My guess is wrong: \b works slower than the group. Also it is
inappropriate at all.

Thanks!

-- 
Regards,
Aleksey Cherepanov




This bug report was last modified 10 years and 137 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.