#16800 - 24.3; flyspell works slow on very short words at the end of big file

GNU bug report logs - #16800
24.3; flyspell works slow on very short words at the end of big file

Package: emacs;

Reported by: Aleksey Cherepanov <aleksey.4erepanov <at> gmail.com>

Date: Tue, 18 Feb 2014 20:59:02 UTC

Severity: normal

Found in version 24.3

Fixed in version 24.5

Done: Agustin Martin <agustin6martin <at> gmail.com>

Bug is archived. No further changes may be made.

Message #47 received at 16800 <at> debbugs.gnu.org (full text, mbox):

From: Aleksey Cherepanov <aleksey.4erepanov <at> gmail.com> To: Agustin Martin <agustin.martin <at> hispalinux.es> Cc: 16800 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org> Subject: Re: bug#16800: 24.3; flyspell works slow on very short words at the end of big file Date: Sun, 23 Feb 2014 23:56:59 +0400

On Sun, Feb 23, 2014 at 02:26:00AM +0100, Agustin Martin wrote: > 2014-02-22 22:03 GMT+01:00 Eli Zaretskii <eliz <at> gnu.org>: > > > > Date: Sat, 22 Feb 2014 22:55:11 +0400 > > > From: Aleksey Cherepanov <aleksey.4erepanov <at> gmail.com> > > > > > > > > Emacs words are language sensitive too. > > > > > > > > But not in the same way as ispell/flyspell is. The CASECHARS, > > > > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only > > > > taken into account by ispell/flyspell. > > > > > > I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...) > > > So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is > > > one word for emacs and two words with garbage around for flyspell. I > > > think my solution fails in such case. > > > > It's more complex than that: with some languages, and at least with > > aspell, we take these parameters from the dictionary. So they cannot > > be known in advance in some cases. > > > > Hi, > > Not yet sure if I am missing something important, but I am playing with a > regexp search in flyspell-word-search-* functions based on what flyspell > thinks is the word to spellcheck (`word') and what thinks should not be > part of a word (`NOTCASECHARS'). Since no OTHERCHARS is used there may be > some intermediate matches being false positives that will be discarded once > flyspell-word checks them. > > I have tested this in Alekseys's file and is apparently working well and in > this particular case with much better efficiency. Need to think about more > ad-hoc situations where it may fail or slow down things. Suggestions for > possible failures are welcome. > > Patch is attached. I did the tests against an old and patched version of > flyspell.el (that shipped with Debian stable) and built the patch for it. > Should apply and work similarly in trunk's flyspell.el. > > --- flyspell.el.orig 2014-02-23 02:17:03.680107519 +0100 > +++ flyspell.el 2014-02-23 02:50:50.634625248 +0100 > @@ -1050,8 +1050,19 @@ > (save-excursion > (let ((r '()) > (inhibit-point-motion-hooks t) > + (flyspell-not-casechars (flyspell-get-not-casechars)) I'd move concat here too so it is out of inner loop. > p) > - (while (and (not r) (setq p (search-backward word bound t))) > + (while > + (and (not r) > + (setq p > + (re-search-backward > + (concat > + "\\(" flyspell-not-casechars "\\|\\b\\)" I think \b here could be replaced with \` (beginning of buffer). I think it is the only boundary we need that is not described by not-casechars, word sequence. Similarly \' (end of buffer) could be used for forward search. Also not capturing group ("\\(?:") could be used because we do not need a match data of the first group. It should work faster but I don't really know. Maybe it would be faster to not capture word but capture one char or void but I doubt the difference would be noticable. > + "\\(" word "\\)" I think regexp-quote around the word is necessary here. > + flyspell-not-casechars > + ) > + bound t))) > + (goto-char (match-beginning 2)) s/2/1/ if the first group is not capturing. > (let ((lw (flyspell-get-word))) > (if (and (consp lw) > (if ignore-case > @@ -1068,8 +1079,19 @@ > (save-excursion > (let ((r '()) > (inhibit-point-motion-hooks t) > + (flyspell-not-casechars (flyspell-get-not-casechars)) concat here as above. > p) > - (while (and (not r) (setq p (search-forward word bound t))) > + (while > + (and (not r) > + (setq p > + (re-search-forward > + (concat > + flyspell-not-casechars > + "\\(" word "\\)" regexp-quote as above. > + "\\(" flyspell-not-casechars "\\|\\b\\)" I think \b could be replaced by \' here as described above. The second group could be not capturing here. > + ) > + bound t))) > + (goto-char (match-beginning 1)) I guess match-end should here. > (let ((lw (flyspell-get-word))) > (if (and (consp lw) (string-equal (car lw) word)) > (setq r p) I guess that \b would work faster than the group so we could have 'if' statement around the whole loop that has one implementation with \b for case when casechars are "[[:alpha:]]" and not-casechars are "[^[:alpha:]]" and another implementation as above for other cases. But it seems cumbersome. Thanks! -- Regards, Aleksey Cherepanov

This bug report was last modified 10 years and 136 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16800 24.3; flyspell works slow on very short words at the end of big file

GNU bug report logs - #16800
24.3; flyspell works slow on very short words at the end of big file