GNU bug report logs -
#16800
24.3; flyspell works slow on very short words at the end of big file
Previous Next
Reported by: Aleksey Cherepanov <aleksey.4erepanov <at> gmail.com>
Date: Tue, 18 Feb 2014 20:59:02 UTC
Severity: normal
Found in version 24.3
Fixed in version 24.5
Done: Agustin Martin <agustin6martin <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
Message #47 received at 16800 <at> debbugs.gnu.org (full text, mbox):
On Sun, Feb 23, 2014 at 02:26:00AM +0100, Agustin Martin wrote:
> 2014-02-22 22:03 GMT+01:00 Eli Zaretskii <eliz <at> gnu.org>:
>
> > > Date: Sat, 22 Feb 2014 22:55:11 +0400
> > > From: Aleksey Cherepanov <aleksey.4erepanov <at> gmail.com>
> > >
> > > > > Emacs words are language sensitive too.
> > > >
> > > > But not in the same way as ispell/flyspell is. The CASECHARS,
> > > > NON-CASECHARS, and OTHERCHARS parameters of the dictionary are only
> > > > taken into account by ispell/flyspell.
> > >
> > > I think one could define a dictionary like: ("my" "[a]" "[^a]" "" ...)
> > > So the only letter for flyspell words is "a". That way "qqaaqqaaqq" is
> > > one word for emacs and two words with garbage around for flyspell. I
> > > think my solution fails in such case.
> >
> > It's more complex than that: with some languages, and at least with
> > aspell, we take these parameters from the dictionary. So they cannot
> > be known in advance in some cases.
> >
>
> Hi,
>
> Not yet sure if I am missing something important, but I am playing with a
> regexp search in flyspell-word-search-* functions based on what flyspell
> thinks is the word to spellcheck (`word') and what thinks should not be
> part of a word (`NOTCASECHARS'). Since no OTHERCHARS is used there may be
> some intermediate matches being false positives that will be discarded once
> flyspell-word checks them.
>
> I have tested this in Alekseys's file and is apparently working well and in
> this particular case with much better efficiency. Need to think about more
> ad-hoc situations where it may fail or slow down things. Suggestions for
> possible failures are welcome.
>
> Patch is attached. I did the tests against an old and patched version of
> flyspell.el (that shipped with Debian stable) and built the patch for it.
> Should apply and work similarly in trunk's flyspell.el.
>
> --- flyspell.el.orig 2014-02-23 02:17:03.680107519 +0100
> +++ flyspell.el 2014-02-23 02:50:50.634625248 +0100
> @@ -1050,8 +1050,19 @@
> (save-excursion
> (let ((r '())
> (inhibit-point-motion-hooks t)
> + (flyspell-not-casechars (flyspell-get-not-casechars))
I'd move concat here too so it is out of inner loop.
> p)
> - (while (and (not r) (setq p (search-backward word bound t)))
> + (while
> + (and (not r)
> + (setq p
> + (re-search-backward
> + (concat
> + "\\(" flyspell-not-casechars "\\|\\b\\)"
I think \b here could be replaced with \` (beginning of buffer). I
think it is the only boundary we need that is not described by
not-casechars, word sequence. Similarly \' (end of buffer) could be
used for forward search.
Also not capturing group ("\\(?:") could be used because we do not
need a match data of the first group. It should work faster but I
don't really know.
Maybe it would be faster to not capture word but capture one char or
void but I doubt the difference would be noticable.
> + "\\(" word "\\)"
I think regexp-quote around the word is necessary here.
> + flyspell-not-casechars
> + )
> + bound t)))
> + (goto-char (match-beginning 2))
s/2/1/ if the first group is not capturing.
> (let ((lw (flyspell-get-word)))
> (if (and (consp lw)
> (if ignore-case
> @@ -1068,8 +1079,19 @@
> (save-excursion
> (let ((r '())
> (inhibit-point-motion-hooks t)
> + (flyspell-not-casechars (flyspell-get-not-casechars))
concat here as above.
> p)
> - (while (and (not r) (setq p (search-forward word bound t)))
> + (while
> + (and (not r)
> + (setq p
> + (re-search-forward
> + (concat
> + flyspell-not-casechars
> + "\\(" word "\\)"
regexp-quote as above.
> + "\\(" flyspell-not-casechars "\\|\\b\\)"
I think \b could be replaced by \' here as described above.
The second group could be not capturing here.
> + )
> + bound t)))
> + (goto-char (match-beginning 1))
I guess match-end should here.
> (let ((lw (flyspell-get-word)))
> (if (and (consp lw) (string-equal (car lw) word))
> (setq r p)
I guess that \b would work faster than the group so we could have 'if'
statement around the whole loop that has one implementation with \b
for case when casechars are "[[:alpha:]]" and not-casechars are
"[^[:alpha:]]" and another implementation as above for other cases.
But it seems cumbersome.
Thanks!
--
Regards,
Aleksey Cherepanov
This bug report was last modified 10 years and 136 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.