From unknown Sat Jun 21 05:11:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7343: Making flyspell incredibly fast when checking whole files Resent-From: Brandon Craig Rhodes Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 06 Nov 2010 15:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 7343 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 7343@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.12890566271556 (code B ref -1); Sat, 06 Nov 2010 15:18:02 +0000 Received: (at submit) by debbugs.gnu.org; 6 Nov 2010 15:17:07 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PEkVu-0000P2-02 for submit@debbugs.gnu.org; Sat, 06 Nov 2010 11:17:06 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PEjM0-0008Ke-7p for submit@debbugs.gnu.org; Sat, 06 Nov 2010 10:02:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PEjQO-000585-8U for submit@debbugs.gnu.org; Sat, 06 Nov 2010 10:07:21 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD, T_TVD_MIME_NO_HEADERS autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:46088) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PEjQ1-00054f-3Y for submit@debbugs.gnu.org; Sat, 06 Nov 2010 10:07:20 -0400 Received: from [140.186.70.92] (port=49910 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PEjNj-00020z-Ox for bug-gnu-emacs@gnu.org; Sat, 06 Nov 2010 10:05:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PEjN4-0004fs-BC for bug-gnu-emacs@gnu.org; Sat, 06 Nov 2010 10:03:56 -0400 Received: from asaph.rhodesmill.org ([74.207.234.78]:35858) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PEjN4-0004fX-7i for bug-gnu-emacs@gnu.org; Sat, 06 Nov 2010 10:03:54 -0400 Received: by asaph.rhodesmill.org (Postfix, from userid 1000) id 957BE2E103; Sat, 6 Nov 2010 10:03:52 -0400 (EDT) From: Brandon Craig Rhodes Date: Sat, 06 Nov 2010 10:03:52 -0400 Message-ID: <874obu5z1z.fsf@asaph.rhodesmill.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 1) X-detected-operating-system: by eggs.gnu.org: Error: This connection is not (no longer?) in the cache. X-Spam-Score: -5.9 (-----) X-Mailman-Approved-At: Sat, 06 Nov 2010 11:17:04 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) --=-=-= Spell-checking programs like "aspell" and "hunspell" are blazingly fast when simply asked to check words with their "-l" option, but become much slower (the difference is often one or more orders of magnitude, depending on the dictionary size) when asked about words interactively because in that case they generate helpful near-misses. (Actually, "aspell" can turn this off even in interactive mode, which might become the basis of a further patch; but right now I will confine myself to submitting this one, since it has gotten my Emacs running fast enough again that I am happy.) Anyway, flyspell does try to take advantage of the above behavior by checking whether a region is larger than flyspell-large-region characters, and if so then it runs the spell checker as a separate process with "-l". But then it does something that, in many cases, is rather ruinous: it takes every misspelling so identified, and passes it *back* through the normal interactive spell-checking logic! This is because all of the real logic of what to do with a misspelling - how to highlight it, how to search for nearby instances of the same word, how to cache spellings, and so forth - is bound up in flyspell-word, so the flyspell-external-point-words function, which processes the actual misspellings discovered by flyspell-large-region, really has no other choice but to call flyspell-word for each misspelling. So to let flyspell-large-region enjoy the speed that it really should, we need to tell it never to re-check its words against the live *spell process attached to Emacs, because that is (a) redundant and (b) very expensive since, this second time, the spell checker will pause to generate near-misses. A patch is attached that fixes this problem, and - here on my laptop, at least - makes flyspell blazing fast at even large files. The mechanism is simple: I have added a second optional argument to flyspell-word, named "known-misspelling", that tells flyspell-word that the word has already been checked and is a misspelling and does not need to be checked again. Then, down in the function, I simply placed the entire interactive session with ispell/aspell/hunspell inside of an "if". I apologize in advance that this diff is constructed against the Ubuntu version of flyspell, which has who-knows-how-many differences with the official Emacs one. If the patch is too difficult to apply, let me know, and I will find the time to check out Emacs from trunk myself and reapply the patch there. Thanks! --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=flyspell-large-fast.patch diff -r 0de00de3360c -r 06af33083844 site-lisp/flyspell.el --- a/site-lisp/flyspell.el Sat Nov 06 08:10:07 2010 -0400 +++ b/site-lisp/flyspell.el Sat Nov 06 08:23:58 2010 -0400 @@ -1009,7 +1009,7 @@ ;;*---------------------------------------------------------------------*/ ;;* flyspell-word ... */ ;;*---------------------------------------------------------------------*/ -(defun flyspell-word (&optional following) +(defun flyspell-word (&optional following known-misspelling) "Spell check a word." (interactive (list ispell-following-word)) (ispell-set-spellchecker-params) ; Initialize variables and dicts alists @@ -1071,29 +1071,35 @@ (setq flyspell-word-cache-end end) (setq flyspell-word-cache-word word) ;; now check spelling of word. - (ispell-send-string "%\n") - ;; put in verbose mode - (ispell-send-string (concat "^" word "\n")) - ;; we mark the ispell process so it can be killed - ;; when emacs is exited without query - (set-process-query-on-exit-flag ispell-process nil) - ;; Wait until ispell has processed word. Since this code is often - ;; executed from post-command-hook but the ispell process may not - ;; be responsive, it's important to make sure we re-enable C-g. - (with-local-quit - (while (progn - (accept-process-output ispell-process) - (not (string= "" (car ispell-filter)))))) - ;; (ispell-send-string "!\n") - ;; back to terse mode. - ;; Remove leading empty element - (setq ispell-filter (cdr ispell-filter)) - ;; ispell process should return something after word is sent. - ;; Tag word as valid (i.e., skip) otherwise - (or ispell-filter - (setq ispell-filter '(*))) - (if (consp ispell-filter) - (setq poss (ispell-parse-output (car ispell-filter)))) + (if (not known-misspelling) + (progn + (ispell-send-string "%\n") + ;; put in verbose mode + (ispell-send-string (concat "^" word "\n")) + ;; we mark the ispell process so it can be killed + ;; when emacs is exited without query + (set-process-query-on-exit-flag ispell-process nil) + ;; Wait until ispell has processed word. Since this + ;; code is often executed from post-command-hook but + ;; the ispell process may not be responsive, it's + ;; important to make sure we re-enable C-g. + (with-local-quit + (while (progn + (accept-process-output ispell-process) + (not (string= "" (car ispell-filter)))))) + ;; (ispell-send-string "!\n") + ;; back to terse mode. + ;; Remove leading empty element + (setq ispell-filter (cdr ispell-filter)) + ;; ispell process should return something after word is sent. + ;; Tag word as valid (i.e., skip) otherwise + (or ispell-filter + (setq ispell-filter '(*))) + (if (consp ispell-filter) + (setq poss (ispell-parse-output (car ispell-filter))))) + ;; Else, this was a known misspelling to begin with, and + ;; we should forge an ispell return value. + (setq poss (list word 0 '() '()))) (let ((res (cond ((eq poss t) ;; correct (setq flyspell-word-cache-result t) @@ -1424,7 +1430,7 @@ t nil)))) (setq keep nil) - (flyspell-word) + (flyspell-word nil t) ;; Search for next misspelled word will begin from ;; end of last validated match. (setq buffer-scan-pos (point)))) --=-=-= -- Brandon Craig Rhodes brandon@rhodesmill.org http://rhodesmill.org/brandon --=-=-=-- From unknown Sat Jun 21 05:11:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7343: Making flyspell incredibly fast when checking whole files Resent-From: Stefan Monnier Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 08 Nov 2010 18:07:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7343 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Brandon Craig Rhodes Cc: Agustin Martin , 7343@debbugs.gnu.org Received: via spool by 7343-submit@debbugs.gnu.org id=B7343.128923957027243 (code B ref 7343); Mon, 08 Nov 2010 18:07:02 +0000 Received: (at 7343) by debbugs.gnu.org; 8 Nov 2010 18:06:10 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PFW6b-00075M-MK for submit@debbugs.gnu.org; Mon, 08 Nov 2010 13:06:09 -0500 Received: from ironport2-out.teksavvy.com ([206.248.154.183] helo=ironport2-out.pppoe.ca) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PFW6Z-00075H-G7 for 7343@debbugs.gnu.org; Mon, 08 Nov 2010 13:06:08 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtEKAJDL10xMCpqE/2dsb2JhbAChCIECcr1GhUgEhFiNWg X-IronPort-AV: E=Sophos;i="4.59,169,1288584000"; d="scan'208";a="81943336" Received: from 76-10-154-132.dsl.teksavvy.com (HELO pastel.home) ([76.10.154.132]) by ironport2-out.pppoe.ca with ESMTP/TLS/ADH-AES256-SHA; 08 Nov 2010 13:10:42 -0500 Received: by pastel.home (Postfix, from userid 20848) id 7F140A86D8; Mon, 8 Nov 2010 13:10:42 -0500 (EST) From: Stefan Monnier Message-ID: References: <874obu5z1z.fsf@asaph.rhodesmill.org> Date: Mon, 08 Nov 2010 13:10:42 -0500 In-Reply-To: <874obu5z1z.fsf@asaph.rhodesmill.org> (Brandon Craig Rhodes's message of "Sat, 06 Nov 2010 10:03:52 -0400") User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.1 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.1 (--) > Anyway, flyspell does try to take advantage of the above behavior by > checking whether a region is larger than flyspell-large-region > characters, and if so then it runs the spell checker as a separate > process with "-l". But then it does something that, in many cases, is > rather ruinous: it takes every misspelling so identified, and passes it > *back* through the normal interactive spell-checking logic! This is > because all of the real logic of what to do with a misspelling - how to > highlight it, how to search for nearby instances of the same word, how > to cache spellings, and so forth - is bound up in flyspell-word, so the > flyspell-external-point-words function, which processes the actual > misspellings discovered by flyspell-large-region, really has no other > choice but to call flyspell-word for each misspelling. IIUC this sounds very good (tho it only speeds up flyspell-region and not flyspell-post-command-hook) and the patch looks good and small enough for inclusion as a "tiny patch". Agustin, could you double check that it's OK and install it in the trunk if so? Stefan From unknown Sat Jun 21 05:11:34 2025 X-Loop: help-debbugs@gnu.org Subject: bug#7343: Making flyspell incredibly fast when checking whole files Resent-From: Agustin Martin Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-To: owner@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 08 Nov 2010 18:42:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 7343 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 7343@debbugs.gnu.org, Brandon Craig Rhodes Received: via spool by 7343-submit@debbugs.gnu.org id=B7343.128924172228176 (code B ref 7343); Mon, 08 Nov 2010 18:42:02 +0000 Received: (at 7343) by debbugs.gnu.org; 8 Nov 2010 18:42:02 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PFWfJ-0007KP-KY for submit@debbugs.gnu.org; Mon, 08 Nov 2010 13:42:01 -0500 Received: from fibonacci.ccupm.upm.es ([138.100.198.70] helo=smtp.upm.es) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PFWfH-0007KE-I6 for 7343@debbugs.gnu.org; Mon, 08 Nov 2010 13:42:00 -0500 Received: from agmartin.aq.upm.es (Agmartin.aq.upm.es [138.100.41.131]) by smtp.upm.es (8.14.3/8.14.3/fibonacci-001) with ESMTP id oA8IkbrV011383; Mon, 8 Nov 2010 19:46:37 +0100 Received: by agmartin.aq.upm.es (Postfix, from userid 1000) id 1ED9E598BC; Mon, 8 Nov 2010 19:46:37 +0100 (CET) Date: Mon, 8 Nov 2010 19:46:37 +0100 From: Agustin Martin Message-ID: <20101108184636.GA9940@agmartin.aq.upm.es> References: <874obu5z1z.fsf@asaph.rhodesmill.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -6.4 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) On Mon, Nov 08, 2010 at 01:10:42PM -0500, Stefan Monnier wrote: > > Anyway, flyspell does try to take advantage of the above behavior by > > checking whether a region is larger than flyspell-large-region > > characters, and if so then it runs the spell checker as a separate > > process with "-l". But then it does something that, in many cases, is > > rather ruinous: it takes every misspelling so identified, and passes it > > *back* through the normal interactive spell-checking logic! This is > > because all of the real logic of what to do with a misspelling - how to > > highlight it, how to search for nearby instances of the same word, how > > to cache spellings, and so forth - is bound up in flyspell-word, so the > > flyspell-external-point-words function, which processes the actual > > misspellings discovered by flyspell-large-region, really has no other > > choice but to call flyspell-word for each misspelling. > > IIUC this sounds very good (tho it only speeds up flyspell-region and > not flyspell-post-command-hook) and the patch looks good and small > enough for inclusion as a "tiny patch". Agustin, could you double check > that it's OK and install it in the trunk if so? Hi, I also agree that Brandon's patch sounds very good, although I could not yet really test it. I hope to have time for this in no more than two or three days. I will also add something in the docstring about the new option. -- Agustin From unknown Sat Jun 21 05:11:34 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.427 (Entity 5.427) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Brandon Craig Rhodes Subject: bug#7343: closed (Re: bug#7343: Making flyspell incredibly fast when checking whole files) Message-ID: References: <20101110144414.GA4565@agmartin.aq.upm.es> <874obu5z1z.fsf@asaph.rhodesmill.org> X-Gnu-PR-Message: they-closed 7343 X-Gnu-PR-Package: emacs Reply-To: 7343@debbugs.gnu.org Date: Wed, 10 Nov 2010 14:40:05 +0000 Content-Type: multipart/mixed; boundary="----------=_1289400005-3376-1" This is a multi-part message in MIME format... ------------=_1289400005-3376-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #7343: Making flyspell incredibly fast when checking whole files which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 7343@debbugs.gnu.org. --=20 7343: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D7343 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1289400005-3376-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 7343-done) by debbugs.gnu.org; 10 Nov 2010 14:39:39 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PGBpq-0000s2-ME for submit@debbugs.gnu.org; Wed, 10 Nov 2010 09:39:39 -0500 Received: from edison.ccupm.upm.es ([138.100.198.71] helo=smtp.upm.es) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PGBpk-0000rw-Hn for 7343-done@debbugs.gnu.org; Wed, 10 Nov 2010 09:39:36 -0500 Received: from agmartin.aq.upm.es (Agmartin.aq.upm.es [138.100.41.131]) by smtp.upm.es (8.14.3/8.14.3/edison-001) with ESMTP id oAAEiEa2005718; Wed, 10 Nov 2010 15:44:14 +0100 Received: by agmartin.aq.upm.es (Postfix, from userid 1000) id 3BCE1598B2; Wed, 10 Nov 2010 15:44:14 +0100 (CET) Date: Wed, 10 Nov 2010 15:44:14 +0100 From: Agustin Martin To: Brandon Craig Rhodes , 7343-done@debbugs.gnu.org Subject: Re: bug#7343: Making flyspell incredibly fast when checking whole files Message-ID: <20101110144414.GA4565@agmartin.aq.upm.es> References: <874obu5z1z.fsf@asaph.rhodesmill.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <874obu5z1z.fsf@asaph.rhodesmill.org> User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -6.4 (------) X-Debbugs-Envelope-To: 7343-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.4 (------) On Sat, Nov 06, 2010 at 10:03:52AM -0400, Brandon Craig Rhodes wrote: > A patch is attached that fixes this problem, and - here on my laptop, at > least - makes flyspell blazing fast at even large files. The mechanism > is simple: I have added a second optional argument to flyspell-word, > named "known-misspelling", that tells flyspell-word that the word has > already been checked and is a misspelling and does not need to be > checked again. Then, down in the function, I simply placed the entire > interactive session with ispell/aspell/hunspell inside of an "if". Installed in the Emacs bzr repo. Closing bug report. Thanks a lot for your patch, -- Agustin ------------=_1289400005-3376-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 6 Nov 2010 15:17:07 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PEkVu-0000P2-02 for submit@debbugs.gnu.org; Sat, 06 Nov 2010 11:17:06 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1PEjM0-0008Ke-7p for submit@debbugs.gnu.org; Sat, 06 Nov 2010 10:02:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PEjQO-000585-8U for submit@debbugs.gnu.org; Sat, 06 Nov 2010 10:07:21 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD, T_TVD_MIME_NO_HEADERS autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([199.232.76.165]:46088) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PEjQ1-00054f-3Y for submit@debbugs.gnu.org; Sat, 06 Nov 2010 10:07:20 -0400 Received: from [140.186.70.92] (port=49910 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PEjNj-00020z-Ox for bug-gnu-emacs@gnu.org; Sat, 06 Nov 2010 10:05:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PEjN4-0004fs-BC for bug-gnu-emacs@gnu.org; Sat, 06 Nov 2010 10:03:56 -0400 Received: from asaph.rhodesmill.org ([74.207.234.78]:35858) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PEjN4-0004fX-7i for bug-gnu-emacs@gnu.org; Sat, 06 Nov 2010 10:03:54 -0400 Received: by asaph.rhodesmill.org (Postfix, from userid 1000) id 957BE2E103; Sat, 6 Nov 2010 10:03:52 -0400 (EDT) From: Brandon Craig Rhodes To: bug-gnu-emacs@gnu.org Subject: Making flyspell incredibly fast when checking whole files Date: Sat, 06 Nov 2010 10:03:52 -0400 Message-ID: <874obu5z1z.fsf@asaph.rhodesmill.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 1) X-detected-operating-system: by eggs.gnu.org: Error: This connection is not (no longer?) in the cache. X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 06 Nov 2010 11:17:04 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.9 (-----) --=-=-= Spell-checking programs like "aspell" and "hunspell" are blazingly fast when simply asked to check words with their "-l" option, but become much slower (the difference is often one or more orders of magnitude, depending on the dictionary size) when asked about words interactively because in that case they generate helpful near-misses. (Actually, "aspell" can turn this off even in interactive mode, which might become the basis of a further patch; but right now I will confine myself to submitting this one, since it has gotten my Emacs running fast enough again that I am happy.) Anyway, flyspell does try to take advantage of the above behavior by checking whether a region is larger than flyspell-large-region characters, and if so then it runs the spell checker as a separate process with "-l". But then it does something that, in many cases, is rather ruinous: it takes every misspelling so identified, and passes it *back* through the normal interactive spell-checking logic! This is because all of the real logic of what to do with a misspelling - how to highlight it, how to search for nearby instances of the same word, how to cache spellings, and so forth - is bound up in flyspell-word, so the flyspell-external-point-words function, which processes the actual misspellings discovered by flyspell-large-region, really has no other choice but to call flyspell-word for each misspelling. So to let flyspell-large-region enjoy the speed that it really should, we need to tell it never to re-check its words against the live *spell process attached to Emacs, because that is (a) redundant and (b) very expensive since, this second time, the spell checker will pause to generate near-misses. A patch is attached that fixes this problem, and - here on my laptop, at least - makes flyspell blazing fast at even large files. The mechanism is simple: I have added a second optional argument to flyspell-word, named "known-misspelling", that tells flyspell-word that the word has already been checked and is a misspelling and does not need to be checked again. Then, down in the function, I simply placed the entire interactive session with ispell/aspell/hunspell inside of an "if". I apologize in advance that this diff is constructed against the Ubuntu version of flyspell, which has who-knows-how-many differences with the official Emacs one. If the patch is too difficult to apply, let me know, and I will find the time to check out Emacs from trunk myself and reapply the patch there. Thanks! --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=flyspell-large-fast.patch diff -r 0de00de3360c -r 06af33083844 site-lisp/flyspell.el --- a/site-lisp/flyspell.el Sat Nov 06 08:10:07 2010 -0400 +++ b/site-lisp/flyspell.el Sat Nov 06 08:23:58 2010 -0400 @@ -1009,7 +1009,7 @@ ;;*---------------------------------------------------------------------*/ ;;* flyspell-word ... */ ;;*---------------------------------------------------------------------*/ -(defun flyspell-word (&optional following) +(defun flyspell-word (&optional following known-misspelling) "Spell check a word." (interactive (list ispell-following-word)) (ispell-set-spellchecker-params) ; Initialize variables and dicts alists @@ -1071,29 +1071,35 @@ (setq flyspell-word-cache-end end) (setq flyspell-word-cache-word word) ;; now check spelling of word. - (ispell-send-string "%\n") - ;; put in verbose mode - (ispell-send-string (concat "^" word "\n")) - ;; we mark the ispell process so it can be killed - ;; when emacs is exited without query - (set-process-query-on-exit-flag ispell-process nil) - ;; Wait until ispell has processed word. Since this code is often - ;; executed from post-command-hook but the ispell process may not - ;; be responsive, it's important to make sure we re-enable C-g. - (with-local-quit - (while (progn - (accept-process-output ispell-process) - (not (string= "" (car ispell-filter)))))) - ;; (ispell-send-string "!\n") - ;; back to terse mode. - ;; Remove leading empty element - (setq ispell-filter (cdr ispell-filter)) - ;; ispell process should return something after word is sent. - ;; Tag word as valid (i.e., skip) otherwise - (or ispell-filter - (setq ispell-filter '(*))) - (if (consp ispell-filter) - (setq poss (ispell-parse-output (car ispell-filter)))) + (if (not known-misspelling) + (progn + (ispell-send-string "%\n") + ;; put in verbose mode + (ispell-send-string (concat "^" word "\n")) + ;; we mark the ispell process so it can be killed + ;; when emacs is exited without query + (set-process-query-on-exit-flag ispell-process nil) + ;; Wait until ispell has processed word. Since this + ;; code is often executed from post-command-hook but + ;; the ispell process may not be responsive, it's + ;; important to make sure we re-enable C-g. + (with-local-quit + (while (progn + (accept-process-output ispell-process) + (not (string= "" (car ispell-filter)))))) + ;; (ispell-send-string "!\n") + ;; back to terse mode. + ;; Remove leading empty element + (setq ispell-filter (cdr ispell-filter)) + ;; ispell process should return something after word is sent. + ;; Tag word as valid (i.e., skip) otherwise + (or ispell-filter + (setq ispell-filter '(*))) + (if (consp ispell-filter) + (setq poss (ispell-parse-output (car ispell-filter))))) + ;; Else, this was a known misspelling to begin with, and + ;; we should forge an ispell return value. + (setq poss (list word 0 '() '()))) (let ((res (cond ((eq poss t) ;; correct (setq flyspell-word-cache-result t) @@ -1424,7 +1430,7 @@ t nil)))) (setq keep nil) - (flyspell-word) + (flyspell-word nil t) ;; Search for next misspelled word will begin from ;; end of last validated match. (setq buffer-scan-pos (point)))) --=-=-= -- Brandon Craig Rhodes brandon@rhodesmill.org http://rhodesmill.org/brandon --=-=-=-- ------------=_1289400005-3376-1--