From unknown Sun Jun 22 07:29:24 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#22090 <22090@debbugs.gnu.org> To: bug#22090 <22090@debbugs.gnu.org> Subject: Status: Isearch is sluggish and eventually refuses further service with "[Too many words]". Reply-To: bug#22090 <22090@debbugs.gnu.org> Date: Sun, 22 Jun 2025 14:29:24 +0000 retitle 22090 Isearch is sluggish and eventually refuses further service wi= th "[Too many words]". reassign 22090 emacs submitter 22090 Alan Mackenzie severity 22090 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 03 23:25:51 2015 Received: (at submit) by debbugs.gnu.org; 4 Dec 2015 04:25:51 +0000 Received: from localhost ([127.0.0.1]:38021 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4hwN-0006xX-9k for submit@debbugs.gnu.org; Thu, 03 Dec 2015 23:25:51 -0500 Received: from eggs.gnu.org ([208.118.235.92]:45777) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4hw3-0006x5-FV for submit@debbugs.gnu.org; Thu, 03 Dec 2015 23:25:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4hw2-0003Qa-CB for submit@debbugs.gnu.org; Thu, 03 Dec 2015 23:25:31 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50022) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4hw2-0003QW-8o for submit@debbugs.gnu.org; Thu, 03 Dec 2015 23:25:30 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55775) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4hw1-0004Mg-5K for bug-gnu-emacs@gnu.org; Thu, 03 Dec 2015 23:25:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4hvx-0003N4-Sx for bug-gnu-emacs@gnu.org; Thu, 03 Dec 2015 23:25:29 -0500 Received: from mail.muc.de ([193.149.48.3]:33394) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4hvx-0003Ma-Kt for bug-gnu-emacs@gnu.org; Thu, 03 Dec 2015 23:25:25 -0500 Received: (qmail 33552 invoked by uid 3782); 4 Dec 2015 04:18:43 -0000 Received: from acm.muc.de (p579E9292.dip0.t-ipconnect.de [87.158.146.146]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 04 Dec 2015 05:18:42 +0100 Received: (qmail 2126 invoked by uid 1000); 4 Dec 2015 04:20:52 -0000 Date: Fri, 4 Dec 2015 04:20:52 +0000 To: bug-gnu-emacs@gnu.org Subject: Isearch is sluggish and eventually refuses further service with "[Too many words]". Message-ID: <20151204042052.GA1965@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.3 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.3 (----) Hello, Emacs With a recent emacs-25 (last update eaa1fd6dbff8346eb38485de5ebf0fbfacf374d9 from Thursday 2015-12-03): emacs -Q C-c C-f src/xdisp.c Move point to L30 (paragraph beginning "Updating the display is triggered by the Lisp interpreter ...") C-s C-w repeatedly, to yank words onto the search string. After ~29 words have been yanked, the response becomes sluggish, pausing for between 0.5s and 1s before highlighting the "for" at the end of L31. Carrying on with C-w, some words are taking 2 or 3 seconds to be registered by Isearch. This is Bad. After having yanked "you as part of" from L32, (i) the " of" gets highlighted in the isearch-error face in the echo area; (ii) the text "[Too many words]" is appended to the echo area; (iii) the highlighting is removed from the match; (iv) point is placed at the start of the match (i.e. BOL 30). At this point, will still behave as expected, except it's action too is very sluggish - to remove two words from the current search took several seconds. Observation: it may be that C-w done in the vicinity of two or several spaces experiences extra delay. -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 04:24:00 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 09:24:01 +0000 Received: from localhost ([127.0.0.1]:38213 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4mau-0005mr-Ah for submit@debbugs.gnu.org; Fri, 04 Dec 2015 04:24:00 -0500 Received: from mtaout21.012.net.il ([80.179.55.169]:64548) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4mas-0005mh-HR for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 04:23:59 -0500 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0NYT00J00TQQBX00@a-mtaout21.012.net.il> for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 11:23:57 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYT00J4ZU3WAV40@a-mtaout21.012.net.il>; Fri, 04 Dec 2015 11:23:57 +0200 (IST) Date: Fri, 04 Dec 2015 11:23:43 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: <20151204042052.GA1965@acm.fritz.box> X-012-Sender: halo1@inter.net.il To: Alan Mackenzie Message-id: <834mfyin34.fsf@gnu.org> References: <20151204042052.GA1965@acm.fritz.box> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: Fri, 4 Dec 2015 04:20:52 +0000 > From: Alan Mackenzie > > With a recent emacs-25 (last update > eaa1fd6dbff8346eb38485de5ebf0fbfacf374d9 from Thursday 2015-12-03): > > emacs -Q > C-c C-f src/xdisp.c > Move point to L30 (paragraph beginning "Updating the display is triggered > by the Lisp interpreter ...") > > C-s > C-w repeatedly, to yank words onto the search string. > > After ~29 words have been yanked, the response becomes sluggish, pausing > for between 0.5s and 1s before highlighting the "for" at the end of L31. > > Carrying on with C-w, some words are taking 2 or 3 seconds to be > registered by Isearch. This is Bad. Here's a profile for this part: - command-execute 1762 99% - call-interactively 1762 99% - funcall-interactively 1762 99% - isearch-yank-word-or-char 1760 99% - isearch-yank-internal 1760 99% - isearch-yank-string 1760 99% - isearch-process-search-string 1760 99% - isearch-search-and-update 1760 99% - isearch-update 1760 99% - if 1760 99% - progn 1760 99% - while 1757 99% - let 1757 99% - isearch-lazy-highlight-search 1757 99% - condition-case 1757 99% - let 1757 99% - while 1757 99% - setq 1757 99% - isearch-search-string 1757 99% - let* 1757 99% - save-excursion 1757 99% - funcall 1757 99% - # 1757 99% - let 1757 99% - condition-case 1757 99% - funcall 1757 99% - cond 7 0% - let 7 0% - if 7 0% - funcall 7 0% character-fold-to-regexp 7 0% - isearch-lazy-highlight-new-loop 2 0% - if 2 0% - and 2 0% - sit-for 2 0% redisplay 2 0% - if 1 0% - if 1 0% - isearch-message 1 0% - let 1 0% - if 1 0% let 1 0% - isearch-forward 1 0% - isearch-mode 1 0% - isearch-update 1 0% - if 1 0% - progn 1 0% - isearch-lazy-highlight-new-loop 1 0% - if 1 0% - and 1 0% - sit-for 1 0% - redisplay 1 0% - redisplay_internal (C function) 1 0% - find-image 1 0% image-search-load-path 1 0% - execute-extended-command 1 0% - command-execute 1 0% - call-interactively 1 0% - funcall-interactively 1 0% - profiler-report 1 0% - profiler-report-cpu 1 0% profiler-cpu-profile 1 0% - ... 5 0% Automatic GC 5 0% From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 10:16:27 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 15:16:27 +0000 Received: from localhost ([127.0.0.1]:39127 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4s5z-0007fR-97 for submit@debbugs.gnu.org; Fri, 04 Dec 2015 10:16:27 -0500 Received: from mail-lf0-f44.google.com ([209.85.215.44]:36385) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4s5w-0007fH-8I for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 10:16:25 -0500 Received: by lfs39 with SMTP id 39so110662990lfs.3 for <22090@debbugs.gnu.org>; Fri, 04 Dec 2015 07:16:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=PqD45NIkj3PUlfDglK1B9YjW+uQ9FrjfVH4/br0JpKU=; b=P7T6JSPYU914SPH0Jlag/7jNj/TZl6qCulNZiwwDyq1hlNiBG4Gtg3FJ53cyLM1D2l m8pY/znvebMqzDSPOxRcGGRuAjBnVALDtAmo+WckjeWwILpdF8Ut4ZtwBPwhTvFXgZq3 CVZLrwrw6cmhhGW0EOU7KHbJqAaIlR1agN8k4Z70KvQfaSIOxZG1ElehjOSKPuDRYRcQ noWjbWvo05deanuD3c+Z+K5TbcUOL1QG/K1VnD/9ScMaKKm5kFvLLQHie6vC2RN9Rz7a XlrITfet4U7ID/KASNJT4sDRJLRRTiidXBORWRCDdg1mo+ZNXBs16R7wP1304kzFXkfF jHQw== MIME-Version: 1.0 X-Received: by 10.25.137.7 with SMTP id l7mr7014396lfd.63.1449242183327; Fri, 04 Dec 2015 07:16:23 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 07:16:23 -0800 (PST) In-Reply-To: <834mfyin34.fsf@gnu.org> References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> Date: Fri, 4 Dec 2015 15:16:23 +0000 X-Google-Sender-Auth: ErEapYeb3hiuKurSmEGa0iw3I68 Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Eli Zaretskii Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, Alan Mackenzie X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2015-12-04 9:23 GMT+00:00 Eli Zaretskii : >> Date: Fri, 4 Dec 2015 04:20:52 +0000 >> From: Alan Mackenzie >> >> With a recent emacs-25 (last update >> eaa1fd6dbff8346eb38485de5ebf0fbfacf374d9 from Thursday 2015-12-03): >> >> emacs -Q >> C-c C-f src/xdisp.c >> Move point to L30 (paragraph beginning "Updating the display is triggere= d >> by the Lisp interpreter ...") >> >> C-s >> C-w repeatedly, to yank words onto the search string. >> >> After ~29 words have been yanked, the response becomes sluggish, pausing >> for between 0.5s and 1s before highlighting the "for" at the end of L31. Thanks for the report. The source for this (and for a similar bug mentioned on a thread in emacs-devel) was the code I had added for special case-folding support. For now, I've just removed the code. I can think of a way of solving this, but it adds some complexity to isearch, which I don't wanna do (and I don't think this feature was that important anyway). Here's a full copy of the commit message explaining why the bug happens. 30f3432 * lisp/character-fold.el: Remove special case-folding support (character-fold-to-regexp): Remove special code for case-folding. Char-fold search still respects the `case-fold-search' variable (i.e., f matches F). This only removes the code that was added to ensure that f also matched all chars that F matched. For instance, after this commit, f no longer matches =F0=9D=94=BD. This was necessary because the logic created a regexp with 2^(length of the string) redundant paths. So, when a very long string "almost" matched, Emacs took a very long time to figure out that it didn't. This became particularly relevant because isearch's lazy-highlight does a search bounded by (1- match-end) (which, in most circumstances, is a search that almost matches). A recipe for this can be found in bug#22090. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 10:24:13 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 15:24:13 +0000 Received: from localhost ([127.0.0.1]:39144 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4sDV-0007rn-20 for submit@debbugs.gnu.org; Fri, 04 Dec 2015 10:24:13 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:36214) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4sD9-0007r6-Ua for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 10:24:10 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NYU00J00APW6R00@a-mtaout22.012.net.il> for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 17:23:50 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYU00IJ2ARQFGG0@a-mtaout22.012.net.il>; Fri, 04 Dec 2015 17:23:50 +0200 (IST) Date: Fri, 04 Dec 2015 17:23:37 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: X-012-Sender: halo1@inter.net.il To: bruce.connor.am@gmail.com Message-id: <83h9jygruu.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: Fri, 4 Dec 2015 15:16:23 +0000 > From: Artur Malabarba > Cc: Alan Mackenzie , 22090@debbugs.gnu.org > > 30f3432 * lisp/character-fold.el: Remove special case-folding support > > (character-fold-to-regexp): Remove special code for > case-folding. Char-fold search still respects the > `case-fold-search' variable (i.e., f matches F). This only > removes the code that was added to ensure that f also matched > all chars that F matched. For instance, after this commit, f > no longer matches 𝔽. Thanks. Is there any reasonably simple way of describing the resulting limitations (i.e. what will NOT match) on the user manual level? From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 10:50:22 2015 Received: (at submit) by debbugs.gnu.org; 4 Dec 2015 15:50:22 +0000 Received: from localhost ([127.0.0.1]:39160 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4scn-0008Ul-J2 for submit@debbugs.gnu.org; Fri, 04 Dec 2015 10:50:21 -0500 Received: from eggs.gnu.org ([208.118.235.92]:35148) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4scT-0008Tv-FM for submit@debbugs.gnu.org; Fri, 04 Dec 2015 10:50:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4scS-0002qd-BI for submit@debbugs.gnu.org; Fri, 04 Dec 2015 10:50:01 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:41100) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4scS-0002qZ-8V for submit@debbugs.gnu.org; Fri, 04 Dec 2015 10:50:00 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45153) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4scR-0007Z0-AJ for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 10:50:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4scN-0002py-9T for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 10:49:59 -0500 Received: from plane.gmane.org ([80.91.229.3]:37458) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4scN-0002pc-2z for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 10:49:55 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1a4scH-0003ch-TN for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 16:49:50 +0100 Received: from c-68-39-146-59.hsd1.in.comcast.net ([68.39.146.59]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 04 Dec 2015 16:49:49 +0100 Received: from random832 by c-68-39-146-59.hsd1.in.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 04 Dec 2015 16:49:49 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: bug-gnu-emacs@gnu.org From: Random832 Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Date: Fri, 4 Dec 2015 15:49:43 +0000 (UTC) Lines: 13 Message-ID: References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-68-39-146-59.hsd1.in.comcast.net User-Agent: slrn/pre1.0.3-7 (Linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.1 (----) On 2015-12-04, Artur Malabarba wrote: > This was necessary because the logic created a regexp with > 2^(length of the string) redundant paths. So, when a very > long string "almost" matched, Emacs took a very long time to > figure out that it didn't. This became particularly relevant > because isearch's lazy-highlight does a search bounded by (1- > match-end) (which, in most circumstances, is a search that > almost matches). A recipe for this can be found in bug#22090. So has any thought been given to implementing folding searches via matching a simple regexp against a projected version of the buffer rather than the current mechanism of creating a regexp that will always match when it should? From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 11:07:02 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 16:07:02 +0000 Received: from localhost ([127.0.0.1]:39165 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4ssv-0000RZ-Ec for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:07:01 -0500 Received: from mail-lf0-f42.google.com ([209.85.215.42]:35888) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4ssb-0000R4-Ba for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 11:07:00 -0500 Received: by lfs39 with SMTP id 39so111669410lfs.3 for <22090@debbugs.gnu.org>; Fri, 04 Dec 2015 08:06:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=ZpxGi2ut3VcU3qINIjqJOGRHOzKUtOcniyO0AVepNis=; b=Bq68ods5U3jSBQ4eQcAaMt2yjlBeVg/b5oh4YY5qmSPmsHYw9Mc4RAvp8HeuAu8sqA SwwZhAmbX7ikLN/4GYGT9aXxOIyPjw4qYF93hAEi7BsPS4AsBHkdm6j2zKxWkTWFtasy mUXcKpZOPSaGiEdHU8Z84aFxQoWLUyY8lK+5OzIe6P1DVMlfaZ/HzvzUCaA2F0HAFHCb 7xBRw8OjU34st/mMgZ5blcry0Y8OT2AV9v6/b2iUXxWUWQOWwl661I0fy0FjBSXfkHEw xym1jZ7rD778Kax7/TctI8uPOf6mONOUXnSoSB704bez2cvbA46qx9jm7z6KqZOSLHpp YvJg== MIME-Version: 1.0 X-Received: by 10.25.137.7 with SMTP id l7mr7099830lfd.63.1449245200494; Fri, 04 Dec 2015 08:06:40 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 08:06:40 -0800 (PST) In-Reply-To: <83h9jygruu.fsf@gnu.org> References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> <83h9jygruu.fsf@gnu.org> Date: Fri, 4 Dec 2015 16:06:40 +0000 X-Google-Sender-Auth: RXI-wGvlMpET9nVE0uBNAQYV2PM Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Eli Zaretskii Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, Alan Mackenzie X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2015-12-04 15:23 GMT+00:00 Eli Zaretskii : >> Date: Fri, 4 Dec 2015 15:16:23 +0000 >> From: Artur Malabarba >> Cc: Alan Mackenzie , 22090@debbugs.gnu.org >> >> 30f3432 * lisp/character-fold.el: Remove special case-folding support >> >> (character-fold-to-regexp): Remove special code for >> case-folding. Char-fold search still respects the >> `case-fold-search' variable (i.e., f matches F). This only >> removes the code that was added to ensure that f also matched >> all chars that F matched. For instance, after this commit, f >> no longer matches =F0=9D=94=BD. > > Thanks. Is there any reasonably simple way of describing the > resulting limitations (i.e. what will NOT match) on the user manual > level? Basically, 'a' will match similar characters (like '=F0=9D=91=8E' and '=C3= =A1') and their upper-case equivalents (like '=C3=81'). 'a' will NOT match characters similar to 'A' that don't have a lower-case equivalent (like '=F0=9D=94=B8'= ) in the unicode standard. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 11:21:07 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 16:21:07 +0000 Received: from localhost ([127.0.0.1]:39179 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4t6Z-0000nE-FI for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:21:07 -0500 Received: from mail-lf0-f42.google.com ([209.85.215.42]:33983) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4t6X-0000n6-Se for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 11:21:06 -0500 Received: by lffu14 with SMTP id u14so115383580lff.1 for <22090@debbugs.gnu.org>; Fri, 04 Dec 2015 08:21:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=AE4rAzvZjxHFoyvF5fJ4ygalYHc5x31TnuP7Jo9d/LI=; b=KBSJEcUeK0iLJJ/0Lv05fKJ077NdF6Ge7w3Ox9h5Z/lw0fIRrq9ZeVDLM8leCqjAXa dyPvw0Id5FfGj7NNYueJ9PUIoLDclQOKi96jHCtDDfWhV0p9V+NRkI4v3WfBE1pcsEn0 q+s2XIadSCRlgXve9jT806RGqbOPkc2QvA7t0dSeQeYbKAZ4SKWN8sWKaUW7tAFg7Xt4 /u+++tBUs/FtA6WwC05ewZY2zZM6ChAYY7H7gfbsCg2w/QVfYxpLO7GCZbKUmLPvcfgH chcW0meB0uH5zEuzt+batGvnT+NoMeKsof0gKL7t32Jy3qd2IIiZRtDMc4YWDsxUpFTA ffOw== MIME-Version: 1.0 X-Received: by 10.25.19.69 with SMTP id j66mr8465581lfi.25.1449246065168; Fri, 04 Dec 2015 08:21:05 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 08:21:04 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 08:21:04 -0800 (PST) In-Reply-To: References: <834mfyin34.fsf@gnu.org> <20151204042052.GA1965@acm.fritz.box> Date: Fri, 4 Dec 2015 16:21:04 +0000 X-Google-Sender-Auth: 5QsDIw0z5_uU5n9rHVs2LGcOFJM Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Random832 Content-Type: multipart/alternative; boundary=001a114069d4e81831052614e41c X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a114069d4e81831052614e41c Content-Type: text/plain; charset=UTF-8 On 4 Dec 2015 3:49 pm, "Random832" wrote: > So has any thought been given to implementing folding searches > via matching a simple regexp against a projected version of the > buffer rather than the current mechanism of creating a regexp > that will always match when it should? There were suggestions of projecting both the buffer and the search string (which is what case folding does) but nobody has offered to do it. What do you mean by "simple regexp"? --001a114069d4e81831052614e41c Content-Type: text/html; charset=UTF-8

On 4 Dec 2015 3:49 pm, "Random832" <random832@fastmail.com> wrote:
> So has any thought been given to implementing folding searches
> via matching a simple regexp against a projected version of the
> buffer rather than the current mechanism of creating a regexp
> that will always match when it should?

There were suggestions of projecting both the buffer and the search string (which is what case folding does) but nobody has offered to do it.
What do you mean by "simple regexp"?

--001a114069d4e81831052614e41c-- From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 11:28:04 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 16:28:04 +0000 Received: from localhost ([127.0.0.1]:39189 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tDI-0000xG-AJ for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:28:04 -0500 Received: from mtaout29.012.net.il ([80.179.55.185]:50753) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tDG-0000ww-86 for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 11:28:03 -0500 Received: from conversion-daemon.mtaout29.012.net.il by mtaout29.012.net.il (HyperSendmail v2007.08) id <0NYU00F00DJSBU00@mtaout29.012.net.il> for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 18:27:52 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout29.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYU0077KDQFZLB0@mtaout29.012.net.il>; Fri, 04 Dec 2015 18:27:52 +0200 (IST) Date: Fri, 04 Dec 2015 18:27:48 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: X-012-Sender: halo1@inter.net.il To: bruce.connor.am@gmail.com Message-id: <83bna6govv.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> <83h9jygruu.fsf@gnu.org> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: Fri, 4 Dec 2015 16:06:40 +0000 > From: Artur Malabarba > Cc: Alan Mackenzie , 22090@debbugs.gnu.org > > > Thanks. Is there any reasonably simple way of describing the > > resulting limitations (i.e. what will NOT match) on the user manual > > level? > > Basically, 'a' will match similar characters (like 'π‘Ž' and 'Γ‘') and > their upper-case equivalents (like 'Á'). 'a' will NOT match characters > similar to 'A' that don't have a lower-case equivalent (like '𝔸') in > the unicode standard. What about ligatures, or symbols like β„»? Also, by "lower-case equivalent" do you mean a case mapping defined by the UCD, or just by visual appearance? An example of the latter would be β’Ά vs ⓐ. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 11:37:12 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 16:37:12 +0000 Received: from localhost ([127.0.0.1]:39228 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tM7-0001Dh-Ug for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:37:12 -0500 Received: from mail-lf0-f48.google.com ([209.85.215.48]:34391) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tM5-0001DZ-SV for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 11:37:10 -0500 Received: by lffu14 with SMTP id u14so115695842lff.1 for <22090@debbugs.gnu.org>; Fri, 04 Dec 2015 08:37:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=BQlof53iN/JGi4THk/Hox7St/MmXpfdsEvkvgzTAPpY=; b=iT1iqV2iZohA1DWAX2aDBUnYaMT6I6kB2MhZoWZE8HfKFYweyTlQCj61sAIPLucYVz Ewse8WGeIpc0bLiZg0q2xMZth5k3I0z2MDUUuxY1w9FWIo3Xqjd+mCLFkgEnN8bBm0kQ KfcwNZ2fyjNUTsDNh+Sr6tJOPqIQjawuPlhMp1LN/R6CZjjCzzboBtE3p9xbmbQOaYUl Sw32OIQUEklBVuYvsww42e9MLasF582p29lLNQS6riUwXibAIsH02GxSDnAvhpj06ZNF 4GtlE7LMhDicDQE3LLwL6d4GRTiFC9qLsQJwPN2UfKGiuDex1wDknrL7oE4SHrH1zpeu 7Fbw== MIME-Version: 1.0 X-Received: by 10.25.18.92 with SMTP id h89mr8879718lfi.54.1449247029052; Fri, 04 Dec 2015 08:37:09 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 08:37:08 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 08:37:08 -0800 (PST) In-Reply-To: <83bna6govv.fsf@gnu.org> References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> <83h9jygruu.fsf@gnu.org> <83bna6govv.fsf@gnu.org> Date: Fri, 4 Dec 2015 16:37:08 +0000 X-Google-Sender-Auth: omCTDkRgGeLzZPmBonGz-pyYWO8 Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Eli Zaretskii Content-Type: multipart/alternative; boundary=001a113fb2a45bcd750526151e1f X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, Alan Mackenzie X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a113fb2a45bcd750526151e1f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 4 Dec 2015 4:27 pm, "Eli Zaretskii" wrote: > > > Date: Fri, 4 Dec 2015 16:06:40 +0000 > > From: Artur Malabarba > > Cc: Alan Mackenzie , 22090@debbugs.gnu.org > > > > > Thanks. Is there any reasonably simple way of describing the > > > resulting limitations (i.e. what will NOT match) on the user manual > > > level? > > > > Basically, 'a' will match similar characters (like '=F0=9D=91=8E' and '= =C3=A1') and > > their upper-case equivalents (like '=C3=81'). 'a' will NOT match charac= ters > > similar to 'A' that don't have a lower-case equivalent (like '=F0=9D=94= =B8') in > > the unicode standard. > > What about ligatures, or symbols like =E2=84=BB? Won't match cross-case. > Also, by "lower-case equivalent" do you mean a case mapping defined by > the UCD Yes. Visual appearance is irrelevant. Strictly speaking, to match a general character, you need to search for its decomposition. If this character also has a case "equivalent" (as per current-case-table), you can also search for the decomposition of this equivalent. --001a113fb2a45bcd750526151e1f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On 4 Dec 2015 4:27 pm, "Eli Zaretskii" <eliz@gnu.org> wrote:
>
> > Date: Fri, 4 Dec 2015 16:06:40 +0000
> > From: Artur Malabarba <bruce.connor.am@gmail.com>
> > Cc: Alan Mackenzie <acm@muc.de>, 22090@debbugs.gnu.org<= br> > >
> > > Thanks.=C2=A0 Is there any reasonably simple way of describi= ng the
> > > resulting limitations (i.e. what will NOT match) on the user= manual
> > > level?
> >
> > Basically, 'a' will match similar characters (like '= =F0=9D=91=8E' and '=C3=A1') and
> > their upper-case equivalents (like '=C3=81'). 'a'= will NOT match characters
> > similar to 'A' that don't have a lower-case equivalen= t (like '=F0=9D=94=B8') in
> > the unicode standard.
>
> What about ligatures, or symbols like =E2=84=BB?

Won't match cross-case.

> Also, by "lower-case equivalent" do you mean = a case mapping defined by
> the UCD

Yes. Visual appearance is irrelevant.

Strictly speaking, to match a general character, you need to= search for its decomposition. If this character also has a case "equi= valent" (as per current-case-table), you can also search for the decom= position of this equivalent.

--001a113fb2a45bcd750526151e1f-- From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 11:38:23 2015 Received: (at submit) by debbugs.gnu.org; 4 Dec 2015 16:38:23 +0000 Received: from localhost ([127.0.0.1]:39234 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tNH-0001FT-IQ for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:38:23 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54440) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tNG-0001FL-0I for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:38:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4tNE-0005Kw-RE for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:38:21 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:33703) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4tNE-0005Ks-P1 for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:38:20 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36197) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4tNE-0003NP-0T for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 11:38:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a4tN9-0005IO-Bx for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 11:38:19 -0500 Received: from plane.gmane.org ([80.91.229.3]:45605) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a4tN9-0005HE-5R for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 11:38:15 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1a4tMz-0007C2-Pb for bug-gnu-emacs@gnu.org; Fri, 04 Dec 2015 17:38:05 +0100 Received: from c-68-39-146-59.hsd1.in.comcast.net ([68.39.146.59]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 04 Dec 2015 17:38:05 +0100 Received: from random832 by c-68-39-146-59.hsd1.in.comcast.net with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 04 Dec 2015 17:38:05 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: bug-gnu-emacs@gnu.org From: Random832 Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Date: Fri, 4 Dec 2015 16:37:59 +0000 (UTC) Lines: 9 Message-ID: References: <834mfyin34.fsf@gnu.org> <20151204042052.GA1965@acm.fritz.box> X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: c-68-39-146-59.hsd1.in.comcast.net User-Agent: slrn/pre1.0.3-7 (Linux) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.1 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.1 (----) On 2015-12-04, Artur Malabarba wrote: > There were suggestions of projecting both the buffer and the search string > (which is what case folding does) but nobody has offered to do it. > What do you mean by "simple regexp"? As in none of the huge character classes for folding, just the characters the user types (normalized to all-lowercase or all-uppercase for case-folding searches) but maybe e.g. "\s+" for lax whitespace. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 11:51:30 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 16:51:30 +0000 Received: from localhost ([127.0.0.1]:39259 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tZx-0001bL-On for submit@debbugs.gnu.org; Fri, 04 Dec 2015 11:51:29 -0500 Received: from mail-lf0-f44.google.com ([209.85.215.44]:36630) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tZv-0001bC-Sm for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 11:51:28 -0500 Received: by lfs39 with SMTP id 39so112557199lfs.3 for <22090@debbugs.gnu.org>; Fri, 04 Dec 2015 08:51:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=MP6W68vMvYunfktPT5PxF0dPsjgYafMDB/WP/S1w5Lo=; b=MAU43COLbnP0XJ0vtGgJsTjubNJOhpzcFSvbnfPXDzxVPoaHcr7AuRjmF/XnkD+Fui 2SqAky7AhmrsENY0E3upbLjuTYe8EgYg9tuqRw9DQlbflu5tyzT+ZxC5eQX4XnUYGRWv RlgRJowLLWxSUtMMR09KSIZDYJnXf3Ns2+6c/j+dVHMmUhdCutheu8dYvMIlq4SMhPMd 3inTuBvZHyaDt1OEhoPFH8UtdIZbMrlwKxXjWE4QwJPiV0HNzdyH5ZnZuGKZsgldVtSb JlIYZxBGNw+RVgYWxNfLKqzEoGjk5ubse8ZMq+09w1+s8cjYKDVX0lSo+Tj3YfSuwO67 YsHg== MIME-Version: 1.0 X-Received: by 10.25.19.69 with SMTP id j66mr8525328lfi.25.1449247887001; Fri, 04 Dec 2015 08:51:27 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 08:51:26 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 08:51:26 -0800 (PST) In-Reply-To: References: <834mfyin34.fsf@gnu.org> <20151204042052.GA1965@acm.fritz.box> Date: Fri, 4 Dec 2015 16:51:26 +0000 X-Google-Sender-Auth: GzUlNjbj514LT1C_oZKAHFGkPro Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Random832 Content-Type: multipart/alternative; boundary=001a114069d47f11af05261551f9 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --001a114069d47f11af05261551f9 Content-Type: text/plain; charset=UTF-8 On 4 Dec 2015 4:37 pm, "Random832" wrote: > > On 2015-12-04, Artur Malabarba wrote: > > There were suggestions of projecting both the buffer and the search string > > (which is what case folding does) but nobody has offered to do it. > > What do you mean by "simple regexp"? > > As in... I see. Then the answer is the same. Nobody has offered to write the C code necessary to compare only a "projection" of the buffer with the search string. --001a114069d47f11af05261551f9 Content-Type: text/html; charset=UTF-8

On 4 Dec 2015 4:37 pm, "Random832" <random832@fastmail.com> wrote:
>
> On 2015-12-04, Artur Malabarba <bruce.connor.am@gmail.com> wrote:
> > There were suggestions of projecting both the buffer and the search string
> > (which is what case folding does) but nobody has offered to do it.
> > What do you mean by "simple regexp"?
>
> As in...

I see. Then the answer is the same. Nobody has offered to write the C code necessary to compare only a "projection" of the buffer with the search string.

--001a114069d47f11af05261551f9-- From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 12:01:04 2015 Received: (at 22090-done) by debbugs.gnu.org; 4 Dec 2015 17:01:04 +0000 Received: from localhost ([127.0.0.1]:39277 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tjE-0001v0-1J for submit@debbugs.gnu.org; Fri, 04 Dec 2015 12:01:04 -0500 Received: from mail.muc.de ([193.149.48.3]:58994) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4tjC-0001uR-2v for 22090-done@debbugs.gnu.org; Fri, 04 Dec 2015 12:01:02 -0500 Received: (qmail 18253 invoked by uid 3782); 4 Dec 2015 17:01:00 -0000 Date: 4 Dec 2015 17:01:00 -0000 Message-ID: <20151204170100.18252.qmail@mail.muc.de> From: Alan Mackenzie To: 22090-done@debbugs.gnu.org Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Organization: muc.de e.V. In-Reply-To: X-Newsgroups: gnu.emacs.bug User-Agent: tin/2.3.1-20141224 ("Tallant") (UNIX) (FreeBSD/10.1-RELEASE-p16 (amd64)) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22090-done Cc: Eli Zaretskii , bruce.connor.am@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Artur. In article you wrote: > 2015-12-04 9:23 GMT+00:00 Eli Zaretskii : >>> Date: Fri, 4 Dec 2015 04:20:52 +0000 >>> From: Alan Mackenzie >>> >>> With a recent emacs-25 (last update >>> eaa1fd6dbff8346eb38485de5ebf0fbfacf374d9 from Thursday 2015-12-03): >>> >>> emacs -Q >>> C-c C-f src/xdisp.c >>> Move point to L30 (paragraph beginning "Updating the display is triggered >>> by the Lisp interpreter ...") >>> >>> C-s >>> C-w repeatedly, to yank words onto the search string. >>> >>> After ~29 words have been yanked, the response becomes sluggish, pausing >>> for between 0.5s and 1s before highlighting the "for" at the end of L31. > Thanks for the report. The source for this (and for a similar bug > mentioned on a thread in emacs-devel) was the code I had added for > special case-folding support. > For now, I've just removed the code. I can think of a way of solving > this, but it adds some complexity to isearch, which I don't wanna do > (and I don't think this feature was that important anyway). Here's a > full copy of the commit message explaining why the bug happens. Thanks for reacting to this so quickly. I confirm that both symptoms of the bug have been resolved. So I'm closing this bug. [ .... ] -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 13:25:31 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 18:25:31 +0000 Received: from localhost ([127.0.0.1]:39350 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4v2w-0003yf-0a for submit@debbugs.gnu.org; Fri, 04 Dec 2015 13:25:30 -0500 Received: from mtaout28.012.net.il ([80.179.55.184]:53837) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4v2a-0003y7-QE for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 13:25:27 -0500 Received: from conversion-daemon.mtaout28.012.net.il by mtaout28.012.net.il (HyperSendmail v2007.08) id <0NYU00200J0ER600@mtaout28.012.net.il> for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 20:24:02 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout28.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYU00M0XJ42Y180@mtaout28.012.net.il>; Fri, 04 Dec 2015 20:24:02 +0200 (IST) Date: Fri, 04 Dec 2015 20:24:45 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: X-012-Sender: halo1@inter.net.il To: Random832 Message-id: <838u5agjgy.fsf@gnu.org> References: <834mfyin34.fsf@gnu.org> <20151204042052.GA1965@acm.fritz.box> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > From: Random832 > Date: Fri, 4 Dec 2015 16:37:59 +0000 (UTC) > > On 2015-12-04, Artur Malabarba wrote: > > There were suggestions of projecting both the buffer and the search string > > (which is what case folding does) but nobody has offered to do it. > > What do you mean by "simple regexp"? > > As in none of the huge character classes for folding, just the > characters the user types (normalized to all-lowercase or > all-uppercase for case-folding searches) but maybe e.g. "\s+" > for lax whitespace. You need to normalize the buffer text as well, so this must be done on the C level. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 13:48:52 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 18:48:52 +0000 Received: from localhost ([127.0.0.1]:39375 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4vPY-0004ZS-At for submit@debbugs.gnu.org; Fri, 04 Dec 2015 13:48:52 -0500 Received: from mtaout27.012.net.il ([80.179.55.183]:56428) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4vPV-0004ZJ-Mo for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 13:48:50 -0500 Received: from conversion-daemon.mtaout27.012.net.il by mtaout27.012.net.il (HyperSendmail v2007.08) id <0NYU00800JZ2V800@mtaout27.012.net.il> for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 20:43:41 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout27.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYU001GPK0TO6A0@mtaout27.012.net.il>; Fri, 04 Dec 2015 20:43:41 +0200 (IST) Date: Fri, 04 Dec 2015 20:48:23 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: X-012-Sender: halo1@inter.net.il To: bruce.connor.am@gmail.com Message-id: <8337vigidk.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> <83h9jygruu.fsf@gnu.org> <83bna6govv.fsf@gnu.org> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: Fri, 4 Dec 2015 16:37:08 +0000 > From: Artur Malabarba > Cc: 22090@debbugs.gnu.org, Alan Mackenzie > > > What about ligatures, or symbols like β„»? > > Won't match cross-case. So f will match ffi but not β„», and F will match β„» but not ffi, is that right? > > Also, by "lower-case equivalent" do you mean a case mapping defined by > > the UCD > > Yes. Visual appearance is irrelevant. > > Strictly speaking, to match a general character, you need to search for its > decomposition. If this character also has a case "equivalent" (as per > current-case-table), you can also search for the decomposition of this > equivalent. OK, thanks. I will try to think if this needs to be explained in the manual. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 14:21:49 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 19:21:49 +0000 Received: from localhost ([127.0.0.1]:39384 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4vvQ-0005MG-DP for submit@debbugs.gnu.org; Fri, 04 Dec 2015 14:21:48 -0500 Received: from mail.muc.de ([193.149.48.3]:24206) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4vv5-0005Ln-H9 for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 14:21:46 -0500 Received: (qmail 73200 invoked by uid 3782); 4 Dec 2015 19:21:26 -0000 Date: 4 Dec 2015 19:21:26 -0000 Message-ID: <20151204192126.73199.qmail@mail.muc.de> From: Alan Mackenzie To: bruce.connor.am@gmail.com Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Organization: muc.de e.V. In-Reply-To: X-Newsgroups: gnu.emacs.bug User-Agent: tin/2.3.1-20141224 ("Tallant") (UNIX) (FreeBSD/10.1-RELEASE-p16 (amd64)) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Artur. In article you wrot= e: > 2015-12-04 9:23 GMT+00:00 Eli Zaretskii : >>> Date: Fri, 4 Dec 2015 04:20:52 +0000 >>> From: Alan Mackenzie >>> >>> With a recent emacs-25 (last update >>> eaa1fd6dbff8346eb38485de5ebf0fbfacf374d9 from Thursday 2015-12-03): >>> >>> emacs -Q >>> C-c C-f src/xdisp.c >>> Move point to L30 (paragraph beginning "Updating the display is trigg= ered >>> by the Lisp interpreter ...") >>> >>> C-s >>> C-w repeatedly, to yank words onto the search string. >>> >>> After ~29 words have been yanked, the response becomes sluggish, paus= ing >>> for between 0.5s and 1s before highlighting the "for" at the end of L= 31. > Thanks for the report. The source for this (and for a similar bug > mentioned on a thread in emacs-devel) was the code I had added for > special case-folding support. > For now, I've just removed the code. I can think of a way of solving > this, but it adds some complexity to isearch, which I don't wanna do > (and I don't think this feature was that important anyway). Here's a > full copy of the commit message explaining why the bug happens. > 30f3432 * lisp/character-fold.el: Remove special case-folding support > (character-fold-to-regexp): Remove special code for > case-folding. Char-fold search still respects the > `case-fold-search' variable (i.e., f matches F). This only > removes the code that was added to ensure that f also matched > all chars that F matched. For instance, after this commit, f > no longer matches =F0=9D=94=BD. > This was necessary because the logic created a regexp with > 2^(length of the string) redundant paths. So, when a very > long string "almost" matched, Emacs took a very long time to > figure out that it didn't. This became particularly relevant > because isearch's lazy-highlight does a search bounded by (1- > match-end) (which, in most circumstances, is a search that > almost matches). A recipe for this can be found in bug#22090. Would you like any help to sort out these regexps? I have some expertise in doing this, having half-written fix-re.el, a program which analyses and corrects just the sort of thing you're talking about. --=20 Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 15:00:21 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 20:00:21 +0000 Received: from localhost ([127.0.0.1]:39417 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4wWi-0006Ie-Sl for submit@debbugs.gnu.org; Fri, 04 Dec 2015 15:00:21 -0500 Received: from mail-lb0-f177.google.com ([209.85.217.177]:35464) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4wWO-0006GZ-7r for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 15:00:19 -0500 Received: by lbbed20 with SMTP id ed20so12383476lbb.2 for <22090@debbugs.gnu.org>; Fri, 04 Dec 2015 11:59:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=H6DDB32d0M+qxLM4xWPyv543qvZIOCFYBbe7hTeZP4c=; b=P1dyinI6pLjoN3dVvjAsDrGKq49xEdYcSqOBXM8yBPn9SoU79Ms2mS6TD7wuCBxQy2 xMunjI721sjx4ZFZXlzwV4P1KW2gszJ9XdPWc+Q7kGEW8w5ONpHvzi3dvgvA0dYAwfok iOKfVmn314fG7uNrQwNnbd4WDJMM8z1+Qlxxmzm1B7TScewhXFyPNdDa4iGQGusJ6OM+ 0iKgYffGm1/W9JWSyyeKWMp0rb1b70YaYbktTRIYmtfNKgBa9/JiVNeV/TDh4cxaOWJB vXPHk8QGeJrg5I9Jq1AdJVPAv9NlYG/lKEUJXrMtTyx/eNklIr1CtDHgSPCkFlKj/twd DEOA== MIME-Version: 1.0 X-Received: by 10.112.170.7 with SMTP id ai7mr9015460lbc.102.1449259199272; Fri, 04 Dec 2015 11:59:59 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 11:59:59 -0800 (PST) In-Reply-To: <8337vigidk.fsf@gnu.org> References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> <83h9jygruu.fsf@gnu.org> <83bna6govv.fsf@gnu.org> <8337vigidk.fsf@gnu.org> Date: Fri, 4 Dec 2015 19:59:59 +0000 X-Google-Sender-Auth: bHeyVa9dIO66bFnstQybJTmaH40 Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Eli Zaretskii Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, Alan Mackenzie X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2015-12-04 18:48 GMT+00:00 Eli Zaretskii : >> > What about ligatures, or symbols like =E2=84=BB? >> >> Won't match cross-case. > > So f will match =EF=AC=83 but not =E2=84=BB, and F will match =E2=84=BB b= ut not =EF=AC=83, is that > right? No. ffi will match =EF=AC=83, but FFI won't. And FAX will match =E2=84=BB, = but fax won't. f shouldn't match =EF=AC=83 anymore. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 15:09:04 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 20:09:04 +0000 Received: from localhost ([127.0.0.1]:39421 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4wf9-0006UT-Qr for submit@debbugs.gnu.org; Fri, 04 Dec 2015 15:09:04 -0500 Received: from mtaout28.012.net.il ([80.179.55.184]:49526) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4wf7-0006U3-Ma for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 15:09:02 -0500 Received: from conversion-daemon.mtaout28.012.net.il by mtaout28.012.net.il (HyperSendmail v2007.08) id <0NYU00M00NOON000@mtaout28.012.net.il> for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 22:08:05 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout28.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYU00BGPNXH9RF0@mtaout28.012.net.il>; Fri, 04 Dec 2015 22:08:05 +0200 (IST) Date: Fri, 04 Dec 2015 22:08:48 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: <20151204192126.73199.qmail@mail.muc.de> X-012-Sender: halo1@inter.net.il To: Alan Mackenzie Message-id: <83wpsuf033.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <20151204042052.GA1965@acm.fritz.box> <20151204192126.73199.qmail@mail.muc.de> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, bruce.connor.am@gmail.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: 4 Dec 2015 19:21:26 -0000 > From: Alan Mackenzie > Cc: 22090@debbugs.gnu.org > > > (character-fold-to-regexp): Remove special code for > > case-folding. Char-fold search still respects the > > `case-fold-search' variable (i.e., f matches F). This only > > removes the code that was added to ensure that f also matched > > all chars that F matched. For instance, after this commit, f > > no longer matches 𝔽. > > > This was necessary because the logic created a regexp with > > 2^(length of the string) redundant paths. So, when a very > > long string "almost" matched, Emacs took a very long time to > > figure out that it didn't. This became particularly relevant > > because isearch's lazy-highlight does a search bounded by (1- > > match-end) (which, in most circumstances, is a search that > > almost matches). A recipe for this can be found in bug#22090. > > Would you like any help to sort out these regexps? I'm not sure the use cases related to case folding should be working in principle. That's because normalization under case folding means first downcase, then decompose; it is not allowed to downcase the decomposition. So if this issue is only about those, I don't think there's anything to sort out here, thanks. From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 15:50:07 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 20:50:07 +0000 Received: from localhost ([127.0.0.1]:39438 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4xIr-0007SE-TO for submit@debbugs.gnu.org; Fri, 04 Dec 2015 15:50:06 -0500 Received: from mail-lf0-f44.google.com ([209.85.215.44]:36376) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4xIW-0007RN-7L for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 15:50:03 -0500 Received: by lfs39 with SMTP id 39so116585215lfs.3 for <22090@debbugs.gnu.org>; Fri, 04 Dec 2015 12:49:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=SQ56wkNJsxr0R42Tv4wXbBGANbxnoRqFhm8xORavPOQ=; b=Xu/a83uvCumIcvmGKuP/iSvwIA5Trgnpy0QeSiiiqpdfdQq3vrc9KZx7do++2iv/pQ k60pHotVU+MHMxpUUwY1zHCB4TNw0WuJSifSwsEYkm/Sb+swbQJiIQ29Whrj6+32sT79 3Eef+8S//4tNITZSvuQSMIgbU2OFz13hOQCtyCv4azt4XmJxPzoBxCESAqW/7n30G7P2 tWjNUfh59dNy5FML1lSrcq63y1H/VhuRClyr/8d71etKJBIRspFFgPAghHzXkqa0SHjb 6QCBLjX2HcUIvuObSGfjiot/ZfVKxfNTZyKRFnRHpJHFsQ6YpBvbi3YtI6zGf2WkHfgx bwoQ== MIME-Version: 1.0 X-Received: by 10.25.18.92 with SMTP id h89mr9353472lfi.54.1449262183026; Fri, 04 Dec 2015 12:49:43 -0800 (PST) Received: by 10.112.202.99 with HTTP; Fri, 4 Dec 2015 12:49:42 -0800 (PST) In-Reply-To: <20151204192126.73199.qmail@mail.muc.de> References: <20151204192126.73199.qmail@mail.muc.de> Date: Fri, 4 Dec 2015 20:49:42 +0000 X-Google-Sender-Auth: 23OWBYE1jMcj3USL1ZwaugSARiM Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Alan Mackenzie Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2015-12-04 19:21 GMT+00:00 Alan Mackenzie : > Would you like any help to sort out these regexps? I have some expertise > in doing this, having half-written fix-re.el, a program which analyses > and corrects just the sort of thing you're talking about. Maybe you can help then. The situation is actually quite simple. We have a regexp for matching anything that 'a' should match (for instance, that might look like "\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]= \\)"), and we have another for matching anything that A could match (e.g. "\\(A[`=C2=B4]?\\|[=C3=81=C3=80]\\)"). When case-fold-search is on the previous code would simply join these regexps with "\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)\\|\\(A[`=C2= =B4]?\\|[=C3=81=C3=80]\\)\\)". The problem is that (when case-fold-search is on) this creates a lot of redundancy. There are two paths in that regexp that match "a", there are two paths that match "=C3=A0" and so on (but it's not full redundancy, for instance, only one path matches =F0=9D=91=8E). From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 04 17:57:56 2015 Received: (at 22090) by debbugs.gnu.org; 4 Dec 2015 22:57:56 +0000 Received: from localhost ([127.0.0.1]:39475 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4zIZ-00023Z-Fu for submit@debbugs.gnu.org; Fri, 04 Dec 2015 17:57:55 -0500 Received: from mail.muc.de ([193.149.48.3]:14026) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a4zIX-00023Q-2Z for 22090@debbugs.gnu.org; Fri, 04 Dec 2015 17:57:54 -0500 Received: (qmail 59353 invoked by uid 3782); 4 Dec 2015 22:57:51 -0000 Received: from acm.muc.de (p579E9292.dip0.t-ipconnect.de [87.158.146.146]) by colin.muc.de (tmda-ofmipd) with ESMTP; Fri, 04 Dec 2015 23:57:50 +0100 Received: (qmail 26686 invoked by uid 1000); 4 Dec 2015 23:00:00 -0000 Date: Fri, 4 Dec 2015 23:00:00 +0000 To: Artur Malabarba Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Message-ID: <20151204230000.GC6070@acm.fritz.box> References: <20151204192126.73199.qmail@mail.muc.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Artur. On Fri, Dec 04, 2015 at 08:49:42PM +0000, Artur Malabarba wrote: > 2015-12-04 19:21 GMT+00:00 Alan Mackenzie : > > Would you like any help to sort out these regexps? I have some expertise > > in doing this, having half-written fix-re.el, a program which analyses > > and corrects just the sort of thing you're talking about. > Maybe you can help then. The situation is actually quite simple. > We have a regexp for matching anything that 'a' should match (for > instance, that might look like "\\(a[Β΄`]?\\|[Γ‘Γ π‘Ž]\\)"), and we have > another for matching anything that A could match (e.g. > "\\(A[`Β΄]?\\|[ÁÀ]\\)"). Each of these regexps looks intrinsically blameless.. > When case-fold-search is on the previous code would simply join these > regexps with "\\(\\(a[Β΄`]?\\|[Γ‘Γ π‘Ž]\\)\\|\\(A[`Β΄]?\\|[ÁÀ]\\)\\)". Quick question: _why_ do you need to join them? Given that case-fold-search is enabled, couldn't you just use, say, the lower case version? > The problem is that (when case-fold-search is on) this creates a lot > of redundancy. There are two paths in that regexp that match "a", > there are two paths that match "Γ " and so on (but it's not full > redundancy, for instance, only one path matches π‘Ž). Yes. This is the killer danger in regexps (at least with the sort of regexp engine we've got). But it looks to me that this redundancy would be quite easy to eliminate - you just need three regexp fragments for the letter "a" - a lower case one, an upper case one and a case-fold-search one. The other thing is that for that single character "a" a 39 character regexp fragment is being generated. Might this have something to do with the "[Too many words]" error I got last night (which comes from the regexp engine returning a "too long regexp" error)? Even if you can reduce that to, say 19 characters, that's only winning a factor of 2 in the slide towards a too long regexp. It might well be that for a very long regexp, you might have to divide it into shorter sections (a typical long RE will by a sequence of sub expressions, rather than lots of alternatives inside \(...\|........\)). -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 05 04:19:34 2015 Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 09:19:34 +0000 Received: from localhost ([127.0.0.1]:39583 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a590A-0001a8-HB for submit@debbugs.gnu.org; Sat, 05 Dec 2015 04:19:34 -0500 Received: from mtaout21.012.net.il ([80.179.55.169]:41698) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a58zp-0001Zd-RC for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 04:19:32 -0500 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0NYV00M00NZ1K000@a-mtaout21.012.net.il> for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 11:19:12 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYV00MPYOK0GGC0@a-mtaout21.012.net.il>; Sat, 05 Dec 2015 11:19:12 +0200 (IST) Date: Sat, 05 Dec 2015 11:19:01 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: X-012-Sender: halo1@inter.net.il To: bruce.connor.am@gmail.com Message-id: <83k2otfe2i.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <20151204042052.GA1965@acm.fritz.box> <834mfyin34.fsf@gnu.org> <83h9jygruu.fsf@gnu.org> <83bna6govv.fsf@gnu.org> <8337vigidk.fsf@gnu.org> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: Fri, 4 Dec 2015 19:59:59 +0000 > From: Artur Malabarba > Cc: 22090@debbugs.gnu.org, Alan Mackenzie > > 2015-12-04 18:48 GMT+00:00 Eli Zaretskii : > >> > What about ligatures, or symbols like β„»? > >> > >> Won't match cross-case. > > > > So f will match ffi but not β„», and F will match β„» but not ffi, is that > > right? > > No. ffi will match ffi, but FFI won't. And FAX will match β„», but fax won't. > f shouldn't match ffi anymore. OK, thanks. It seems like the text in the manual already complies with the above. From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 05 12:23:56 2015 Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 17:23:56 +0000 Received: from localhost ([127.0.0.1]:40044 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5GYu-0000BV-BM for submit@debbugs.gnu.org; Sat, 05 Dec 2015 12:23:56 -0500 Received: from mail-lb0-f175.google.com ([209.85.217.175]:36431) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5GYs-0000BN-B4 for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 12:23:55 -0500 Received: by lbblt2 with SMTP id lt2so36517527lbb.3 for <22090@debbugs.gnu.org>; Sat, 05 Dec 2015 09:23:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=68Ase23Gf9dCza8G+nxR6lnUjlEurqSOhn4NpnO3H4U=; b=c/p/f25WJlAhOD6cpIr8OvRfXS80aiEEWWJKxOiN/VehnLASucaGaHy+yNrpN3C17b 0CN2N1VyazkmqOp9YHVOTJ/AZnqhkVyKVHw9Pw3qdBhYBDanPnhnOiUeA/i3vGbqDCKX BjxNeQ6oa46kLU1vKyNva9vnfChbbTMEKO6eGLOMcfSBegfVHRCXr0adkEOpJU9zPhZD P1foY0qFfZtRI+0OIfRNnaIiVFNXOhFnbr7cf57I8P/fFTH2f8Y+9bHadUHgtHWUjZU9 kV62VBtebeSqHW7/+X9mChzNeCvdtszYXtFJ4rDdLyl2eU5Ws1FUqR7zvCc7aUChQBbf X7CA== MIME-Version: 1.0 X-Received: by 10.112.242.167 with SMTP id wr7mr9142395lbc.69.1449336233464; Sat, 05 Dec 2015 09:23:53 -0800 (PST) Received: by 10.112.202.99 with HTTP; Sat, 5 Dec 2015 09:23:53 -0800 (PST) In-Reply-To: <20151204230000.GC6070@acm.fritz.box> References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> Date: Sat, 5 Dec 2015 17:23:53 +0000 X-Google-Sender-Auth: UgEnmVy8hJExEwm-vHntK2KhD7I Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Alan Mackenzie Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) nn2015-12-04 23:00 GMT+00:00 Alan Mackenzie : >> When case-fold-search is on the previous code would simply join these >> regexps with "\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)\\|\\(A[`= =C2=B4]?\\|[=C3=81=C3=80]\\)\\)". > > Quick question: _why_ do you need to join them? Given that > case-fold-search is enabled, couldn't you just use, say, the lower case > version? Because there are some characters in each regexp that don't have lower/upper-case equivalents. For instance, if I use the "\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)" regexp, that's enough t= o match A or =C3=80, but it's not enough to match a variety of other chars (=F0=9D=94=B8=F0=9D=95=AC= =F0=9D=96=A0=F0=9D=97=94=F0=9D=98=88=F0=9D=98=BC=F0=9D=99=B0=F0=9F=84=B0). > it looks to me that this redundancy would > be quite easy to eliminate - you just need three regexp fragments for > the letter "a" - a lower case one, an upper case one and a > case-fold-search one. Yes, we could go that route. It's just going to add complexity to the code that generates the char-fold-table (which is already quite dense) and I wonder if it's worth such a corner-case. Like I said, 'a' already matches A and =C3=80, how much do we want to support this extra case-folding? > The other thing is that for that single character "a" a 39 character > regexp fragment is being generated. Might this have something to do > with the "[Too many words]" error I got last night (which comes from the > regexp engine returning a "too long regexp" error)? yes > Even if you can reduce that to, say 19 characters, that's only winning a > factor of 2 in the slide towards a too long regexp. It might well be > that for a very long regexp, you might have to divide it into shorter > sections (a typical long RE will by a sequence of sub expressions, > rather than lots of alternatives inside \(...\|........\)). I don't understand what you mean. Could you elaborate? From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 05 12:32:49 2015 Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 17:32:49 +0000 Received: from localhost ([127.0.0.1]:40048 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5GhU-0000Pr-MB for submit@debbugs.gnu.org; Sat, 05 Dec 2015 12:32:48 -0500 Received: from mtaout25.012.net.il ([80.179.55.181]:33240) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5GhA-0000PJ-8j for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 12:32:47 -0500 Received: from conversion-daemon.mtaout25.012.net.il by mtaout25.012.net.il (HyperSendmail v2007.08) id <0NYW00J00AMNBN00@mtaout25.012.net.il> for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 19:29:29 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout25.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYW00AIEB955OF0@mtaout25.012.net.il>; Sat, 05 Dec 2015 19:29:29 +0200 (IST) Date: Sat, 05 Dec 2015 19:32:17 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: X-012-Sender: halo1@inter.net.il To: bruce.connor.am@gmail.com Message-id: <8337vgg5su.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: Sat, 5 Dec 2015 17:23:53 +0000 > From: Artur Malabarba > Cc: 22090@debbugs.gnu.org > > nn2015-12-04 23:00 GMT+00:00 Alan Mackenzie : > >> When case-fold-search is on the previous code would simply join these > >> regexps with "\\(\\(a[Β΄`]?\\|[Γ‘Γ π‘Ž]\\)\\|\\(A[`Β΄]?\\|[ÁÀ]\\)\\)". > > > > Quick question: _why_ do you need to join them? Given that > > case-fold-search is enabled, couldn't you just use, say, the lower case > > version? > > Because there are some characters in each regexp that don't have > lower/upper-case equivalents. For instance, if I use the > "\\(\\(a[Β΄`]?\\|[Γ‘Γ π‘Ž]\\)" regexp, that's enough to match A or Γ€, but > it's not enough to match a variety of other chars (π”Έπ•¬π– π—”π˜ˆπ˜Όπ™°πŸ„°). You don't need to match the latter set. Character folding is applied _after_ case folding, not before. So characters that don't have a lower-case variant simply shouldn't match a lower-case a -- and they won't, if you just let case-insensitive regexp matching do its job. From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 05 13:13:14 2015 Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 18:13:14 +0000 Received: from localhost ([127.0.0.1]:40057 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5HKb-0001OU-IQ for submit@debbugs.gnu.org; Sat, 05 Dec 2015 13:13:13 -0500 Received: from mail-lb0-f179.google.com ([209.85.217.179]:36545) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5HKH-0001Nj-Mx for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 13:13:12 -0500 Received: by lbblt2 with SMTP id lt2so36843426lbb.3 for <22090@debbugs.gnu.org>; Sat, 05 Dec 2015 10:12:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=Cy7a9+V7v3bQFIUzzE7ru3/zaMgUrqFmB5mOcSmWlyQ=; b=G+PJgj4TwI8ckgR/Ccyv0ZfcwyQKLPiBbF8g7kipnmXbeMlogokDdlG5l4stl3Je1m 6l1FuR4f1HJyD5yljHc1hVFURH0VtSCP1EySGoTMY/60NJ9AY7lM17E2Xrb9HTRpBs3e bDNWiQuq6cbbU4G+BuxVFpuowmcRsxIoAiZKvgezPuKbPA2xDQKmh+ZdFRecslUGiIUN fSywynoHx92UPdgwFDZBzBWiBGQquiLbfGLQxm95XMKSmnfM9ck1UAvalFLmaea6LVGT 8KOvaM4mVHa5h80J71d8AqruQs+0JZO3r1+jLSfL+ads0snz+lGV9mnWME6Bek76DDx2 HPtA== MIME-Version: 1.0 X-Received: by 10.112.242.167 with SMTP id wr7mr9191970lbc.69.1449339172867; Sat, 05 Dec 2015 10:12:52 -0800 (PST) Received: by 10.112.202.99 with HTTP; Sat, 5 Dec 2015 10:12:52 -0800 (PST) In-Reply-To: <8337vgg5su.fsf@gnu.org> References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> <8337vgg5su.fsf@gnu.org> Date: Sat, 5 Dec 2015 18:12:52 +0000 X-Google-Sender-Auth: ch6xjQlcxoaYXSgtpVZ0OKXI2bM Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Eli Zaretskii Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, Alan Mackenzie X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2015-12-05 17:32 GMT+00:00 Eli Zaretskii : >> Because there are some characters in each regexp that don't have >> lower/upper-case equivalents. For instance, if I use the >> "\\(\\(a[=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E]\\)" regexp, that's enoug= h to match A or =C3=80, but >> it's not enough to match a variety of other chars (=F0=9D=94=B8=F0=9D=95= =AC=F0=9D=96=A0=F0=9D=97=94=F0=9D=98=88=F0=9D=98=BC=F0=9D=99=B0=F0=9F=84=B0= ). > > You don't need to match the latter set. Character folding is applied > _after_ case folding, not before. So characters that don't have a > lower-case variant simply shouldn't match a lower-case a -- and they > won't, if you just let case-insensitive regexp matching do its job. Given that char-folding is a new feature, how it combines with case-folding is entirely up to us, and I have really no idea what would be TRT. However, if that is your opinion, I'm more than happy to accept that the current situation ('a' doesn't match '=F0=9D=94=B8=F0=9D=95=AC=F0=9D=96= =A0=F0=9D=97=94=F0=9D=98=88=F0=9D=98=BC=F0=9D=99=B0=F0=9F=84=B0') is TRT, given that it has the simplest implementation. :-) From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 05 13:34:47 2015 Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 18:34:47 +0000 Received: from localhost ([127.0.0.1]:40072 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5HfT-0001vX-34 for submit@debbugs.gnu.org; Sat, 05 Dec 2015 13:34:47 -0500 Received: from mtaout29.012.net.il ([80.179.55.185]:58246) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5HfQ-0001vO-5H for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 13:34:45 -0500 Received: from conversion-daemon.mtaout29.012.net.il by mtaout29.012.net.il (HyperSendmail v2007.08) id <0NYW00B00E7O7R00@mtaout29.012.net.il> for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 20:34:37 +0200 (IST) Received: from HOME-C4E4A596F7 ([84.94.185.246]) by mtaout29.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NYW009M2E9OCN20@mtaout29.012.net.il>; Sat, 05 Dec 2015 20:34:37 +0200 (IST) Date: Sat, 05 Dec 2015 20:34:33 +0200 From: Eli Zaretskii Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". In-reply-to: X-012-Sender: halo1@inter.net.il To: bruce.connor.am@gmail.com Message-id: <831tb0g2x2.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> <8337vgg5su.fsf@gnu.org> X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org, acm@muc.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.9 (/) > Date: Sat, 5 Dec 2015 18:12:52 +0000 > From: Artur Malabarba > Cc: Alan Mackenzie , 22090@debbugs.gnu.org > > 2015-12-05 17:32 GMT+00:00 Eli Zaretskii : > >> Because there are some characters in each regexp that don't have > >> lower/upper-case equivalents. For instance, if I use the > >> "\\(\\(a[Β΄`]?\\|[Γ‘Γ π‘Ž]\\)" regexp, that's enough to match A or Γ€, but > >> it's not enough to match a variety of other chars (π”Έπ•¬π– π—”π˜ˆπ˜Όπ™°πŸ„°). > > > > You don't need to match the latter set. Character folding is applied > > _after_ case folding, not before. So characters that don't have a > > lower-case variant simply shouldn't match a lower-case a -- and they > > won't, if you just let case-insensitive regexp matching do its job. > > Given that char-folding is a new feature, how it combines with > case-folding is entirely up to us, and I have really no idea what > would be TRT. I don't think there's any reasonable alternative, because for characters that have a decomposition, you wouldn't downcase the result of the decomposition, would you? The Unicode Standard also says this much (p.158): In principle, normalization needs to be done after case folding, because case folding does not preserve the normalized form of strings in all instances. (There are a couple of examples there showing why the reverse order could cause incorrect results.) So if this is true for normalization, it should also be true for the case in point. > However, if that is your opinion, I'm more than happy to accept that > the current situation ('a' doesn't match 'π”Έπ•¬π– π—”π˜ˆπ˜Όπ™°πŸ„°') is TRT, > given that it has the simplest implementation. :-) Yes, I think it's TRT. From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 05 13:50:35 2015 Received: (at 22090) by debbugs.gnu.org; 5 Dec 2015 18:50:35 +0000 Received: from localhost ([127.0.0.1]:40076 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5Huk-0002LF-TL for submit@debbugs.gnu.org; Sat, 05 Dec 2015 13:50:35 -0500 Received: from mail.muc.de ([193.149.48.3]:56514) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5HuQ-0002Ki-J1 for 22090@debbugs.gnu.org; Sat, 05 Dec 2015 13:50:33 -0500 Received: (qmail 6765 invoked by uid 3782); 5 Dec 2015 18:50:13 -0000 Received: from acm.muc.de (p548A4450.dip0.t-ipconnect.de [84.138.68.80]) by colin.muc.de (tmda-ofmipd) with ESMTP; Sat, 05 Dec 2015 19:50:12 +0100 Received: (qmail 4907 invoked by uid 1000); 5 Dec 2015 18:52:20 -0000 Date: Sat, 5 Dec 2015 18:52:20 +0000 To: Artur Malabarba Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Message-ID: <20151205185220.GF2698@acm.fritz.box> References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Delivery-Agent: TMDA/1.1.12 (Macallan) From: Alan Mackenzie X-Primary-Address: acm@muc.de X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hello, Artur. On Sat, Dec 05, 2015 at 05:23:53PM +0000, Artur Malabarba wrote: > nn2015-12-04 23:00 GMT+00:00 Alan Mackenzie : > >> When case-fold-search is on the previous code would simply join these > >> regexps with "\\(\\(a[Β΄`]?\\|[Γ‘Γ π‘Ž]\\)\\|\\(A[`Β΄]?\\|[ÁÀ]\\)\\)". > > Quick question: _why_ do you need to join them? Given that > > case-fold-search is enabled, couldn't you just use, say, the lower case > > version? > Because there are some characters in each regexp that don't have > lower/upper-case equivalents. For instance, if I use the > "\\(\\(a[Β΄`]?\\|[Γ‘Γ π‘Ž]\\)" regexp, that's enough to match A or Γ€, but > it's not enough to match a variety of other chars (π”Έπ•¬π– π—”π˜ˆπ˜Όπ™°πŸ„°). OK, thanks. > > it looks to me that this redundancy would > > be quite easy to eliminate - you just need three regexp fragments for > > the letter "a" - a lower case one, an upper case one and a > > case-fold-search one. > Yes, we could go that route. It's just going to add complexity to the > code that generates the char-fold-table (which is already quite dense) > and I wonder if it's worth such a corner-case. Like I said, 'a' > already matches A and Γ€, how much do we want to support this extra > case-folding? But it seems the complexity (and it can't honestly be that much, surely?) is intrinsic to the task being carried out. Sticking a "\\|" between the upper case and lower case versions clearly doesn't work. Seriously, how difficult can it be to generate "\\([Aa][Β΄`]?\\|[Γ‘Γ π‘ŽΓΓ€]\\)" , which is a blameless regexp, given where you've already got to? > > The other thing is that for that single character "a" a 39 character > > regexp fragment is being generated. Might this have something to do > > with the "[Too many words]" error I got last night (which comes from the > > regexp engine returning a "too long regexp" error)? > yes I was afraid of that. > > Even if you can reduce that to, say 19 characters, that's only winning a > > factor of 2 in the slide towards a too long regexp. It might well be > > that for a very long regexp, you might have to divide it into shorter > > sections (a typical long RE will by a sequence of sub expressions, > > rather than lots of alternatives inside \(...\|........\)). > I don't understand what you mean. Could you elaborate? Once you've generated the long regexp, if it's too long, you can split it up into, say, 3 pieces A, B, C, such that (equal re (concat A B C)). Then you can do something like: (and (search-forward-regexp A bound noerror) (search-forward-regexp (concat "\\=" B) bound noerror) (search-forward-regexp (concat "\\=" C) bound noerror)) . Though, thinking about it, it might be less painful to enhance the regexp engine to take longer regexps. -- Alan Mackenzie (Nuremberg, Germany). From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 06 07:50:29 2015 Received: (at 22090) by debbugs.gnu.org; 6 Dec 2015 12:50:29 +0000 Received: from localhost ([127.0.0.1]:40349 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5Ylo-00068f-9Z for submit@debbugs.gnu.org; Sun, 06 Dec 2015 07:50:28 -0500 Received: from mail-lf0-f43.google.com ([209.85.215.43]:33355) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a5Yll-00068X-Tf for 22090@debbugs.gnu.org; Sun, 06 Dec 2015 07:50:26 -0500 Received: by lfaz4 with SMTP id z4so137276173lfa.0 for <22090@debbugs.gnu.org>; Sun, 06 Dec 2015 04:50:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:cc:content-type:content-transfer-encoding; bh=vKj5wmpo5/yQiPlV0WSgZaeIhOTaDC8KDq7sgXOgUUE=; b=mlpKx9beVnP7ulM2txzY42d6gDQdIV+x8PthTU+BEP1RZIxlwtmWfYmyU4r9LaGAo+ oLOn0ku7v/adz0XcxAIabojTpo57JRLDAiXgx9wKIAoUHJ8lvszLiyhqtd/OIMOrOf70 DJAREKoc1mv9QidOsZPvI4NEF4lR3wARA82xbjP11cjXReyWm3JZEcyoDGffF++sqCeh 4/xMQvCpnucj2cIYwB5XvwJ/+02IjmMQibtMTVZwdlgogc9ZsDnOqqIG444uXPttShgp 9Px5n+xisjsvqH9B1xuXi2VVNxqdRvVsv6KJ+BdyMnGznceHwfdGfj+O56vWK2a5+Uc0 t5NQ== MIME-Version: 1.0 X-Received: by 10.25.137.7 with SMTP id l7mr9721635lfd.63.1449406224811; Sun, 06 Dec 2015 04:50:24 -0800 (PST) Received: by 10.112.202.99 with HTTP; Sun, 6 Dec 2015 04:50:24 -0800 (PST) In-Reply-To: <20151205185220.GF2698@acm.fritz.box> References: <20151204192126.73199.qmail@mail.muc.de> <20151204230000.GC6070@acm.fritz.box> <20151205185220.GF2698@acm.fritz.box> Date: Sun, 6 Dec 2015 12:50:24 +0000 X-Google-Sender-Auth: _dLy06VRMD5XfriLk-ZEstpSk2M Message-ID: Subject: Re: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". From: Artur Malabarba To: Alan Mackenzie Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 22090 Cc: 22090@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: bruce.connor.am@gmail.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) 2015-12-05 18:52 GMT+00:00 Alan Mackenzie : > But it seems the complexity (and it can't honestly be that much, > surely?) is intrinsic to the task being carried out. Sticking a "\\|" > between the upper case and lower case versions clearly doesn't work. > > Seriously, how difficult can it be to generate > > "\\([Aa][=C2=B4`]?\\|[=C3=A1=C3=A0=F0=9D=91=8E=C3=81=C3=80]\\)" > > , which is a blameless regexp, given where you've already got to? Oh. I see. I thought you were talking about mutually exclusive regexps. Indeed a regexp like that would be trivial to generate. But is it really blameless? I mean, if "\\(A\\|a\\)" can lead to extremely slow searches, doesn't the same happen with "[Aa]"? Anyway, at this point I'm just asking for future knowledge/reference. According to Eli, the current implementation is in accordance with the Unicode Standard. So it's probably best to keep it this way at least for the first release of the feature. > Once you've generated the long regexp, if it's too long, you can split > it up into, say, 3 pieces A, B, C, such that (equal re (concat A B C)). > > Then you can do something like: > > (and (search-forward-regexp A bound noerror) > (search-forward-regexp (concat "\\=3D" B) bound noerror) > (search-forward-regexp (concat "\\=3D" C) bound noerror)) > > . Though, thinking about it, it might be less painful to enhance the > regexp engine to take longer regexps. Besides. Char-folding is supposed to turn strings into regexps usable anywhere, and this wouldn't work with that. I've added a clause to the function so that it won't do any charfolding if the resulting regexp would be longer than 5k chars (instead it will just regexp-quote). That will at least prevent the too-many words error in isearch. (I already had this clause in there before, but it was using 10k, which apparently is not enough). From unknown Sun Jun 22 07:29:24 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 04 Jan 2016 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator