#22090 - Isearch is sluggish and eventually refuses further service with "[Too many words]".

GNU bug report logs - #22090
Isearch is sluggish and eventually refuses further service with "[Too many words]".

Package: emacs;

Reported by: Alan Mackenzie <acm <at> muc.de>

Date: Fri, 4 Dec 2015 04:26:01 UTC

Severity: normal

Done: Alan Mackenzie <acm <at> muc.de>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Alan Mackenzie <acm <at> muc.de> To: Artur Malabarba <bruce.connor.am <at> gmail.com> Cc: 22090 <at> debbugs.gnu.org Subject: bug#22090: Isearch is sluggish and eventually refuses further service with "[Too many words]". Date: Sat, 5 Dec 2015 18:52:20 +0000

Hello, Artur. On Sat, Dec 05, 2015 at 05:23:53PM +0000, Artur Malabarba wrote: > nn2015-12-04 23:00 GMT+00:00 Alan Mackenzie <acm <at> muc.de>: > >> When case-fold-search is on the previous code would simply join these > >> regexps with "\\(\\(a[´`]?\\|[áà𝑎]\\)\\|\\(A[`´]?\\|[ÁÀ]\\)\\)". > > Quick question: _why_ do you need to join them? Given that > > case-fold-search is enabled, couldn't you just use, say, the lower case > > version? > Because there are some characters in each regexp that don't have > lower/upper-case equivalents. For instance, if I use the > "\\(\\(a[´`]?\\|[áà𝑎]\\)" regexp, that's enough to match A or À, but > it's not enough to match a variety of other chars (𝔸𝕬𝖠𝗔𝘈𝘼𝙰🄰). OK, thanks. > > it looks to me that this redundancy would > > be quite easy to eliminate - you just need three regexp fragments for > > the letter "a" - a lower case one, an upper case one and a > > case-fold-search one. > Yes, we could go that route. It's just going to add complexity to the > code that generates the char-fold-table (which is already quite dense) > and I wonder if it's worth such a corner-case. Like I said, 'a' > already matches A and À, how much do we want to support this extra > case-folding? But it seems the complexity (and it can't honestly be that much, surely?) is intrinsic to the task being carried out. Sticking a "\\|" between the upper case and lower case versions clearly doesn't work. Seriously, how difficult can it be to generate "\\([Aa][´`]?\\|[áà𝑎ÁÀ]\\)" , which is a blameless regexp, given where you've already got to? > > The other thing is that for that single character "a" a 39 character > > regexp fragment is being generated. Might this have something to do > > with the "[Too many words]" error I got last night (which comes from the > > regexp engine returning a "too long regexp" error)? > yes I was afraid of that. > > Even if you can reduce that to, say 19 characters, that's only winning a > > factor of 2 in the slide towards a too long regexp. It might well be > > that for a very long regexp, you might have to divide it into shorter > > sections (a typical long RE will by a sequence of sub expressions, > > rather than lots of alternatives inside \(...\|........\)). > I don't understand what you mean. Could you elaborate? Once you've generated the long regexp, if it's too long, you can split it up into, say, 3 pieces A, B, C, such that (equal re (concat A B C)). Then you can do something like: (and (search-forward-regexp A bound noerror) (search-forward-regexp (concat "\\=" B) bound noerror) (search-forward-regexp (concat "\\=" C) bound noerror)) . Though, thinking about it, it might be less painful to enhance the regexp engine to take longer regexps. -- Alan Mackenzie (Nuremberg, Germany).

This bug report was last modified 9 years and 220 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #22090 Isearch is sluggish and eventually refuses further service with "[Too many words]".

GNU bug report logs - #22090
Isearch is sluggish and eventually refuses further service with "[Too many words]".