From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Juri Linkov Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 05 Dec 2012 00:37:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 13084@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.13546678074008 (code B ref -1); Wed, 05 Dec 2012 00:37:02 +0000 Received: (at submit) by debbugs.gnu.org; 5 Dec 2012 00:36:47 +0000 Received: from localhost ([127.0.0.1]:53851 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tg2yl-00012a-4c for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:47 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43098) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tg2yi-00012T-Oq for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tg2yd-0005jc-FY for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:40 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-101.9 required=5.0 tests=BAYES_00, USER_IN_WHITELIST autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:44832) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tg2yd-0005jY-CX for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:39 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54460) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tg2yb-0004LU-Tw for bug-gnu-emacs@gnu.org; Tue, 04 Dec 2012 19:36:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tg2ya-0005jG-UF for bug-gnu-emacs@gnu.org; Tue, 04 Dec 2012 19:36:37 -0500 Received: from ps18281.dreamhost.com ([69.163.218.105]:35584 helo=ps18281.dreamhostps.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tg2ya-0005jB-P0 for bug-gnu-emacs@gnu.org; Tue, 04 Dec 2012 19:36:36 -0500 Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id 6E2CB46FA014 for ; Tue, 4 Dec 2012 16:36:34 -0800 (PST) From: Juri Linkov Organization: JURTA Date: Wed, 05 Dec 2012 02:34:39 +0200 Message-ID: <87txs1l4kg.fsf@mail.jurta.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -4.2 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) The minimal reproducible recipe for crashes in boyer_moore noticed in bug#13041: 1. emacs -Q 2. Eval in *scratch*: (let ((table (standard-case-table)) canon) (setq canon (copy-sequence table)) (aset canon #xff59 ?y) (set-char-table-extra-slot table 1 canon) (set-char-table-extra-slot table 2 nil) (set-standard-case-table table)) 3. Start an activity that includes a search, e.g. `C-x 8 RET TAB' The crash in boyer_moore is caused by fullwidth characters like #xff59 whose Unicode properties are: name: FULLWIDTH LATIN SMALL LETTER Y decomposition: (wide 121) (wide 'y') However, the crash doesn't occur when the same fullwidth characters are set to their downcase counterparts in lisp/international/characters.el: ;; Fullwidth Latin (setq c #xff21) (while (<= c #xff3a) (set-case-syntax-pair c (+ c #x20) tbl) (modify-category-entry c ?l) (modify-category-entry (+ c #x20) ?l) (setq c (1+ c))) From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 11 Dec 2012 15:39:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Juri Linkov , Kenichi Handa Cc: 13084@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.13552402826957 (code B ref 13084); Tue, 11 Dec 2012 15:39:02 +0000 Received: (at 13084) by debbugs.gnu.org; 11 Dec 2012 15:38:02 +0000 Received: from localhost ([127.0.0.1]:37285 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TiRuE-0001o9-Fk for submit@debbugs.gnu.org; Tue, 11 Dec 2012 10:38:02 -0500 Received: from mtaout23.012.net.il ([80.179.55.175]:60275) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TiRuC-0001ne-1W for 13084@debbugs.gnu.org; Tue, 11 Dec 2012 10:38:01 -0500 Received: from conversion-daemon.a-mtaout23.012.net.il by a-mtaout23.012.net.il (HyperSendmail v2007.08) id <0MEV00B00HX5DC00@a-mtaout23.012.net.il> for 13084@debbugs.gnu.org; Tue, 11 Dec 2012 17:37:14 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout23.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MEV00BORI225CB0@a-mtaout23.012.net.il>; Tue, 11 Dec 2012 17:37:14 +0200 (IST) Date: Tue, 11 Dec 2012 17:37:09 +0200 From: Eli Zaretskii In-reply-to: <87txs1l4kg.fsf@mail.jurta.org> X-012-Sender: halo1@inter.net.il Message-id: <831uewa9cq.fsf@gnu.org> References: <87txs1l4kg.fsf@mail.jurta.org> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Juri Linkov > Date: Wed, 05 Dec 2012 02:34:39 +0200 > > The minimal reproducible recipe for crashes in boyer_moore noticed in bug#13041: > > 1. emacs -Q > > 2. Eval in *scratch*: > > (let ((table (standard-case-table)) canon) > (setq canon (copy-sequence table)) > (aset canon #xff59 ?y) > (set-char-table-extra-slot table 1 canon) > (set-char-table-extra-slot table 2 nil) > (set-standard-case-table table)) > > 3. Start an activity that includes a search, e.g. `C-x 8 RET TAB' [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.175 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4734] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.7 (/) > From: Juri Linkov > Date: Wed, 05 Dec 2012 02:34:39 +0200 > > The minimal reproducible recipe for crashes in boyer_moore noticed in bug#13041: > > 1. emacs -Q > > 2. Eval in *scratch*: > > (let ((table (standard-case-table)) canon) > (setq canon (copy-sequence table)) > (aset canon #xff59 ?y) > (set-char-table-extra-slot table 1 canon) > (set-char-table-extra-slot table 2 nil) > (set-standard-case-table table)) > > 3. Start an activity that includes a search, e.g. `C-x 8 RET TAB' Thanks. I think i fixed this (revision 111021 on the emacs-24 branch), please test. In addition, I'd suggest that Handa-san (or someone else) takes a good look at the code that sets up the simple_translate table in boyer_moore, because the constants there, like 0200 and 0x3F, and all the talk about characters that belong "to the same charset and row" smell of pre-Unicode (a.k.a. "MULE") representation of characters. For now, I disabled boyer_moore for unibyte characters beyond 160, because my reading of the code is that simple_translate and the supporting code cannot handle that. Maybe I'm wrong. From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Juri Linkov Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 11 Dec 2012 23:25:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii Cc: Kenichi Handa , 13084@debbugs.gnu.org Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.13552682618793 (code B ref 13084); Tue, 11 Dec 2012 23:25:01 +0000 Received: (at 13084) by debbugs.gnu.org; 11 Dec 2012 23:24:21 +0000 Received: from localhost ([127.0.0.1]:37750 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TiZBU-0002Hm-VT for submit@debbugs.gnu.org; Tue, 11 Dec 2012 18:24:21 -0500 Received: from ps18281.dreamhost.com ([69.163.218.105]:35903 helo=ps18281.dreamhostps.com) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TiZBR-0002HX-BV for 13084@debbugs.gnu.org; Tue, 11 Dec 2012 18:24:17 -0500 Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id BBFEB451C165; Tue, 11 Dec 2012 15:23:32 -0800 (PST) From: Juri Linkov Organization: JURTA References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> Date: Wed, 12 Dec 2012 01:17:04 +0200 In-Reply-To: <831uewa9cq.fsf@gnu.org> (Eli Zaretskii's message of "Tue, 11 Dec 2012 17:37:09 +0200") Message-ID: <87txrsw55b.fsf@mail.jurta.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) > I think i fixed this (revision 111021 on the emacs-24 branch), > please test. Thanks, there are no more crashes when using code from http://debbugs.gnu.org/13041#41 Does this mean there are no more obstacles to filling a translation table for ignoring equivalence with all character mappings according to the `decomposition' property? This would be the first step in this direction. From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Dec 2012 03:57:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Juri Linkov Cc: handa@gnu.org, 13084@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135528458013108 (code B ref 13084); Wed, 12 Dec 2012 03:57:01 +0000 Received: (at 13084) by debbugs.gnu.org; 12 Dec 2012 03:56:20 +0000 Received: from localhost ([127.0.0.1]:37936 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TidQi-0003PM-1e for submit@debbugs.gnu.org; Tue, 11 Dec 2012 22:56:20 -0500 Received: from mtaout22.012.net.il ([80.179.55.172]:56700) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TidQf-0003PE-KG for 13084@debbugs.gnu.org; Tue, 11 Dec 2012 22:56:18 -0500 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0MEW00C00G3D0V00@a-mtaout22.012.net.il> for 13084@debbugs.gnu.org; Wed, 12 Dec 2012 05:55:09 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MEW00BZAG7WZE60@a-mtaout22.012.net.il>; Wed, 12 Dec 2012 05:55:09 +0200 (IST) Date: Wed, 12 Dec 2012 05:55:04 +0200 From: Eli Zaretskii In-reply-to: <87txrsw55b.fsf@mail.jurta.org> X-012-Sender: halo1@inter.net.il Message-id: <83lid39b6v.fsf@gnu.org> References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> <87txrsw55b.fsf@mail.jurta.org> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Juri Linkov > Cc: Kenichi Handa , 13084@debbugs.gnu.org > Date: Wed, 12 Dec 2012 01:17:04 +0200 > > > I think i fixed this (revision 111021 on the emacs-24 branch), > > please test. > > Thanks, there are no more crashes when using code from > http://debbugs.gnu.org/13041#41 > > Does this mean there are no more obstacles to filling a translation table > for ignoring equivalence with all character mappings according to the > `decomposition' property? This would be the first step in this direction. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.172 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4969] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.7 (/) > From: Juri Linkov > Cc: Kenichi Handa , 13084@debbugs.gnu.org > Date: Wed, 12 Dec 2012 01:17:04 +0200 > > > I think i fixed this (revision 111021 on the emacs-24 branch), > > please test. > > Thanks, there are no more crashes when using code from > http://debbugs.gnu.org/13041#41 > > Does this mean there are no more obstacles to filling a translation table > for ignoring equivalence with all character mappings according to the > `decomposition' property? This would be the first step in this direction. I'm not sure I understand what you are asking. Please show more details. From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Juri Linkov Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Dec 2012 09:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii Cc: handa@gnu.org, 13084@debbugs.gnu.org Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135530492413433 (code B ref 13084); Wed, 12 Dec 2012 09:36:02 +0000 Received: (at 13084) by debbugs.gnu.org; 12 Dec 2012 09:35:24 +0000 Received: from localhost ([127.0.0.1]:38154 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tiiip-0003Ub-V0 for submit@debbugs.gnu.org; Wed, 12 Dec 2012 04:35:24 -0500 Received: from ps18281.dreamhost.com ([69.163.218.105]:35209 helo=ps18281.dreamhostps.com) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tiiin-0003UO-Ps for 13084@debbugs.gnu.org; Wed, 12 Dec 2012 04:35:22 -0500 Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id B7419451C15E; Wed, 12 Dec 2012 01:34:33 -0800 (PST) From: Juri Linkov Organization: JURTA References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> <87txrsw55b.fsf@mail.jurta.org> <83lid39b6v.fsf@gnu.org> Date: Wed, 12 Dec 2012 11:27:50 +0200 In-Reply-To: <83lid39b6v.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 12 Dec 2012 05:55:04 +0200") Message-ID: <871uevsknt.fsf@mail.jurta.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) >> Does this mean there are no more obstacles to filling a translation ta= ble >> for ignoring equivalence with all character mappings according to the >> `decomposition' property? This would be the first step in this direct= ion. > > I'm not sure I understand what you are asking. Please show more detail= s. There is confusion with the word `equivalence'. Currently there exists the case equivalence table in the case table (`case_eqv_table'). Implementing a diacritic search in bug#13041 requires adding a new similar table. I don't know what would be a good name: `decomposition_eqv_table' or `normalization_eqv_table' or something bette= r. I'm unfamiliar with the details of `search_buffer', but in principle using two tables in the macro `TRANSLATE' could implement a diacritic search where at the first step the character will be translated using `decomposition_eqv_table', and after that the resulting character will be translated using `case_eqv_table'. So the dataflow to get the canonical character will be =C1 -> A -> a. If `case-fold-search' is nil, then =C1 -> A. If a new variable `decomposition-search' (or `normalized-search') is nil then =C1 -> =E1. From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: martin rudalics Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Dec 2012 10:23:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Juri Linkov Cc: Eli Zaretskii , 13084@debbugs.gnu.org Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135530774817847 (code B ref 13084); Wed, 12 Dec 2012 10:23:01 +0000 Received: (at 13084) by debbugs.gnu.org; 12 Dec 2012 10:22:28 +0000 Received: from localhost ([127.0.0.1]:38195 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TijSO-0004do-3g for submit@debbugs.gnu.org; Wed, 12 Dec 2012 05:22:28 -0500 Received: from mailout-de.gmx.net ([213.165.64.22]:34258) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1TijSL-0004de-Af for 13084@debbugs.gnu.org; Wed, 12 Dec 2012 05:22:26 -0500 Received: (qmail invoked by alias); 12 Dec 2012 10:21:38 -0000 Received: from 62-47-36-179.adsl.highway.telekom.at (EHLO [62.47.36.179]) [62.47.36.179] by mail.gmx.net (mp032) with SMTP; 12 Dec 2012 11:21:38 +0100 X-Authenticated: #14592706 X-Provags-ID: V01U2FsdGVkX18w8NQJuf8wMEehsnFfVBK+29jIMGN6sU+q+Z1O5c gDmeOQXoTEXLOc Message-ID: <50C85AB0.30009@gmx.at> Date: Wed, 12 Dec 2012 11:21:36 +0100 From: martin rudalics MIME-Version: 1.0 References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> <87txrsw55b.fsf@mail.jurta.org> <83lid39b6v.fsf@gnu.org> <871uevsknt.fsf@mail.jurta.org> In-Reply-To: <871uevsknt.fsf@mail.jurta.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable X-Y-GMX-Trusted: 0 X-Spam-Score: 0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) > So the dataflow to get the canonical character will be =C1 -> A -> a. > If `case-fold-search' is nil, then =C1 -> A. If a new variable > `decomposition-search' (or `normalized-search') is nil then =C1 -> =E1= =2E Any such table should allow handling asymmetric searches: That is, searching for "=E1ba" should match "=E1ba" "=E1b=E0" and "=E1b=E1" but no= t "aba" or "=E0b=E1". Can we do that? martin From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Juri Linkov Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Dec 2012 10:39:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: martin rudalics Cc: Eli Zaretskii , 13084@debbugs.gnu.org Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135530872119345 (code B ref 13084); Wed, 12 Dec 2012 10:39:01 +0000 Received: (at 13084) by debbugs.gnu.org; 12 Dec 2012 10:38:41 +0000 Received: from localhost ([127.0.0.1]:38201 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tiji4-00051x-FO for submit@debbugs.gnu.org; Wed, 12 Dec 2012 05:38:40 -0500 Received: from ps18281.dreamhost.com ([69.163.218.105]:34908 helo=ps18281.dreamhostps.com) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tiji1-00051p-5b for 13084@debbugs.gnu.org; Wed, 12 Dec 2012 05:38:38 -0500 Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id 385DC451C165; Wed, 12 Dec 2012 02:37:49 -0800 (PST) From: Juri Linkov Organization: JURTA References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> <87txrsw55b.fsf@mail.jurta.org> <83lid39b6v.fsf@gnu.org> <871uevsknt.fsf@mail.jurta.org> <50C85AB0.30009@gmx.at> Date: Wed, 12 Dec 2012 12:31:57 +0200 In-Reply-To: <50C85AB0.30009@gmx.at> (martin rudalics's message of "Wed, 12 Dec 2012 11:21:36 +0100") Message-ID: <87y5h3mu9i.fsf@mail.jurta.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) >> So the dataflow to get the canonical character will be =C1 -> A -> a. >> If `case-fold-search' is nil, then =C1 -> A. If a new variable >> `decomposition-search' (or `normalized-search') is nil then =C1 -> =E1= . > > Any such table should allow handling asymmetric searches: That is, > searching for "=E1ba" should match "=E1ba" "=E1b=E0" and "=E1b=E1" but = not "aba" or > "=E0b=E1". Can we do that? IIUC what you mean is something like `search-upper-case' where upper case chars disable case fold searching, so "Aba" should match "Aba" and "AbA" but not "aba". From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: martin rudalics Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Dec 2012 12:45:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Juri Linkov Cc: Eli Zaretskii , 13084@debbugs.gnu.org Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.13553162602520 (code B ref 13084); Wed, 12 Dec 2012 12:45:02 +0000 Received: (at 13084) by debbugs.gnu.org; 12 Dec 2012 12:44:20 +0000 Received: from localhost ([127.0.0.1]:38362 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tilff-0000ea-HG for submit@debbugs.gnu.org; Wed, 12 Dec 2012 07:44:19 -0500 Received: from mailout-de.gmx.net ([213.165.64.22]:36185) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1Tilfc-0000eS-RY for 13084@debbugs.gnu.org; Wed, 12 Dec 2012 07:44:17 -0500 Received: (qmail invoked by alias); 12 Dec 2012 12:43:29 -0000 Received: from 62-47-36-179.adsl.highway.telekom.at (EHLO [62.47.36.179]) [62.47.36.179] by mail.gmx.net (mp020) with SMTP; 12 Dec 2012 13:43:29 +0100 X-Authenticated: #14592706 X-Provags-ID: V01U2FsdGVkX191nGQIRMV1BESh4jgGtEZJFkjonQYhgFnUVu622V NCMU2a6eDolepa Message-ID: <50C87BED.8080608@gmx.at> Date: Wed, 12 Dec 2012 13:43:25 +0100 From: martin rudalics MIME-Version: 1.0 References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> <87txrsw55b.fsf@mail.jurta.org> <83lid39b6v.fsf@gnu.org> <871uevsknt.fsf@mail.jurta.org> <50C85AB0.30009@gmx.at> <87y5h3mu9i.fsf@mail.jurta.org> In-Reply-To: <87y5h3mu9i.fsf@mail.jurta.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Spam-Score: 0.8 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) > IIUC what you mean is something like `search-upper-case' > where upper case chars disable case fold searching, > so "Aba" should match "Aba" and "AbA" but not "aba". Yes. I think that's a very good explanation in Emacs terms. martin From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 12 Dec 2012 16:49:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Juri Linkov Cc: handa@gnu.org, 13084@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135533092825640 (code B ref 13084); Wed, 12 Dec 2012 16:49:01 +0000 Received: (at 13084) by debbugs.gnu.org; 12 Dec 2012 16:48:48 +0000 Received: from localhost ([127.0.0.1]:39264 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TipUF-0006fU-M3 for submit@debbugs.gnu.org; Wed, 12 Dec 2012 11:48:47 -0500 Received: from mtaout20.012.net.il ([80.179.55.166]:37058) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TipUD-0006fM-Er for 13084@debbugs.gnu.org; Wed, 12 Dec 2012 11:48:46 -0500 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0MEX00H00FXG2Y00@a-mtaout20.012.net.il> for 13084@debbugs.gnu.org; Wed, 12 Dec 2012 18:47:19 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MEX00HXJFYU2810@a-mtaout20.012.net.il>; Wed, 12 Dec 2012 18:47:19 +0200 (IST) Date: Wed, 12 Dec 2012 18:47:16 +0200 From: Eli Zaretskii In-reply-to: <871uevsknt.fsf@mail.jurta.org> Message-id: <83a9tj8bfv.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: QUOTED-PRINTABLE X-012-Sender: halo1@inter.net.il References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> <87txrsw55b.fsf@mail.jurta.org> <83lid39b6v.fsf@gnu.org> <871uevsknt.fsf@mail.jurta.org> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Juri Linkov > Cc: handa@gnu.org, 13084@debbugs.gnu.org > Date: Wed, 12 Dec 2012 11:27:50 +0200 > > >> Does this mean there are no more obstacles to filling a translation table > >> for ignoring equivalence with all character mappings according to the > >> `decomposition' property? This would be the first step in this direction. > > > > I'm not sure I understand what you are asking. Please show more details. > > There is confusion with the word `equivalence'. Currently there > exists the case equivalence table in the case table (`case_eqv_table'). > Implementing a diacritic search in bug#13041 requires adding a new > similar table. I don't know what would be a good name: > `decomposition_eqv_table' or `normalization_eqv_table' or something better. > > I'm unfamiliar with the details of `search_buffer', but in principle > using two tables in the macro `TRANSLATE' could implement a diacritic > search where at the first step the character will be translated using > `decomposition_eqv_table', and after that the resulting character > will be translated using `case_eqv_table'. > > So the dataflow to get the canonical character will be =?UTF-8?Q?=C1?= -> A -> a. > If `case-fold-search' is nil, then =?UTF-8?Q?=C1?= -> A. If a new variable > `decomposition-search' (or `normalized-search') is nil then =?UTF-8?Q?=C1?= -> =?UTF-8?Q?=E1.?= [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.5000] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Juri Linkov > Cc: handa@gnu.org, 13084@debbugs.gnu.org > Date: Wed, 12 Dec 2012 11:27:50 +0200 > > >> Does this mean there are no more obstacles to filling a translation table > >> for ignoring equivalence with all character mappings according to the > >> `decomposition' property? This would be the first step in this direction. > > > > I'm not sure I understand what you are asking. Please show more details. > > There is confusion with the word `equivalence'. Currently there > exists the case equivalence table in the case table (`case_eqv_table'). > Implementing a diacritic search in bug#13041 requires adding a new > similar table. I don't know what would be a good name: > `decomposition_eqv_table' or `normalization_eqv_table' or something better. > > I'm unfamiliar with the details of `search_buffer', but in principle > using two tables in the macro `TRANSLATE' could implement a diacritic > search where at the first step the character will be translated using > `decomposition_eqv_table', and after that the resulting character > will be translated using `case_eqv_table'. > > So the dataflow to get the canonical character will be =?UTF-8?Q?=C1?= -> A -> a. > If `case-fold-search' is nil, then =?UTF-8?Q?=C1?= -> A. If a new variable > `decomposition-search' (or `normalized-search') is nil then =?UTF-8?Q?=C1?= -> =?UTF-8?Q?=E1.?= [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.166 listed in list.dnswl.org] 0.0 SINGLE_HEADER_2K A single header contains 2K-3K characters 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4993] > From: Juri Linkov > Cc: handa@gnu.org, 13084@debbugs.gnu.org > Date: Wed, 12 Dec 2012 11:27:50 +0200 >=20 > >> Does this mean there are no more obstacles to filling a translat= ion table > >> for ignoring equivalence with all character mappings according t= o the > >> `decomposition' property? This would be the first step in this = direction. > > > > I'm not sure I understand what you are asking. Please show more = details. >=20 > There is confusion with the word `equivalence'. Currently there > exists the case equivalence table in the case table (`case_eqv_tabl= e'). > Implementing a diacritic search in bug#13041 requires adding a new > similar table. I don't know what would be a good name: > `decomposition_eqv_table' or `normalization_eqv_table' or something= better. >=20 > I'm unfamiliar with the details of `search_buffer', but in principl= e > using two tables in the macro `TRANSLATE' could implement a diacrit= ic > search where at the first step the character will be translated usi= ng > `decomposition_eqv_table', and after that the resulting character > will be translated using `case_eqv_table'. >=20 > So the dataflow to get the canonical character will be =C1 -> A -> = a. > If `case-fold-search' is nil, then =C1 -> A. If a new variable > `decomposition-search' (or `normalized-search') is nil then =C1 -> = =E1. OK, all this is now clear and agreed. So what did you mean by "no more obstacles" above? The obstacles I see is that case tables aren'= t up to the job because they don't support ignoring of characters, and the code in search.c cannot handle ignoring even if the table did support that. These obstacles still stand. From unknown Tue Jun 24 01:42:16 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.428 (Entity 5.428) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Juri Linkov Subject: bug#13084: closed (Re: bug#13084: boyer_moore crashes with certain characters in the case table) Message-ID: References: <876246swgq.fsf@mail.jurta.org> <87txs1l4kg.fsf@mail.jurta.org> X-Gnu-PR-Message: they-closed 13084 X-Gnu-PR-Package: emacs Reply-To: 13084@debbugs.gnu.org Date: Wed, 12 Dec 2012 23:11:01 +0000 Content-Type: multipart/mixed; boundary="----------=_1355353861-19949-1" This is a multi-part message in MIME format... ------------=_1355353861-19949-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #13084: boyer_moore crashes with certain characters in the case table which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 13084@debbugs.gnu.org. --=20 13084: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D13084 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1355353861-19949-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 13084-done) by debbugs.gnu.org; 12 Dec 2012 23:10:35 +0000 Received: from localhost ([127.0.0.1]:39670 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TivRi-0005B4-EP for submit@debbugs.gnu.org; Wed, 12 Dec 2012 18:10:34 -0500 Received: from ps18281.dreamhost.com ([69.163.218.105]:50137 helo=ps18281.dreamhostps.com) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TivRg-0005Ax-O6 for 13084-done@debbugs.gnu.org; Wed, 12 Dec 2012 18:10:33 -0500 Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id A382E451E1B6; Wed, 12 Dec 2012 15:09:41 -0800 (PST) From: Juri Linkov To: Eli Zaretskii Subject: Re: bug#13084: boyer_moore crashes with certain characters in the case table Organization: JURTA References: <87txs1l4kg.fsf@mail.jurta.org> <831uewa9cq.fsf@gnu.org> <87txrsw55b.fsf@mail.jurta.org> <83lid39b6v.fsf@gnu.org> <871uevsknt.fsf@mail.jurta.org> <83a9tj8bfv.fsf@gnu.org> Date: Thu, 13 Dec 2012 01:05:09 +0200 In-Reply-To: <83a9tj8bfv.fsf@gnu.org> (Eli Zaretskii's message of "Wed, 12 Dec 2012 18:47:16 +0200") Message-ID: <876246swgq.fsf@mail.jurta.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 13084-done Cc: handa@gnu.org, 13084-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.8 (/) > So what did you mean by "no more obstacles" above? By obstacles I meant crashes that you fixed. Thanks for that. I'm closing this bug. ------------=_1355353861-19949-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 5 Dec 2012 00:36:47 +0000 Received: from localhost ([127.0.0.1]:53851 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tg2yl-00012a-4c for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:47 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43098) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tg2yi-00012T-Oq for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tg2yd-0005jc-FY for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:40 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-101.9 required=5.0 tests=BAYES_00, USER_IN_WHITELIST autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:44832) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tg2yd-0005jY-CX for submit@debbugs.gnu.org; Tue, 04 Dec 2012 19:36:39 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54460) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tg2yb-0004LU-Tw for bug-gnu-emacs@gnu.org; Tue, 04 Dec 2012 19:36:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Tg2ya-0005jG-UF for bug-gnu-emacs@gnu.org; Tue, 04 Dec 2012 19:36:37 -0500 Received: from ps18281.dreamhost.com ([69.163.218.105]:35584 helo=ps18281.dreamhostps.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Tg2ya-0005jB-P0 for bug-gnu-emacs@gnu.org; Tue, 04 Dec 2012 19:36:36 -0500 Received: from localhost (ps18281.dreamhostps.com [69.163.218.105]) by ps18281.dreamhostps.com (Postfix) with ESMTP id 6E2CB46FA014 for ; Tue, 4 Dec 2012 16:36:34 -0800 (PST) From: Juri Linkov To: bug-gnu-emacs@gnu.org Subject: boyer_moore crashes with certain characters in the case table Organization: JURTA Date: Wed, 05 Dec 2012 02:34:39 +0200 Message-ID: <87txs1l4kg.fsf@mail.jurta.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) The minimal reproducible recipe for crashes in boyer_moore noticed in bug#13041: 1. emacs -Q 2. Eval in *scratch*: (let ((table (standard-case-table)) canon) (setq canon (copy-sequence table)) (aset canon #xff59 ?y) (set-char-table-extra-slot table 1 canon) (set-char-table-extra-slot table 2 nil) (set-standard-case-table table)) 3. Start an activity that includes a search, e.g. `C-x 8 RET TAB' The crash in boyer_moore is caused by fullwidth characters like #xff59 whose Unicode properties are: name: FULLWIDTH LATIN SMALL LETTER Y decomposition: (wide 121) (wide 'y') However, the crash doesn't occur when the same fullwidth characters are set to their downcase counterparts in lisp/international/characters.el: ;; Fullwidth Latin (setq c #xff21) (while (<= c #xff3a) (set-case-syntax-pair c (+ c #x20) tbl) (modify-category-entry c ?l) (modify-category-entry (+ c #x20) ?l) (setq c (1+ c))) ------------=_1355353861-19949-1-- From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table References: <87txs1l4kg.fsf@mail.jurta.org> Resent-From: Kenichi Handa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 13 Dec 2012 13:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii Cc: juri@jurta.org, 13084@debbugs.gnu.org Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135540617929114 (code B ref 13084); Thu, 13 Dec 2012 13:43:02 +0000 Received: (at 13084) by debbugs.gnu.org; 13 Dec 2012 13:42:59 +0000 Received: from localhost ([127.0.0.1]:40357 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tj93z-0007ZX-7H for submit@debbugs.gnu.org; Thu, 13 Dec 2012 08:42:59 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:41131) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tj93x-0007ZQ-MK for 13084@debbugs.gnu.org; Thu, 13 Dec 2012 08:42:58 -0500 Received: from 253.240.accsnet.ne.jp ([202.220.240.253]:54722 helo=mongkok) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1Tj936-0002L7-0M; Thu, 13 Dec 2012 08:42:04 -0500 From: Kenichi Handa In-Reply-To: <831uewa9cq.fsf@gnu.org> (message from Eli Zaretskii on Tue, 11 Dec 2012 17:37:09 +0200) Date: Thu, 13 Dec 2012 22:39:29 +0900 Message-ID: <87r4mu6pgu.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -4.2 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -5.0 (-----) In article <831uewa9cq.fsf@gnu.org>, Eli Zaretskii writes: > In addition, I'd suggest that Handa-san (or someone else) takes a good > look at the code that sets up the simple_translate table in > boyer_moore, because the constants there, like 0200 and 0x3F, and all > the talk about characters that belong "to the same charset and row" > smell of pre-Unicode (a.k.a. "MULE") representation of characters. > For now, I disabled boyer_moore for unibyte characters beyond 160, > because my reading of the code is that simple_translate and the > supporting code cannot handle that. Maybe I'm wrong. I have not yet checked the code, but what I remember is that search_buffer checks the search string and decides which to use; boyer_moore or simple_search. If all equivalent characters of all non-ASCII characters in the search string are in the same character group, we can use boyer_moore. Here, A and B belongs to the same character group iff A and B has the same multibyte sequence except for the last byte. In this condition, we should be able to use the table simple_translate. --- Kenichi Handa handa@gnu.org From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 13 Dec 2012 17:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Kenichi Handa Cc: juri@jurta.org, 13084@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135541998725030 (code B ref 13084); Thu, 13 Dec 2012 17:34:02 +0000 Received: (at 13084) by debbugs.gnu.org; 13 Dec 2012 17:33:07 +0000 Received: from localhost ([127.0.0.1]:41213 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TjCeh-0006Vf-6o for submit@debbugs.gnu.org; Thu, 13 Dec 2012 12:33:07 -0500 Received: from mtaout21.012.net.il ([80.179.55.169]:53351) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TjCee-0006VT-Rg for 13084@debbugs.gnu.org; Thu, 13 Dec 2012 12:33:05 -0500 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MEZ00400CK2OO00@a-mtaout21.012.net.il> for 13084@debbugs.gnu.org; Thu, 13 Dec 2012 19:32:10 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MEZ004AMCPLE6C0@a-mtaout21.012.net.il>; Thu, 13 Dec 2012 19:32:10 +0200 (IST) Date: Thu, 13 Dec 2012 19:32:08 +0200 From: Eli Zaretskii In-reply-to: <87r4mu6pgu.fsf@gnu.org> X-012-Sender: halo1@inter.net.il Message-id: <83obhxoo2v.fsf@gnu.org> References: <87r4mu6pgu.fsf@gnu.org> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Kenichi Handa > Cc: juri@jurta.org, 13084@debbugs.gnu.org > Date: Thu, 13 Dec 2012 22:39:29 +0900 > > I have not yet checked the code, but what I remember is that > search_buffer checks the search string and decides which to > use; boyer_moore or simple_search. If all equivalent > characters of all non-ASCII characters in the search string > are in the same character group, we can use boyer_moore. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.169 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.5000] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Kenichi Handa > Cc: juri@jurta.org, 13084@debbugs.gnu.org > Date: Thu, 13 Dec 2012 22:39:29 +0900 > > I have not yet checked the code, but what I remember is that > search_buffer checks the search string and decides which to > use; boyer_moore or simple_search. If all equivalent > characters of all non-ASCII characters in the search string > are in the same character group, we can use boyer_moore. [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.169 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4893] > From: Kenichi Handa > Cc: juri@jurta.org, 13084@debbugs.gnu.org > Date: Thu, 13 Dec 2012 22:39:29 +0900 > > I have not yet checked the code, but what I remember is that > search_buffer checks the search string and decides which to > use; boyer_moore or simple_search. If all equivalent > characters of all non-ASCII characters in the search string > are in the same character group, we can use boyer_moore. Yes, that's my reading of the code as well. > Here, A and B belongs to the same character group iff A and > B has the same multibyte sequence except for the last byte. > In this condition, we should be able to use the table > simple_translate. OK, then maybe just the comments need to be fixed. They shouldn't talk about "charset" and "row", which are undefined in Unicode Emacs. They should instead use terminology that correspond to UTF-8 multibyte representation of characters we use today. From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table References: <87txs1l4kg.fsf@mail.jurta.org> Resent-From: Kenichi Handa Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 15 Dec 2012 13:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii Cc: juri@jurta.org, 13084@debbugs.gnu.org Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135557766917730 (code B ref 13084); Sat, 15 Dec 2012 13:22:01 +0000 Received: (at 13084) by debbugs.gnu.org; 15 Dec 2012 13:21:09 +0000 Received: from localhost ([127.0.0.1]:43485 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tjrfw-0004bu-1H for submit@debbugs.gnu.org; Sat, 15 Dec 2012 08:21:08 -0500 Received: from fencepost.gnu.org ([208.118.235.10]:37692) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Tjrfs-0004bm-QQ for 13084@debbugs.gnu.org; Sat, 15 Dec 2012 08:21:06 -0500 Received: from 253.240.accsnet.ne.jp ([202.220.240.253]:59484 helo=mongkok) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1Tjrep-0002cf-Cm; Sat, 15 Dec 2012 08:20:00 -0500 From: Kenichi Handa In-Reply-To: <83obhxoo2v.fsf@gnu.org> (message from Eli Zaretskii on Thu, 13 Dec 2012 19:32:08 +0200) Date: Sat, 15 Dec 2012 22:17:17 +0900 Message-ID: <87ehir78v6.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -4.2 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) In article <83obhxoo2v.fsf@gnu.org>, Eli Zaretskii writes: > > Here, A and B belongs to the same character group iff A and > > B has the same multibyte sequence except for the last byte. > > In this condition, we should be able to use the table > > simple_translate. > OK, then maybe just the comments need to be fixed. They shouldn't > talk about "charset" and "row", which are undefined in Unicode Emacs. > They should instead use terminology that correspond to UTF-8 multibyte > representation of characters we use today. I've just committed this change. How is it? === modified file 'src/search.c' --- src/search.c 2012-10-10 20:09:47 +0000 +++ src/search.c 2012-12-15 13:04:46 +0000 @@ -1313,8 +1313,11 @@ non-nil, we can use boyer-moore search only if TRT can be represented by the byte array of 256 elements. For that, all non-ASCII case-equivalents of all case-sensitive - characters in STRING must belong to the same charset and - row. */ + characters in STRING must belong to the same character + group (two characters belong to the same group iff their + multibyte forms are the same except for the last byte; + i.e. every 64 characters form a group; U+0000..U+003F, + U+0040..U+007F, U+0080..U+00BF, ...). */ while (--len >= 0) { --- Kenichi Handa handa@gnu.org From unknown Tue Jun 24 01:42:16 2025 X-Loop: help-debbugs@gnu.org Subject: bug#13084: boyer_moore crashes with certain characters in the case table Resent-From: Eli Zaretskii Original-Sender: debbugs-submit-bounces@debbugs.gnu.org Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 15 Dec 2012 13:58:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 13084 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Kenichi Handa Cc: juri@jurta.org, 13084@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 13084-submit@debbugs.gnu.org id=B13084.135557983621157 (code B ref 13084); Sat, 15 Dec 2012 13:58:01 +0000 Received: (at 13084) by debbugs.gnu.org; 15 Dec 2012 13:57:16 +0000 Received: from localhost ([127.0.0.1]:43523 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TjsEt-0005VC-9T for submit@debbugs.gnu.org; Sat, 15 Dec 2012 08:57:15 -0500 Received: from mtaout21.012.net.il ([80.179.55.169]:50695) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TjsEp-0005V2-V9 for 13084@debbugs.gnu.org; Sat, 15 Dec 2012 08:57:13 -0500 Received: from conversion-daemon.a-mtaout21.012.net.il by a-mtaout21.012.net.il (HyperSendmail v2007.08) id <0MF200F00RMOZ100@a-mtaout21.012.net.il> for 13084@debbugs.gnu.org; Sat, 15 Dec 2012 15:55:01 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout21.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0MF200F8ARZPZ610@a-mtaout21.012.net.il>; Sat, 15 Dec 2012 15:55:01 +0200 (IST) Date: Sat, 15 Dec 2012 15:55:06 +0200 From: Eli Zaretskii In-reply-to: <87ehir78v6.fsf@gnu.org> X-012-Sender: halo1@inter.net.il Message-id: <83d2ybmnd1.fsf@gnu.org> References: <87ehir78v6.fsf@gnu.org> X-Spam-Score: 1.5 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: > From: Kenichi Handa > Cc: juri@jurta.org, 13084@debbugs.gnu.org > Date: Sat, 15 Dec 2012 22:17:17 +0900 > > In article <83obhxoo2v.fsf@gnu.org>, Eli Zaretskii writes: > > > > Here, A and B belongs to the same character group iff A and > > > B has the same multibyte sequence except for the last byte. > > > In this condition, we should be able to use the table > > > simple_translate. > > > OK, then maybe just the comments need to be fixed. They shouldn't > > talk about "charset" and "row", which are undefined in Unicode Emacs. > > They should instead use terminology that correspond to UTF-8 multibyte > > representation of characters we use today. > > I've just committed this change. How is it? [...] Content analysis details: (1.5 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [80.179.55.169 listed in list.dnswl.org] 0.7 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4986] X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: 0.7 (/) > From: Kenichi Handa > Cc: juri@jurta.org, 13084@debbugs.gnu.org > Date: Sat, 15 Dec 2012 22:17:17 +0900 > > In article <83obhxoo2v.fsf@gnu.org>, Eli Zaretskii writes: > > > > Here, A and B belongs to the same character group iff A and > > > B has the same multibyte sequence except for the last byte. > > > In this condition, we should be able to use the table > > > simple_translate. > > > OK, then maybe just the comments need to be fixed. They shouldn't > > talk about "charset" and "row", which are undefined in Unicode Emacs. > > They should instead use terminology that correspond to UTF-8 multibyte > > representation of characters we use today. > > I've just committed this change. How is it? Clear, thanks.