From unknown Fri Aug 15 15:59:09 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Resent-From: Oleksandr Gavenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 10 Sep 2016 08:35:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 24405 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 24405@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.147349645110541 (code B ref -1); Sat, 10 Sep 2016 08:35:01 +0000 Received: (at submit) by debbugs.gnu.org; 10 Sep 2016 08:34:11 +0000 Received: from localhost ([127.0.0.1]:55118 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bidjj-0002jt-SS for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:34:11 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45938) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bidjh-0002jO-Df for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:34:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bidjb-0005E2-6F for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:34:00 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:36020) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bidjb-0005Dt-3B for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:33:59 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36908) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bidjY-0003xF-Vj for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 04:33:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bidjT-0005CT-Pu for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 04:33:55 -0400 Received: from mail-lf0-x236.google.com ([2a00:1450:4010:c07::236]:35988) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bidjT-0005CP-Ig for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 04:33:51 -0400 Received: by mail-lf0-x236.google.com with SMTP id g62so58297470lfe.3 for ; Sat, 10 Sep 2016 01:33:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=kfiiHCFb2s2jGKfkofEKs6Iv1g1K6v+loeE8dtLdwgI=; b=yTZ33QH7C9knKnFIoQBoMAKhiQ+dEUAIL2XrMes8TiuCE65kwV7sfldrU7w1eGG8IM ZZc541y7eU4F8xT0OV+3G035qN0uaY1NWEAmPALyDyWUfLinowe5lvKOohV+pT4KdzCN 8E18Wmnnvc4RL4SZmtKaR5iFTcG2nuS9k0eVqMLNzYGgAWleOPhH83M8q9fRp4YumB1P zpQmNQVF+bIQYzOPx2toM83xm/K4KFe5rPKm8dQkCIzDgHxv/rzCHo+B7sI8HD+eA+j4 5mBEbHtvf73ce3OrywUnPIBc18nHOdFvb0nOcwrlED7vA2mupqpT1x/5AxHUX6k0Kx4A bw/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=kfiiHCFb2s2jGKfkofEKs6Iv1g1K6v+loeE8dtLdwgI=; b=RVxuK5Yz5eov1OXDxtxmoEl0WIbX9lySdUrb2WYei5BHyGtvSvaeo3mURhTPrCuu9Q LjSkcoQ/gHgdQdV4tHg0vRN5KA9qji191MyAJynTFIro1Q42XLNoOKa9xRyqfSW/c5K7 ANThy8CqqHGDN47urd4o2ZBXc0k5HUZdUDCyjlE/SUHkFoMklXQ3q2/qXC4tJP3JeUWw p1EICoShgByGi2msFSB7VDk2Ny8CYn6sw5FABUXJbL+LkdrNo3WAw/z6e8hdA5VjGMVy uh28bjD1yed4Vd1Suo+x9benJzs1+gE2KX/An8Qxh24/K41TB9xnrvVrdwJGwrmEAtXn khNg== X-Gm-Message-State: AE9vXwMpT/qy5EFKZpBmjV/eGxN9BjhyViuCh0Aw7PNrvLqSGOLyY5CQS6deauuhwm4zwQ== X-Received: by 10.46.32.227 with SMTP id g96mr415569lji.30.1473496429693; Sat, 10 Sep 2016 01:33:49 -0700 (PDT) Received: from desktop ([46.185.21.165]) by smtp.gmail.com with ESMTPSA id b71sm1301099lfb.42.2016.09.10.01.33.48 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 10 Sep 2016 01:33:48 -0700 (PDT) From: Oleksandr Gavenko Date: Sat, 10 Sep 2016 11:33:45 +0300 Message-ID: <87mvjgupau.fsf@gavenkoa.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Evaluate following form by C-x C-e: (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l))) (word-separating-categories nil)) (forward-word)) Hello=D0=9F=D1=80=D0=B8=D0=B2LL=D0=B6=C9=AA=C9=99=CA=8Ahel=C9=99=CA=8Aai= =C9=AAa My pointer stopped between =CA=8Ah. I have: (aref char-script-table ?=CA=8A) phonetic (aref char-script-table ?h) latin (aref char-script-table ?=D0=B6) cyrillic (category-set-mnemonics (char-category-set ?=CA=8A)) ".Ljl" (category-set-mnemonics (char-category-set ?h)) ".Lalr" (category-docstring ?y) "Cyrillic" (category-docstring ?l) "Latin" I expect that point moved to last character before new line. Seems that: (?l . ?y) (?y . ?l) has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic scripts but refused to move through Latin/Phonetic scripts. If it is intended behavior how will I make Emacs to move across Latin/Phone= tic scripts? See also: http://emacs.stackexchange.com/questions/21131/does-word-syntax-take-scri= pt-into-account In GNU Emacs 24.5.1 (x86_64-pc-linux-gnu, GTK+ Version 3.18.6) of 2016-01-22 on binet, modified by Debian Windowing system distributor `The X.Org Foundation', version 11.0.11803000 System Description: Debian GNU/Linux testing (stretch) --=20 http://defun.work/ From unknown Fri Aug 15 15:59:09 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 10 Sep 2016 10:06:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24405 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Oleksandr Gavenko Cc: 24405@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 24405-submit@debbugs.gnu.org id=B24405.147350193918850 (code B ref 24405); Sat, 10 Sep 2016 10:06:02 +0000 Received: (at 24405) by debbugs.gnu.org; 10 Sep 2016 10:05:39 +0000 Received: from localhost ([127.0.0.1]:55150 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bifAI-0004tx-Rc for submit@debbugs.gnu.org; Sat, 10 Sep 2016 06:05:39 -0400 Received: from eggs.gnu.org ([208.118.235.92]:59627) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bifAH-0004tm-R0 for 24405@debbugs.gnu.org; Sat, 10 Sep 2016 06:05:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bifA9-00057z-FT for 24405@debbugs.gnu.org; Sat, 10 Sep 2016 06:05:32 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:57998) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bifA0-000566-Hw; Sat, 10 Sep 2016 06:05:20 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:2439 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bif9y-0005jt-0o; Sat, 10 Sep 2016 06:05:19 -0400 Date: Sat, 10 Sep 2016 13:05:09 +0300 Message-Id: <83lgz083ze.fsf@gnu.org> From: Eli Zaretskii In-reply-to: <87mvjgupau.fsf@gavenkoa.example.com> (message from Oleksandr Gavenko on Sat, 10 Sep 2016 11:33:45 +0300) References: <87mvjgupau.fsf@gavenkoa.example.com> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) tags 24405 + notabug thanks > From: Oleksandr Gavenko > Date: Sat, 10 Sep 2016 11:33:45 +0300 > > Evaluate following form by C-x C-e: > > (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l))) > (word-separating-categories nil)) > (forward-word)) > > HelloПривLLжɪəʊheləʊaiɪa > > My pointer stopped between ʊh. > > I have: > > (aref char-script-table ?ʊ) phonetic > (aref char-script-table ?h) latin > (aref char-script-table ?ж) cyrillic > > (category-set-mnemonics (char-category-set ?ʊ)) ".Ljl" > (category-set-mnemonics (char-category-set ?h)) ".Lalr" > > (category-docstring ?y) "Cyrillic" > (category-docstring ?l) "Latin" > > I expect that point moved to last character before new line. > > Seems that: > > (?l . ?y) (?y . ?l) > > has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic > scripts but refused to move through Latin/Phonetic scripts. > > If it is intended behavior how will I make Emacs to move across Latin/Phonetic > scripts? You can't do this for 2 characters that belong to different scripts, but have the same categories in their category sets. Those two characters both have the 'l' (Latin) category in their sets, so you cannot force Emacs to consider them not as word boundary. For the same reason, including a cons cell whose members are identical, such as (?l . ?l), has no effect. This is the intended behavior, yes. The word-combining-categories feature is designed to support specific rare situations with mixing the Far Eastern scripts (e.g., use of Kanji characters in Japanese text), not for arbitrary games with Latin and European scripts. May I ask why do you need to consider the above a single word? In what situation(s) does that make sense? Thanks. From unknown Fri Aug 15 15:59:09 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Resent-From: Oleksandr Gavenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 10 Sep 2016 17:14:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24405 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: notabug To: Eli Zaretskii Cc: 24405@debbugs.gnu.org Received: via spool by 24405-submit@debbugs.gnu.org id=B24405.147352759219101 (code B ref 24405); Sat, 10 Sep 2016 17:14:01 +0000 Received: (at 24405) by debbugs.gnu.org; 10 Sep 2016 17:13:12 +0000 Received: from localhost ([127.0.0.1]:55743 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bilq3-0004y1-Pm for submit@debbugs.gnu.org; Sat, 10 Sep 2016 13:13:11 -0400 Received: from mail-lf0-f46.google.com ([209.85.215.46]:35047) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bilq1-0004xl-No for 24405@debbugs.gnu.org; Sat, 10 Sep 2016 13:13:10 -0400 Received: by mail-lf0-f46.google.com with SMTP id l131so63190675lfl.2 for <24405@debbugs.gnu.org>; Sat, 10 Sep 2016 10:13:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:organization:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=GjP4HoXEmakNU2uzAHIi7AnKJeSD+5CjzzcLYoBZfFo=; b=Tauh5xwryjDq1UwHVPMQWuvoiIuFhKcpfIpXHpWqK+piefS1uNyuNV8MvhKojt/QjN 4Ep8ZvTVK4/n+xh2AEV3/c6BO/g2ogfhiOAjLb9VQTkN9pTRBz/S0Z5jlJGU6nR+eM5k FisBe4/8vJ15uwSP0/5xRmVqNKdtfiJH/n5/T8ZmMGZXMdo8yZdPncAFWMh2yqSP8RUN mg/SaFybjrd0DtirgjSfLKb4ZmVqSe57fYMmudAuAIEJyoSHoo1JrAtJ2z0wTamCkxds U8vr9cvU7qOm5VjT+7EYODch49h2xAiLKx/j4ChYQVIfEuBUup1j3NPjktvaaTHQhkTQ rF2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:organization:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=GjP4HoXEmakNU2uzAHIi7AnKJeSD+5CjzzcLYoBZfFo=; b=d8btGCHp9YytyUiqoHIbtYMnIuuegkolIDTmoAJ3xNLi6whYPaaNeSGjjWqw5JFOst mhXRY8pdH264i3tGTmkU4Hy6iWSkgaSWkEOjDeaVFn412pDCniZAiNrFHgs+aMzkDhtm FkWvAc221xRnTXTvOvSvZIOlqpUTwRAp2wKyeunrKLKpSIeNZaICnGQSkQXw36zez4T6 AvydwWozbWtOnsvFK5SXL07S/x1J7cLHqTOVBU/Jdr1NfALfWilOhLEM91Iu6Yu9Nc6y MpaWmnsa6xYhaLUNA19sxQMm9tlNK7JQfEzHUJUpU5P/w/h5zD58pylY13iKLO1sK1HB xdvQ== X-Gm-Message-State: AE9vXwMS7qzK4GmCZ+2DwkF6AQbZcSrJZeHMArofdN7J+XfCRCbadV39H9I+OR+ebXA1Xg== X-Received: by 10.46.1.170 with SMTP id f42mr2944394lji.50.1473527583287; Sat, 10 Sep 2016 10:13:03 -0700 (PDT) Received: from desktop ([46.185.21.165]) by smtp.gmail.com with ESMTPSA id g201sm1675903lfg.8.2016.09.10.10.13.02 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 10 Sep 2016 10:13:02 -0700 (PDT) From: Oleksandr Gavenko Organization: Oleksandr Gavenko , http://defun.work/ References: <87mvjgupau.fsf@gavenkoa.example.com> <83lgz083ze.fsf@gnu.org> Date: Sat, 10 Sep 2016 20:12:57 +0300 In-Reply-To: <83lgz083ze.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 10 Sep 2016 13:05:09 +0300") Message-ID: <87inu3vfty.fsf@gavenkoa.example.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 1.7 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On 2016-09-10, Eli Zaretskii wrote: > This is the intended behavior, yes. The word-combining-categories > feature is designed to support specific rare situations with mixing > the Far Eastern scripts (e.g., use of Kanji characters in Japanese > text), not for arbitrary games with Latin and European scripts. > > May I ask why do you need to consider the above a single word? In > what situation(s) does that make sense? [...] Content analysis details: (1.7 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 2.4 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.215.46 listed in dnsbl.sorbs.net] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (gavenkoa[at]gmail.com) -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.215.46 listed in list.dnswl.org] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.215.46 listed in wl.mailspike.net] -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.7 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: On 2016-09-10, Eli Zaretskii wrote: > This is the intended behavior, yes. The word-combining-categories > feature is designed to support specific rare situations with mixing > the Far Eastern scripts (e.g., use of Kanji characters in Japanese > text), not for arbitrary games with Latin and European scripts. > > May I ask why do you need to consider the above a single word? In > what situation(s) does that make sense? [...] Content analysis details: (1.7 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 2.4 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.215.46 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.215.46 listed in wl.mailspike.net] -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.215.46 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (gavenkoa[at]gmail.com) -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid On 2016-09-10, Eli Zaretskii wrote: > This is the intended behavior, yes. The word-combining-categories > feature is designed to support specific rare situations with mixing > the Far Eastern scripts (e.g., use of Kanji characters in Japanese > text), not for arbitrary games with Latin and European scripts. > > May I ask why do you need to consider the above a single word? In > what situation(s) does that make sense? I work on dictionary. Dictionary article and supplemented texts uses IPA symbols for word pronunciation. I like with single move to select pronunciation in text like: leap [li=CB=90p] lip [l=C9=AAp] wheel [wi=CB=90l] will [w=C9=AAl] seek [si=CB=90k] sick [s=C9=AAk] It's annoying to move across long mixed words with C-Left, C-Right or C-S-Left, C-S-Right, you may try to move across: international [=CB=8C=C9=AAnt=C9=99r=CB=88n=C3=A6=CA=83=C9=99n=C9=99l] Also I found that some IPA characters marked as latin script: (aref char-script-table ?=C3=A6) latin But it may be discussing because it is usual letter for some languages. As a workaround should I modify char-script-table? Like: (mapc (lambda (ch) (aset char-script-table ch 'latin) (modify-syntax-entr= y ch "w")) '(?=CA=8C ?=C9=99 ?=C9=9C ?=C9=92 ?=C9=9B ?=CE=B8 ?=CA=8A ?=C9=AA ?= =C9=94 ?=C9=91 ?=CA=83 ?=CA=A7 ?=CB=90 ?=CB=88 ?=CB=8C ?=CA=92 ?=C5=8B)) This brings desired behavior but it is unclear if this is fine. Another solution is to invent own: (define-category ?p "Phonetic") and to add it to IPA characters: (mapc (lambda (ch) (modify-category-entry ch "p")) '(?=CA=8C ?=C9=99 ?=C9=9C ?=C9=92 ?=C9=9B ?=CE=B8 ?=CA=8A ?=C9=AA ?= =C9=94 ?=C9=91 ?=CA=83 ?=CA=A7 ?=CB=90 ?=CB=88 ?=CB=8C ?=CA=92 ?=C5=8B)) so it becomes possible to use: (add-to-list 'word-combining-categories '(?p . ?l)) (add-to-list 'word-combining-categories '(?l . ?p)) --=20 http://defun.work/ From unknown Fri Aug 15 15:59:09 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 10 Sep 2016 17:24:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24405 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: notabug To: Oleksandr Gavenko Cc: 24405@debbugs.gnu.org Reply-To: Eli Zaretskii Received: via spool by 24405-submit@debbugs.gnu.org id=B24405.147352823620085 (code B ref 24405); Sat, 10 Sep 2016 17:24:01 +0000 Received: (at 24405) by debbugs.gnu.org; 10 Sep 2016 17:23:56 +0000 Received: from localhost ([127.0.0.1]:55748 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bim0R-0005Dt-QQ for submit@debbugs.gnu.org; Sat, 10 Sep 2016 13:23:56 -0400 Received: from eggs.gnu.org ([208.118.235.92]:60079) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bim0P-0005Dg-LG for 24405@debbugs.gnu.org; Sat, 10 Sep 2016 13:23:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bim0H-0007Iv-ER for 24405@debbugs.gnu.org; Sat, 10 Sep 2016 13:23:48 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_40,RP_MATCHES_RCVD autolearn=disabled version=3.3.2 Received: from fencepost.gnu.org ([2001:4830:134:3::e]:33982) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bim0H-0007Ig-BB; Sat, 10 Sep 2016 13:23:45 -0400 Received: from 84.94.185.246.cable.012.net.il ([84.94.185.246]:4509 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.82) (envelope-from ) id 1bim0D-0007Wk-Gv; Sat, 10 Sep 2016 13:23:43 -0400 Date: Sat, 10 Sep 2016 20:23:25 +0300 Message-Id: <83h99n8y9e.fsf@gnu.org> From: Eli Zaretskii In-reply-to: <87inu3vfty.fsf@gavenkoa.example.com> (message from Oleksandr Gavenko on Sat, 10 Sep 2016 20:12:57 +0300) References: <87mvjgupau.fsf@gavenkoa.example.com> <83lgz083ze.fsf@gnu.org> <87inu3vfty.fsf@gavenkoa.example.com> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-Spam-Score: -6.3 (------) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.3 (------) > From: Oleksandr Gavenko > Cc: 24405@debbugs.gnu.org > Date: Sat, 10 Sep 2016 20:12:57 +0300 > > As a workaround should I modify char-script-table? I'd suggest to write your own word-motion commands. It's not complicated, you can use regular expressions (which understand categories, if you need that). > Another solution is to invent own: > > (define-category ?p "Phonetic") > > and to add it to IPA characters: > > (mapc (lambda (ch) (modify-category-entry ch "p")) > '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ)) > > so it becomes possible to use: > > (add-to-list 'word-combining-categories '(?p . ?l)) > (add-to-list 'word-combining-categories '(?l . ?p)) That'd be my second best advice. But I think regular expressions should provide a better and easier solution. From unknown Fri Aug 15 15:59:09 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Resent-From: Oleksandr Gavenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 11 Sep 2016 11:58:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24405 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: notabug To: Eli Zaretskii Cc: 24405@debbugs.gnu.org Received: via spool by 24405-submit@debbugs.gnu.org id=B24405.147359506817539 (code B ref 24405); Sun, 11 Sep 2016 11:58:02 +0000 Received: (at 24405) by debbugs.gnu.org; 11 Sep 2016 11:57:48 +0000 Received: from localhost ([127.0.0.1]:55957 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bj3ON-0004Yo-VH for submit@debbugs.gnu.org; Sun, 11 Sep 2016 07:57:48 -0400 Received: from mail-lf0-f52.google.com ([209.85.215.52]:34170) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bj3OL-0004Yb-Rh for 24405@debbugs.gnu.org; Sun, 11 Sep 2016 07:57:46 -0400 Received: by mail-lf0-f52.google.com with SMTP id u14so70994261lfd.1 for <24405@debbugs.gnu.org>; Sun, 11 Sep 2016 04:57:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:organization:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=zUjr6HAL8TiZXsFS91d85arRUg36gFxqDZwYDC53ADw=; b=yC6MDcPQvJWgt/JEuIjqPQ66BamfcDkOx2bSNOYFG2hV3d3ZjFUfGxKyZnDEEdeSnu H7e+guQfXRGl7fxcfaPnwdUUpxV7kKVGmRRQ50gKNPHrp24+h/Ad4v0aUYZ6P6YZd008 LqbW+rFKJbcjXg4fHsbtqbh4O5MHSO55GeS6JqGR5r2IsNgJ2/OXdvAoMgyFdbZwBmzS ZF1jwZ/3fQ1rk7569KoMtXDHFAAeg0pVB6P2ib9fNUk1KtUcSMj3CPRQExqp5tONKoVk BVQLZjHsacCmvQJyBfG0kbv6cn5ISJq46VaN8Yx6w/mWJ/m5CwMqxbrRCfFkjrZ/miQe fOMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:organization:references:date :in-reply-to:message-id:user-agent:mime-version :content-transfer-encoding; bh=zUjr6HAL8TiZXsFS91d85arRUg36gFxqDZwYDC53ADw=; b=X7tZIaJsU27tbUerQCcx2AwQ8bsdY+Ajmz8OCqkaAQUUZwHjlBygbe7LrE/xKA7V0E CM0QxNo7o0G5YnjrEIEZvZpfRR3RqJyYz4uqwTRRuwgx0oOE4U1G0TiMEfbVM2GohqQw 3vi0qiuoy/w8OFQv3dL+993U1jydzz0JoIisrOSlOsfiGCzaM1ZG2pxy8yQ6IVZusn9P Vi39zs4sLxQRIpOKgi1LORiy5SuYHAmiK7kTUM8D+aZJ5AaYn04uE5pFpOW3XGKMHRh9 Fh5Q9FlRp0b0Mkjc7pGFYl0mjbOMXQfpO4DrQcWJwN8Qu+O0sa6E1B333NjiFN7J40G7 xaNQ== X-Gm-Message-State: AE9vXwOIHq2DA2kR4OfGHYBodMvwfzFLjma3/2t7G5cE2qdA6luod4vr1Cxe1v7WoGCrrA== X-Received: by 10.25.155.18 with SMTP id d18mr3636123lfe.120.1473595059381; Sun, 11 Sep 2016 04:57:39 -0700 (PDT) Received: from desktop ([46.185.21.165]) by smtp.gmail.com with ESMTPSA id p21sm2292403lfi.4.2016.09.11.04.57.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 11 Sep 2016 04:57:38 -0700 (PDT) From: Oleksandr Gavenko Organization: Oleksandr Gavenko , http://defun.work/ References: <87mvjgupau.fsf@gavenkoa.example.com> <83lgz083ze.fsf@gnu.org> <87inu3vfty.fsf@gavenkoa.example.com> <83h99n8y9e.fsf@gnu.org> Date: Sun, 11 Sep 2016 14:57:33 +0300 In-Reply-To: <83h99n8y9e.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 10 Sep 2016 20:23:25 +0300") Message-ID: <87r38qtzrm.fsf@gavenkoa.example.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On 2016-09-10, Eli Zaretskii wrote: >> Another solution is to invent own: >>=20 >> (define-category ?p "Phonetic") >>=20 >> and to add it to IPA characters: >>=20 >> (mapc (lambda (ch) (modify-category-entry ch "p")) >> '(?=CA=8C ?=C9=99 ?=C9=9C ?=C9=92 ?=C9=9B ?=CE=B8 ?=CA=8A ?=C9= =AA ?=C9=94 ?=C9=91 ?=CA=83 ?=CA=A7 ?=CB=90 ?=CB=88 ?=CB=8C ?=CA=92 ?=C5=8B= )) >>=20 >> so it becomes possible to use: >>=20 >> (add-to-list 'word-combining-categories '(?p . ?l)) >> (add-to-list 'word-combining-categories '(?l . ?p)) > > That'd be my second best advice. But I think regular expressions > should provide a better and easier solution. This works for me: (defconst my/ipa-chars (list ?=CB=88 ?=CB=8C ?=CB=90 ?=C7=81 ?=CA=B2 ?=CE= =B8 ?=C3=B0 ?=C5=8B ?=C9=A1 ?=CA=92 ?=CA=83 ?=CA=A7 ?=C9=99 ?=C9=9C ?=C9=9B= ?=CA=8C ?=C9=92 ?=C9=94 ?=C9=91 ?=C3=A6 ?=CA=8A ?=C9=AA)) (define-category ?p "Phonetic") (mapc (lambda (ch) (cond ((eq (aref char-script-table ch) 'phonetic) (modify-category-entry ch ?p) (modify-category-entry ch ?l nil t)) ((eq (aref char-script-table ch) 'latin) ; (aref char-script-table= ?=CB=8C) is 'latin but (char-category-set ?=CB=8C) is ".j" (modify-category-entry ch ?l)))) my/ipa-chars) (add-to-list 'word-combining-categories '(?p . ?l)) (add-to-list 'word-combining-categories '(?l . ?p)) But adding and removing categories looks too low level. It is necessary to = use some (define-category ?p "Phonetic") that is not defined in Emacs itself. This looks easier to me: (mapc (lambda (ch) (aset char-script-table ch 'latin) (modify-syntax-entry ch "w")) my/ipa-chars) But ``char-script-table`` derived from Unicode and some code my depends on this database... --=20 http://defun.work/ From unknown Fri Aug 15 15:59:09 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Oleksandr Gavenko Subject: bug#24405: closed (Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.) Message-ID: References: <87mvjgupau.fsf@gavenkoa.example.com> X-Gnu-PR-Message: they-closed 24405 X-Gnu-PR-Package: emacs X-Gnu-PR-Keywords: notabug Reply-To: 24405@debbugs.gnu.org Date: Sun, 29 Sep 2019 04:35:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1569731702-31697-1" This is a multi-part message in MIME format... ------------=_1569731702-31697-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-ca= tegories`` for word boundaries on changing between latin/phonetic scripts. which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 24405@debbugs.gnu.org. --=20 24405: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D24405 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1569731702-31697-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 24405-done) by debbugs.gnu.org; 29 Sep 2019 04:34:04 +0000 Received: from localhost ([127.0.0.1]:51948 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEQuJ-0008Dq-TT for submit@debbugs.gnu.org; Sun, 29 Sep 2019 00:34:04 -0400 Received: from mail-pf1-f182.google.com ([209.85.210.182]:38819) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1iEQuI-0008DH-Hs for 24405-done@debbugs.gnu.org; Sun, 29 Sep 2019 00:34:02 -0400 Received: by mail-pf1-f182.google.com with SMTP id h195so3656363pfe.5 for <24405-done@debbugs.gnu.org>; Sat, 28 Sep 2019 21:34:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=gWlplBEDAC3kSSleDwOQEj7pUea9bw7jY5wqSvoOMGE=; b=NSY/XLBiswhTy6FR5+PQt1HioS+t2mMtlP1s9ubOe8+HSkAZH0KWPqNaaCHoJCOjYg FEMbR8KpAjcAgnz64sh7myJ2WzlGhEBY+mxmnacn/FIFqOa/Qox/CyF5x++CKJZ2XTJZ eMHoD/WVJnACaCpOCO5feYaDMeMnxVWuNtJgi049oI1SwtZMVoOtiDT8M1apy1WDJXT5 EHdwkNWguq79dFxIR06S6y5/T++AESnZF71upTpnsQneEYxDa6r/Ntr7W9w0On7mT+gD VMIedIueXSv25NCr4RUBhxUg82U+PJP7/HHuxHg7468StYLRwRTdx2keP6CVvc6SmExQ SwVA== X-Gm-Message-State: APjAAAXDw0t7aLKPJctD+t1eHyPUBZh6byZDAJp+SdcPsNN0gcyW6CpA 0mfMt2v1v7SRZcnbZsEZgujOaB0VsTO0MhyLVBo= X-Google-Smtp-Source: APXvYqyL3oc9ApKBhc9bSGimQBJM+kiknilC6BPwFK8vBZWPRftX/xG0IdOFm++8v3CuJSry1Q6mUPt7NK9UZxrSGWI= X-Received: by 2002:a65:5802:: with SMTP id g2mr11707599pgr.333.1569731635931; Sat, 28 Sep 2019 21:33:55 -0700 (PDT) MIME-Version: 1.0 From: Stefan Kangas Date: Sun, 29 Sep 2019 06:33:45 +0200 Message-ID: Subject: Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. To: Eli Zaretskii Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: 24405-done Cc: 24405-done@debbugs.gnu.org, Oleksandr Gavenko X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) Eli Zaretskii writes: > tags 24405 + notabug > thanks [...] > This is the intended behavior, yes. The word-combining-categories > feature is designed to support specific rare situations with mixing > the Far Eastern scripts (e.g., use of Kanji characters in Japanese > text), not for arbitrary games with Latin and European scripts. This was already tagged notabug, and I can see nothing more to do here. I'm therefore closing this now. Best regards, Stefan Kangas ------------=_1569731702-31697-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 10 Sep 2016 08:34:11 +0000 Received: from localhost ([127.0.0.1]:55118 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bidjj-0002jt-SS for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:34:11 -0400 Received: from eggs.gnu.org ([208.118.235.92]:45938) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bidjh-0002jO-Df for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:34:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bidjb-0005E2-6F for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:34:00 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:36020) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bidjb-0005Dt-3B for submit@debbugs.gnu.org; Sat, 10 Sep 2016 04:33:59 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36908) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bidjY-0003xF-Vj for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 04:33:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bidjT-0005CT-Pu for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 04:33:55 -0400 Received: from mail-lf0-x236.google.com ([2a00:1450:4010:c07::236]:35988) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bidjT-0005CP-Ig for bug-gnu-emacs@gnu.org; Sat, 10 Sep 2016 04:33:51 -0400 Received: by mail-lf0-x236.google.com with SMTP id g62so58297470lfe.3 for ; Sat, 10 Sep 2016 01:33:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=kfiiHCFb2s2jGKfkofEKs6Iv1g1K6v+loeE8dtLdwgI=; b=yTZ33QH7C9knKnFIoQBoMAKhiQ+dEUAIL2XrMes8TiuCE65kwV7sfldrU7w1eGG8IM ZZc541y7eU4F8xT0OV+3G035qN0uaY1NWEAmPALyDyWUfLinowe5lvKOohV+pT4KdzCN 8E18Wmnnvc4RL4SZmtKaR5iFTcG2nuS9k0eVqMLNzYGgAWleOPhH83M8q9fRp4YumB1P zpQmNQVF+bIQYzOPx2toM83xm/K4KFe5rPKm8dQkCIzDgHxv/rzCHo+B7sI8HD+eA+j4 5mBEbHtvf73ce3OrywUnPIBc18nHOdFvb0nOcwrlED7vA2mupqpT1x/5AxHUX6k0Kx4A bw/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:mime-version :content-transfer-encoding; bh=kfiiHCFb2s2jGKfkofEKs6Iv1g1K6v+loeE8dtLdwgI=; b=RVxuK5Yz5eov1OXDxtxmoEl0WIbX9lySdUrb2WYei5BHyGtvSvaeo3mURhTPrCuu9Q LjSkcoQ/gHgdQdV4tHg0vRN5KA9qji191MyAJynTFIro1Q42XLNoOKa9xRyqfSW/c5K7 ANThy8CqqHGDN47urd4o2ZBXc0k5HUZdUDCyjlE/SUHkFoMklXQ3q2/qXC4tJP3JeUWw p1EICoShgByGi2msFSB7VDk2Ny8CYn6sw5FABUXJbL+LkdrNo3WAw/z6e8hdA5VjGMVy uh28bjD1yed4Vd1Suo+x9benJzs1+gE2KX/An8Qxh24/K41TB9xnrvVrdwJGwrmEAtXn khNg== X-Gm-Message-State: AE9vXwMpT/qy5EFKZpBmjV/eGxN9BjhyViuCh0Aw7PNrvLqSGOLyY5CQS6deauuhwm4zwQ== X-Received: by 10.46.32.227 with SMTP id g96mr415569lji.30.1473496429693; Sat, 10 Sep 2016 01:33:49 -0700 (PDT) Received: from desktop ([46.185.21.165]) by smtp.gmail.com with ESMTPSA id b71sm1301099lfb.42.2016.09.10.01.33.48 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 10 Sep 2016 01:33:48 -0700 (PDT) From: Oleksandr Gavenko To: bug-gnu-emacs@gnu.org Subject: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts. Date: Sat, 10 Sep 2016 11:33:45 +0300 Message-ID: <87mvjgupau.fsf@gavenkoa.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Evaluate following form by C-x C-e: (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l))) (word-separating-categories nil)) (forward-word)) Hello=D0=9F=D1=80=D0=B8=D0=B2LL=D0=B6=C9=AA=C9=99=CA=8Ahel=C9=99=CA=8Aai= =C9=AAa My pointer stopped between =CA=8Ah. I have: (aref char-script-table ?=CA=8A) phonetic (aref char-script-table ?h) latin (aref char-script-table ?=D0=B6) cyrillic (category-set-mnemonics (char-category-set ?=CA=8A)) ".Ljl" (category-set-mnemonics (char-category-set ?h)) ".Lalr" (category-docstring ?y) "Cyrillic" (category-docstring ?l) "Latin" I expect that point moved to last character before new line. Seems that: (?l . ?y) (?y . ?l) has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic scripts but refused to move through Latin/Phonetic scripts. If it is intended behavior how will I make Emacs to move across Latin/Phone= tic scripts? See also: http://emacs.stackexchange.com/questions/21131/does-word-syntax-take-scri= pt-into-account In GNU Emacs 24.5.1 (x86_64-pc-linux-gnu, GTK+ Version 3.18.6) of 2016-01-22 on binet, modified by Debian Windowing system distributor `The X.Org Foundation', version 11.0.11803000 System Description: Debian GNU/Linux testing (stretch) --=20 http://defun.work/ ------------=_1569731702-31697-1--