From unknown Mon Aug 18 21:39:47 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#20140 <20140@debbugs.gnu.org> To: bug#20140 <20140@debbugs.gnu.org> Subject: Status: 24.4; M17n shaper output rejected Reply-To: bug#20140 <20140@debbugs.gnu.org> Date: Tue, 19 Aug 2025 04:39:47 +0000 retitle 20140 24.4; M17n shaper output rejected reassign 20140 emacs submitter 20140 Richard Wordingham severity 20140 normal tag 20140 moreinfo thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 18 18:20:55 2015 Received: (at submit) by debbugs.gnu.org; 18 Mar 2015 22:20:55 +0000 Received: from localhost ([127.0.0.1]:52055 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYMKc-0006e5-TD for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:55 -0400 Received: from eggs.gnu.org ([208.118.235.92]:41301) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYMKa-0006dw-5d for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYMKY-00046U-FW for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:51 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50987) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKY-00046Q-Cx for submit@debbugs.gnu.org; Wed, 18 Mar 2015 18:20:50 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43022) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKW-00070s-Ok for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:20:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYMKT-000467-BB for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:20:48 -0400 Received: from know-smtprelay-omc-10.server.virginmedia.net ([80.0.253.74]:46876) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYMKS-00045w-TU for bug-gnu-emacs@gnu.org; Wed, 18 Mar 2015 18:20:45 -0400 Received: from JRWUBU2 ([81.103.224.4]) by know-smtprelay-10-imp with bizsmtp id 5ALj1q02L06JmVd01ALjNr; Wed, 18 Mar 2015 22:20:43 +0000 X-Originating-IP: [81.103.224.4] X-Spam: 0 X-Authority: v=2.1 cv=dY0O3Bne c=1 sm=1 tr=0 a=pLuj3OkTrmEUIJBpyvkqVg==:117 a=pLuj3OkTrmEUIJBpyvkqVg==:17 a=IkcTkHD0fZMA:10 a=NLZqzBF-AAAA:8 a=mDV3o1hIAAAA:8 a=ibnQV_NrJ8uHORisgRgA:9 a=QEXdDO2ut3YA:10 Date: Wed, 18 Mar 2015 22:20:40 +0000 From: Richard Wordingham To: bug-gnu-emacs@gnu.org Subject: 24.4; M17n shaper output rejected Message-ID: <20150318222040.4066e6e9@JRWUBU2> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.3 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.3 (----) I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin installation, for which the version of libm17n-0 is 1.6.3-1. I am attempting to induce Emacs to render the Tai Tham script. There appears to be a bug/feature in Emacs which makes this unnecessarily difficult. To achieve Tai Tham rendering, I added the following in new, loaded file tai-tham.el: (defvar tai-tham-composable-pattern (let ((table ;; C is letters, independent vowels, digits, punctuation and symbols. '(("C" . "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" . "[\u1A55-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with sakot ("H" . "\u1A60") ; sakot ("N" . "\u1A58"))) ; mai kang lai - also included in M. ;; Which orthographic syllable mai kang lai belongs to can depend on the font! (regexp "C\\(M\\|HS*C?\\)*\\(NC\\(M\\|HS*C?\\)*\\)*N?")) (let ((case-fold-search nil)) (dolist (elt table) (setq regexp (replace-regexp-in-string (car elt) (cdr elt) regexp t t)))) regexp)) (let ((elt (list (vector tai-tham-composable-pattern 0 'font-shape-gstring) (vector "." 0 'font-shape-gstring) ))) (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt)) I added the following (cut-down) file LANA-OFT.flt to the m17n database: (font layouter lana-otf nil (font (nil nil unicode-bmp :otf=3Dlana))) (category ;; H: SAKOT ;; N: Other character with non-zero canonical combining class ;; Z: Character with ccc=3D0 or other with ccc=3D9=20 (0x0000 0x1A5F ?Z)bug-gnu-emacs@gnu.org (0x1A60 ?H) (0x1A61 0x1A74 ?Z) (0x1A75 0x1A7C ?N) (0x1A7D 0xFFFF ?Z) ) (generator (0 (cond ("(H)(N+)" (2 =3D *) (1 =3D)) ("." =3D) ) * ) ) (category ;; C: Consonant and non-mark (lenient processing) ;; H: SAKOT ;; P: Preposed vowelbug-gnu-emacs@gnu.org ;; R: Medial RA (preposed dependent consonant) ;; M: Mark (0x1A20 0x1A54 ?C) (0x1A55 0x1A55 ?R) (0x1A56 0x1A5E ?M) (0x1A5F ?C) ; Unassigned (0x1A60 ?H) (0x1A61 0x1A6D ?M) (0x1A6E 0x1A72 ?P) (0x1A73 0x1A7C ?M) (0x1A7D 0x1A7E ?C) ; Unassigned (0x1A7F ?M) (0x1A80 0x1A89 ?C) (0x1A8A 0x1A8F ?C) ; Unassigned (0x1A90 0x1A99 ?C) (0x1A9A 0x1A9F ?C) ; Unassigned (0x1AA0 0x1AAC ?C) ; Punctuation (0x1AAD ?C) ; Can take a vowel! (0x1AAE 0x1AAF ?C) ; Unassigned ) (generator (0 (cond ("(C)(R|P)" (2 =3D) (1 =3D) ) ("." =3D) )* ) ) (generator (0 otf:lana)) However, much Tai Tham text failed to render properly. To determine what was wrong, I added some monitoring code to ftfont.c: *** ftfont.c.orig 2014-03-21 05:34:40.000000000 +0000 --- ftfont.c 2015-03-18 19:47:30.032718995 +0000 *************** *** 2516,2522 **** --- 2516,2553 ---- flt =3D mflt_get (msymbol ("combining")); for (i =3D 0; i < 3; i++) { + int k; + fprintf(stdout, "mflt_run("); + if (gstring.glyphs[0].encoded) { + for (k =3D 0; k < len; k++) { + fprintf(stdout, " %d", gstring.glyphs[k].code); + } + } else { + for (k =3D 0; k < len; k++) { + fprintf(stdout, " %4.4X", gstring.glyphs[k].c); + } + } int result =3D mflt_run (&gstring, 0, len, &flt_font_ft.flt_font, flt); + if (-1 =3D=3D result) { + fprintf(stdout, ") failed.\n"); + } else if (result >=3D 0) { + fprintf(stdout, ") produced ("); + for (k =3D 0; k < result; k++) { + #if 0 + fprintf(stdout, " %d", gstring.glyphs[k].code); + #else + fprintf(stdout, " %4.4X>%d:%d:%d", + gstring.glyphs[k].c, gstring.glyphs[k].code, + gstring.glyphs[k].from, gstring.glyphs[k].to); + #endif + } + fprintf(stdout, ")\n"); + if (result !=3D gstring.used) { + fprintf(stdout, "Anomalously, gstring.used =3D %d\n", + (int) gstring.used); + } + fflush(0); + } if (result !=3D -2) break; if (INT_MAX / 2 < gstring.allocated) The sample Tai Tham text was: ;; =E1=A9=88=E1=A9=A3=E1=A9=B4=E1=A9=81=E1=A9=A2=E1=A9=A0=E1=A8=B7=E1=A8=BD= =E1=A9=A3=E1=A9=88=E1=A9=A3=E1=A9=83=E1=A9=B6=E1=A9=A3=E1=A9=A0=E1=A8=B6=E1= =A8=B6=E1=A9=A3 / =E1=A8=A3=E1=A9=A3=E1=A9=B4=E1=A8=BE=E1=A9=AE=E1=A9=AC=E1= =A9=A5=E1=A8=A6 - =E1=A9=88=E1=A9=A2=E1=A8=AC=E1=A9=A0=E1=A8=AC=E1=A9=A3 = =E1=A8=A0=E1=A9=A0=E1=A9=B5=E1=A8=B7 =E1=A9=83=E1=A9=A0=E1=A9=B6=E1=A8=AF = =E1=A8=AE=E1=A9=A0 =E1=A8=B3=E1=A9=AB=E1=A9=A0=E1=A9=B5=E1=A8=B6 =E1=A8=A0=E1=A9=A2=E1=A9=A0=E1=A9=B5=E1=A8=B7=E1=A8=A0=E1=A9=AB=E1=A9=B6=E1= =A9=A0=E1=A8=AF=E1=A8=BF=E1=A9=A5=E1=A9=A0=E1=A8=B7=E1=A8=B6=E1=A9=A6=E1=A9= =B5=E1=A9=A0=E1=A8=B7 ;; =E1=A8=A3=E1=A9=95 =E1=A8=B2=E1=A9=B1 I extract and analyse what was rendered as shaped ('accepted') and what was not ('rejected'), quoting the monitoring output. I suspect the problem is the strict testing of the from and to fields in Lisp function font-shape-gstring, which is defined in file font.c. The shaping of the following was accepted: mflt_run( 1A48 1A63 1A74) produced ( 1A48>820:0:0 1A63>858:1:1 1A74>878:2:2) mflt_run( 1A41 1A62 1A60 1A37) produced ( 1A41>813:0:1 1A62>853:0:1 0000>953:2:3) mflt_run( 1A3D 1A63) produced ( 1A3D>808:0:0 1A63>858:1:1) mflt_run( 1A48 1A63) produced ( 1A48>820:0:0 1A63>858:1:1) mflt_run( 1A43 1A76 1A63 1A60 1A36) produced ( 1A43>815:0:1 1A76>890:0:1 1A63>858:2:4 0000>952:2:4)=20 mflt_run( 1A36 1A63) produced ( 1A36>800:0:0 1A63>858:1:1) mflt_run( 1A23 1A63 1A74) produced ( 1A23>777:0:0 0000>859:1:2) mflt_run( 1A26) produced ( 1A26>780:0:0) mflt_run( 1A48 1A62) produced ( 1A48>820:0:1 1A62>853:0:1) mflt_run( 1A2C 1A60 1A2C 1A63) produced ( 0000>789:0:2 1A63>858:3:3) mflt_run( 1A43 1A60 1A76 1A2F) produced ( 1A43>815:0:3 1A76>890:0:3 0000>941:0:3)=20 mflt_run( 1A2E 1A60) produced ( 1A2E>792:0:1 1A60>851:0:1) mflt_run( 1A33 1A6B 1A60 1A75 1A36) produced ( 1A33>797:0:4 1A6B>868:0:4 1A75>889:0:4 0000>952:0:4)=20 mflt_run( 1A20 1A6B 1A76 1A60 1A2F) produced ( 1A20>774:0:4 1A6B>868:0:4 1A76>890:0:4 0000>941:0:4) mflt_run( 1A3F 1A65 1A60 1A37) produced ( 1A3F>811:0:1 1A65>862:0:1 0000>953:2:3) The shaping of the following, with vowels or MEDIAL RA that should be rendered before the consonant, was rejected: mflt_run( 1A3E 1A6E 1A6C 1A65) produced ( 1A6E>872:1:1 1A3E>810:0:3 1A6C>869:0:3 1A65>862:0:3)=20 mflt_run( 1A23 1A55) produced ( 1A55>835:1:1 1A23>777:0:0) mflt_run( 1A32 1A71) produced ( 1A71>875:1:1 1A32>796:0:0) The problem is that the first glyph does not derive from the first character. The shaping of the following was rejected: mflt_run( 1A20 1A60 1A75 1A37) produced ( 1A20>774:0:2 1A75>889:0:2 0000>953:1:3) In this case, character 2 is stacked below character 0, and characters 1 and 3 combine to form a spacing glyph. mflt_run( 1A20 1A62 1A60 1A75 1A37) produced ( 1A20>774:0:1 1A62>853:0:3 1A75>889:0:3 0000>953:2:4) Character 1 is mounted on character 0, and character 3 on character 1. Characters 2 and 4 combine to form a spacing glyph. =20 mflt_run( 1A36 1A66 1A75 1A60 1A37) produced ( 1A36>800:0:1 1A66>863:0:2 1A75>889:0:2 0000>953:3:4) Character 1 is mounted on character 0. and character 2 on character 1. Characters 3 and 4 form a spacing glyph. There does appear to be a work around, which is to have m17n declare the orthographic syllables it receives to be 'grapheme clusters'. It solves at least some of the problems above. However, it then makes editing of the 'clusters' more difficult. Note that there are examples above with 5 characters in a cluster, and this is by no means the limit. Richard. From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 18 23:43:45 2015 Received: (at 20140) by debbugs.gnu.org; 19 Mar 2015 03:43:45 +0000 Received: from localhost ([127.0.0.1]:52160 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYRN3-00061I-D3 for submit@debbugs.gnu.org; Wed, 18 Mar 2015 23:43:45 -0400 Received: from mtaout22.012.net.il ([80.179.55.172]:42630) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YYRN0-000619-OS for 20140@debbugs.gnu.org; Wed, 18 Mar 2015 23:43:44 -0400 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0NLF00J00W6LZ500@a-mtaout22.012.net.il> for 20140@debbugs.gnu.org; Thu, 19 Mar 2015 05:43:40 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NLF00J3YX0SZ430@a-mtaout22.012.net.il>; Thu, 19 Mar 2015 05:43:40 +0200 (IST) Date: Thu, 19 Mar 2015 05:43:34 +0200 From: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected In-reply-to: <20150318222040.4066e6e9@JRWUBU2> X-012-Sender: halo1@inter.net.il To: Richard Wordingham , Kenichi Handa Message-id: <83oanpwt21.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > Date: Wed, 18 Mar 2015 22:20:40 +0000 > From: Richard Wordingham > > I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin > installation, for which the version of libm17n-0 is 1.6.3-1. I am > attempting to induce Emacs to render the Tai Tham script. There > appears to be a bug/feature in Emacs which makes this unnecessarily > difficult. Thanks for the report. I hope Handa-san (CC'ed) could look into it. From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 21 04:33:34 2015 Received: (at 20140) by debbugs.gnu.org; 21 Mar 2015 08:33:35 +0000 Received: from localhost ([127.0.0.1]:59559 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZEqc-0005l2-8l for submit@debbugs.gnu.org; Sat, 21 Mar 2015 04:33:34 -0400 Received: from fencepost.gnu.org ([208.118.235.10]:55573 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZEqZ-0005kt-Kt for 20140@debbugs.gnu.org; Sat, 21 Mar 2015 04:33:32 -0400 Received: from fl1-122-134-88-3.iba.mesh.ad.jp ([122.134.88.3]:59731 helo=tinhau) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1YZEqY-0005Gj-Tx for 20140@debbugs.gnu.org; Sat, 21 Mar 2015 04:33:31 -0400 Received: from handa by tinhau with local (Exim 4.80) (envelope-from ) id 1YZEqL-0007d2-Ea; Sat, 21 Mar 2015 17:33:17 +0900 From: handa@gnu.org (K. Handa) To: Richard Wordingham Subject: Re: bug#20140: 24.4; M17n shaper output rejected In-Reply-To: <20150318222040.4066e6e9@JRWUBU2> (message from Richard Wordingham on Wed, 18 Mar 2015 22:20:40 +0000) Date: Sat, 21 Mar 2015 17:33:17 +0900 Message-ID: <87pp8292cy.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) In article <20150318222040.4066e6e9@JRWUBU2>, Richard Wordingham writes: [...] > I extract and analyse what was rendered as shaped ('accepted') and what > was not ('rejected'), quoting the monitoring output. I suspect the > problem is the strict testing of the from and to fields in Lisp function > font-shape-gstring, which is defined in file font.c. [...] > The shaping of the following, with vowels or MEDIAL RA that should be > rendered before the consonant, was rejected: > mflt_run( 1A3E 1A6E 1A6C 1A65) produced ( 1A6E>872:1:1 1A3E>810:0:3 1A6C>869:0:3 1A65>862:0:3)=20 If U+1A6E is displayed before U+1A3E, and they are in different grapheme cluster, when you move point forward one step by one, the cursor must move back and forth as below (cursor is indicated by dashes): display: SPC 1A6E 1A3E+1A6C+1A65 SPC step 1: ---=20=20=20=20 step 2: -------------- step 3: ---- step 4: --- Is that what you want? At least, the support for all Indic scripts (they have characters in logical order as your example of Tai Tham text) treats re-ordered glyphs as one grapheme cluster. That is not only Emacs but also gtk (pango) applications. Please try to move cursor over this Devanagri text "=E0=A4=B9=E0=A4=BF=E0= =A4=82=E0=A4=A6=E0=A5=80" on Emacs, gedit, and, for instance, firefox. They all treat that text as 2 grapheme clusters "=E0=A4=B9=E0=A4=BF=E0=A4=82" and "=E0=A4= =A6=E0=A5=80". The first one corresponds to character the sequence U+935 U+93F, and U+93F (vowel I) is displayed before U+935 (base cosonant). [...] > There does appear to be a work around, which is to have m17n declare > the orthographic syllables it receives to be 'grapheme clusters'. I think that's the right solution; i.e. make all combined and out-of-ordered glyphs as one cluster. > It solves at least some of the problems above. Which one is not solved by it? > However, it then makes editing of the 'clusters' more > difficult. Note that there are examples above with 5 > characters in a cluster, and this is by no means the > limit. But, it seems that the current behavior is accepted, at least, by Indic people. By the way, I long ago proposed these commands which enables you to move point into a grapheme cluster (by suppressing composing of a cluster temporarily). It worked in old Emacs (I don't remember the version), but not in the latest Emacs. (defun forward-char-intrusive () (interactive) (setq disable-point-adjustment t) (forward-char 1)) (defun backward-char-intrusive () (interactive) (setq disable-point-adjustment t) (forward-char -1)) (global-set-key (kbd "C-S-f") 'forward-char-intrusive) (global-set-key (kbd "C-S-b") 'backward-char-intrusive) --- K. Handa handa@gnu.org From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 21 13:25:14 2015 Received: (at 20140) by debbugs.gnu.org; 21 Mar 2015 17:25:14 +0000 Received: from localhost ([127.0.0.1]:60006 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZN98-0003Yt-Ii for submit@debbugs.gnu.org; Sat, 21 Mar 2015 13:25:14 -0400 Received: from b2bfep15.mx.upcmail.net ([62.179.121.60]:40950) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZN95-0003Ye-Tl for 20140@debbugs.gnu.org; Sat, 21 Mar 2015 13:25:12 -0400 Received: from edge12.upcmail.net ([192.168.13.82]) by b2bfep15.mx.upcmail.net (InterMail vM.8.01.05.05 201-2260-151-110-20120111) with ESMTP id <20150321172505.GDMY14231.b2bfep15-int.chello.at@edge12.upcmail.net> for <20140@debbugs.gnu.org>; Sat, 21 Mar 2015 18:25:05 +0100 Received: from iznogoud.viz ([91.119.226.187]) by edge12.upcmail.net with edge id 6HR41q00D43DRrP0CHR46Q; Sat, 21 Mar 2015 18:25:05 +0100 X-SourceIP: 91.119.226.187 Received: from wolfgang by iznogoud.viz with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1YZN8x-0000Q4-Sn; Sat, 21 Mar 2015 18:25:03 +0100 From: Wolfgang Jenkner To: handa@gnu.org (K. Handa) Subject: Re: bug#20140: 24.4; M17n shaper output rejected Date: Sat, 21 Mar 2015 18:20:26 +0100 References: <20150318222040.4066e6e9@JRWUBU2> <87pp8292cy.fsf@gnu.org> Message-ID: <85zj76qn4g.fsf@iznogoud.viz> User-Agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/25.0.50 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, Richard Wordingham X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Sat, Mar 21 2015, K. Handa wrote: > By the way, I long ago proposed these commands which enables > you to move point into a grapheme cluster (by suppressing > composing of a cluster temporarily). It worked in old Emacs (I > don't remember the version), but not in the latest Emacs. > > (defun forward-char-intrusive () > (interactive) > (setq disable-point-adjustment t) > (forward-char 1)) It actually works in trunk emacs, I think. If we start with point at the beginning of the word (I use the itrans transcription for clarity) -!-hiMdI then calling the function once /appears/ to leave point here hiM-!-dI but C-x = shows that it is really here h-!-iMdI And so on. From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 21 13:58:34 2015 Received: (at 20140) by debbugs.gnu.org; 21 Mar 2015 17:58:34 +0000 Received: from localhost ([127.0.0.1]:60020 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZNfN-0004KQ-Kb for submit@debbugs.gnu.org; Sat, 21 Mar 2015 13:58:34 -0400 Received: from know-smtprelay-omc-9.server.virginmedia.net ([80.0.253.73]:44830) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZNfK-0004K9-Hq for 20140@debbugs.gnu.org; Sat, 21 Mar 2015 13:58:31 -0400 Received: from JRWUBU2 ([81.103.224.4]) by know-smtprelay-9-imp with bizsmtp id 6HyP1q02R06JmVd01HyQjP; Sat, 21 Mar 2015 17:58:24 +0000 X-Originating-IP: [81.103.224.4] X-Spam: 0 X-Authority: v=2.1 cv=dJgomYpb c=1 sm=1 tr=0 a=pLuj3OkTrmEUIJBpyvkqVg==:117 a=pLuj3OkTrmEUIJBpyvkqVg==:17 a=IkcTkHD0fZMA:10 a=NLZqzBF-AAAA:8 a=mDV3o1hIAAAA:8 a=ct7dnu-KoUftn3s5otsA:9 a=UsawN17YW7ie1F18:21 a=WAKGHCdlqEPa-ZRm:21 a=QEXdDO2ut3YA:10 Date: Sat, 21 Mar 2015 17:58:18 +0000 From: Richard Wordingham To: handa@gnu.org (K. Handa) Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20150321175818.1b125eba@JRWUBU2> In-Reply-To: <87pp8292cy.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87pp8292cy.fsf@gnu.org> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On Sat, 21 Mar 2015 17:33:17 +0900 handa@gnu.org (K. Handa) wrote: > In article <20150318222040.4066e6e9@JRWUBU2>, Richard Wordingham > writes: [...] > > I extract and analyse what was rendered as shaped ('accepted') and > > what was not ('rejected'), quoting the monitoring output. I > > suspect the problem is the strict testing of the from and to fields > > in Lisp function font-shape-gstring, which is defined in file > > font.c. > [...] > > The shaping of the following, with vowels or MEDIAL RA that should > > be rendered before the consonant, was rejected: >=20 > > mflt_run( 1A3E 1A6E 1A6C 1A65) produced ( 1A6E>872:1:1 1A3E>810:0:3 > 1A6C>869:0:3 1A65>862:0:3)=20 >=20 > If U+1A6E is displayed before U+1A3E, and they are in > different grapheme cluster, when you move point forward one > step by one, the cursor must move back and forth as below > (cursor is indicated by dashes): >=20 > display: SPC 1A6E 1A3E+1A6C+1A65 SPC > step 1: --- =20 > step 2: -------------- > step 3: ---- > step 4: --- >=20 > Is that what you want? It gives me more control for editing in Emacs. Another implementation could choose to move in visual order. The policing function could choose to merge the 'out of order' clusters - that is what new HarfBuzz does, though I think that should only be done if the client requests it. What I ought to want is SIL's split cursor scheme, which indicated the next ('point') and previous characters, even in bidirectional text. Unfortunately, that's not compatible with m17n, which seems to assume that cursor position will be a single number. The Emacs functions forward-char-intrusive and backward-char-intrusive provided a pleasant, more intuitive, alternative, and I am sad to hear they are gone. Perhaps I'll have to start using toggle-auto-composition. The one consolation in Emacs is that delete-forward-char deletes a single character, rather than a whole cluster. That greatly reduces the disadvantage of having clusters. Also, search still works by characters rather than by clusters. If I want to search for a character in LibreOffice, I have to go into the special regular expression find and replace menu. That is unpleasant. > At least, the support for all Indic scripts (they have > characters in logical order as your example of Tai Tham > text) treats re-ordered glyphs as one grapheme cluster. > That is not only Emacs but also gtk (pango) applications. That's a nasty fault with HarfBuzz. > Please try to move cursor over this Devanagri text "=E0=A4=B9=E0=A4=BF=E0= =A4=82=E0=A4=A6=E0=A5=80" on > Emacs, gedit, and, for instance, firefox. They all treat > that text as 2 grapheme clusters "=E0=A4=B9=E0=A4=BF=E0=A4=82" and "=E0= =A4=A6=E0=A5=80". The first > one corresponds to character the sequence U+935 U+93F, and > U+93F (vowel I) is displayed before U+935 (base cosonant). Note that those clusters are only 3 and 2 characters long. Retyping them is tolerable. Now consider the Sanskrit Devanagari text =E0=A4=B8=E0= =A5=8D=E0=A4=A4=E0=A5=8D=E0=A4=B0=E0=A5=80, which contains two consonant-combining viramas. Emacs moves across it in 1 step, but Claws e-mail (GTK-based, I believe) and LibreOffice (HarfBuzz-based, at least for linux) both take 3 steps to move across it. Claws and LibreOffice use different algorithms to position the cursor. That of LibreOffice seems more reasonable, but that of Claws works better! The reason is that Unicode did not declare virama as forming grapheme clusters. > [...] >=20 > > There does appear to be a work around, which is to have m17n declare > > the orthographic syllables it receives to be 'grapheme clusters'. >=20 > I think that's the right solution; i.e. make all combined > and out-of-ordered glyphs as one cluster. >=20 > > It solves at least some of the problems above. >=20 > Which one is not solved by it? It seems to have solved all of them. When I reported the bug, I was having problems with my font because libotf was silently ignoring half the lookups in my font. I though I might have problems with U+1A58 TAI THAM SIGN MAI KANG LAI, which in Lao visually groups (usually) with the following base consonant and in Tai Khuen groups with the preceding base consonant. My clustering in Emacs follows the Tai Khuen scheme. (I compose two orthographic clusters together in Emacs, but declare two grapheme clusters in the FLT processing.) However, my font follows a major Northern Thai dictionary and places it on the following base consonant if there is nothing above it, but otherwise places it on the preceding base consonant. However, my implementation is too dirty to cause problems - the second cluster is not reported as deriving from the mai kang lai character. I wonder, though, what will happen if I manage to implement the Universal Shaping Engine's (USE) rphf feature. The author of a Lao-style Tai Tham font wanted this feature in HarfBuzz. The desired effect seems easy to achieve in m17n-flt, but placing it under font control is more difficult. I'm studying MLM2-OTF.flt to see how to do it. > > However, it then makes editing of the 'clusters' more > > difficult. Note that there are examples above with 5 > > characters in a cluster, and this is by no means the > > limit. >=20 > But, it seems that the current behavior is accepted, at > least, by Indic people. Who do you mean by 'Indic people'? I can see at least three groups: 1) Indian speakers of Indic languages who use Indic scripts, thus including users of Hindi, Gujarati and Bengali. See my comments above. 2) Indian users of Indic scripts, thus also including speakers of Malayalam and Tamil. In Tamil, a phonetically CVCCV word will normally naturally split into clusters as CV.C+virama.CV. I must admit I am surprised that they have accepted CV.CCV - or do Tamils not use Emacs for Tamil? Tamils are notorious for regarding their writing system as a syllabary rather than as an abugida. I haven't studied the Malayalam script - that does seem a fairly complicated Indian script, as one might expect when Dravidians use a script tailored to Middle Indic and stretched to cover Old Indic. 3) Users of Indic scripts, thus also including the Burmese, Thai, Cambodians and Lao as well as the users of the Tai Tham script. Rebellion is rampant. The original Unicode encoding of Thai followed the phonetic order (allegedly - it was probably the collation order instead). This was rapidly thrown out as incompatible with the current, working encoding. Unicode responded with the derogatory property of 'logical order exception'. Around Unicode 5.1, the preposed vowels of Thai and Lao were suddenly included in grapheme clusters with the base consonant. As the consequences started to appear in applications, there were howls of rage from Thais, and the characters were restored to their original status as fully independent characters. It doesn't seem so long ago that the Cambodian government imposed Unicode on Cambodia. You'd have thought that access to applications would have made Unicode the obvious choice. New Tai Lue is an interesting case. Microsoft delayed support for this simple Indic script for so long that most apparently Unicode-encoded New Tai Lue text was actually encoded in visual order. With Unicode 8.0, New Tai Lue is changing from phonetic order to visual order, and it will no longer need any clusters at all! Emacs 23.3 (which is what is in long-term support Ubuntu 12.04) offers no support for New Tai Lue, so I am not sure that there is yet a New Tai Lue view on composition in Emacs. Richard. From debbugs-submit-bounces@debbugs.gnu.org Sat Mar 21 14:26:41 2015 Received: (at 20140) by debbugs.gnu.org; 21 Mar 2015 18:26:42 +0000 Received: from localhost ([127.0.0.1]:60024 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZO6b-00051d-Io for submit@debbugs.gnu.org; Sat, 21 Mar 2015 14:26:41 -0400 Received: from mtaout20.012.net.il ([80.179.55.166]:60788) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YZO6X-00051N-Ua for 20140@debbugs.gnu.org; Sat, 21 Mar 2015 14:26:39 -0400 Received: from conversion-daemon.a-mtaout20.012.net.il by a-mtaout20.012.net.il (HyperSendmail v2007.08) id <0NLK00900QZOHE00@a-mtaout20.012.net.il> for 20140@debbugs.gnu.org; Sat, 21 Mar 2015 20:26:31 +0200 (IST) Received: from HOME-C4E4A596F7 ([87.69.4.28]) by a-mtaout20.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0NLK009R9R861Q70@a-mtaout20.012.net.il>; Sat, 21 Mar 2015 20:26:31 +0200 (IST) Date: Sat, 21 Mar 2015 20:26:20 +0200 From: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected In-reply-to: <20150321175818.1b125eba@JRWUBU2> X-012-Sender: halo1@inter.net.il To: Richard Wordingham Message-id: <83384ytdf7.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87pp8292cy.fsf@gnu.org> <20150321175818.1b125eba@JRWUBU2> X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, handa@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Eli Zaretskii List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) > Date: Sat, 21 Mar 2015 17:58:18 +0000 > From: Richard Wordingham > Cc: 20140@debbugs.gnu.org > > Another implementation could choose to move in visual order. Emacs 24.4 does have visual-order cursor movement. Customize the variable visual-order-cursor-movement to get that. From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 25 10:26:00 2015 Received: (at 20140) by debbugs.gnu.org; 25 Mar 2015 14:26:00 +0000 Received: from localhost ([127.0.0.1]:36486 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YamFr-0008K8-8R for submit@debbugs.gnu.org; Wed, 25 Mar 2015 10:25:59 -0400 Received: from fencepost.gnu.org ([208.118.235.10]:54581 ident=Debian-exim) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YamFo-0008Jy-1J for 20140@debbugs.gnu.org; Wed, 25 Mar 2015 10:25:57 -0400 Received: from fl1-122-134-88-3.iba.mesh.ad.jp ([122.134.88.3]:56196 helo=tinhau) by fencepost.gnu.org with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1YamFn-0007fb-8G; Wed, 25 Mar 2015 10:25:55 -0400 Received: from handa by tinhau with local (Exim 4.80) (envelope-from ) id 1YamFm-0004U5-Jj; Wed, 25 Mar 2015 23:25:54 +0900 From: handa@gnu.org (K. Handa) To: Richard Wordingham Subject: Re: bug#20140: 24.4; M17n shaper output rejected In-Reply-To: <20150321175818.1b125eba@JRWUBU2> (message from Richard Wordingham on Sat, 21 Mar 2015 17:58:18 +0000) Date: Wed, 25 Mar 2015 23:25:54 +0900 Message-ID: <87mw31887h.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hi, thank you for the detailed explanation. In article <20150321175818.1b125eba@JRWUBU2>, Richard Wordingham writes: > What I ought to want is SIL's split cursor scheme, which indicated the > next ('point') and previous characters, even in bidirectional text. > Unfortunately, that's not compatible with m17n, which seems to assume > that cursor position will be a single number. The Emacs functions > forward-char-intrusive and backward-char-intrusive provided a pleasant, > more intuitive, alternative, and I am sad to hear they are gone. > Perhaps I'll have to start using toggle-auto-composition. Those Emacs functions are just my idea for improving Emacs for CTL users, and have never been included in the official Emacs verison. I check the code and found two problems: (1) When the command sets disable-point-adjustment to t, command_loop_1 should force updating the display if point is within a grapheme cluster. So we need this patch: diff --git a/src/keyboard.c b/src/keyboard.c index bf65df1..13125c1 100644 --- a/src/keyboard.c +++ b/src/keyboard.c @@ -1636,6 +1636,16 @@ command_loop_1 (void) adjust_point_for_property (last_point_position, MODIFF !=3D prev_modiff); } + else if (current_buffer =3D=3D prev_buffer + && last_point_position !=3D PT) + { + if (PT > BEGV && PT < ZV + && (composition_adjust_point (last_point_position, PT) !=3D PT)) + /* Now point is within a grapheme cluster. We must update + the display so that this cluster is discomosed on the + screen and the cursor is correctly placed at point. */ + windows_or_buffers_changed =3D 22; + } =20 /* Install chars successfully executed in kbd macro. */ =20 (2) We should break a grapheme cluster at point. So we need this patch. diff --git a/src/xdisp.c b/src/xdisp.c index a17f5a9..0c56395 100644 --- a/src/xdisp.c +++ b/src/xdisp.c @@ -3408,6 +3408,9 @@ compute_stop_pos (struct it *it) pos =3D next_overlay_change (charpos); if (pos < it->stop_charpos) it->stop_charpos =3D pos; + /* If point is in front of the current stop pos, stop there. */ + if (charpos < PT && PT < it->stop_charpos) + it->stop_charpos =3D PT; =20 /* Set up variables for computing the stop position from text property changes. */ @@ -8166,7 +8169,12 @@ next_element_from_buffer (struct it *it) && IT_CHARPOS (*it) >=3D it->redisplay_end_trigger_charpos) run_redisplay_end_trigger_hook (it); =20 - stop =3D it->bidi_it.scan_dir < 0 ? -1 : it->end_charpos; + /* Set stop position considering the bidi direction and point. */ + if (it->bidi_it.scan_dir < 0) + stop =3D (PT < IT_CHARPOS (*it)) ? PT : -1; + else + stop =3D ((IT_CHARPOS (*it) < PT && PT < it->end_charpos) + ? PT : it->end_charpos); if (CHAR_COMPOSED_P (it, IT_CHARPOS (*it), IT_BYTEPOS (*it), stop) && next_element_from_composition (it)) Could you try these patches and test the usability of forward-char-intrusive and backward-char-intrusive? > > Please try to move cursor over this Devanagri text "=E0=A4=B9=E0=A4=BF= =E0=A4=82=E0=A4=A6=E0=A5=80" on > > Emacs, gedit, and, for instance, firefox. They all treat > > that text as 2 grapheme clusters "=E0=A4=B9=E0=A4=BF=E0=A4=82" and "=E0= =A4=A6=E0=A5=80". The first > > one corresponds to character the sequence U+935 U+93F, and > > U+93F (vowel I) is displayed before U+935 (base cosonant). > Note that those clusters are only 3 and 2 characters long. Retyping > them is tolerable. Now consider the Sanskrit Devanagari text =E0=A4=B8= =E0=A5=8D=E0=A4=A4=E0=A5=8D=E0=A4=B0=E0=A5=80, > which contains two consonant-combining viramas. Emacs moves across it > in 1 step, but Claws e-mail (GTK-based, I believe) and LibreOffice > (HarfBuzz-based, at least for linux) both take 3 steps to move across > it. Claws and LibreOffice use different algorithms to position the > cursor. That of LibreOffice seems more reasonable, but that of > Claws works better! The reason is that Unicode did not declare virama > as forming grapheme clusters. Ah, hmmm, that a problem of DEVA-OTF.flt and DEV2-OTF.flt of the m17n library. I'll try to fix them. > It seems to have solved all of them. When I reported the bug, I was > having problems with my font because libotf was silently ignoring half > the lookups in my font. Could you please send me (not on this list) an appropriate bug/problem report if libotf should be fixed? > I though I might have problems with U+1A58 TAI THAM SIGN MAI KANG LAI, > which in Lao visually groups (usually) with the following base > consonant and in Tai Khuen groups with the preceding base consonant. My > clustering in Emacs follows the Tai Khuen scheme. (I compose two > orthographic clusters together in Emacs, but declare two grapheme > clusters in the FLT processing.) However, my font follows a major > Northern Thai dictionary and places it on the following base consonant > if there is nothing above it, but otherwise places it on the preceding > base consonant. However, my implementation is too dirty to cause > problems - the second cluster is not reported as deriving from the > mai kang lai character. > I wonder, though, what will happen if I manage to implement the > Universal Shaping Engine's (USE) rphf feature. The author of a Lao-style > Tai Tham font wanted this feature in HarfBuzz. The desired effect seems > easy to achieve in m17n-flt, but placing it under font control is more > difficult. I'm studying MLM2-OTF.flt to see how to do it. I've just started to study the Universal Shaping Engine. It seems that we can implement it by a proper FLT file. > > > However, it then makes editing of the 'clusters' more > > > difficult. Note that there are examples above with 5 > > > characters in a cluster, and this is by no means the > > > limit. > >=20 > > But, it seems that the current behavior is accepted, at > > least, by Indic people. > Who do you mean by 'Indic people'? I just mean that I have not heard any complaints about that "too long cluster problem" of Emacs. No one is using Emacs for Indic scripts? > New Tai Lue is an interesting case. Microsoft delayed support for this > simple Indic script for so long that most apparently Unicode-encoded > New Tai Lue text was actually encoded in visual order. With Unicode > 8.0, New Tai Lue is changing from phonetic order to visual order, and > it will no longer need any clusters at all!=20=20 Wow, I didn't know that. > Emacs 23.3 (which is what is in long-term support Ubuntu > 12.04) offers no support for New Tai Lue, so I am not sure > that there is yet a New Tai Lue view on composition in > Emacs. We may be able to provide supports for new scripts in elpa. --- K. Handa handa@gnu.org From debbugs-submit-bounces@debbugs.gnu.org Wed Mar 25 17:45:18 2015 Received: (at 20140) by debbugs.gnu.org; 25 Mar 2015 21:45:18 +0000 Received: from localhost ([127.0.0.1]:36664 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yat6z-0002JF-DC for submit@debbugs.gnu.org; Wed, 25 Mar 2015 17:45:18 -0400 Received: from know-smtprelay-omc-9.server.virginmedia.net ([80.0.253.73]:56507) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Yat6x-0002Io-Bl for 20140@debbugs.gnu.org; Wed, 25 Mar 2015 17:45:16 -0400 Received: from JRWUBU2 ([81.103.224.4]) by know-smtprelay-9-imp with bizsmtp id 7xl91q00n06JmVd01xl9he; Wed, 25 Mar 2015 21:45:09 +0000 X-Originating-IP: [81.103.224.4] X-Spam: 0 X-Authority: v=2.1 cv=dJgomYpb c=1 sm=1 tr=0 a=pLuj3OkTrmEUIJBpyvkqVg==:117 a=pLuj3OkTrmEUIJBpyvkqVg==:17 a=kj9zAlcOel0A:10 a=NLZqzBF-AAAA:8 a=mDV3o1hIAAAA:8 a=LsZXWq4u80ueHZGQUpMA:9 a=CjuIK1q_8ugA:10 Date: Wed, 25 Mar 2015 21:45:07 +0000 From: Richard Wordingham To: 20140@debbugs.gnu.org Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20150325214507.3dd335de@JRWUBU2> In-Reply-To: <87mw31887h.fsf@gnu.org> References: <20150321175818.1b125eba@JRWUBU2> <87mw31887h.fsf@gnu.org> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On Wed, 25 Mar 2015 23:25:54 +0900 handa@gnu.org (K. Handa) wrote: > Hi, thank you for the detailed explanation. > > In article <20150321175818.1b125eba@JRWUBU2>, Richard Wordingham > writes: > > > What I ought to want is SIL's split cursor scheme, which indicated > > the next ('point') and previous characters, even in bidirectional > > text. Unfortunately, that's not compatible with m17n, which seems > > to assume that cursor position will be a single number. > > The Emacs > > functions forward-char-intrusive and backward-char-intrusive > > provided a pleasant, more intuitive, alternative, and I am sad to > > hear they are gone. Perhaps I'll have to start using > > toggle-auto-composition. > > Those Emacs functions are just my idea for improving Emacs > for CTL users, and have never been included in the official > Emacs verison. I think I must have confused them with the behaviour of Emacs 22.1 on Windows XP. I didn't do anything to enable the visual decomposition of the clusters - it just happened when moving with the arrow keys. Indeed, it is conceivable that the characters weren't decomposed, but were simply being rendered by Windows without any need for composition. I haven't had time to try out the experimental code yet. Richard. From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 05 15:48:34 2015 Received: (at 20140) by debbugs.gnu.org; 5 Apr 2015 19:48:34 +0000 Received: from localhost ([127.0.0.1]:45687 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YeqX3-0001XM-7I for submit@debbugs.gnu.org; Sun, 05 Apr 2015 15:48:33 -0400 Received: from know-smtprelay-omc-9.server.virginmedia.net ([80.0.253.73]:36228) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1YeqX1-0001X2-Aw for 20140@debbugs.gnu.org; Sun, 05 Apr 2015 15:48:32 -0400 Received: from JRWUBU2 ([81.103.224.4]) by know-smtprelay-9-imp with bizsmtp id CKoR1q00M06JmVd01KoRl8; Sun, 05 Apr 2015 20:48:25 +0100 X-Originating-IP: [81.103.224.4] X-Spam: 0 X-Authority: v=2.1 cv=dJgomYpb c=1 sm=1 tr=0 a=pLuj3OkTrmEUIJBpyvkqVg==:117 a=pLuj3OkTrmEUIJBpyvkqVg==:17 a=NLZqzBF-AAAA:8 a=mDV3o1hIAAAA:8 a=k8Yw-NZnoM8DthYqnh4A:9 a=CjuIK1q_8ugA:10 a=pdTaYS8AUncam0ZpjhcA:9 a=oAEvv6KhJA_pUhrM:18 a=HXjIzolwW10A:10 Date: Sun, 5 Apr 2015 20:48:24 +0100 From: Richard Wordingham To: handa@gnu.org (K. Handa) Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20150405204824.35d870b1@JRWUBU2> In-Reply-To: <87mw31887h.fsf@gnu.org> References: <20150321175818.1b125eba@JRWUBU2> <87mw31887h.fsf@gnu.org> X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.10; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/1FbX5iSymwVXmjko80irIQH" X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) --MP_/1FbX5iSymwVXmjko80irIQH Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline On Wed, 25 Mar 2015 23:25:54 +0900 handa@gnu.org (K. Handa) wrote: > Could you try these patches and test the usability of > forward-char-intrusive and backward-char-intrusive? The results weren't quite what I'd hoped for, but the results are usable. Thank you. The text I principally tried the commands out on was Tai Tham text . I used corrections for the bugs that had been affecting its rendering. It seems that the commands prevent shaping across the cursor, but do not inhibit shaping within the former cluster. I was only doing shaping on complete orthographic syllables, so entering a cluster chiefly had the effect of losing all positioning of marks and making the text unreadable. However, the behaviour may make a good teaching aid! I then tried the command on Thai, and there the commands worked well. I therefore added to LANA-OTF.flt, for marks not in complete syllables, the command: ("(M)" [ (1 = ) ] ) ; For stepping through. I attach the results of not stepping through (labelled 0), and stepping through by 1 to 7 characters (labelled 1 to 7). The result is not so good at 3 steps - I think because the extra rendering command does not handle SAKOT. Aggressive use of dotted circles might improve the display. I don't know why, at the end, there is a delay in TONE-2 rising to its proper height. C-S-f is not a good key sequence for me. C-S is one of my X-keyboard switching combinations - I chose it for compatibility with the Xming X server. Richard. --MP_/1FbX5iSymwVXmjko80irIQH Content-Type: image/png Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=emacs_entry.png iVBORw0KGgoAAAANSUhEUgAAAEYAAAIPCAYAAADO5qgJAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A /wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB98EBRIzI6RMMfQAAAAZdEVYdENv bW1lbnQAQ3JlYXRlZCB3aXRoIEdJTVBXgQ4XAAAV60lEQVR42u2deVhTZ9rG7xNCFiBIRKoibh13 i06t2tEWtGr9bItTbevSRT+tVevU7ap2HMd26tWp4+dgdWqtX/20Wre6dBv3DQR3LcVprQW1rrUi AgYJS0KAvN8fAUSakJNATt5z8jzXxR8ESODO+z7v8zzc53cEc8FdBorfhIokcB5qCHVrYzAYJP2F BEEAY4xWDG0lEoaE8SrspmNYMncpTt61A8yC7KvXkVfKADBYb6QCADKsgShMQTrW/WsDvjczoOgo pjzcB7O3fYFFY3shus2zAICQINpKgO0W1o19Azu0Y7D1cjYAoE0wD8e11CEEQx+qh1oAoG6OwfNW Y/arY/BYcw1XOUYwm82M6hg/rhjb5c34+4fHkW+vFCCsByaODcORy30weWhLBIOvkCzH2H7dgxUf fYWzWTnIyb6Ig+u348Kl/dicko1yLlsCKbdJWFe8MPUdjOwpYOVT8+VfxzDzGSwbHQujIEDQtcbA ObuQ5eHbLKi00BUlY8bTE/F1th0avQ5qtRZ6LZ81pvvkG8aQOrkjBqWOxu7Uf+APeWswMm42Sv6Z iZTJbdwvOVaK/Fs5KKpw/8sEhTVFi8ZaLpIvzGYzq+uDmZPYqEbBLG7DLVbBGGOshJ2eGs2Cfv8h u1LG3EdhCnupERylrZsP49jDDADjIdyu4/LcH5FR9AB6tC/H8cRpmLslC9F92gBXT+Oa2NJd3Rgj dt6AyWSCyWTCjZ0j0Nj4AnbcMFU+dgO7RkX6oaiqR46xW+7CwvRoFHQbqZtWY+3287CFhkCw5MNi F/8yWkMEjEYjjEYjIgxaqFQaGCKM1Y8ZOMs1bn8blT4CesEKM7rgne8tyN78NAwlJWD6COi9/FuC QmLQqVMMQjnu7d2uXnVULLqE3caZC2bYe+qhghXXTl0H2o5CG513L6rvtRBHj8l97BDWC6+PiMSJ 9xcjJceGooz1mL/uDvpM/iNaqqHYcC+MEI5+H+zC4m578FxTLQwPvw/LpG3Y8lobKFgXaiK5nMdU 5CRh4VtLkWZmMtxKvhQmPw3rl29FRgl/wqglUgAZaRnIr9UWlF2/iiI7uAxphCk6jlmDhmJfsbOz u79jmheoydd2aSPeW8sw7t0xaKeh5HtPmJv7sXLlPvx6Jwkjm3fGvB9KgdKbOLLiTxg+LRWFgZx8 WeltnP32LLLMuUhfMxXxrVsjYWUB+g7vDH1A5piag6phyQCA8JRCzFmVib0J7REayDmGCryA6ZVI GD6Ci3kvrRgShoSRrTDujEMrF6wm45Az49DKa3oyDjmaqPuNQ+mrXibjEBmHZBhkHPJ3jiHjUF37 VlnGITvykubh2d4PIkIQIKgHY1eBF6IozjhkCIXp1Gf4/MID6FC8GAkzNfg69wASGontCj0zDjU3 asDF3MqtcahGFOxNYLrgJ9nOux44cDw0DhUxJg/jUMNkMvkZh9RSpTKHcSgMABBs0EKlsjuMQ2EA oOHOOOSXN8lhHLJzbRwSUlJS6GJRVyumQ4cOpAQNqkgY3wtjv3MAs4aNxYrzpdXVsClpFoaN+RjV D9GKobGDFy30FWx6ewGORk1G4uzeMAhykYHBdusYNqzYgAM/ZMOCkAYWhtlw91YuslCCChmtDlby E1bPTcQ+UwR6PDsWsaqfGlgYbSe8sWEH3pDZtrFk/huHTICu92RMGdUdoXhcpDCu+A+CB3vFfgf7 Zo/HikuOT8OfXITV0zpD11Bf9zrsKMkzwQYA3y7CxJc8yDGq4BBoVXaU2+8VyRVldqg0IRDd4qgi MWTJDgzx1dfrcf7ojREIBhD8yETMHtYKwaJPJX0MujcrQvqBNPxaUoZSUwaSk3+Bpt1DiKopre0y Ns4eh0n/cxqFMmo0QrokIK4RUJK+HbtPn8eVKxdEbiV1SwybOw15y1Zj+uhElCMEMY+OxrwpPRBW czexMhTcNiFbZZFV8hXCHsbrC2fBsGoLknZvQjqrbCKpV6ICj4RpkMr34sWLpETtvEOuTdpKJAwJ I5uxQwPFO9OnOnVFTHzsOj5OTEOfOTPQJ6LyPWUWZF/LgTq6FZpoAeuNw1i3/hLiZr2GLjqFrRhX rghbTZtaVfjIrsalMFWuiGVrluDFtiKcMz6wq0nmdvAkXLoiatrUqpOBw652LOsmjq+agSfb6BvE FMCd28FjTIqr5xd0aBId5TVmgTu3AzzFpLh8/jj2xZ1AdzuoIzF6fy4sFgssFgvups1B23qO9hTi dhCg1uqg0znU0Hadju3HxyOmEffC1MolPnY7CPpoxPaQYYEnB0yKOGHsZSi1VcBqs4MxO2xWK6y6 IGi0wXwUQsyGO5fO4awxpMbuNaBN57YID/KhMIXJo9Bs8Dcoqfz8+WZ6IGwEkrK2YSAPxvEKM/a+ 1gd773swDl/cOYIXGtM8pjpKf3wbXXofwaKb3gsjq+66+Ng0xD2zBBk1HBa2Sxvx9rwNuGSrIa7a gCZNG0NXjxJYVsKUF/yM/5z9BSU1CCLO0AiaB19B4p+j8enfUpU1dqjVIeLq1x9jc2YJLJnnUZyX j08XLcB+XRM8Nn4CeuJ+NIJ5zVTEb12L75uOwjtLOytZmHLcOXsYSYcLUJaTC1gtOJ2ShPPa1mj6 wnj08hEaQVbJt2D3ELR4vRNSL/4LPX1MyaDRpiKO6woriktV0IdofP6OygtlF6RDaIhUbS8FCUPC KFEYwqTQiiFhSJj6BGFSXAlDmBSxUwXCpFQ2Q4RJkXUQJsXfOYYwKXXtW0VhUmxXsWVWAro3D4Ug CBAiOuGZt3fh1zIPRVEcJiXoAha+tR3Nhj+P/rHNUJr2IV5+LhGqxPM4MaOd+9wQCJgUxhhj9ly2 5XGB6YbuZQWESamxAAp/wqHLwYh9sgNETxkVj0mpuI2dM8diS6v5ODLhQQ9+WMmYlIo8HHxrEMZ9 NwLfpMxB93oMpeWASREnTEUuDsweiNGHnsTWQ4swILJ+f5Ey7q9UcRt7ZvbHyP39sGH33xEXVgar 1QqrzQ4lh1thmCkZC5ZnoCBzORJahkGv1zs+4tbhloK1oSaSx3kMz/dXIkZVXacSXXfN2VYiYZQq DDGqKLxsIt0FMapcPT8xqpwHMaqqHidGFTGqnA9QiFHl4uwiRhUFFXgkjPeVLzGqnOQdYlTRViJh SBjZjB3qmkh4YDVbMnepJBwqLlaMJ1YzqThU3Gwlj9lT9ykrrR+YS6uZVByq+tUxElvNnDKqfMWh qit4s5rBGaPKRxwqZVjNfMCharjj2q9Ws4bnUDWMMJxZzRqCQ1V/YQLQaqYWs332zByAl5IHYNOB KqtZGaDSQKeRsAzyAYeqXsJUW82QgYSWy+99ofcaZJ0cj+ZSaeMDDpUi5zENwaGSXXctFYdKdsJI xaHiYuzgpiGr5lABQHHeSUk4VDIQ5h6HCgBgvSYJh0p2yTc0ZoYkHCoabSplxRQVl0rCoZJBjrk/ QkP8f3E6bSUShoRRpjDEqKIVQ8KQMPUJYlS5EoYYVWInDsSoqmyGiFEl6yBGlb9zDDGq6tq3imJU lV3DhvF/QFuj2mEcCmmJvq/+L9ILPAM7KI9RFVKOs6n/gS2mPVpFhaD06k7MfWY8jr6SjguLe7i/ kEqmjCr3WynIiG4DB1R/Wm5vi2iDGsFi/29dfBJTuzyBz0XcQtE49jBurItHqHySbyl+WtALRp0a wVFPYHv8Rhx89/fiL7uTIaNKpDBadJyxBxk/Z+LU5jfR6MuZmPPvLA8u1qoyDhlhNBoRYdBCpdI4 jEOVj/HGqBL926jDotC8ZXs8Ovp9/N/rOnzzj2247uU56zAOxSiAUXV/voZKrYLdaobVS+KQIoxD 5Vm7sXh9PvoM7YfYGB3yTq3Cmx9dQauJT6OtBooNt4tZCAJykxZgZK9WiIx4AB1HrAFe/RwH3n8E euXqQk0kl/MYYlTJdexA111ztpVIGKUKQ4wqivq0BHUEMapcPT8xqlxMJ4hRVfk4MaqIUeV8gEKM KhdnFzGqKKjAI2G8r3yJUeUk79Bok7YSCUPCKEmY+3y+JEwNYWr6fEkY2koiCoTg3zKoqI6RWeUr RYj2+foBBujXrSTa58vJhRd8AgI5uPCCP5+vH2CA3iVfiX2+0RFMehigV8lXYp+v6WO7i+/3HXap HjlGYp+vxDDAeggjtc/3HgxQp9MhvOt0bD++CgMbcSeMf32+DhhgRxglPLLJ5+utMH7x+UoMA/RK mGqf73tjkGMBYPgdBvra5ysxDFC2TaSvYYDczWOKj03jAgbInTDlBT9zAQP069ihxp9eDQO0ZJ5H cV6+32GAnAhzDwZYlpMLWC1+hwFyl3wLdg9Bi9c7+R0GyF2O4Tn8e1xXWFFcqvI7DJCTHFOzadIh NITPFUNbiYQhYUgYEoaEIWFImNpBgEBXwhAgUOwUggCBld0ZAQJlHQQI9HeOIUBgXftWUYDAGsHM RzG9jQDBMBLJhR6KojhAYFXyZYU4MTser53IRua5OCRlbcNAMXlZsYBAx1+HotPvYUryU1g25xSe muTBKygaEFj8Hd6ftBPxy/6CR7z5H4YMAYEifpdipC+cgG0PL0Xa4+FQ7fJO//rfWZQzYaw/LsaE z9ph0XeDEakCChrgRRvizqI+T77EqKpjxdB11zx21ySMAoUhRhWFj5pIYlS5en5iVDkPYlRVPU6M KmJUOQ1iVLk6u4hRRUEFHgnjfeVLjConeYfHG4HzcD9a2kokDAkjT2GIUeVKGGJU0Vby8DwmRpWs 6xgurWbZV68To8qZ1YwYVS5VlAOjqkZIZTWTB6NKYqtZi8ZamG7+4ndGFcxmM6vrwxF2VnhyNuvW fS47+PkTTB02giVVfcldFKawlxrBcRWFmw/j2MMMgIvvj2Nf3GGSBZ9WMw4YVZxaze4xqgBA23U6 th8fj5hGHAnDg9XMwajibMUU/XQQP9w8jpHN7//WQeHRmHr6Mj7q7TmUQRGMqiajj4GNvvd5wY6B aPJyJPaJPZW8CTkwqvwSxKgS10QGBKPKGZeqdgQko8oZl6p2BBCjqm4uVf+m6lpdRMAwqurmUvVv ev93BySjyh2Xiv4TyXlIf1y74VIF3Mz3XqPEL5eKthIJQ8IoXxgeEi+tGBKGhJGvMMSociUMMarE TiSIUVXZDBGjStZBjCp/5xhiVNW1bxXFqGJ3sC1OcNzYrvrjISy+VOaZKIpjVIXZsC0+BnPjjuHM 37pCW6lnsFaDIDHDaWUzqgBBrYVOp6sUxoNQNKMKNlxeOgDRhnA80KEfJiw7CZMnlwIoklElhKD7 mxuxJ7ob2hltuLxvMabMGIDndJlImtRG5B+jQEYVoEfH4S+iY+Vn7Tt8gk+/3YPBaw4h57VXEe3F 30OMKrlXvnTdNY/dNQmjQGGIUUXhoyaSGFWunp8YVc6DGFVVjxOjihhVztslYlS5OLuIUUVBBR4J 433lS4wqJ3mH2A60lUgYEkZJwhCjypUwxKiireTheUyMKlnXMfKxmjELsq/lSMauko/VTGJPMJ+M KpfqSucJlo/VrNITLBW7ijurWYvGWnklX6msZjKrY8hq5mSpkNXMRQSm1Uztze4L1gSBVVTA7mWO VASjym46jU07TXgo/hE82LgC1w9+gBmbzOi56Ak0DYJiw/2KKc/FsQ8nYOK4HJQCCIrqhmHv7Mby qe24u2KEmkilz2MqcpKw8K2lSKv13tRmVAWeMPlpWL98KzJK7hemNqMKpTdxZMWfMHxaKgq5yTEN owAy0jKQX6stKLt+FUV2Z12Ea0aVXqp3TRy2tp5xdycbEuoCV6vvzzZmV1R/KwDWosbXw2NHsQU7 LrIiO5M0KPnymGN4DhKGhCFhSBgShoQhRhWtGBKGhKlfEKPKlTDEqBIZxKiqap+JUSXrIEaVv3MM Marq2reKYlQBQPltJC94Ht2aqCEIAnTR8Xj3jMUzURTHqDJokLmoLx7+ZxTe3bQU43pGoPjyj7jZ 4gn0ixGRGWTKqHL/X4LiE2xycwMbsvU2q/Bm3O7hze2KGB/hdh2X56TjdH402mf+GY8200OtjUTn P87HwdseXA4qQ+OQW2HsxXdQbL2AT/c9iCU/mmG9tQf/nbcIz43/Atl28S/jMA4ZYTQaEWHQQqXS OIxDlY/xZhxy+9sIOgO00KL/X6cjLioY6saPYtJbj8N27CucK/buRR3GoRhiVMm28qXrrmlQRcJI Igwxqih81EQSo8rV8xOjynkQo6rqcWJUEaPKaRCjytXZRYwqCirwSBjvK19iVDnJO+QMp61EwpAw ShKGGFWuhCFGFW0lD89jYlTJuo7hy2rGHJ6bvFImCYeKi60kympWdBQAuPD9csmo4sH3y5fVTN0c ACTjUNUz+UprNeOFUcWd1QwAWc3IakZWM3EhB6uZ29UbHNUV7cOCYGvA40EOjCr375mhFyYO1ePQ ghU4ZSpHRcEZrFlyHJrHnsdDoVBsuBdGaIz/+ngH5kdvwlMPBEMdEY9Pwmbhy7UvoJmCe3NqInmc x7hiVAX8oMoVo0oWp1IDKeARoypwhCk6jlmDhmKfs7pH35/LaZ5kydd2aSPeW8sw7t0xaKeh5HtP mJv7ueDbcZl8a/Lt0tdMRXzr1khYWYC+wyXk23GVY2oOqoYlAwDCUwoxZ1Um9ia0R2gg5xgq8AKm VyJh+AhiVNGKIWFImPoEMapcCUOMKrFNFDGqKktbYlTJOohR5e8cQ4yquvatkhhVtgsL0VmofXM7 AUKnBTjvwW04lMeoCguBrbQM1cN820V80PdRbH7le5z5S0e4PUsUy6iqFSVpb7JW+r5s1S/lgc2o uv/dv4ujS9fh7oBZeLaFB+WpIm9uV7Ocz96DxO0qDNsxCFEqz1KZAm9uVxVluLIlEYcjx+Bkn/B6 vagcbm5HjKq6Vgxdd02DKhJGEmGIUUXhoyaSGFWunp8YVc6DGFVVjxOjihhVToMYVa7OLmJUUVCB R8J4X/kSo8pJ3pHLv2iFyppJKisabSUShoSRpzDEqHIlDDGqaCt5eO4So0rWdQxfVrOKLGT/UgZ1 dKvfMKoUu5VEWc2uvM2N75c/RhUnvl++rGahsRg8b7VMGFUSW82ijRouki93VjNXUfU9ZDXjYVAl upwnq5mzCCyrmXhhrOewatk5dJiyBbH1vKxeGYyqyig4vhRrbvXGzJfbQQPlh7gVY7+NfYlfwTpo vWdJV8ZBTSSP8xhiVLkShhhVxKhyHsSoqqNLF8moCrjkS4yqOicQxKhyPagiRhUVeIoMEsbfOabe iVtidhWtGBKGhJGnMMSociUMMarENlHEqKosYYlRRQWeqB1DjCoXwhCjqo59qyRGFQBYL23GtPiW 0AsCBHUkYkcuxgmTZ1Ns5TGqdNn4oHdnvNdiJU5vG4/fFaXir3GD8FnfQ7iytj/cpmbFMqoKD7EX w4NY3La8KusQS5sZw9Q9PmGivEOKZVSFxOKlhChkfPYlzpkrUJp1BJv2mNFjdD80FVuhytA49P+D 0DskC7r57gAAAABJRU5ErkJggg== --MP_/1FbX5iSymwVXmjko80irIQH-- From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 03 16:21:39 2022 Received: (at 20140) by debbugs.gnu.org; 3 Feb 2022 21:21:39 +0000 Received: from localhost ([127.0.0.1]:58109 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nFjXv-0007sy-7d for submit@debbugs.gnu.org; Thu, 03 Feb 2022 16:21:39 -0500 Received: from quimby.gnus.org ([95.216.78.240]:55538) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nFjXt-0007sg-N8 for 20140@debbugs.gnu.org; Thu, 03 Feb 2022 16:21:38 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Content-Type:MIME-Version:Message-ID:In-Reply-To:Date: References:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=seK1kjRBIb9DQwHE6KW9pBJfiufV4XYagMebmghMth8=; b=Eeb+CsztVrBjMlwk6AqAFYRtxo k9L09Y6AKnydiMuzHOWCFGu4vI2AojmKShqZ/8IOtTon0tcILx3jAw9EJPmhpJcfYEqMS6XqM3Bmg SoyGPJ9AeoPpRvS/HOO5YDtlxUvJpz2OOPv+qRQt8IIFLXDpire+hZE0N2ujGA6+QtNI=; Received: from [84.212.220.105] (helo=giant) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nFjXl-0004fX-CU; Thu, 03 Feb 2022 22:21:31 +0100 From: Lars Ingebrigtsen To: Richard Wordingham Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> X-Now-Playing: New Fads's _Love It All_: "Saxophone" Date: Thu, 03 Feb 2022 22:21:28 +0100 In-Reply-To: <20150318222040.4066e6e9@JRWUBU2> (Richard Wordingham's message of "Wed, 18 Mar 2015 22:20:40 +0000") Message-ID: <87r18jk5nr.fsf@gnus.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: Richard Wordingham writes: > I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin > installation, for which the version of libm17n-0 is 1.6.3-1. I am > attempting to induce Emacs to render the Tai Tham script. There > app [...] Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Richard Wordingham writes: > I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin > installation, for which the version of libm17n-0 is 1.6.3-1. I am > attempting to induce Emacs to render the Tai Tham script. There > appears to be a bug/feature in Emacs which makes this unnecessarily > difficult. (I'm going through old bug reports that unfortunately weren't resolved at the time.) I vaguely remember there having been some fixes in this area since this bug report was opened -- does this work better for you in more recent versions of Emacs? -- (domestic pets only, the antidote for overdose, milk.) bloggy blog: http://lars.ingebrigtsen.no From debbugs-submit-bounces@debbugs.gnu.org Thu Feb 03 16:21:50 2022 Received: (at control) by debbugs.gnu.org; 3 Feb 2022 21:21:50 +0000 Received: from localhost ([127.0.0.1]:58112 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nFjY2-0007tM-EN for submit@debbugs.gnu.org; Thu, 03 Feb 2022 16:21:50 -0500 Received: from quimby.gnus.org ([95.216.78.240]:55554) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nFjY0-0007t0-Gc for control@debbugs.gnu.org; Thu, 03 Feb 2022 16:21:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnus.org; s=20200322; h=Subject:From:To:Message-Id:Date:Sender:Reply-To:Cc: MIME-Version:Content-Type:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=3BADYOa8E9PctBZnG7h87geQQQfpQBOxK1k8YEHoHxA=; b=oa3/BgaPezaAYsKWpvdNmKUye4 x66d19wp4EWpCGfB/tVjYqSwqWkb/OzQLU/hfzUATmZtikpA2uhxaV1nD9X3RMy0SdZBuvC5oFS6Y Pdn41RIKn/2G0slIk3dX0QYTNslvCIT/vZeUW5OUJpFHfwxWuRpPJL9TV6h7WyjlGYyw=; Received: from [84.212.220.105] (helo=giant) by quimby.gnus.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nFjXs-0004fk-DD for control@debbugs.gnu.org; Thu, 03 Feb 2022 22:21:38 +0100 Date: Thu, 03 Feb 2022 22:21:34 +0100 Message-Id: <87pmo3k5nl.fsf@gnus.org> To: control@debbugs.gnu.org From: Lars Ingebrigtsen Subject: control message for bug #20140 X-Spam-Report: Spam detection software, running on the system "quimby.gnus.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see @@CONTACT_ADDRESS@@ for details. Content preview: tags 20140 + moreinfo quit Content analysis details: (-2.9 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tags 20140 + moreinfo quit From debbugs-submit-bounces@debbugs.gnu.org Fri Feb 04 02:37:09 2022 Received: (at 20140) by debbugs.gnu.org; 4 Feb 2022 07:37:09 +0000 Received: from localhost ([127.0.0.1]:58704 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nFt9Z-0002gt-0r for submit@debbugs.gnu.org; Fri, 04 Feb 2022 02:37:09 -0500 Received: from eggs.gnu.org ([209.51.188.92]:33114) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nFt9W-0002gU-Jp for 20140@debbugs.gnu.org; Fri, 04 Feb 2022 02:37:07 -0500 Received: from [2001:470:142:3::e] (port=43794 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nFt9P-0004Uy-Lm; Fri, 04 Feb 2022 02:37:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=o1AGfkKD+Kz0BWvO/l1fk+KTAEwGl/XyZX3JEipUN+g=; b=kbnHgF6Bz0XM cY3VAdDN2zf4oA6DBjOrmuW563wa3Ah82y0pmx+JzXWqwWqAgJAZjqGrk6LXkzMLGgGwmseXis0cY BKN/DKsoG7ejHbOaF+emILzT7hcqXZ8504J/vFWFrjTGP8Toka1t6DS1Cu4uXrh+lDs8ZmWoVOqL9 iSVpt8BEwyQYYo08XN6xKdegrCXzFOnBMvYlLgY+p139ss66aGdOTfSS6byjbk+Hrc+OE/A/snPzb u6VqyKZDbNI1g6aF0QoGNFUEwTYITJYi3E5dyRcA6y+o881hZ5Q/vtusD5IT1hqZfunD0AsZBQlAf 3/A8Z4vIIrlVX5FYYLE5Jg==; Received: from [87.69.77.57] (port=3002 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nFt9M-00010q-5S; Fri, 04 Feb 2022 02:36:57 -0500 Date: Fri, 04 Feb 2022 09:37:03 +0200 Message-Id: <83v8xv2icg.fsf@gnu.org> From: Eli Zaretskii To: Lars Ingebrigtsen In-Reply-To: <87r18jk5nr.fsf@gnus.org> (message from Lars Ingebrigtsen on Thu, 03 Feb 2022 22:21:28 +0100) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, richard.wordingham@ntlworld.com X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Lars Ingebrigtsen > Date: Thu, 03 Feb 2022 22:21:28 +0100 > Cc: 20140@debbugs.gnu.org > > Richard Wordingham writes: > > > I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin > > installation, for which the version of libm17n-0 is 1.6.3-1. I am > > attempting to induce Emacs to render the Tai Tham script. There > > appears to be a bug/feature in Emacs which makes this unnecessarily > > difficult. > > (I'm going through old bug reports that unfortunately weren't resolved > at the time.) > > I vaguely remember there having been some fixes in this area since this > bug report was opened -- does this work better for you in more recent > versions of Emacs? The most important change is that we now use HarfBuzz by default. Richard didn't contribute the Tai Tham composition rules to us (AFAIR), so I cannot test what happens now in Emacs with HarfBuzz. Maybe we should revisit this issue, but first I hope Richard could tell whether the issue still exists, and if so, what composition rules he uses or suggests to use for Tai Tham. From debbugs-submit-bounces@debbugs.gnu.org Sat Feb 05 17:53:01 2022 Received: (at 20140) by debbugs.gnu.org; 5 Feb 2022 22:53:01 +0000 Received: from localhost ([127.0.0.1]:35842 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGTvQ-0004DP-Lo for submit@debbugs.gnu.org; Sat, 05 Feb 2022 17:53:01 -0500 Received: from smtpq1.tb.ukmail.iss.as9143.net ([212.54.57.96]:38948) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGTvO-0004DD-OM for 20140@debbugs.gnu.org; Sat, 05 Feb 2022 17:52:59 -0500 Received: from [212.54.57.106] (helo=csmtp2.tb.ukmail.iss.as9143.net) by smtpq1.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGTvI-0006wz-N7 for 20140@debbugs.gnu.org; Sat, 05 Feb 2022 23:52:52 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id GTvIn2JD4YDyuGTvIn3ESm; Sat, 05 Feb 2022 23:52:52 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=eu3Mc6lX c=1 sm=1 tr=0 ts=61feffc4 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=OocQHUDgAAAA:8 a=NLZqzBF-AAAA:8 a=KxoD5NdYuB1iYnENJgQA:9 a=QEXdDO2ut3YA:10 a=M_eVecF_RifbRHp7dpgA:9 a=_FVE-zBwftR9WsbkzFJk:22 a=xUZTl98r3Qw_uB5NK3jt:22 a=wW_WBVUImv98JQXhvVPZ:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644101572; bh=6T9m1uuehi08aaE6MuAgNOI7SnwPXLltdxcEN2CJL2U=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=tkMfw8GhG5ReBG+zBm9xyk3dqGIAwFpq/sRLiSw1t8glppucRIgi44avdW5nesMUK jOBfyDNRrPQjQTyu6UsUKsmVG4/i3hW35Ynz9ZBvN7brttNTuKXjbmmDZOybY28cEd 4OPnJGq1KnBJHDqjUO3fAN8DcI+pliioTmlA61YQxDETRxgi5fFMgu2Hjd+P7wrOa8 NuTP7wpggeiNZ3MFzFeZbpOxF7ndq5nEztChAlB3f1WoW7tGxTbkqE1nvqdIWS2Ckk iUngPnOYvYqbFPgWGL6ZWyL6MW3L2grognoJPlJ6hrkzDxyvHqQhumoflv27OaGLm4 MxRhGIn2oqC/w== Date: Sat, 5 Feb 2022 22:52:51 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220205225251.08a0faab@JRWUBU2> In-Reply-To: <83v8xv2icg.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/Imvf5sngy_WuL_y0QW2hf.K" X-CMAE-Envelope: MS4xfGwskZ0ax6CiUm0fMk7xt5TVQYp3y5Zw7CczaZh7ir+KckZt1j85NbjnYjX+Sp9AufrgvBzTMhrr041GYcJvFHvWRiNAp5NnAAOcwx6Tg3DwMg8gbU1m LpgwjW0YNeC0nJUVOUOgDh82HqZW28MeB//1j50vSfsFJXUPc+hmhxyqebnlyhtxp6gTQmE31Mo/SNQ+z7gY6IfBGhapRRIy/J9ibND+SgrPuTvriZ96Tg8T ZqtQMFodMWBWW+1j+ntvZw== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, Lars Ingebrigtsen X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --MP_/Imvf5sngy_WuL_y0QW2hf.K Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Fri, 04 Feb 2022 09:37:03 +0200 Eli Zaretskii wrote: > > From: Lars Ingebrigtsen > > Date: Thu, 03 Feb 2022 22:21:28 +0100 > > Cc: 20140@debbugs.gnu.org > >=20 > > Richard Wordingham writes: > > =20 > > > I am running Emacs 24.4 in a Ubuntu 12.04 Precise Pangolin > > > installation, for which the version of libm17n-0 is 1.6.3-1. I am > > > attempting to induce Emacs to render the Tai Tham script. There > > > appears to be a bug/feature in Emacs which makes this > > > unnecessarily difficult. =20 > >=20 > > (I'm going through old bug reports that unfortunately weren't > > resolved at the time.) > >=20 > > I vaguely remember there having been some fixes in this area since > > this bug report was opened -- does this work better for you in more > > recent versions of Emacs? =20 I'm currently using the vanilla emacs on Ubuntu Focal, which is described as 'GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.24.14) of 2020-03-26, modified by Debian'. The key good news is that the commands forward-char-intrusive and backward-char-intrusive are now standard, so I can position the cursor by dead-reckoning. You can reasonably mark the issue as solved. > The most important change is that we now use HarfBuzz by default. Isn't that only true for Emacs 27.1 and above? > Richard didn't contribute the Tai Tham composition rules to us > (AFAIR), so I cannot test what happens now in Emacs with HarfBuzz. > Maybe we should revisit this issue, but first I hope Richard could > tell whether the issue still exists, and if so, what composition rules > he uses or suggests to use for Tai Tham. Sad to see that Khaled Hosny's suggestion not to use composition rules seems not to have been taken. You're welcome to include my composition rules. They're complicated by the facts that the 'regular expressions' are not interpreted as regular expressions and they are not interpreted as closed under canonical equivalence. I therefore calculate the regular expression. My composition rules are attached as tai-tham.el, which was last modified on 20 March 2015. (It would need reformatting to paste into this email.) There are some deficiencies; I've a feeling there may be a problem with adding ZWNJ and CGJ as marks; ZWJ should also be added for completeness. I need ZWNJ to write 4-column =E1=A8=B4=E1=A9=A3=E1=A9=B4=E1= =A8=B6=E1=A9=A0=E1=A9=85=E2=80=8C=E1=A9=A3=E1=A9=A0=E1=A8=BF as opposed to 3-column =E1=A8=B4=E1=A9=A3=E1=A9=B4=E1=A8=B6=E1=A9=A0=E1=A9=85=E1=A9=A3=E1= =A9=A0=E1=A8=BF, and even with my font, HarfBuzz will need CGJ for the suppression of jack-booted dotted circles. Additionally, for didactic text, what can I do for U+25CC for explicit display of marks and their equivalents on a dotted circle, and for that matter, for display on NBSP? Richard. Richard. --MP_/Imvf5sngy_WuL_y0QW2hf.K Content-Type: text/x-emacs-lisp Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename=tai-tham.el ;;; tai-tham.el --- support for Tai Tham -*- coding: utf-8 -*- ;; Copyright (C) 2008, 2009, 2010, 2011 ;; National Institute of Advanced Industrial Science and Technology (AIST) ;; Registration Number H13PRO009 ;; Keywords: multilingual, Tai Tham, i18n ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see . ;;; Code: ;; (set-language-info-alist ;; "Northern Thai" '((charset unicode) ;; (coding-system utf-8) ;; (coding-priority utf-8) ;; (sample-text . ;; "Northern Thai (=E1=A8=A3=E1=A9=A3=E1=A9=B4=E1=A8=BE=E1=A9=AE=E1= =A9=AC=E1=A9=A5=E1=A8=A6 / =E1=A8=BD=E1=A9=A3=E1=A9=88=E1=A9=A3=E1=A9=83=E1= =A9=B6=E1=A9=A3=E1=A9=A0=E1=A8=B6=E1=A8=B6=E1=A9=A3) =E1=A9=88=E1=A9=A0=E1= =A9=85=E1=A9=A2=E1=A9=94=E1=A9=A0=E1=A8=AF=E1=A9=A6=E1=A8=A3=E1=A9=95=E1=A9= =A2=E1=A9=A0=E1=A8=B8") ;; (documentation . t))) ;; To load: ;; (load-file "~/tham/tai-tham.el") tai-tham-composable-pattern ;;=20 (defvar tai-tham-composable-pattern (let ((table ;; C is letters, independent vowels, digits, punctuation and symbols. '(("C" . "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" . "[\u1A55-\u1A57\u1A59-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("H" . "\u1A60") ; sakot ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with sakot ("N" . "\u1A58"))) ; mai kang lai ;; The definition of a sequence of interacting Tai Tham characters is ;; surprisingly complicated. The basic syllable structure should just be: ;; ;; C(M|HC)* ;; ;; There are three complications: ;; ;; 1. Emacs uses a backtracking regular expression engine, but it only ;; backtracks if the characters accepted so far don't only match the reg= ular ;; expression. Thus if M includes sakot, CHC will be parsed as CH and t= hen ;; C - there is no cause to backtrack! On the other hand, missing conso= nants ;; should not disrupt display - the glyph for sakot will normally alert = the ;; user that text entry is incomplete. ;; ;; 2. Some characters can be swapped round with sakot without changing the ;; signification of the sequence of characters. The regular expression ;; works with strings of characters rather than traces of fully decompos= ed ;; characters subject to Unicode's canonical equivalence. ;; ;; 3. Which syllable mai kang lai belongs to depends on the font. Again, if ;; M included mai kang lai, CNC would be parsed as CN and C. The word ;; =E1=A8=B4=E1=A9=98=E1=A9=A0=E1=A9=83=E1=A9=A3=E1=A9=A0=E1=A8=BF has m= ai kang lai in the middle of an orthographic syllable. ; (basic_syllable "C\\(N*\\(M\\|HS*C?\\)\\)*") (basic_syllable "C\\(N*\\(M\\|HS*C\\)\\)*") (regexp "X\\(N\\(X\\)?\\)*H?")) ; X is basic syllable (let ((case-fold-search nil)) (setq regexp (replace-regexp-in-string "X" basic_syllable regexp t t)) (dolist (elt table) (setq regexp (replace-regexp-in-string (car elt) (cdr elt) regexp t t)))) regexp)) ; Failed attempt to get proper composition for incomplete word =E1=A8=B4=E1= =A9=98=E1=A9=A0=E1=A9=83=E1=A9=A3=E1=A9=A0. ;(let ((elt (list (vector tai-tham-composable-pattern 3 'font-shape-gstring) ; (vector tai-tham-composable-pattern 2 'font-shape-gstring) ; (vector tai-tham-composable-pattern 1 'font-shape-gstring) ; (vector tai-tham-composable-pattern 0 'font-shape-gstring) ; (vector "." 0 'font-shape-gstring) ; ))) ; (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt)) (let ((elt (list (vector tai-tham-composable-pattern 0 'font-shape-gstring) (vector "." 0 'font-shape-gstring) ))) (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt)) --MP_/Imvf5sngy_WuL_y0QW2hf.K-- From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 06 03:11:35 2022 Received: (at 20140) by debbugs.gnu.org; 6 Feb 2022 08:11:35 +0000 Received: from localhost ([127.0.0.1]:36321 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGcdz-0006w2-9g for submit@debbugs.gnu.org; Sun, 06 Feb 2022 03:11:35 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55318) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGcdy-0006vp-8t for 20140@debbugs.gnu.org; Sun, 06 Feb 2022 03:11:34 -0500 Received: from [2001:470:142:3::e] (port=52672 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGcds-0000vA-M5; Sun, 06 Feb 2022 03:11:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=WkOuKVyvfBHM5/vSvQAgqGUTHXZKHWL7XO3/XEwWJo4=; b=pU0eTolgqUPb+2JL01Qw 0+fgBA+aA+ejUGxPXDxMK9uoIBa8YZLSTyAlrJDYoYFd9v2Mcmci6dQ4ja7nMvB2WC0WRJF6Wd83R 8y+7tLj+hs3dgruGbr9jg2DDBMJFsfr3yiFb0iueoB+CVxrFUagA/QBrcYufEFOSQVtLZ1h3JmBCj gyxXbsuiGjPv/qYdq7qy4WZ/aNA6Yl0Z+b2ElclaMrS4qvhgp73WNpoHPcwIWosfCJmYayK0N6Dnb zb8sLb5QRMwa9+b6Um1Zi/U2IlE0v33hZxZJMycysyQfxX3FzblSRje1olEXijoc9BSETaqhbQdf7 j4t7PCLssWmpfQ==; Received: from [87.69.77.57] (port=2960 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGcdp-0001Uj-2e; Sun, 06 Feb 2022 03:11:27 -0500 Date: Sun, 06 Feb 2022 10:11:08 +0200 Message-Id: <83y22oza77.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220205225251.08a0faab@JRWUBU2> (message from Richard Wordingham on Sat, 5 Feb 2022 22:52:51 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, Kenichi Handa , larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sat, 5 Feb 2022 22:52:51 +0000 > From: Richard Wordingham > Cc: Lars Ingebrigtsen , 20140@debbugs.gnu.org > > I'm currently using the vanilla emacs on Ubuntu Focal, which is > described as 'GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ > Version 3.24.14) of 2020-03-26, modified by Debian'. The key good news > is that the commands forward-char-intrusive and backward-char-intrusive > are now standard, so I can position the cursor by dead-reckoning. You > can reasonably mark the issue as solved. I don't see the commands forward-char-intrusive and backward-char-intrusive anywhere in Emacs, so I guess they are your local changes, based on the code posted by Handa-san in this discussion? > > The most important change is that we now use HarfBuzz by default. > > Isn't that only true for Emacs 27.1 and above? That's true, but Emacs 26 is ancient history; Emacs 28.1 is about to be released. So from our perspective, HarfBuzz is the default shaping engine, and since it's available on all the supported platforms we care about, we are phasing out m17n-flt shapers. > > Richard didn't contribute the Tai Tham composition rules to us > > (AFAIR), so I cannot test what happens now in Emacs with HarfBuzz. > > Maybe we should revisit this issue, but first I hope Richard could > > tell whether the issue still exists, and if so, what composition rules > > he uses or suggests to use for Tai Tham. > > Sad to see that Khaled Hosny's suggestion not to use composition rules > seems not to have been taken. You mean, to pass all the text via HarfBuzz instead? That makes the Emacs redisplay painfully slow, and would require a complete redesign of how we render text to be bearable. So as long as such a redesign is not available, we cannot use that advice. > You're welcome to include my composition rules. Thanks. > They're complicated by the facts that the 'regular expressions' are > not interpreted as regular expressions and they are not interpreted > as closed under canonical equivalence. I therefore calculate the > regular expression. I'm not sure I understand the issue: what you do seems to be very similar to what we do for the Indic scripts in indian.el, so what kind of complications are you talking about here? Also, your rules seem to follow the description in the "Structuring Tai Tham Unicode" document (Revision 7), a.k.a. "L2/19-365", dated Oct 2019, is that right? Is that document the latest word on shaping Tai Tham, or are there any additional sources? > There are some deficiencies; I've a feeling there may be a problem with > adding ZWNJ and CGJ as marks; ZWJ should also be added for > completeness. These are barely mentioned in the L2/19-365 document, and not mentioned at all in the Tai Tham section of the Unicode Standard. Does it mean they are not very important in contemporary Tai Tham texts? > I need ZWNJ to write 4-column ᨴᩣᩴᨶ᩠ᩅ‌ᩣ᩠ᨿ as opposed to > 3-column ᨴᩣᩴᨶ᩠ᩅᩣ᩠ᨿ, and even with my font, HarfBuzz will need CGJ for > the suppression of jack-booted dotted circles. Additionally, for > didactic text, what can I do for U+25CC for explicit display of marks > and their equivalents on a dotted circle, and for that matter, for > display on NBSP? At least for the dotted circle case, Emacs has a general composition rule; see compose-gstring-for-dotted-circle and the corresponding rule in composite.c. So I'm not sure we need anything specific to Tai Tham there. Can you recommend good fonts for Tai Tham? Are they free fonts? Thanks. From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 06 17:10:12 2022 Received: (at 20140) by debbugs.gnu.org; 6 Feb 2022 22:10:12 +0000 Received: from localhost ([127.0.0.1]:40051 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGpjU-00074C-S9 for submit@debbugs.gnu.org; Sun, 06 Feb 2022 17:10:12 -0500 Received: from smtpq1.tb.ukmail.iss.as9143.net ([212.54.57.96]:44330) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nGpjS-00073S-1o for 20140@debbugs.gnu.org; Sun, 06 Feb 2022 17:10:07 -0500 Received: from [212.54.57.108] (helo=csmtp4.tb.ukmail.iss.as9143.net) by smtpq1.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGpjM-0007B5-0a for 20140@debbugs.gnu.org; Sun, 06 Feb 2022 23:10:00 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id GpjLnVGOGpODMGpjLnip3d; Sun, 06 Feb 2022 23:10:00 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=GcsEICbL c=1 sm=1 tr=0 ts=62004738 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=IkcTkHD0fZMA:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=AZnJWaAPAAAA:8 a=bfS6WyRxXP7FMp168e0A:9 a=QEXdDO2ut3YA:10 a=RUIEsB1ujRkA:10 a=cET8LZuHwC8A:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 a=T2rBzvJ0ivks0o3LBaDr:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644185400; bh=RodmaYOecXkbLFG2MTMlEByqQOmm4UbhXjCYCc4khiM=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=S1CnfnbbyKRctsT6JAGcgA5+XcB08U0GekdnclVf/SgSy3c2gxkeqm5f3Pvarnyo3 f+JMN2JD+rMg5+E7BYLIUnpgPr3II8YfUv9NDoKLYvjfy3tEESGCBa9GJ84qhgpRsw SC+YOtwYCaViQxmZKb3NJ/9m658+v7qfnhKyRmAFeHxvaQAiUK6fBmFoAHCafAagLz zXiDDhwMmSmmZub+xG9L11rECiLVPgyy9RM9j8fl2Av1d8TASvMRKHy9mTvbBHN5aT Fwy8NCBWiI/FPyD7Q64Vjimi+wp7h14zb1763XnXSgtx8UFxb1JHzIzVoapAG23JIj nRYSqXLFfyocw== Date: Sun, 6 Feb 2022 22:09:58 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220206220958.5a4d8ffe@JRWUBU2> In-Reply-To: <83y22oza77.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83y22oza77.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfGKQyosGxB5CB9ez+bqsKVZMT5KXkvZg0Dv104sryRTqwbUdrnrGPBAav/Nh+NKl57/B/pzR9QLjPGGqWlqEPYjDdymXepHpdlrLPMgl6DAL/cdtRjuR TN2flDJ8cOn3mlWw+9w2FjWOmuke4X7Xy3bOnjlfmra/WbmH2PA+8ppL62Opd85kb21m3osonW0q9uam0I/6RfL3EQ5q2peE+iq3SHeq5ku2DqvU2aBf9dXH b4dSYbmmmVfe7R7RS5qHK7XS31CP/lLakxXC2kUVSGI= X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, Kenichi Handa , larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sun, 06 Feb 2022 10:11:08 +0200 Eli Zaretskii wrote: > > Date: Sat, 5 Feb 2022 22:52:51 +0000 > > From: Richard Wordingham > > Cc: Lars Ingebrigtsen , 20140@debbugs.gnu.org > >=20 > > I'm currently using the vanilla emacs on Ubuntu Focal, which is > > described as 'GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ > > Version 3.24.14) of 2020-03-26, modified by Debian'. The key good > > news is that the commands forward-char-intrusive and > > backward-char-intrusive are now standard, so I can position the > > cursor by dead-reckoning. You can reasonably mark the issue as > > solved. =20 >=20 > I don't see the commands forward-char-intrusive and > backward-char-intrusive anywhere in Emacs, so I guess they are your > local changes, based on the code posted by Handa-san in this > discussion? >=20 > > > The most important change is that we now use HarfBuzz by default. > > > =20 > >=20 > > Isn't that only true for Emacs 27.1 and above? =20 >=20 > That's true, but Emacs 26 is ancient history; Emacs 28.1 is about to > be released. So from our perspective, HarfBuzz is the default shaping > engine, and since it's available on all the supported platforms we > care about, we are phasing out m17n-flt shapers. >=20 > > > Richard didn't contribute the Tai Tham composition rules to us > > > (AFAIR), so I cannot test what happens now in Emacs with HarfBuzz. > > > Maybe we should revisit this issue, but first I hope Richard could > > > tell whether the issue still exists, and if so, what composition > > > rules he uses or suggests to use for Tai Tham. =20 > >=20 > > Sad to see that Khaled Hosny's suggestion not to use composition > > rules seems not to have been taken. =20 >=20 > You mean, to pass all the text via HarfBuzz instead? That makes the > Emacs redisplay painfully slow, and would require a complete redesign > of how we render text to be bearable. So as long as such a redesign > is not available, we cannot use that advice. Except for Malayalam! (Subexpression XX* in indian.el at the moment.) > > You're welcome to include my composition rules. =20 >=20 > Thanks. >=20 > > They're complicated by the facts that the 'regular expressions' are > > not interpreted as regular expressions and they are not interpreted > > as closed under canonical equivalence. I therefore calculate the > > regular expression. =20 >=20 > I'm not sure I understand the issue: what you do seems to be very > similar to what we do for the Indic scripts in indian.el, so what kind > of complications are you talking about here? Well, those rules themselves are a bit odd. Why are you composing single clusters? Why are you breaking clusters where Microsoft imitators are likely to insert dotted circles? The basic structure for most Indic scripts is R*C(M|HC)*(M|H)* where R is miscellaneous prefixed forms (e.g. dot reph, visarga variants), C is consonants (and things that can act like them), H is the conjoining operator, and M is miscellaneous marks, including ZWJ and ZWNJ. "(M|H)*" accounts for explicit viramas and isolated half-forms. Jackboots are then applied on the ground that spell checkers cannot be relied upon. The first problem for Tai Tham is that marks with non-zero canonical combining class (ccc) greater than 9 (note that script-specific nuktas generally have ccc=3D7) do not mix with conjoining operators with ccc=3D9; the conjoining operator (as opposed to visible virama) should not be separated from the following consonant. Mark Davis ignored this requirement from the proposals, so unless your 'regular' expression is acting on traces under canonical equivalence rather than mere strings, one has to complicate the expressions to cope. The second issue is that the behaviour of U+1A58 TAI THAM SIGN MAI KANG LAI is a stylistic variable. It can act like a dot reph or a phonetic-syllable-final mark. My composition rules therefore have to treat it as gluing orthographic syllables together. The third issue, that is less visible, is that I had a problem with back-tracking. > Also, your rules seem to follow the description in the "Structuring > Tai Tham Unicode" document (Revision 7), a.k.a. "L2/19-365", dated Oct > 2019, is that right? Is that document the latest word on shaping Tai > Tham, or are there any additional sources? No, the document's a crime. I tried to minmise it's destructiveness, which is why I got an acknowledgement in it. I advocate sticking to phonetic order, as in Khmer and Brahmi. That scheme needs a couple of formally unproposed characters to make some distinctions. The best sources are the regular expressions in the proposals, but they missed out the combination of tone mark and final consonant signs. What do you mean by 'shaping'? For Tai Tham, only positive service provided by rendering engines is the movement of preposed vowels and MEDIAL RA to the start of the glyph sequence; all the other resequencing has to be done by the fonts themselves. > > There are some deficiencies; I've a feeling there may be a problem > > with adding ZWNJ and CGJ as marks; ZWJ should also be added for > > completeness. =20 >=20 > These are barely mentioned in the L2/19-365 document, and not > mentioned at all in the Tai Tham section of the Unicode Standard. > Does it mean they are not very important in contemporary Tai Tham > texts? The Tai Tham section is based on information before grammar nazification disabled Tai Tham texts, or at least, those that were to be rendered using restrictive shapers based on alleged knowledge of the languages. ZWNJ is a standard mechanism for disabling ligatures in non-cursive scripts, though I'm not sure of the balance of ZWJ and ZWNJ in Fraktur, e.g. the different renderings of the two meanings of Antiqua German Wachstube. CGJ is needed where there is no other character to mark the boundary of two chained syllables and concatenating the vowel and tone marks of the two together violates the ordering rules for a single syllable. It would also be needed to mark other differences relevant to collation, e.g. if syllable-initial BA were sorted according to its pronunciation, as in one major dictionary. Automating an inconsistent hand-sort is hard, slow work, especially as the CLDR tools choke on an easy Lao sort. (By contrast, the official Thai sort is very machine-friendly.) > > I need ZWNJ to write 4-column =E1=A8=B4=E1=A9=A3=E1=A9=B4=E1=A8=B6=E1= =A9=A0=E1=A9=85=E2=80=8C=E1=A9=A3=E1=A9=A0=E1=A8=BF as opposed to > > 3-column =E1=A8=B4=E1=A9=A3=E1=A9=B4=E1=A8=B6=E1=A9=A0=E1=A9=85=E1=A9= =A3=E1=A9=A0=E1=A8=BF, and even with my font, HarfBuzz will need CGJ > > for the suppression of jack-booted dotted circles. Additionally, for > > didactic text, what can I do for U+25CC for explicit display of > > marks and their equivalents on a dotted circle, and for that > > matter, for display on NBSP? =20 This, the main use of ZWNJ, was unknown to the authors of the Tai Tham proposals. In Lao texts of the 1930s, non-ligation seems to mark an enthusiasm for the spelling reforms, which one normally thinks of as only applying to the Lao script. Having looked at indian.el, it seems that it will be easy to add these controls (CGJ, ZWJ and ZWNJ) to the composition tables. > At least for the dotted circle case, Emacs has a general composition > rule; see compose-gstring-for-dotted-circle and the corresponding rule > in composite.c. So I'm not sure we need anything specific to Tai Tham > there. Does the 3-character Khmer sequence "=E2=97=8C=E1=9F=92=E1=9E=80" work in Version 28? It doesn't in Version 26.3. It should look like a dotted circle with the lower part of =E1=9E=80=E1=9F=92=E1=9E=80 below it. = In Version 26.3, I don't even get the consonant U+1780 subscripted! With HarfBuzz, if you don't compose U+25CC with the following mark, you are very likely to get two dotted circles - are you deliberately deleting one? Doing so wouldn't be a reliable process. Possibly I could fix the rendering problem by also composing sequences starting with marks - to be investigated. If it works, it might work with NBSP, though it wouldn't help with my plan for to render as just the spacing mark.=20 > Can you recommend good fonts for Tai Tham? Are they free fonts? Almost all Tai Tham fonts have problems. Probably the best is the one used for the New Testament, which relies on the SIL Graphite renderer. I'll dig into that one. The nicest OTL shaper-based one for most words is Lamphun, which is based on Hariphunchai. Unfortunately, not even Lamphun distinguished subscript HIGH RATHA from the subscript , and it is rather limited for interacting marks - Hariphunchai lacks mark-to-mark positioning. The commoner combinations of marks are handled by glyph substitution, and Lamphun has made a start on mark-to-mark positioning. Hariphunchai and Lamphun are available under the SIL Open Font licence. For Lao and Pali, Khottabun is a nice font, but there are some idiosyncrasies in its encoding of words. (Unicode appears only to define character encoding, and is largely silent on the encoding of Tai Tham words.) It is available under the SIL Open Font licence, so I can and perhaps ought to add it to my renderer (https://wrdingham.co.uk/renderer_test.htm) and font (https://wrdingham.co.uk/font_test.htm) tests. Unfortunately, it only supports characters used for Lao or Pali. It appears to evade the jackboots of the HarfBuzz implementation of the Universal Shaping Engine (USE) by not having a glyph for U+25CC - cunning! I don't know whether this trick works with the Windows renderers. There's a clutch of Tai Khuen fonts released under the SIL Open Font licence that are aesthetically satisfying, but have a tendency to rely on Tai Khuen orthographic rules to avoid clashing glyphs, and don't extend to supporting somewhat exceptional words like Pali _indriya_. The fonts are: A Tai Tham KH A Tai Tham KH New A Tai Tham KH New V3 They are unlikely to work with Uniscribe or DirectWrite, as they rely on the ccmp or liga feature being enabled for the default script; I'm not sure whether that's a problem for those using emacs on Windows. If you don't mind the reactionary square nature of the glyphs, there is also my Da Lekh family, with full coverage of the encoded character set, and some support for language-specific glyphs that are very different between the languages. (Generally the glyphs aim to be an 'international' compromise.) Features may be used instead of language environment - I don't set out to punish Windows victims. There are four fonts: Da Lekh Da Lekh Si Da Lekh Seri Da Lekh Si Seri The ones with Seri in the name have the same freedoms as the Deja Vu fonts and none of the restrictions. (I drew all their glyphs.) The others have the same freedoms and, necessarily, restrictions as Deja Vu Sans. The Seri (meaning 'untrammelled') fonts were created for unconstrained use by the Unicode Consortium and deliberately have no defence against the jackboots of the Universal Shaping Engine. They should work fine with the M17n renderer. Unfortunately, for its Latin glyphs, one only gets what one pays for. The ones with 'Si' colour conjoined syllables red so that one can see how words are spelt. This capability was added for use with spell checkers, and I use it successfully for spell-checking in Firefox and LibreOffice. Richard. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 07 09:04:58 2022 Received: (at 20140) by debbugs.gnu.org; 7 Feb 2022 14:04:58 +0000 Received: from localhost ([127.0.0.1]:41308 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nH4dV-0006jw-UQ for submit@debbugs.gnu.org; Mon, 07 Feb 2022 09:04:58 -0500 Received: from eggs.gnu.org ([209.51.188.92]:45426) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nH4dU-0006jj-0m for 20140@debbugs.gnu.org; Mon, 07 Feb 2022 09:04:56 -0500 Received: from [2001:470:142:3::e] (port=52040 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nH4dN-0001wB-UO; Mon, 07 Feb 2022 09:04:49 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=FJenTatY7pObevwM9sc/jpnBNfEWzrXrtZwtl+UV6z4=; b=VeZA73XWsh+NIkwhIMEi xc5OutpFPZ/bw8CEMm7XiMG5cf42jJPYqxjs7lNbzZTCQeveaIc8VU728C4eGC5rUfnnvPORABypc P4NHArkMSoxLxHt9IAOpImn4fD3vB6av8L4budZlIJKmWAIeHZ2/2ioAZyrqyXxJvrP/NEPBvu2Rl vgFzzbcGYAkdtGVk/Rn8ASXG94rK7g/VKw17pIxsq/MY+iaGAdZD9PyK/+D+yGK9Yx2wIpE1NITsT zBUcE8ZrrQCXAswVX1rGxLHgYAkkMYfjtZqLnoFYVZEOY9CGnSnijUOkBwy7mkXI9+wCrxlBeVUTS mVJrFpy/vUxW3A==; Received: from [87.69.77.57] (port=1433 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nH4dN-000190-AN; Mon, 07 Feb 2022 09:04:49 -0500 Date: Mon, 07 Feb 2022 16:04:35 +0200 Message-Id: <83czjyydqk.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220206220958.5a4d8ffe@JRWUBU2> (message from Richard Wordingham on Sun, 6 Feb 2022 22:09:58 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83y22oza77.fsf@gnu.org> <20220206220958.5a4d8ffe@JRWUBU2> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, handa@gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sun, 6 Feb 2022 22:09:58 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org, Kenichi Handa > > > > Sad to see that Khaled Hosny's suggestion not to use composition > > > rules seems not to have been taken. > > > > You mean, to pass all the text via HarfBuzz instead? That makes the > > Emacs redisplay painfully slow, and would require a complete redesign > > of how we render text to be bearable. So as long as such a redesign > > is not available, we cannot use that advice. > > Except for Malayalam! (Subexpression XX* in indian.el at the moment.) (That was changed lately. But it is a tangent.) > > > They're complicated by the facts that the 'regular expressions' are > > > not interpreted as regular expressions and they are not interpreted > > > as closed under canonical equivalence. I therefore calculate the > > > regular expression. > > > > I'm not sure I understand the issue: what you do seems to be very > > similar to what we do for the Indic scripts in indian.el, so what kind > > of complications are you talking about here? > > Well, those rules themselves are a bit odd. Why are you composing > single clusters? Why are you breaking clusters where Microsoft > imitators are likely to insert dotted circles? I'm not sure this is what I asked. I asked why you think this way of defining patterns for composition rules is in any way exceptional. It seems pretty much boilerplate to me. > The best sources are the regular expressions in the proposals, but they > missed out the combination of tone mark and final consonant signs. Can you be more specific about those proposals? Any specific pointers? Also, does this mean there's currently no widely accepted agreement regarding Tai Tham shaping? What do native readers of that script expect? > What do you mean by 'shaping'? Whatever is needed to produce correct display from a sequence of codepoints in a given script. > > At least for the dotted circle case, Emacs has a general composition > > rule; see compose-gstring-for-dotted-circle and the corresponding rule > > in composite.c. So I'm not sure we need anything specific to Tai Tham > > there. > > Does the 3-character Khmer sequence "◌្ក" work > in Version 28? It doesn't in Version 26.3. It should look like a > dotted circle with the lower part of ក្ក below it. In Version 26.3, I > don't even get the consonant U+1780 subscripted! No, it doesn't produce what you want (though the 2nd and the 3rd characters do combine), but that's not surprising: the general rules for U+25CC that we have cover only a single combining mark after it: (aset composition-function-table #x25CC `([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle])) So a sequence of more than one character after U+25CC needs an explicit rule to work. What is the rule in this case? (And what does Khmer have to do with the question I asked, which is about Tai Tham?) > With HarfBuzz, if you don't compose U+25CC with the following mark, you > are very likely to get two dotted circles - are you deliberately > deleting one? No. And I don't get 2 dotted circles with the above in Emacs 28 with HarfBuzz. Anyway, Khmer is a separate issue. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 07 18:38:18 2022 Received: (at 20140) by debbugs.gnu.org; 7 Feb 2022 23:38:19 +0000 Received: from localhost ([127.0.0.1]:44115 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nHDaM-0007WC-D9 for submit@debbugs.gnu.org; Mon, 07 Feb 2022 18:38:18 -0500 Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:56532) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nHDaJ-0007Vu-PT for 20140@debbugs.gnu.org; Mon, 07 Feb 2022 18:38:16 -0500 Received: from [212.54.57.109] (helo=csmtp5.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nHDaE-0000oD-2N for 20140@debbugs.gnu.org; Tue, 08 Feb 2022 00:38:10 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id HDaDnSTIoXlqZHDaDnbJBl; Tue, 08 Feb 2022 00:38:10 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=ZcUOi+ZA c=1 sm=1 tr=0 ts=6201ad62 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=IkcTkHD0fZMA:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=te1EGT4yAAAA:8 a=nOWOQ1zm7K-q8ooLlBwA:9 a=QEXdDO2ut3YA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 a=RRElR4r2U1jGY2dU47NL:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644277090; bh=x60pPCckrl81m/x70WB6BYtLjCKtus6Vx1Dv78CB2Vg=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=z2s7n7I8u8+6kZmMzYIWjGv9cML2othF+dkoyBI4RFB4zKrDV7ELuu5iBOzzRTFio 1upAmux8w0b9dSuUIg3Ju8eQ4jm5DBPDR7fWa1uiYfacX39qNflmxws2QlMpRRK0RR oSF5JJu+K2QJBpEoUadCjrP4+oTym3Os9MgTe0Q5glXfm4YpXoqPZGXTZJrS2PBjAj QONiBEUS3aeHrvVzdsxvXfJikXjJaBtcmHWWmxNX1AlD1ZnPdYRL/A/eNL9HOKQLzr Tpwgumw5mw12fhaXPx7A3inDCnIrgIuHoB8ljDRf5jbs2PjrgiPrfZhTwMfCp2iIOK OrQQhvX877URw== Date: Mon, 7 Feb 2022 23:38:08 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220207233808.1a84d8ec@JRWUBU2> In-Reply-To: <83czjyydqk.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83y22oza77.fsf@gnu.org> <20220206220958.5a4d8ffe@JRWUBU2> <83czjyydqk.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfFVv2oRQ+lKAHMI7ixXMepQTFyJY40u8K8rDs+pnVNWa7t6O+AcT5wcYN3r5E7G8weWy6Zmfd+S0RkxVxS9thzNGeIxjpVLSgWNv3vwXfCn/d1sNCY+y /hxKI5uDRqg4gI2f3TS1YkIifLpz7hFY73YKsNBpX5KzCdHkHWObYORxN/xYqT0+h+//8hHXWDR8k3Mc6M8tkAx8P+mGQjfF9rHlf5URsK3AV2/Rjfm7CRno a+nd8goUf30a1eruyiLcioC45DkI0Nej7ZF4EvvaqVY= X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, handa@gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, 07 Feb 2022 16:04:35 +0200 Eli Zaretskii wrote: > > Date: Sun, 6 Feb 2022 22:09:58 +0000 > > From: Richard Wordingham > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org, Kenichi Handa > > =20 > > > > Sad to see that Khaled Hosny's suggestion not to use composition > > > > rules seems not to have been taken. =20 > > >=20 > > > You mean, to pass all the text via HarfBuzz instead? That makes > > > the Emacs redisplay painfully slow, and would require a complete > > > redesign of how we render text to be bearable. So as long as > > > such a redesign is not available, we cannot use that advice. =20 > >=20 > > Except for Malayalam! (Subexpression XX* in indian.el at the > > moment.) =20 >=20 > (That was changed lately. But it is a tangent.) >=20 > > > > They're complicated by the facts that the 'regular expressions' > > > > are not interpreted as regular expressions and they are not > > > > interpreted as closed under canonical equivalence. I therefore > > > > calculate the regular expression. =20 > > >=20 > > > I'm not sure I understand the issue: what you do seems to be very > > > similar to what we do for the Indic scripts in indian.el, so what > > > kind of complications are you talking about here? =20 > >=20 > > Well, those rules themselves are a bit odd. Why are you composing > > single clusters? Why are you breaking clusters where Microsoft > > imitators are likely to insert dotted circles? =20 >=20 > I'm not sure this is what I asked. I asked why you think this way of > defining patterns for composition rules is in any way exceptional. It > seems pretty much boilerplate to me. Your 'boilerplate' rules look like a straightforward derivation from the DirectWrite rules for valid subsequences - I haven't checked for repair work. That seems unlikely to handle prohibited dittograms nicely. It also wouldn't work well when 'well-formed' adjacent clusters need to interact, as with virama-terminated clusters in Kharoshthi and some styles of Brahmi. I haven't hunted for their definitions - I should probably download a recent tarball. The exceptional features were the calculation of the regular expression, especially the expression (replace-regexp-in-string "X" basic_syllable regexp t t)) > > The best sources are the regular expressions in the proposals, but > > they missed out the combination of tone mark and final consonant > > signs. =20 >=20 > Can you be more specific about those proposals? Any specific > pointers? >=20 > Also, does this mean there's currently no widely accepted agreement > regarding Tai Tham shaping? What do native readers of that script > expect? >=20 > > What do you mean by 'shaping'? =20 >=20 > Whatever is needed to produce correct display from a sequence of > codepoints in a given script. The main shaper writers refused to maintain such a service for Tai Tham, though HarfBuzz did briefly provide such a service with its South East Asian Shaper. Windows still confesses its inability to render the full range of orthographic syllables. To work, fonts have to engage in dotted circle removal by some means or other. It seems that native readers expect a font encoding, where the key sequence for a mark (or subscript consonant) specifies its position and shape. I was badly shocked when I found the backing store for the Tai Tham Northern Thai New Testament. I found examples of marks above entered in the reverse-order to what Unicode-savvy people would expect, and the complete opposite to what one would type for Thai, for which input systems generally enforce the rule of typing from base character outwards. The general pointer would be to look at the English Wikipedia entry for _(Unicode_block). In this case, that becomes https://www.unicode.org/L2/L2007/07007r-n3207r-lanna.pdf Section 13. The codepoints have changed since then, but the names (apart from the script name) and representative glyphs have been pretty stable. The relationship between the outermost subexpression and syllables needs updating, and 'H' needs to be updated to include other subscript consonants, but formally the expression as a whole still stands. > > > At least for the dotted circle case, Emacs has a general > > > composition rule; see compose-gstring-for-dotted-circle and the > > > corresponding rule in composite.c. So I'm not sure we need > > > anything specific to Tai Tham there. =20 > >=20 > > Does the 3-character Khmer sequence "=E2=97=8C=E1=9F=92=E1=9E=80" > > work in Version 28? It doesn't in Version 26.3. It should look > > like a dotted circle with the lower part of =E1=9E=80=E1=9F=92=E1=9E=80= below it. In > > Version 26.3, I don't even get the consonant U+1780 subscripted! =20 >=20 > No, it doesn't produce what you want (though the 2nd and the 3rd > characters do combine), but that's not surprising: the general rules > for U+25CC that we have cover only a single combining mark after it: >=20 > (aset composition-function-table #x25CC > `([,(purecopy ".\\c^") 0 compose-gstring-for-dotted-circle])) >=20 > So a sequence of more than one character after U+25CC needs an > explicit rule to work. What is the rule in this case? (And what does > Khmer have to do with the question I asked, which is about Tai Tham?) You asked if there were any Tai Tham specific requirements. The requirement is general, but the need for Khmer is the most obvious. The rule for Brahmi, Kharoshthi and their descendants is fairly close to 'take any existing composition, and substitute dotted circle for the first letter (Lo)'. For the important cases, it is: (i) Dotted circle plus any sequence of marks (Let the shaper worry about validity); (ii) Dotted circle, conjoining operator, consonant, VS?; and (iii) Dotted circle, conjoining operator, consonant, VS?, any sequence of marks. (iv) (i)-(iii) preceded by anything repha-like. 'Conjoining operator' is a virama or pure stacker optionally preceded or followed by ZWJ or ZWNJ. VS is a variation selector. 'Repha-like' includes U+0D4E MALAYALAM LETTER DOT REPH, the Mymr script kinzi sequences, and the prototypical . The entire sequence would be best handled in the renderer, though you may have problems with selecting the font and script. Richard. From debbugs-submit-bounces@debbugs.gnu.org Tue Feb 08 17:13:19 2022 Received: (at 20140) by debbugs.gnu.org; 8 Feb 2022 22:13:19 +0000 Received: from localhost ([127.0.0.1]:48961 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nHYjf-0005hv-6x for submit@debbugs.gnu.org; Tue, 08 Feb 2022 17:13:19 -0500 Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:53404) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nHYjd-0005hh-GW for 20140@debbugs.gnu.org; Tue, 08 Feb 2022 17:13:17 -0500 Received: from [212.54.57.109] (helo=csmtp5.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nHYjY-0002r0-0T for 20140@debbugs.gnu.org; Tue, 08 Feb 2022 23:13:12 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id HYjXnWs9cXlqZHYjXnc07N; Tue, 08 Feb 2022 23:13:12 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=ZcUOi+ZA c=1 sm=1 tr=0 ts=6202eaf8 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=teMSSIXVoO6el5AJq08A:9 a=CjuIK1q_8ugA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644358392; bh=zFUtfA9Pwcj6qlyboAqFdq4fvksJXwl841Q76wHDkHs=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=g0F7qoHH9HBSxesY3St27X+pZT9UN4nY6wxN5h9LLrx+Dh0G1IvnLrm+8JUhS+jyD CGsGQxZEJr3VHyWUHgi3FX/tQ+lb9e9EHlruHFGTFcTMHIPH3G6eCGB0kc0kLanDQW 9D2UDEIbcfkewnTlohe4HQS9x9iu/BlnD0Utoh1wcNTto9egWquSeEotuXbWl4wNxz QYYWNr2vvLVUV3Sgv6AWdvW3pjKZeSpSuigwbgx3ezJzaKykw84bzvBXs+oAoLyIRc l7u+Bn3uQdWRWeLx7LOvG0uE7ZnxisFpgCqcYcc/bGcQPiIAt53eRmsyDFN5jp7Top eXPnsUTMsN/9w== Date: Tue, 8 Feb 2022 22:13:10 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220208221310.734f8d75@JRWUBU2> In-Reply-To: <83y22oza77.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83y22oza77.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4xfKxwaeUUC04hz5lwNEc8tdbduOLbp30W4C4pgoD2jMThuURUeAV7EPMuTX2vANFhprjkv2QN1siEd1NFFuZ1d7EqJqOWDjHI1Vi8vz0/QmqA0bb4iYYY okciC3h1jq2JTAiV7oP6aX3wgD5Q8bf6jqcbj3tec6gZhgI/pQ2VmfpY3Xb1BlL3kEp7PByyF+MES6Vqo3aj1HgR5Bz7vC2iMU4GeYsRDfDaqMEm0KtG9+Q0 MfYiRxZbDv084/AJ4rXrjaV8yY5WB9Z6iRimi2NROEI= X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, Kenichi Handa , larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sun, 06 Feb 2022 10:11:08 +0200 Eli Zaretskii wrote: > > Date: Sat, 5 Feb 2022 22:52:51 +0000 > > From: Richard Wordingham > > Cc: Lars Ingebrigtsen , 20140@debbugs.gnu.org > > > > I'm currently using the vanilla emacs on Ubuntu Focal, which is > > described as 'GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ > > Version 3.24.14) of 2020-03-26, modified by Debian'. The key good > > news is that the commands forward-char-intrusive and > > backward-char-intrusive are now standard, so I can position the > > cursor by dead-reckoning. You can reasonably mark the issue as > > solved. > > I don't see the commands forward-char-intrusive and > backward-char-intrusive anywhere in Emacs, so I guess they are your > local changes, based on the code posted by Handa-san in this > discussion? That's a shame; they are indeed local, sitting in my initialisation file (.emacs). (I future-proofed myself too well.) They are well worth adding to the general store of emacs commands, and mentioning in documentation next to forward-char and backward-char. Richard. From debbugs-submit-bounces@debbugs.gnu.org Sat Feb 12 13:54:35 2022 Received: (at 20140) by debbugs.gnu.org; 12 Feb 2022 18:54:35 +0000 Received: from localhost ([127.0.0.1]:35655 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nIxXW-0000xf-Uq for submit@debbugs.gnu.org; Sat, 12 Feb 2022 13:54:35 -0500 Received: from eggs.gnu.org ([209.51.188.92]:47342) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nIxXV-0000xQ-6Z for 20140@debbugs.gnu.org; Sat, 12 Feb 2022 13:54:34 -0500 Received: from [2001:470:142:3::e] (port=46850 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nIxXO-00081y-B2; Sat, 12 Feb 2022 13:54:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=rOXbJbOK3uaWKFl22u348WPzR7iq8O+/jSkxpe3+YFw=; b=a1TCHGmRzaNG Q7xfC7NqLrHeJtxBn2+J5OA239zomMWtxp5rPVPUFWQHhRIzQeoDdOTUWBCjPnC3XOzQMg4LatTdm dMQPkNNoDoF0CN27jSDm9jD+fYoX9kSW4M8UQrTZ36qCF/B/daZpt5X+Z8JHvxHkixCLpu475OqLa UgMOnw6brXr2icJeVVGLWcjDrRo24BtnLNlXv7XUJMKSVyLQLEdesvKlZD9knG2WlK2/ctQl9J3yq VurF14vrAT1BG+8ZquUN5x4ExmZpSd4DfR1IgShS0Hd3gGPLGmvSW4OZoqE4+s4wVD0IiN6jIDUdN OgoMSeFGdyCzb5hOeNNy7Q==; Received: from [87.69.77.57] (port=3715 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nIxXN-0006IK-MF; Sat, 12 Feb 2022 13:54:26 -0500 Date: Sat, 12 Feb 2022 20:54:20 +0200 Message-Id: <83zgmvrk4j.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220208221310.734f8d75@JRWUBU2> (message from Richard Wordingham on Tue, 8 Feb 2022 22:13:10 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83y22oza77.fsf@gnu.org> <20220208221310.734f8d75@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, handa@gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Tue, 8 Feb 2022 22:13:10 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org, Kenichi Handa > > On Sun, 06 Feb 2022 10:11:08 +0200 > Eli Zaretskii wrote: > > > I don't see the commands forward-char-intrusive and > > backward-char-intrusive anywhere in Emacs, so I guess they are your > > local changes, based on the code posted by Handa-san in this > > discussion? > > That's a shame; they are indeed local, sitting in my initialisation > file (.emacs). (I future-proofed myself too well.) They are well worth > adding to the general store of emacs commands, and mentioning in > documentation next to forward-char and backward-char. I've now added a similar feature to what will become Emacs 29 at some future point. The code is based on that old post by Handa-san, but I decided to change its user-facing aspects: instead of new commands, I added a new user option, which, if set non-nil, disables auto-composition at point, and thus allows point to "enter" the composed sequence. I think this is better for 2 reasons: . no need to introduce new cursor motion commands, for which it will be hard to find a convenient key binding (using C-S-f/C-S-b will conflict with the shift-selection feature, for example); . the user option affects cursor motion by any means, so it's more general thus I hope will be more convenient. From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 13 11:04:26 2022 Received: (at 20140) by debbugs.gnu.org; 13 Feb 2022 16:04:26 +0000 Received: from localhost ([127.0.0.1]:38778 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJHMQ-0000js-BM for submit@debbugs.gnu.org; Sun, 13 Feb 2022 11:04:26 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60190) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJHML-0000jb-Lc for 20140@debbugs.gnu.org; Sun, 13 Feb 2022 11:04:25 -0500 Received: from [2001:470:142:3::e] (port=37292 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJHMF-0001Qr-Mx; Sun, 13 Feb 2022 11:04:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=GK4VQDUQbczDthPh+b5Ao10AsarOn0JjGif/Lbl1TY0=; b=PQwMUUwwDJn/ 3+rrkoUlIQTzlUDiQr+OCOzjenzr5n2X+oN/Nq4wstgDO7iTeuakqYQC5zJbbcsOR1ccbOMSEykzD nTObPb2MZ7yHVFDwIUDB7U+I2JssHGyDyakOvZhFFfsNpIXd0rslqNIMGuUKRxkP0rJPWFKoRTPOg 4oq2tx2NEXjkAOQt0W+VvQCJ/xfTcXMBjycoW0bmJkzulkV4p88c5JEPUpvv3FmIoOZGaHYWIKTZs QJLIqzyyabYzwRUIQyaA7vz0fvoj92/GiN4fzYg+wLRIf5slNDWb0zftXKJ9hBuSj24XJCxQjoAy3 5Ic0h/aiEBstsF5+NYSK8A==; Received: from [87.69.77.57] (port=2182 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJHME-0002H5-Sz; Sun, 13 Feb 2022 11:04:15 -0500 Date: Sun, 13 Feb 2022 18:04:11 +0200 Message-Id: <831r06rbwk.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220205225251.08a0faab@JRWUBU2> (message from Richard Wordingham on Sat, 5 Feb 2022 22:52:51 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sat, 5 Feb 2022 22:52:51 +0000 > From: Richard Wordingham > Cc: Lars Ingebrigtsen , 20140@debbugs.gnu.org > > You're welcome to include my composition rules. Thanks. I started with your code: > (defvar tai-tham-composable-pattern > (let ((table > ;; C is letters, independent vowels, digits, punctuation and symbols. > '(("C" . "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") > ("M" . "[\u1A55-\u1A57\u1A59-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark > ("H" . "\u1A60") ; sakot > ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with sakot > ("N" . "\u1A58"))) ; mai kang lai > (basic_syllable "C\\(N*\\(M\\|HS*C\\)\\)*") > (regexp "X\\(N\\(X\\)?\\)*H?")) ; X is basic syllable > (let ((case-fold-search nil)) > (setq regexp (replace-regexp-in-string "X" basic_syllable regexp t t)) > (dolist (elt table) > (setq regexp (replace-regexp-in-string (car elt) (cdr elt) > regexp t t)))) > regexp)) > > (let ((elt (list (vector tai-tham-composable-pattern 0 'font-shape-gstring) > (vector "." 0 'font-shape-gstring) > ))) > (set-char-table-range composition-function-table '(#x1A20 . #x1AAD) elt)) But that didn't seem to work well enough: e.g., some marks in your "sample text" didn't combine with letters, as I think they should. Then I tried this simplistic setting: (set-char-table-range composition-function-table '(#x1a20 . #x1aaf) (list (vector "[\u1a20-\u1aaf]+" 0 'font-shape-gstring))) and it worked much better, including passing a small number of the tests from your renderer test page that I threw on Emacs. This is on MS-Windows with Emacs 29 and HarfBuzz 2.4.0 (which is not even the latest release of HarfBuzz), and with the A Tai Tham KH New V3 font. Any reason not to use the above simple setup for Tai Tham text composition? I needed a couple more additions to Emacs to make Tai Tham support work OOTB: for example, script-representative-chars lacked an entry for Tai Tham, and the default fontset needed an addition. (And on MS-Windows, one needs to run the w32-find-non-USB-fonts magic once, to notice the newly installed Tai Tham font.) Other than that, assuming the above setting of composition-function-table is okay, we are ready to officially add Tai Tham to scripts supported by Emacs. Btw, is there a way to get all the examples from your https://wrdingham.co.uk/lanna/renderer_test.htm as a UTF-8 encoded text file? I'd like to test the Emacs rendering with all of the examples, but copy-pasting each example separately from the browser is not my idea of useful time investment. So if you could provide the examples as a downloadable text file, I'd appreciate. From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 13 14:49:17 2022 Received: (at 20140) by debbugs.gnu.org; 13 Feb 2022 19:49:18 +0000 Received: from localhost ([127.0.0.1]:38956 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJKs1-00005t-NS for submit@debbugs.gnu.org; Sun, 13 Feb 2022 14:49:17 -0500 Received: from eggs.gnu.org ([209.51.188.92]:41910) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJKry-00005f-5Y for 20140@debbugs.gnu.org; Sun, 13 Feb 2022 14:49:16 -0500 Received: from [2001:470:142:3::e] (port=39184 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJKrr-0008Vb-Tq; Sun, 13 Feb 2022 14:49:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=6mV0nr5RbrNS7ztHPljrazLwSMhinAbmTOWZNYRnWuw=; b=CL2460ymU/N4 ihibZ6Ezx+ayVNcb8AiH+lmoY4lj00HFgmMEQPos7h7qAVdLTh6X2Dm5F2oOfRMgjEtI/4d2Wdx+D lIYeI4SVGi9QNhypKlO3ZpRfZBdr7cl8k6fcsAhosSQr5q+EVQizXL6wo+xjOReRfexZKLhvxfA8V 0jHTIk74+5eBywZjOxq5i427vbqT2vS70kAklWdGuzctffb1x2tQMTLSwKYa82mYu5QqtokxZWB9b sO0LleNw0ZKBSyWyO0FDWIi/Vhx9tAp3is+QWEWqsCuVyhmlp8a7wrnP62yb7ZBAyE3PVFXx+fOls FoLGl3DUPU9sJTBWOJFrdw==; Received: from [87.69.77.57] (port=4369 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJKrr-0000OV-Al; Sun, 13 Feb 2022 14:49:07 -0500 Date: Sun, 13 Feb 2022 21:49:04 +0200 Message-Id: <83sfsmpmxb.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220205225251.08a0faab@JRWUBU2> (message from Richard Wordingham on Sat, 5 Feb 2022 22:52:51 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sat, 5 Feb 2022 22:52:51 +0000 > From: Richard Wordingham > Cc: Lars Ingebrigtsen , 20140@debbugs.gnu.org > > Sad to see that Khaled Hosny's suggestion not to use composition rules > seems not to have been taken. Btw, the _only_ reason Handa-san and now myself were able to implement something like the forward/backward-char-intrusive commands is that we DO control which parts of text are composed and which aren't. If we were to follow HarfBuzz developers' advice, and were to hand all the text to HarfBuzz for shaping, we would need the HarfBuzz cooperation to implement such features in the editor. From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 13 15:53:23 2022 Received: (at 20140) by debbugs.gnu.org; 13 Feb 2022 20:53:23 +0000 Received: from localhost ([127.0.0.1]:39006 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJLs2-0001hl-Ku for submit@debbugs.gnu.org; Sun, 13 Feb 2022 15:53:23 -0500 Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:58072) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJLry-0001hV-N0 for 20140@debbugs.gnu.org; Sun, 13 Feb 2022 15:53:21 -0500 Received: from [212.54.57.109] (helo=csmtp5.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJLrs-00052j-DR for 20140@debbugs.gnu.org; Sun, 13 Feb 2022 21:53:12 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id JLrrntkD2XlqZJLrrnf8Ib; Sun, 13 Feb 2022 21:53:12 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=ZcUOi+ZA c=1 sm=1 tr=0 ts=62096fb8 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=AZnJWaAPAAAA:8 a=xLwohXldv0jksHtPaH0A:9 a=CjuIK1q_8ugA:10 a=wgAateuqcuIA:10 a=qskxWB65Wv0A:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 a=T2rBzvJ0ivks0o3LBaDr:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644785592; bh=8rgsLsBbzmmYa5s4gSfpj9HE9QYHdPtvxmVRXT0CGlY=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=53p50cqZXPTr35ZCMaOL3gocFt20ksUbLScp+s792xT7TPDmkVP60nPASS1sRFd3u TVVr7MsduBo2gL373AQazVd0P3z/0xTgM6dTYqPSzzOaocAid5uVXSgC9hT3Lym957 s+L7CO4UCYlHTEKj8h1wMlQ9ayidls+XnYCL+qZqr9Z3t+TsWmvvYN40QjdjqxmBCe uLq9NUD/8ptOoobyYm0/yo4Y+XXmao5lmWqp+x+zoUFYFtnhAhHZ2hFz/ZtkR81TLL pAcGQYuPl4l+8bikmcKszi34MOILx6YDjhAzfnYPCm7QKX1FryM78SVQKE+DUA6uG1 IvBucyr7TB8WA== Date: Sun, 13 Feb 2022 20:53:10 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220213205310.0b8a715c@JRWUBU2> In-Reply-To: <831r06rbwk.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <831r06rbwk.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4xfMMkZxf+L1QDGJKggW8Yhlg3IvS9mpcNe0+2199gvrlEW1v0+LW55QpVAizlx9fa+UQYSmFPY5tkHMJK/r7xrmlfjx6FA57l0/jhvzXFcz9iJ0Haw5yJ DWbNe7sAZFif2+3li9ZaMR09+f0MhAgsYHc7Yd1L2dxizQBimjTs1rq6XKVRQHCvpxscrbzIZZRoY90murqDoIlXSD+1PMOo8fj++jUkWuEYhPLTuHonbDsv WGXPepDp3jP0XTacysgE6A== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sun, 13 Feb 2022 18:04:11 +0200 Eli Zaretskii wrote: > > Date: Sat, 5 Feb 2022 22:52:51 +0000 > > From: Richard Wordingham > > Cc: Lars Ingebrigtsen , 20140@debbugs.gnu.org > > > > You're welcome to include my composition rules. > > Thanks. I started with your code: > > > (defvar tai-tham-composable-pattern > > (let ((table > > ;; C is letters, independent vowels, digits, punctuation > > and symbols. '(("C" . > > "[\u1A20-\u1A54\u1A80-\u1A89\u1A90-\u1A99\u1AA0-\u1AAD]") ("M" . > > "[\u1A55-\u1A57\u1A59-\u1A5E\u1A61-\u1A7C\u1A7F]"); Mark ("H" . > > "\u1A60") ; sakot ("S" . "[\u1A75-\u1A7C]") ; Marks commuting with > > sakot ("N" . "\u1A58"))) ; mai kang lai > > (basic_syllable "C\\(N*\\(M\\|HS*C\\)\\)*") > > (regexp "X\\(N\\(X\\)?\\)*H?")) ; X is basic syllable > > (let ((case-fold-search nil)) > > (setq regexp (replace-regexp-in-string "X" basic_syllable > > regexp t t)) (dolist (elt table) > > (setq regexp (replace-regexp-in-string (car elt) (cdr elt) > > regexp t t)))) > > regexp)) > > > > (let ((elt (list (vector tai-tham-composable-pattern 0 > > 'font-shape-gstring) (vector "." 0 'font-shape-gstring) > > ))) > > (set-char-table-range composition-function-table '(#x1A20 . > > #x1AAD) elt)) > > But that didn't seem to work well enough: e.g., some marks in your > "sample text" didn't combine with letters, as I think they should. Which ones? Are you sure they didn't combine at the Emacs level? I did suspect the problem was writing '\u1A7C' instead of '\u1a7c', but I'm no longer so sure. (The 'C' might get expanded, but I'm beginning to think not.) > Then I tried this simplistic setting: > > (set-char-table-range composition-function-table > '(#x1a20 . #x1aaf) > (list (vector "[\u1a20-\u1aaf]+" 0 > 'font-shape-gstring))) > > and it worked much better, including passing a small number of the > tests from your renderer test page that I threw on Emacs. This is on > MS-Windows with Emacs 29 and HarfBuzz 2.4.0 (which is not even the > latest release of HarfBuzz), and with the A Tai Tham KH New V3 font. > Any reason not to use the above simple setup for Tai Tham text > composition? Mostly only that you would have to edit the text with "autocomposition at point disabled" or mark word boundaries, e.g. with U+200B ZERO WIDTH SPACE. The Tai languages that use Tai Tham use scriptio continua. While modern Pali does separate words with visible white space, its words tend to be polysyllabic; with discerning composition, it would be about as tolerable as editing Hindi in Devanagari with autocomposition enabled. (Quite a few people edit Devanagari in transliteration to Latin!) You should also add CGJ and ZWNJ, and some people may appreciate ZWJ - the Khottabun font has ligatures involving ZWJ, though it may just be an experimental feature - and ultimately WJ, for when someone writes a Tai Tham word breaker. Oh, and Thai and Lao mai t(r)i and mai chat(t)awa and U+0324 COMBINING DIAERESIS BELOW turn up occasionally - U+0324 is supported in Thep's Khottabun font, and my Da Lekh series supports Thai mai tri and mai chattawa. These characters seem to work with HarfBuzz. If using the native Windows renderer is an option with Emacs, then 'A Tai Tham KH New' works better than 'A Tai Tham KH New V3'. I've created https://wrdingham.co.uk/lanna/font_test.htm to do _font_ comparisons. I'd delayed because I've only recently satisfied myself that it is lawful, at least under English law. (The qualms were with the samples taken from books.) It's still very much a work in progress. > I needed a couple more additions to Emacs to make Tai Tham support > work OOTB: for example, script-representative-chars lacked an entry > for Tai Tham, and the default fontset needed an addition. (And on > MS-Windows, one needs to run the w32-find-non-USB-fonts magic once, to > notice the newly installed Tai Tham font.) > Other than that, assuming the above setting of > composition-function-table is okay, we are ready to officially add Tai > Tham to scripts supported by Emacs. > Btw, is there a way to get all the examples from your > https://wrdingham.co.uk/lanna/renderer_test.htm as a UTF-8 encoded > text file? I'd like to test the Emacs rendering with all of the > examples, but copy-pasting each example separately from the browser is > not my idea of useful time investment. So if you could provide the > examples as a downloadable text file, I'd appreciate. As buried (you're not the only one to have overlooked it) in the penultimate paragraph of 'Content and Layout' section, "The test words may, in principle, be extracted quite simply from this web page. Each test 'word' is the content of the first cell in each row whose class is tst1. For convenience*, I have extracted the first two cells in such rows, along with titles, to a CSV file." The file is rt.csv in the same directory. I included the meaning and pronunciation as those who don't know the script may find it easier to refer to the words by translation or transcription. You may prefer to use the file more or less as it is, but one can easily knock up an Emacs macro sequence to delete the first comma and the rest of the line. I left the section titles in for easier navigation to the renderer test file. *Some people claim to find XML files easy to use, they should then be able to analyse a file conforming to HTML4 syntax. Dodgy spellings go in pink rows whose class is 'tst2'. The alternative encodings demanded by the USE go in orange rows whose class is 'tst3'. I have not extracted these. Richard. From debbugs-submit-bounces@debbugs.gnu.org Sun Feb 13 16:12:03 2022 Received: (at 20140) by debbugs.gnu.org; 13 Feb 2022 21:12:03 +0000 Received: from localhost ([127.0.0.1]:39019 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJMA7-00028w-36 for submit@debbugs.gnu.org; Sun, 13 Feb 2022 16:12:03 -0500 Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:43954) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJMA4-00028Q-32 for 20140@debbugs.gnu.org; Sun, 13 Feb 2022 16:12:02 -0500 Received: from [212.54.57.112] (helo=csmtp8.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJM9x-0001TA-Uf for 20140@debbugs.gnu.org; Sun, 13 Feb 2022 22:11:53 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id JM9xn0zX8gJWQJM9xnx4FO; Sun, 13 Feb 2022 22:11:53 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=FuEWQknq c=1 sm=1 tr=0 ts=62097419 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=Newf1MD0gQDv2GIk31sA:9 a=CjuIK1q_8ugA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644786713; bh=lCt54f8LeGalCZnBHvkBJuN4mS0KamwLhiy+VBo24Q8=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=Vp60P5ik2u5044CdLoO+x01Z8bbR+qc4ZolXzrfgsj7aJrqM5OjEvfweA82dRyh13 Om/FbD2I4IkIATr9gx9r0VAajUgTE8pBE0PPUKOF0nu6J+MmX9uSbpO+LFxpAAXGXC y5yBkQaIF6Qh4vv9DkJjGdjXJUlhnBsuoxzS9XNDlcMo7O0kN45zK9HjR275Wezt6F mtWvhrl6llK7L8ym/qG1+BXzmUgX59L7vwq5hMvXftvseWDII7q17t6pWf/HepzMYB yTQpwVtw0x+tk+XQUqkdeT+9SYzn0zlR4HCmqeOiz9p5cwHMjKuTr335OZFs69sorm EroBr3nCocUUg== Date: Sun, 13 Feb 2022 21:11:52 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220213211152.03e2990a@JRWUBU2> In-Reply-To: <83sfsmpmxb.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4xfBtyEVtWvmM6z8lLrCjo36SULrVxB90yUS8eMKa0hHuwbtVf3drxAOgl379qTcQUt5ng74dS0SqneuuRikTL81RBbdhwGfeFrP8fN5T3cd15z7nvSeEd omolTchML6Pxv33hj7Hby3/EYxFgv/3h6zLjtY1CHv0eZagGWlyzY55lKatGDOM9eD50lgnpll7qJNwAOWirdG+5/SnwqCLAnr9z/3Hv0xmJIWcyVXr9xSbY hGDrc/pQ2PGAwFwg99Rq6w== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Sun, 13 Feb 2022 21:49:04 +0200 Eli Zaretskii wrote: > > Date: Sat, 5 Feb 2022 22:52:51 +0000 > > From: Richard Wordingham > > Cc: Lars Ingebrigtsen , 20140@debbugs.gnu.org > > > > Sad to see that Khaled Hosny's suggestion not to use composition > > rules seems not to have been taken. > > Btw, the _only_ reason Handa-san and now myself were able to implement > something like the forward/backward-char-intrusive commands is that we > DO control which parts of text are composed and which aren't. If we > were to follow HarfBuzz developers' advice, and were to hand all the > text to HarfBuzz for shaping, we would need the HarfBuzz cooperation > to implement such features in the editor. You mean the more sophisticated mechanisms which position the cursor intelligently. Those two commands you named work by completely ignoring the composition mechanism. Correct me if I am wrong, but for Arabic, is not Emacs restricted to typewriter-like fonts? There would be a similar problem with the use of Tai Khuen or other tunnelling fonts for Northern Thai if you used the current mechanism for advancing character by character. Tunnelling fonts write parts of one cluster under the next. The Tai Khuen fonts I've seen do this by relying on characteristics of Tai Khuen spelling. The rules don't hold for Northern Thai, and consequently the subscript portions of successive orthographic syllables can overwrite one another. A sophisticated font could check for clashes, but that needs the orthographic syllables to be passed to the shaper together. Richard. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 14 08:19:45 2022 Received: (at 20140) by debbugs.gnu.org; 14 Feb 2022 13:19:45 +0000 Received: from localhost ([127.0.0.1]:40217 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJbGb-0007Jo-1Z for submit@debbugs.gnu.org; Mon, 14 Feb 2022 08:19:45 -0500 Received: from eggs.gnu.org ([209.51.188.92]:52998) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJbGZ-0007Jb-KT for 20140@debbugs.gnu.org; Mon, 14 Feb 2022 08:19:44 -0500 Received: from [2001:470:142:3::e] (port=58248 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJbGT-0003Go-Et; Mon, 14 Feb 2022 08:19:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=K5QcbmxpFEpBGQQYPpBdO7uThM6AwgdR3sW6FSGuUfI=; b=awZX3EymjqsH RiBdpyiZeL1uO3+s8tqpRylnfoVAv66EBZOsk2saYe4ImoGCXg8FSwmVs1HusGC1l2GEg53a2YjfT WS4q0Evwol/Rlwo2hMLVDwM2MPwQVdcDbSpF7FmC1dg09z926on/7VTsv8bF+NG9LwOuNWOXq+DNg adwv0M95tJO9deJejSgxNLjXmedVDRwBq/1I24ACCwI7AsSdl2+5wy2EUByyOHvv6TDEyBamp71gC AIUyFpX2ltbNwDLWFQfUWhFqFAdNFoVuRLE6nHhK3MSVywmk9ftczDAeNER6Bj50M6A8YGT7XSR7M vnpH35M0Ce5bmF8Vk/MMZA==; Received: from [87.69.77.57] (port=4706 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJbGS-0007DT-UI; Mon, 14 Feb 2022 08:19:37 -0500 Date: Mon, 14 Feb 2022 15:19:36 +0200 Message-Id: <83mtitpouv.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220213205310.0b8a715c@JRWUBU2> (message from Richard Wordingham on Sun, 13 Feb 2022 20:53:10 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <831r06rbwk.fsf@gnu.org> <20220213205310.0b8a715c@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sun, 13 Feb 2022 20:53:10 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > On Sun, 13 Feb 2022 18:04:11 +0200 > Eli Zaretskii wrote: > > > But that didn't seem to work well enough: e.g., some marks in your > > "sample text" didn't combine with letters, as I think they should. > > Which ones? Sorry, that was my faulty testing: I tested a half-baked change. Your rules do work correctly, AFAICT. But I have 2 questions: 1) Why do we need this part of the composition rules: (vector "." 0 'font-shape-gstring) This matches just one character, so what do we want to accomplish by this rule? A single character cannot "self-compose", can it? 2) Since tai-tham-composable-pattern always starts with what you denote as "C", how about setting up only entries of composition-function-table that correspond to those characters, i.e.: (let ((elt (list (vector tai-tham-composable-pattern 0 'font-shape-gstring) ))) (set-char-table-range composition-function-table '(#x1A20 . #x1A54) elt) (set-char-table-range composition-function-table '(#x1A80 . #x1A89) elt) (set-char-table-range composition-function-table '(#x1A90 . #x1A99) elt) (set-char-table-range composition-function-table '(#x1AA0 . #x1AAD) elt)) Do you see any problems with that? > I did suspect the problem was writing '\u1A7C' instead of > '\u1a7c', but I'm no longer so sure. No, that's not a problem. > You should also add CGJ and ZWNJ, and some people may appreciate ZWJ - > the Khottabun font has ligatures involving ZWJ, though it may just be > an experimental feature - and ultimately WJ, for when someone writes a > Tai Tham word breaker. How should I add CGJ and ZWNJ? What are the rules? > Oh, and Thai and Lao mai t(r)i and mai chat(t)awa and U+0324 > COMBINING DIAERESIS BELOW turn up occasionally - U+0324 is supported > in Thep's Khottabun font, and my Da Lekh series supports Thai mai > tri and mai chattawa. These characters seem to work with HarfBuzz. Not sure I understand: what patterns/rules should be added for these? > If using the native Windows renderer is an option with Emacs, then 'A > Tai Tham KH New' works better than 'A Tai Tham KH New V3'. We still support Uniscribe, but prefer HarfBuzz, because MS deprecated Uniscribe. We cannot support DirectWrite, because its APIs are C++-only, and no one has shown whether and how to call them from C. > > Btw, is there a way to get all the examples from your > > https://wrdingham.co.uk/lanna/renderer_test.htm as a UTF-8 encoded > > text file? I'd like to test the Emacs rendering with all of the > > examples, but copy-pasting each example separately from the browser is > > not my idea of useful time investment. So if you could provide the > > examples as a downloadable text file, I'd appreciate. > > As buried (you're not the only one to have overlooked it) in the > penultimate paragraph of 'Content and Layout' section, "The test words > may, in principle, be extracted quite simply from this web page. Each > test 'word' is the content of the first cell in each row whose class is > tst1. For convenience*, I have extracted the first two cells in such > rows, along with titles, to a CSV file." The file is rt.csv in the > same directory. Thanks, I will use that. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 14 08:26:17 2022 Received: (at 20140) by debbugs.gnu.org; 14 Feb 2022 13:26:17 +0000 Received: from localhost ([127.0.0.1]:40231 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJbMv-0007VR-Ev for submit@debbugs.gnu.org; Mon, 14 Feb 2022 08:26:17 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54138) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJbMt-0007VC-7t for 20140@debbugs.gnu.org; Mon, 14 Feb 2022 08:26:15 -0500 Received: from [2001:470:142:3::e] (port=58410 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJbMn-0004QF-H1; Mon, 14 Feb 2022 08:26:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=ZHMuJVrd8b57PWwT5hmCzBBzyMKttXYAaJn0WQjzdrk=; b=kaRRvCwr47MR pjtkzFEnhL93uTjuBZ2NLO+VcbLhNO2WogK2u94WOXX/yQNwBYxxut5kuS6Qq9yKY7owKilgY9fWI nu50iFR+uzXNCUjNMH/I3hM5E3TJl9uOvniockJ65nnQEe63CutxG4+mZBhm+xTVJVNldw85g1XQ8 Mvb79XOIZCCLUGWFE5+D6G9Bg4GyJNZDrShaTxdUJaAydNxaI6/6pB0CmxrbIvkzE6octNSFwb/1e xEvwl55wa0rxA2zb9TqPkLoYl9XqM3Ei9m5mh1qljIo8wW09ztRugzzSsHZX21xsuhs6+zmXALcqf HWFy+mVyk7qyofcTZM41Mg==; Received: from [87.69.77.57] (port=1135 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJbMn-0007q4-11; Mon, 14 Feb 2022 08:26:09 -0500 Date: Mon, 14 Feb 2022 15:26:07 +0200 Message-Id: <83leydpok0.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220213211152.03e2990a@JRWUBU2> (message from Richard Wordingham on Sun, 13 Feb 2022 21:11:52 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> <20220213211152.03e2990a@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sun, 13 Feb 2022 21:11:52 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > Btw, the _only_ reason Handa-san and now myself were able to implement > > something like the forward/backward-char-intrusive commands is that we > > DO control which parts of text are composed and which aren't. If we > > were to follow HarfBuzz developers' advice, and were to hand all the > > text to HarfBuzz for shaping, we would need the HarfBuzz cooperation > > to implement such features in the editor. > > You mean the more sophisticated mechanisms which position the cursor > intelligently. Those two commands you named work by completely > ignoring the composition mechanism. Yes. And the reason we can ignore compositions in certain portions of the text is that we have control on what is passed to HarfBuzz. > Correct me if I am wrong, but for Arabic, is not Emacs restricted to > typewriter-like fonts? No, that's not true. I'm not aware of any such limitation; AFAIK Arabic shaping works correctly in Emacs, certainly with HarfBuzz and Emacs 27 or later. Or maybe I misunderstand what you mean by "typewriter-like" fonts? Can you give an example of a non-typewriter-like font for Arabic that I can find on MS-Windows and try? > There would be a similar problem with the use of Tai Khuen or other > tunnelling fonts for Northern Thai if you used the current mechanism > for advancing character by character. Tunnelling fonts write parts of > one cluster under the next. The Tai Khuen fonts I've seen do this by > relying on characteristics of Tai Khuen spelling. The rules don't hold > for Northern Thai, and consequently the subscript portions of > successive orthographic syllables can overwrite one another. A > sophisticated font could check for clashes, but that needs the > orthographic syllables to be passed to the shaper together. I'm not sure I understand. Does HarfBuzz know about these advancement features? We rely on HarfBuzz to give us back as many grapheme clusters as it sees fit for a given chunk of text, and we expect each grapheme cluster to include glyphs with relative offsets as needed by the script and the font. IOW, this job is delegated to the shaping engine, such as HarfBuzz; Emacs just takes the glyphs and offsets HarfBuzz gives us and blindly obeys them. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 14 17:14:37 2022 Received: (at 20140) by debbugs.gnu.org; 14 Feb 2022 22:14:37 +0000 Received: from localhost ([127.0.0.1]:42865 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJjcD-00089w-5y for submit@debbugs.gnu.org; Mon, 14 Feb 2022 17:14:37 -0500 Received: from smtpq1.tb.ukmail.iss.as9143.net ([212.54.57.96]:39594) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJjcB-00089i-7H for 20140@debbugs.gnu.org; Mon, 14 Feb 2022 17:14:36 -0500 Received: from [212.54.57.107] (helo=csmtp3.tb.ukmail.iss.as9143.net) by smtpq1.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJjc4-0005L9-SC for 20140@debbugs.gnu.org; Mon, 14 Feb 2022 23:14:28 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id Jjc4nztX59Za5Jjc4nUjaY; Mon, 14 Feb 2022 23:14:28 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=NtrCzuRJ c=1 sm=1 tr=0 ts=620ad444 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=ln1XYXOOYlff9YqzoKcA:9 a=CjuIK1q_8ugA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644876868; bh=+xoCUak5YvNq5E7e7bXChPvZTHxvGTwFCtXDdnaM56E=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=1mDZB1IX2IfNoXFxmFr+GEC5bSwZ5eE1JgKw2e2k+AKbov1/N0T+70hhSbzcBS6Y3 QfX+wFCfSGjRpH1KpRylwpD9uo73MhFbjqJ4XHsWDSeWBL1YDtDsP2ms+ryyUyRfhE bQC+mQuuIX6AxSykYt+avXPqeadTm1yTx1kyUSn/OX8ZN/E10wS1/DXUj0joD/VVuZ 57jXW/QTmGDKj7usq/C+SssrxPHVY3CxuCq5R0hmnxBFA8djCDENdPjyH2ENUPvtxi zFAYaROuD8wr6uL2iYs632lh16rHajDCoo1trmoM+ZFvcAr8l04Gpymz8ugIWkRalv NIcTwN3ECSP5Q== Date: Mon, 14 Feb 2022 22:14:27 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220214221427.35231794@JRWUBU2> In-Reply-To: <83mtitpouv.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <831r06rbwk.fsf@gnu.org> <20220213205310.0b8a715c@JRWUBU2> <83mtitpouv.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4xfLP0evIqEZ4Z0+lhgs8AMxBH0BVR7rBeBZ3aQ9xfO4rflDjVNm+YDlJlMA+Tzvl/X0Laggswsg59zBox64L+aO9SsSZaUFtu3MUsTe84RGiBJMG/rLKh 1RZTpZXojBX5QbtIQ5qqupkEpM0UYSkafWZ79a0ddsD1/jm4nvCXEiRSYpinfJbZKEar0sruqDM6m6oOZ5UlEtaZaWn78uFSWz4bXAhjefD0TdRFp2Y4Pk1+ lZPwdkqQts+pglY3L2XgPg== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, 14 Feb 2022 15:19:36 +0200 Eli Zaretskii wrote: > > Date: Sun, 13 Feb 2022 20:53:10 +0000 > > From: Richard Wordingham > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > > On Sun, 13 Feb 2022 18:04:11 +0200 > > Eli Zaretskii wrote: > > > > > But that didn't seem to work well enough: e.g., some marks in your > > > "sample text" didn't combine with letters, as I think they > > > should. > > > > Which ones? > > Sorry, that was my faulty testing: I tested a half-baked change. Your > rules do work correctly, AFAICT. > > But I have 2 questions: > > 1) Why do we need this part of the composition rules: > > (vector "." 0 'font-shape-gstring) > > This matches just one character, so what do we want to accomplish > by this rule? A single character cannot "self-compose", can it? No, but in general it may need shaping, e.g. to take advantage of the locl feature. If that's not needed for shaping to happen, then dispense with it - unless it was need for general consistency. > 2) Since tai-tham-composable-pattern always starts with what you > denote as "C", how about setting up only entries of > composition-function-table that correspond to those characters, > i.e.: > > (let ((elt (list (vector tai-tham-composable-pattern 0 > 'font-shape-gstring) ))) > (set-char-table-range composition-function-table '(#x1A20 . > #x1A54) elt) (set-char-table-range composition-function-table > '(#x1A80 . #x1A89) elt) (set-char-table-range > composition-function-table '(#x1A90 . #x1A99) elt) > (set-char-table-range composition-function-table '(#x1AA0 . #x1AAD) > elt)) > > Do you see any problems with that? It may affect the rendering of isolated marks, particularly the preposed ones like U+1A55 TAI THAM CONSONANT SIGN MEDIAL RA and U+1A6E TAI THAM VOWEL SIGN E. I'll have to investigate HarfBuzz-using Emacs. I can't think of any other possible problems. My first thought is that it is unnecessarily complicated, and sets up work for when (if?) TAI THAM LAO LOW HA gets added. > > You should also add CGJ and ZWNJ, and some people may appreciate > > ZWJ - the Khottabun font has ligatures involving ZWJ, though it may > > just be an experimental feature - and ultimately WJ, for when > > someone writes a Tai Tham word breaker. > > How should I add CGJ and ZWNJ? What are the rules? > > > Oh, and Thai and Lao mai t(r)i and mai chat(t)awa and U+0324 > > COMBINING DIAERESIS BELOW turn up occasionally - U+0324 is supported > > in Thep's Khottabun font, and my Da Lekh series supports Thai mai > > tri and mai chattawa. These characters seem to work with HarfBuzz. > > Not sure I understand: what patterns/rules should be added for these? Add them all to "M" in the definition of tai-tham-composable-pattern. Strictly, U+0324 should also be added to "S", but I'd be surprised to see it in a genuine spelling. Richard. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 14 18:26:35 2022 Received: (at 20140) by debbugs.gnu.org; 14 Feb 2022 23:26:35 +0000 Received: from localhost ([127.0.0.1]:42983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJkjr-0001Yb-5U for submit@debbugs.gnu.org; Mon, 14 Feb 2022 18:26:35 -0500 Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:60658) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJkjm-0001YM-CE for 20140@debbugs.gnu.org; Mon, 14 Feb 2022 18:26:34 -0500 Received: from [212.54.57.106] (helo=csmtp2.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJkjg-0007GB-Ig for 20140@debbugs.gnu.org; Tue, 15 Feb 2022 00:26:24 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id Jkjfni04PYDyuJkjgn95kN; Tue, 15 Feb 2022 00:26:24 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=eu3Mc6lX c=1 sm=1 tr=0 ts=620ae520 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=IkcTkHD0fZMA:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=AZnJWaAPAAAA:8 a=57gHaMvYwbckcmK-Q84A:9 a=QEXdDO2ut3YA:10 a=qskxWB65Wv0A:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 a=T2rBzvJ0ivks0o3LBaDr:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644881184; bh=ApM1PkxI6nwM3YKegoRHLR7Y06HrhFNarY7Y4NSKZ3E=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=rKxdMisF0tYT/5BfESJVPaT/hX6Xsq/hsTHtBdZ5AlkbPdIsWh/5viC+a8ac+y3q4 jcD9jkpm7v+b75sB6voxAG11Vm9ARTLBZVHUrysGbhVjjQTkozvFxz8Br2ECTUCXIs CLdK7DcWWZaUBYwY3ut6KOC8EaV4zipA1XtULSMj3HB7nA1nYWlqher5VQNWej6Wpz oGAP04kJdGS12VRo4taI75wmXEvGs0tumN8EXPsACvjtUxMhns/IP6fAf8sVwRPQwc otd4JbBwXvnprSZqMoAVqKq2/TLUEWU3sOkps4x8hs7I2eU/mlX8iNmjIh2CoLXLRR ZmRiL2uGugSzA== Date: Mon, 14 Feb 2022 23:26:23 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220214232623.30534d5a@JRWUBU2> In-Reply-To: <83leydpok0.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> <20220213211152.03e2990a@JRWUBU2> <83leydpok0.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfHntkY6W/2U5fGmfWF+R3n3ZwHqzy4qntOMj5OH3kgV46f9qJxDp1ndLIqHC4lzpxO5ZXCI8l8EUob3aQEL+xK+WC0ensrYxPNrUhoCjf2ge2N4pNUu1 XqZ9Gr1C/TEJ/T76hOQwnzq8bcnxRrkNUGwdHvlbrnKOUrYTywIN43SAW8KKFjJGzIWiF1DStIwLPAChvd0m6QNf2B6RpPEML9W98QGN3S5dOJzji5/xZ0B4 Jjmx7GN46o963HAq057BRw== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, 14 Feb 2022 15:26:07 +0200 Eli Zaretskii wrote: > > Date: Sun, 13 Feb 2022 21:11:52 +0000 > > From: Richard Wordingham > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > No, that's not true. I'm not aware of any such limitation; AFAIK > Arabic shaping works correctly in Emacs, certainly with HarfBuzz and > Emacs 27 or later. >=20 > Or maybe I misunderstand what you mean by "typewriter-like" fonts? > Can you give an example of a non-typewriter-like font for Arabic that > I can find on MS-Windows and try? Not off the top of my head, but compare =D9=84=D8=AD=D8=AC with the present= ation form =E2=80=8E=EF=B3=8A U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FORM for the= first two letters. The lam part is a vertical line in the middle of the glyph; the 'hah' part forms the lower part of the glyph. > > There would be a similar problem with the use of Tai Khuen or other > > tunnelling fonts for Northern Thai if you used the current mechanism > > for advancing character by character. Tunnelling fonts write parts > > of one cluster under the next. The Tai Khuen fonts I've seen do > > this by relying on characteristics of Tai Khuen spelling. The > > rules don't hold for Northern Thai, and consequently the subscript > > portions of successive orthographic syllables can overwrite one > > another. A sophisticated font could check for clashes, but that > > needs the orthographic syllables to be passed to the shaper > > together. =20 >=20 > I'm not sure I understand. Does HarfBuzz know about these advancement > features? We rely on HarfBuzz to give us back as many grapheme > clusters as it sees fit for a given chunk of text, and we expect each > grapheme cluster to include glyphs with relative offsets as needed by > the script and the font. No, the fonts rely on the grammar of Tai Khuen. If an orthographic syllable contains U+1A6C TAI THAM VOWEL SIGN OA BELOW, there will be a following orthographic syllable in the same phonetic syllable, and it will consist of a single consonant with no tail and possible some marks above. The font designers therefore do not worry about the effect on the advance width; there will be room for U+1A6C below the next orthographic syllable. If you want to see details now, enter =E1=A9=89=E1=A9=A0=E1=A8=BE=E1=A9=AC=E1=A9=81 =E1=A9=89=E1=A9=A0=E1=A8=BE= =E1=A9=B3=E1=A8=B6=E1=A9=A5=E1=A9=A0=E1=A8=AF =E1=A9=89=E1=A9=A0=E1=A8=BE= =E1=A9=AC=E1=A9=B4=E1=A8=B6=E1=A9=A5=E1=A9=A0=E1=A8=AF in the 'Play Area' t= ext box of https://wrdingham.co.uk/lanna/renderer_test.htm. The first word is spelt the same in Northern Thai and Tai Khuen. As you switch the font from Lamphun to A Tai Tham KH (with ccmp enabled if you are using IE 11), the glyphs at the bottom of the word spread out to use the available space. The next two words are 'Dr Nit' written in Tai Khuen and Northern Thai. The word for 'Dr', /m=C9=94=CB=90/, is spelt quite differently in the two languages, though the consonants are the same. Both have a vowel above, but the Northern Thai also has U+1A6C below, as in the first word. When A Tai Tham KH is selected as the font, it clashes badly with the bottom of the second syllable, 'Nit'.=20 This phenomenon of a vowel below expanding below the next consonant also occurs in Northern Thai, but I don't know of any Northern Thai font that is clever enough to do this, because checking for space below the next consonant is fiddly. > IOW, this job is delegated to the shaping engine, such as HarfBuzz; > Emacs just takes the glyphs and offsets HarfBuzz gives us and blindly > obeys them. The problem is that font writers tend to make assumptions about the language their font will be used for. The second is that with a good tunnelling font, HarfBuzz needs to know what comes in the next syllable. At present, using a tunnelling font for Tai Tham risks clashes when used with Emacs. The Tai Khuen fonts look good, but are not suitable for writing Northern Thai. Richard. From debbugs-submit-bounces@debbugs.gnu.org Mon Feb 14 20:27:45 2022 Received: (at 20140) by debbugs.gnu.org; 15 Feb 2022 01:27:45 +0000 Received: from localhost ([127.0.0.1]:43101 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJmd7-0006rU-AK for submit@debbugs.gnu.org; Mon, 14 Feb 2022 20:27:45 -0500 Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:36210) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJmd3-0006rE-6j for 20140@debbugs.gnu.org; Mon, 14 Feb 2022 20:27:44 -0500 Received: from [212.54.57.107] (helo=csmtp3.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJmcx-0003n2-C2 for 20140@debbugs.gnu.org; Tue, 15 Feb 2022 02:27:35 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id Jmcwn0aBf9Za5JmcwnUnF5; Tue, 15 Feb 2022 02:27:35 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=NtrCzuRJ c=1 sm=1 tr=0 ts=620b0187 cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=kj9zAlcOel0A:10 a=oGFeUVbbRNcA:10 a=NLZqzBF-AAAA:8 a=mDV3o1hIAAAA:8 a=OocQHUDgAAAA:8 a=yY7Kid_uAAAA:8 a=TDS1L-e2GLQXtAOEJ24A:9 a=CjuIK1q_8ugA:10 a=wW_WBVUImv98JQXhvVPZ:22 a=_FVE-zBwftR9WsbkzFJk:22 a=xUZTl98r3Qw_uB5NK3jt:22 a=xqWMk3Q7GecoHDXgyAAS:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644888455; bh=XqbbF7fhFT4MBkTqr4TdkchutaudBHJONgWnowqdvok=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=HuTrSbEvG2QVM+M/ZzK42nfYfFA/JJ0VPeaNrELuop2baDOQnlxt9QkRbTJBDbsPN HH3uube8Ic9vX++y+fcME6JdLHN93uFQVZ8LgFSMsjKHAOVZUP41X7kFZ4nWwTnrYp q+nMp197u1YrooGwzicoqzAPOn+SvDJ8uUL2RdNt7T1ySUzJP/YCzjJTcbnh4A45j/ 9Uosx6dKX9//qgUk5bK5uT9t+XTSdt4lQ3ewPcslDjqO8/U1tvMyN+aEyNlMws1A3K BaXiXEdNoeGjEgnLkwIUcYsEjtFNRWO41qilB+XgOg1S+Xwa6gsurcj8GoJmsd2KVY 35J53JzdXStQw== Date: Tue, 15 Feb 2022 01:27:34 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220215012734.41fb4aaf@JRWUBU2> In-Reply-To: <20220214221427.35231794@JRWUBU2> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <831r06rbwk.fsf@gnu.org> <20220213205310.0b8a715c@JRWUBU2> <83mtitpouv.fsf@gnu.org> <20220214221427.35231794@JRWUBU2> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4xfO21pHFa2Y3LZhriZFcP+vmkdsOEuAELsWD8xGyjmaKVBRJjcvpmmMiromiYdcBujgDikRw9UfdkwI7WBBTJMaG5YvoI6MD28L8U6TjiDW1ji+IdI2zW /3UT9etZAkC2iL0oN5maNCPz69QZ6d7IsUeJDyOt9I2vvcDoTN3ptA8Lw8Gx8IbwblmJazJFYXTSW1GiWlVQ8vi5u8hlTLptVZ6di+UqAlvUYWOmTdOneQtW CtTb76jJ7AReUoi9Gcejzw== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Mon, 14 Feb 2022 22:14:27 +0000 Richard Wordingham wrote: > On Mon, 14 Feb 2022 15:19:36 +0200 > Eli Zaretskii wrote: > > > > Date: Sun, 13 Feb 2022 20:53:10 +0000 > > > From: Richard Wordingham > > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > You should also add CGJ and ZWNJ, and some people may appreciate > > > ZWJ - the Khottabun font has ligatures involving ZWJ, though it > > > may just be an experimental feature - and ultimately WJ, for when > > > someone writes a Tai Tham word breaker. > > > > How should I add CGJ and ZWNJ? What are the rules? > > > > > Oh, and Thai and Lao mai t(r)i and mai chat(t)awa and U+0324 > > > COMBINING DIAERESIS BELOW turn up occasionally - U+0324 is > > > supported in Thep's Khottabun font, and my Da Lekh series > > > supports Thai mai tri and mai chattawa. These characters seem to > > > work with HarfBuzz. > > > > Not sure I understand: what patterns/rules should be added for > > these? > > Add them all to "M" in the definition of tai-tham-composable-pattern. > Strictly, U+0324 should also be added to "S", but I'd be surprised to > see it in a genuine spelling. In view of Wyn Owen's report (A Description and Linguistic Analysis of the Tai Khuen Writing System, JSEALS 10.1 (2017) https://evols.library.manoa.hawaii.edu/bitstream/10524/52403/1/09_Owen2017description.pdf) on Tai Khuen spelling, one should also add U+0E49 THAI CHARACTER MAI THO to "M". And, of course, as all 5 non-Tai Tham tone marks used with the Tai Tham script have canonical combining class greater than 9, they should be added to "S" - i.e. add U+0E49 to U+0E4B and U+0EC9 and U+0ECB to "S". Richard. From debbugs-submit-bounces@debbugs.gnu.org Tue Feb 15 09:40:19 2022 Received: (at 20140) by debbugs.gnu.org; 15 Feb 2022 14:40:19 +0000 Received: from localhost ([127.0.0.1]:43943 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJz03-0005j2-KH for submit@debbugs.gnu.org; Tue, 15 Feb 2022 09:40:19 -0500 Received: from eggs.gnu.org ([209.51.188.92]:44280) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nJz01-0005il-HB for 20140@debbugs.gnu.org; Tue, 15 Feb 2022 09:40:14 -0500 Received: from [2001:470:142:3::e] (port=58040 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJyzv-0001Gh-C9; Tue, 15 Feb 2022 09:40:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=yDlsfD+ljssArNG2A6yanwmUnfYIggcBv9VCrvGLSAE=; b=Nu2tEHYqao26NLgJlKA/ k9hDGERFhjnTPF5La8/Nz4jurcz7bE/XhONe1cw2tYKDhUGmv+uDTGVIH8P/Y0tlDn0tpEmmvCBHb dmtsO7jp3baIdqgcL7+3+1dBmIcuJ/QVc2KxcOoXIgfLxjAyj/fABFWC7IIWGXAi6YzOhk5JPFM1n c4oVtlYRp+mQQl3V77pP2IaMdC1haSfO5TB3KFgIEKJYM+tEXovEjOwDPTkb86F2f3GhzRB4bwUFj k05NiIFjAuYHXuex8nUgXcXPlJ5C9zSA/fIlzMN3QGkoo44smiQzXMe7GenZuzgw+lE5/EO43WnAV AAJBDSLr61qwWA==; Received: from [87.69.77.57] (port=2801 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nJyzu-00054y-Sy; Tue, 15 Feb 2022 09:40:07 -0500 Date: Tue, 15 Feb 2022 16:40:09 +0200 Message-Id: <83wnhw2nxy.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220214232623.30534d5a@JRWUBU2> (message from Richard Wordingham on Mon, 14 Feb 2022 23:26:23 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> <20220213211152.03e2990a@JRWUBU2> <83leydpok0.fsf@gnu.org> <20220214232623.30534d5a@JRWUBU2> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Mon, 14 Feb 2022 23:26:23 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > No, that's not true. I'm not aware of any such limitation; AFAIK > > Arabic shaping works correctly in Emacs, certainly with HarfBuzz and > > Emacs 27 or later. > > > > Or maybe I misunderstand what you mean by "typewriter-like" fonts? > > Can you give an example of a non-typewriter-like font for Arabic that > > I can find on MS-Windows and try? > > Not off the top of my head, but compare لحج with the presentation form > ‎ﳊ U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FORM for the first two > letters. The lam part is a vertical line in the middle of the glyph; > the 'hah' part forms the lower part of the glyph. They look identical here (using the default Courier New font). With what font did you think they will look wrong? From debbugs-submit-bounces@debbugs.gnu.org Tue Feb 15 16:06:15 2022 Received: (at 20140) by debbugs.gnu.org; 15 Feb 2022 21:06:15 +0000 Received: from localhost ([127.0.0.1]:46599 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nK51a-00076e-TY for submit@debbugs.gnu.org; Tue, 15 Feb 2022 16:06:15 -0500 Received: from smtpq2.tb.ukmail.iss.as9143.net ([212.54.57.97]:46180) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nK51Z-00076S-BQ for 20140@debbugs.gnu.org; Tue, 15 Feb 2022 16:06:13 -0500 Received: from [212.54.57.111] (helo=csmtp7.tb.ukmail.iss.as9143.net) by smtpq2.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nK51T-0001Du-DO for 20140@debbugs.gnu.org; Tue, 15 Feb 2022 22:06:07 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id K51SnEWdXufb4K51Snkrl7; Tue, 15 Feb 2022 22:06:07 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=FOAIesks c=1 sm=1 tr=0 ts=620c15bf cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=IkcTkHD0fZMA:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=F6aBO-N1z0GnKmalpNMA:9 a=QEXdDO2ut3YA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1644959167; bh=nqdSHxWYJLZuCEfhas2ZkqhEAERCfw5x0qktK+zkU2M=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=gLiOhUOFHDgeT2cEZl8LXwRdvIBKhlySERD4LxD8I1UbEwTt86o2NaWqVwwdH5kJA wep0xUl0o4o6zNReByZgx/w66CWsnX/DBVEq3ofQMJzfkeoDqMQX/DKNinyVkFT8/m Gk3qAJevW/MoINfuBir+Hk61DE4d3MQAI736FuGTylqHvxN7zo3j/GW8I0I8VoIzva dDVP6LcKfeFxYBF8kJ6xOv6oe6Ttsx77/JfMdBGJh/Q6tmcoDoGZovpBjRqtZTOAJK SYXMfSC3F59ylN3HyfLo4ULPhFKaS2V3fRGvym7k2Zi58l8ocS8ZgNtBVamTiMfpXu 61z2l5Hr2JzBw== Date: Tue, 15 Feb 2022 21:06:05 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220215210605.1c41c1b2@JRWUBU2> In-Reply-To: <83wnhw2nxy.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> <20220213211152.03e2990a@JRWUBU2> <83leydpok0.fsf@gnu.org> <20220214232623.30534d5a@JRWUBU2> <83wnhw2nxy.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfJYJ3gOzcmtoCwHITNCWdv+uWpjxb8kCxvpWx4g6njgsOB1xX3xua355bbpeSMrvrnXsB79ZezdiK4gxwndrpJHJXqd69nDokOxqi+3UBkkkUZ3EiA7Q nMX2xjOpjrO1efD2J14bB82AvwFQJPnTfXZ9dlx4iXMx8HWgUP9s0YoQkSPMuu6aqPhZZAgDvkJSGr/5f32BCwhdl9LEAUbDIrgZw7e8XB3YV+e2bI0ZtH6P mVt+3BDP3mYdLy/eO2ZkkQ== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Tue, 15 Feb 2022 16:40:09 +0200 Eli Zaretskii wrote: > > Date: Mon, 14 Feb 2022 23:26:23 +0000 > > From: Richard Wordingham > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > =20 > > > No, that's not true. I'm not aware of any such limitation; AFAIK > > > Arabic shaping works correctly in Emacs, certainly with HarfBuzz > > > and Emacs 27 or later. > > >=20 > > > Or maybe I misunderstand what you mean by "typewriter-like" fonts? > > > Can you give an example of a non-typewriter-like font for Arabic > > > that I can find on MS-Windows and try? =20 > >=20 > > Not off the top of my head, but compare =D9=84=D8=AD=D8=AC with the pre= sentation > > form =E2=80=8E=EF=B3=8A U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FOR= M for the > > first two letters. The lam part is a vertical line in the middle > > of the glyph; the 'hah' part forms the lower part of the glyph. =20 >=20 > They look identical here (using the default Courier New font). With > what font did you think they will look wrong? In the Courier New font in Windows 10 of 2017 (+ automatic updates), U+FCCA looks like the image in the Unicode code chart, and bears little resemblance to the righthand two thirds of . In keeping with its Latin part, the sequence of three characters looks as one would expect from a typewriter when one enters text letter by letter. I must admit I'm having trouble laying my hand on a font which does these ligatures. I wanted to find a font that would render the three characters to look the same as =EF=B3=8A=EF=BA=9E . = (Sticking them together isn't working in the email client I'm using, but does work in some fallback font.) Richard. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 16 08:15:50 2022 Received: (at 20140) by debbugs.gnu.org; 16 Feb 2022 13:15:50 +0000 Received: from localhost ([127.0.0.1]:47761 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKK9u-0007Q7-Ce for submit@debbugs.gnu.org; Wed, 16 Feb 2022 08:15:50 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35694) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKK9s-0007Pv-E3 for 20140@debbugs.gnu.org; Wed, 16 Feb 2022 08:15:49 -0500 Received: from [2001:470:142:3::e] (port=52196 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKK9l-00009g-Um; Wed, 16 Feb 2022 08:15:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-version:References:Subject:In-Reply-To:To:From: Date; bh=6Lzpu3Xbo1Z2Y1qEQeNA+FtzZZew0IbN+/X3uILvxu8=; b=hetOrLyEkN17EixFePob tibY+vhZoYMLCVzHNOP+Pt5a8X7PLpz4heGauz1zc4QG8x72M4kMNYBOQ78pBx00/B9hFnUv+n/PF 7dpmVRzpbWNTKpu+840uD7HA6Iqq8qDIig950wjQLAovuUIbY4dbMgI8hBtONXOnOQD4WVDCwP0El 2RAbTvwvaFp1oF+Zc0AfHbjjgHrrj2hGLEBs6vTKauniVzeM8p66p682BtNH29t4yOK0v1tVh1dpP KK3fW2z4AJEbq5HhyZWlipFkiu0hmZ9PQ5IosO00zaYQwSrrr38QuEAFuHQqxg28yFgGglyLRvv0k +xxrZttJBCNUOA==; Received: from [87.69.77.57] (port=2876 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKK9l-0008Tj-Bg; Wed, 16 Feb 2022 08:15:41 -0500 Date: Wed, 16 Feb 2022 15:15:46 +0200 Message-Id: <83a6er2br1.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220215210605.1c41c1b2@JRWUBU2> (message from Richard Wordingham on Tue, 15 Feb 2022 21:06:05 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> <20220213211152.03e2990a@JRWUBU2> <83leydpok0.fsf@gnu.org> <20220214232623.30534d5a@JRWUBU2> <83wnhw2nxy.fsf@gnu.org> <20220215210605.1c41c1b2@JRWUBU2> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Tue, 15 Feb 2022 21:06:05 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > > Not off the top of my head, but compare لحج with the presentation > > > form ‎ﳊ U+FCCA ARABIC LIGATURE LAM WITH HAH INITIAL FORM for the > > > first two letters. The lam part is a vertical line in the middle > > > of the glyph; the 'hah' part forms the lower part of the glyph. > > > > They look identical here (using the default Courier New font). With > > what font did you think they will look wrong? > > In the Courier New font in Windows 10 of 2017 (+ automatic updates), > U+FCCA looks like the image in the Unicode code chart, and bears little > resemblance to the righthand two thirds of . > In keeping with its Latin part, the sequence of three characters looks > as one would expect from a typewriter when one enters text letter by > letter. It sounds like Courier New in Windows 10 was "improved" by removing the capability of ligating those 2 characters. On Windows XP, their standard Courier New shows the first 2 characters ligate into a single glyph, which looks just like U+FCCA, but on Windows 10 they don't ligate. I don't know why is that; perhaps Arabic typesetting experts decided these should not ligate? > I must admit I'm having trouble laying my hand on a font which > does these ligatures. Try the Arabic Typesetting font, there I see on Windows 10 that the first 2 characters look like U+FCCA. IOW, this is a font issue, not an Emacs or HarfBuzz issue. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 16 10:11:56 2022 Received: (at 20140) by debbugs.gnu.org; 16 Feb 2022 15:11:56 +0000 Received: from localhost ([127.0.0.1]:49224 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKLyG-0006rt-Hp for submit@debbugs.gnu.org; Wed, 16 Feb 2022 10:11:56 -0500 Received: from eggs.gnu.org ([209.51.188.92]:34456) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKLyE-0006rg-VP for 20140@debbugs.gnu.org; Wed, 16 Feb 2022 10:11:55 -0500 Received: from [2001:470:142:3::e] (port=54948 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKLy8-0004H0-So; Wed, 16 Feb 2022 10:11:48 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=gjwPXumBwHXO05XVY+oRtyF21duSaVFOTeOVi5M4Tt4=; b=nKraRe7/DU3+ lBq6MXWhLPK1UCTutvPMMxzPC4SqLpf74e58RJMUPVFC+sEtQ9pUYOAaWhMI80WLcH+cMm8nJU/AQ fdRlU0Bbm30ZJ+ayaY21xB/J+WLH1InYShueWEKB/2XvR7w5asgKWhfqTc7TwH2Xb2rnNPeWELzPe TRHispnoKJYW0gNEG95EsADsqsNSn9eAh2nSzoB44T7u7J8ZAj2O4h7JvI6unyFHpINZ4vYrVvgid MwcJX5KgwU3KHGffTm3IAcIvIBjV1IBcr3p87cUADKDy5p8ZLGhz5RXH2CyZqeL0bNVqNBFmBUWsF OaUQmCsX2oRR3Fk1fEb7dQ==; Received: from [87.69.77.57] (port=2391 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKLy8-0004Sq-CV; Wed, 16 Feb 2022 10:11:48 -0500 Date: Wed, 16 Feb 2022 17:11:52 +0200 Message-Id: <831r023kxz.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220213205310.0b8a715c@JRWUBU2> (message from Richard Wordingham on Sun, 13 Feb 2022 20:53:10 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <831r06rbwk.fsf@gnu.org> <20220213205310.0b8a715c@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Sun, 13 Feb 2022 20:53:10 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > Btw, is there a way to get all the examples from your > > https://wrdingham.co.uk/lanna/renderer_test.htm as a UTF-8 encoded > > text file? I'd like to test the Emacs rendering with all of the > > examples, but copy-pasting each example separately from the browser is > > not my idea of useful time investment. So if you could provide the > > examples as a downloadable text file, I'd appreciate. > > As buried (you're not the only one to have overlooked it) in the > penultimate paragraph of 'Content and Layout' section, "The test words > may, in principle, be extracted quite simply from this web page. Each > test 'word' is the content of the first cell in each row whose class is > tst1. For convenience*, I have extracted the first two cells in such > rows, along with titles, to a CSV file." The file is rt.csv in the > same directory. I included the meaning and pronunciation as those who > don't know the script may find it easier to refer to the words by > translation or transcription. You may prefer to use the file more or > less as it is, but one can easily knock up an Emacs macro sequence to > delete the first comma and the rest of the line. I left the > section titles in for easier navigation to the renderer test file. Thanks, I've reviewed the results of rendering that file, and it looks reasonably well: some examples don't show correctly, but most do. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 16 10:12:49 2022 Received: (at 20140) by debbugs.gnu.org; 16 Feb 2022 15:12:49 +0000 Received: from localhost ([127.0.0.1]:49235 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKLz7-0006tl-JL for submit@debbugs.gnu.org; Wed, 16 Feb 2022 10:12:49 -0500 Received: from eggs.gnu.org ([209.51.188.92]:34738) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKLz6-0006ta-3v for 20140@debbugs.gnu.org; Wed, 16 Feb 2022 10:12:48 -0500 Received: from [2001:470:142:3::e] (port=55020 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKLyy-0004P1-Fc; Wed, 16 Feb 2022 10:12:42 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=1qBFSefufpgPm4jxrM3gravAnxnjQmO9/pyzeJuvEHk=; b=AjUng28io2Oz f8aS327yZ09xqjfDrjDaBHWGgUp7TdZeRe6pPSyKA8LrEWFolEoXXj7IabneJl2FrLPcr5vAWYuOs QEqRbeGpSb1S92o0Mp4NZZK0m9QL743zJ/Y+nu1gDN+fclHRGzCGC/mE3XTbsR92GaCE+eeDdBO9H +PtXAO0J3eOrwa9fh3jz+BFVJLlcUrtoAvUFDHUdM6V/ScARhNMyDPHFudKb0kaPycyW/lSsuAHEw okM8Wka471uuz1aWbtoAQmaHvGtrO9+MaNCGyP4t8mg/Dp8gExB9DFC0AvwvlWFyTKc0Yp0/Jz7LQ fDEd2AdrYDW/umGfw0vnzw==; Received: from [87.69.77.57] (port=2443 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKLyv-0004b9-2U; Wed, 16 Feb 2022 10:12:38 -0500 Date: Wed, 16 Feb 2022 17:12:43 +0200 Message-Id: <83zgmq26c4.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220214221427.35231794@JRWUBU2> (message from Richard Wordingham on Mon, 14 Feb 2022 22:14:27 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <831r06rbwk.fsf@gnu.org> <20220213205310.0b8a715c@JRWUBU2> <83mtitpouv.fsf@gnu.org> <20220214221427.35231794@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Mon, 14 Feb 2022 22:14:27 +0000 > From: Richard Wordingham > Cc: 20140@debbugs.gnu.org, larsi@gnus.org > > > > You should also add CGJ and ZWNJ, and some people may appreciate > > > ZWJ - the Khottabun font has ligatures involving ZWJ, though it may > > > just be an experimental feature - and ultimately WJ, for when > > > someone writes a Tai Tham word breaker. > > > > How should I add CGJ and ZWNJ? What are the rules? > > > > > Oh, and Thai and Lao mai t(r)i and mai chat(t)awa and U+0324 > > > COMBINING DIAERESIS BELOW turn up occasionally - U+0324 is supported > > > in Thep's Khottabun font, and my Da Lekh series supports Thai mai > > > tri and mai chattawa. These characters seem to work with HarfBuzz. > > > > Not sure I understand: what patterns/rules should be added for these? > > Add them all to "M" in the definition of tai-tham-composable-pattern. > Strictly, U+0324 should also be added to "S", but I'd be surprised to > see it in a genuine spelling. Thanks, done. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 16 10:13:59 2022 Received: (at 20140-done) by debbugs.gnu.org; 16 Feb 2022 15:13:59 +0000 Received: from localhost ([127.0.0.1]:49242 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKM0E-0006vr-Tl for submit@debbugs.gnu.org; Wed, 16 Feb 2022 10:13:59 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35000) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKM0C-0006vX-8d for 20140-done@debbugs.gnu.org; Wed, 16 Feb 2022 10:13:57 -0500 Received: from [2001:470:142:3::e] (port=55048 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKM06-0004Y6-Ms; Wed, 16 Feb 2022 10:13:51 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=hAd5c07s1QOGYgGTa/h0QHExenMPdm9BKXCNtXl7Mlk=; b=UsMuvaxHhXNm /qcc/YbzEOIyof/Vb2Vg3FRtKdmHcX/oIwLh5QFdEz0i7crbH1DoL3mvH8QaRoWqIMSuiZYQ4CW2Y RM66qHLNGNe4xr78BqmtS5+OIyCe/55CFlUXw7FGk0GCGn9ON4T2FlOCHAqJfvvnDs3AFK3mPBjGq +SJAZwSjH92YpDHgRzVET0n4TF5T0pt9YTKM8iX37SiFkgYktQxNaWW4VQoazc28TkMONNESgiPpl +/9hhz3EVi42vM0t5XxNHfAC12ryduiB/X3Y1l4p8AhNL34AyxD8MZKuqE/HJlOgqgI81H63llBqO GtnjvzJ9sXdkc6Dp2uYi1Q==; Received: from [87.69.77.57] (port=2517 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKM06-0004oI-27; Wed, 16 Feb 2022 10:13:50 -0500 Date: Wed, 16 Feb 2022 17:13:56 +0200 Message-Id: <83y22a26a3.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220215012734.41fb4aaf@JRWUBU2> (message from Richard Wordingham on Tue, 15 Feb 2022 01:27:34 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <831r06rbwk.fsf@gnu.org> <20220213205310.0b8a715c@JRWUBU2> <83mtitpouv.fsf@gnu.org> <20220214221427.35231794@JRWUBU2> <20220215012734.41fb4aaf@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140-done Cc: 20140-done@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Tue, 15 Feb 2022 01:27:34 +0000 > From: Richard Wordingham > Cc: 20140@debbugs.gnu.org, larsi@gnus.org > > On Mon, 14 Feb 2022 22:14:27 +0000 > Richard Wordingham wrote: > > > On Mon, 14 Feb 2022 15:19:36 +0200 > > Eli Zaretskii wrote: > > > > > > Date: Sun, 13 Feb 2022 20:53:10 +0000 > > > > From: Richard Wordingham > > > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > > > You should also add CGJ and ZWNJ, and some people may appreciate > > > > ZWJ - the Khottabun font has ligatures involving ZWJ, though it > > > > may just be an experimental feature - and ultimately WJ, for when > > > > someone writes a Tai Tham word breaker. > > > > > > How should I add CGJ and ZWNJ? What are the rules? > > > > > > > Oh, and Thai and Lao mai t(r)i and mai chat(t)awa and U+0324 > > > > COMBINING DIAERESIS BELOW turn up occasionally - U+0324 is > > > > supported in Thep's Khottabun font, and my Da Lekh series > > > > supports Thai mai tri and mai chattawa. These characters seem to > > > > work with HarfBuzz. > > > > > > Not sure I understand: what patterns/rules should be added for > > > these? > > > > Add them all to "M" in the definition of tai-tham-composable-pattern. > > Strictly, U+0324 should also be added to "S", but I'd be surprised to > > see it in a genuine spelling. > > In view of Wyn Owen's report (A Description and Linguistic Analysis of > the Tai Khuen Writing System, JSEALS 10.1 (2017) > https://evols.library.manoa.hawaii.edu/bitstream/10524/52403/1/09_Owen2017description.pdf) > on Tai Khuen spelling, one should also add U+0E49 THAI CHARACTER MAI > THO to "M". And, of course, as all 5 non-Tai Tham tone marks used with > the Tai Tham script have canonical combining class greater than 9, they > should be added to "S" - i.e. add U+0E49 to U+0E4B and U+0EC9 and > U+0ECB to "S". Thanks, done that as well, and installed the changes for Emacs 29. And with that, I'm closing this bug report. Thanks a lot for your code and helpful discussions. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 16 14:01:24 2022 Received: (at 20140) by debbugs.gnu.org; 16 Feb 2022 19:01:24 +0000 Received: from localhost ([127.0.0.1]:49471 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKPYJ-0006XG-Nm for submit@debbugs.gnu.org; Wed, 16 Feb 2022 14:01:24 -0500 Received: from smtpq1.tb.ukmail.iss.as9143.net ([212.54.57.96]:47314) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKPYG-0006Wz-QZ for 20140@debbugs.gnu.org; Wed, 16 Feb 2022 14:01:21 -0500 Received: from [212.54.57.105] (helo=csmtp1.tb.ukmail.iss.as9143.net) by smtpq1.tb.ukmail.iss.as9143.net with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKPYA-0005v8-EC for 20140@debbugs.gnu.org; Wed, 16 Feb 2022 20:01:14 +0100 Received: from JRWUBU2 ([82.27.122.109]) by cmsmtp with ESMTP id KPY9nqEAdI8uBKPY9nhTSK; Wed, 16 Feb 2022 20:01:14 +0100 X-SourceIP: 82.27.122.109 X-Authenticated-Sender: X-Spam: 0 X-Authority: v=2.4 cv=Oupcdgzt c=1 sm=1 tr=0 ts=620d49fa cx=a_exe a=lZfnwhydZ+7bl6OdZ0zTBw==:117 a=lZfnwhydZ+7bl6OdZ0zTBw==:17 a=IkcTkHD0fZMA:10 a=oGFeUVbbRNcA:10 a=mDV3o1hIAAAA:8 a=NLZqzBF-AAAA:8 a=OocQHUDgAAAA:8 a=zD6b-fzW1TWd-ClxL08A:9 a=QEXdDO2ut3YA:10 a=_FVE-zBwftR9WsbkzFJk:22 a=wW_WBVUImv98JQXhvVPZ:22 a=xUZTl98r3Qw_uB5NK3jt:22 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ntlworld.com; s=meg.feb2017; t=1645038074; bh=vdtLC+VygBUefJNcYlWu+XoZW5Y6peo+qBc8VKFNt6M=; h=Date:From:To:Cc:Subject:In-Reply-To:References; b=NK6nTajur4orK6klQmbYApI0C2QK9sxqEQX8L46WyBzG9li4i6WnjVLMXxie++nzc 0puYPpcNF5LRM4FqvdzmPflqDtr0FAmXl7e/r9bf8yJUgnsRJxOy5mDciBqRqiHS8K 7AdAGSTEs+XOHUlMKpm0BD/REzsS+1rRoFoPYUeg1gadI8WYnQAI6xoVZ/uqdkGKqY R5pc76heHvkUFGSOAl3jT0By16Yw+m+9moQ0+Sv+WrCL6q+pr+1y2Rv1mKxQCyrc8F mnRunzxucZEyvxHH65hHWMmSbjy3ACb+nKBuYASupzcS9XL57iOGE92B2QvlnDDS7w Uu2WacgbNk69g== Date: Wed, 16 Feb 2022 19:01:12 +0000 From: Richard Wordingham To: Eli Zaretskii Subject: Re: bug#20140: 24.4; M17n shaper output rejected Message-ID: <20220216190112.3ee79598@JRWUBU2> In-Reply-To: <83a6er2br1.fsf@gnu.org> References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> <20220213211152.03e2990a@JRWUBU2> <83leydpok0.fsf@gnu.org> <20220214232623.30534d5a@JRWUBU2> <83wnhw2nxy.fsf@gnu.org> <20220215210605.1c41c1b2@JRWUBU2> <83a6er2br1.fsf@gnu.org> X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-CMAE-Envelope: MS4xfLXIV/RbV1n//oEFOs/YIoKntbAeLlDFuobYIcLKpyrBxu49nnY2j8YRsy2552cke2JrJNdQvVQna23zIf7nntHmMui8mFbgpm5LM2P5xX954HLaUnY/ ofSaQrt9GbXqg71srEpV4BL//LFEMbrl39AVCIkPzQxDemKSL6lphLaVsMoK+dQAxcsUbRkZa0G1zkKLIlNHbitc/PNQeVeGzKRvx08Xe7URdFwdeOFBbMYN r0fFycaPoZUFmiSxby/zag== X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On Wed, 16 Feb 2022 15:15:46 +0200 Eli Zaretskii wrote: > > Date: Tue, 15 Feb 2022 21:06:05 +0000 > > From: Richard Wordingham > > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > =20 > > > > Not off the top of my head, but compare =D9=84=D8=AD=D8=AC with the > > > > presentation form =E2=80=8E=EF=B3=8A U+FCCA ARABIC LIGATURE LAM WIT= H HAH > > > > INITIAL FORM for the first two letters. The lam part is a > > > > vertical line in the middle of the glyph; the 'hah' part forms > > > > the lower part of the glyph. =20 > > >=20 > > > They look identical here (using the default Courier New font). > > > With what font did you think they will look wrong? =20 > >=20 > > In the Courier New font in Windows 10 of 2017 (+ automatic updates), > > U+FCCA looks like the image in the Unicode code chart, and bears > > little resemblance to the righthand two thirds of > U+062C>. In keeping with its Latin part, the sequence of three > > characters looks as one would expect from a typewriter when one > > enters text letter by letter. =20 >=20 > It sounds like Courier New in Windows 10 was "improved" by removing > the capability of ligating those 2 characters. On Windows XP, their > standard Courier New shows the first 2 characters ligate into a single > glyph, which looks just like U+FCCA, but on Windows 10 they don't > ligate. I don't know why is that; perhaps Arabic typesetting experts > decided these should not ligate? >=20 > > I must admit I'm having trouble laying my hand on a font which > > does these ligatures. =20 >=20 > Try the Arabic Typesetting font, there I see on Windows 10 that the > first 2 characters look like U+FCCA. >=20 > IOW, this is a font issue, not an Emacs or HarfBuzz issue. Arabic Typesetting seems not to come in an evaluation copy of Windows 10. And yes, the issue is that some fonts probably don't work well with Emacs. Irritating, but mostly not a big problem. Richard. From debbugs-submit-bounces@debbugs.gnu.org Wed Feb 16 14:20:07 2022 Received: (at 20140) by debbugs.gnu.org; 16 Feb 2022 19:20:08 +0000 Received: from localhost ([127.0.0.1]:49494 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKPqR-0006ys-HH for submit@debbugs.gnu.org; Wed, 16 Feb 2022 14:20:07 -0500 Received: from eggs.gnu.org ([209.51.188.92]:43748) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1nKPqP-0006yH-5L for 20140@debbugs.gnu.org; Wed, 16 Feb 2022 14:20:06 -0500 Received: from [2001:470:142:3::e] (port=59814 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKPqH-0000x4-Ne; Wed, 16 Feb 2022 14:19:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=SK5qU09+esoZR+oYcWHjv//xTXzMoLUnCBHs6z2zTRU=; b=o6cVGiCPgq4s tHxqqXISMs/FFobCPsfw6IRNjuBBPHbouZ5iQfZP0vn0NMlbklQT7UnnnzC+Gcv20P/6caSMD5Kmf W8ps1eKUeNjMk/unyWPsDQ0sSEAlu4TqxOSGJ22TnzynxdpqeW7QFH/s8myfvkKpGNd9scMwSAckj bxgpx4uPZ6Uuq8UJ8AI3K/bxKCVEmDsOrQKRGEK6+CRkIqBkd8Qf38UeM9nYbdyb9hEYLONb7aGPL QWXYDHg67opW3KYBWPNK1Ea7xq1ItCYWRkF/WUKZgfCRng7msTiG0hEdhC7WQrNXDIb0OvjoimYUP kRoPd8gxhF3875ev9yjdGQ==; Received: from [87.69.77.57] (port=1739 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nKPqG-0007fx-Oh; Wed, 16 Feb 2022 14:19:57 -0500 Date: Wed, 16 Feb 2022 21:20:02 +0200 Message-Id: <83sfsi1uvx.fsf@gnu.org> From: Eli Zaretskii To: Richard Wordingham In-Reply-To: <20220216190112.3ee79598@JRWUBU2> (message from Richard Wordingham on Wed, 16 Feb 2022 19:01:12 +0000) Subject: Re: bug#20140: 24.4; M17n shaper output rejected References: <20150318222040.4066e6e9@JRWUBU2> <87r18jk5nr.fsf@gnus.org> <83v8xv2icg.fsf@gnu.org> <20220205225251.08a0faab@JRWUBU2> <83sfsmpmxb.fsf@gnu.org> <20220213211152.03e2990a@JRWUBU2> <83leydpok0.fsf@gnu.org> <20220214232623.30534d5a@JRWUBU2> <83wnhw2nxy.fsf@gnu.org> <20220215210605.1c41c1b2@JRWUBU2> <83a6er2br1.fsf@gnu.org> <20220216190112.3ee79598@JRWUBU2> X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 20140 Cc: 20140@debbugs.gnu.org, larsi@gnus.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Date: Wed, 16 Feb 2022 19:01:12 +0000 > From: Richard Wordingham > Cc: larsi@gnus.org, 20140@debbugs.gnu.org > > > It sounds like Courier New in Windows 10 was "improved" by removing > > the capability of ligating those 2 characters. On Windows XP, their > > standard Courier New shows the first 2 characters ligate into a single > > glyph, which looks just like U+FCCA, but on Windows 10 they don't > > ligate. I don't know why is that; perhaps Arabic typesetting experts > > decided these should not ligate? > > > > > I must admit I'm having trouble laying my hand on a font which > > > does these ligatures. > > > > Try the Arabic Typesetting font, there I see on Windows 10 that the > > first 2 characters look like U+FCCA. > > > > IOW, this is a font issue, not an Emacs or HarfBuzz issue. > > Arabic Typesetting seems not to come in an evaluation copy of Windows > 10. You can easily install it from the Internet. I did. > And yes, the issue is that some fonts probably don't work well with > Emacs. ??? These issues with fonts have nothing to do with Emacs. HarfBuzz will produce the same results outside of Emacs; e.g., try hb-view. Or view your message with those characters in a Web browser (by pointing it to the bug-gnu-emacs archives) -- you will see the same results. AFAIU, the fonts simply don't want to produce a ligature from those two characters. Arabic Typesetting does, so the result is what you expect, in Emacs and elsewhere. From unknown Mon Aug 18 21:39:47 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 17 Mar 2022 11:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator