Package: emacs;
Reported by: Visuwesh <visuweshm <at> gmail.com>
Date: Thu, 30 Jun 2022 12:14:02 UTC
Severity: wishlist
Tags: patch
Found in version 29.0.50
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56323 in the body.
You can then email your comments to 56323 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Thu, 30 Jun 2022 12:14:02 GMT) Full text and rfc822 format available.Visuwesh <visuweshm <at> gmail.com>
:bug-gnu-emacs <at> gnu.org
.
(Thu, 30 Jun 2022 12:14:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: bug-gnu-emacs <at> gnu.org Subject: 29.0.50; Add new customisable phonetic Tamil input method Date: Thu, 30 Jun 2022 17:43:21 +0530
[Message part 1 (text/plain, inline)]
Tags: patch The attached patchset adds a new customisable phonetic Tamil input method. I tried to reuse as much of the existing itrans input method code since it greatly simplifies the creation of an Indic input method (see `indian-make-hash'). The first patch fixes a fallout from bug#50143 asking to add TAMIL OM ௐ to the itrans table, and this means that one can insert the TAMIL OM character using the tamil-itrans input methods as well. I'd prefer it if this patch can be pushed quickly. The second patch actually adds the new phonetic input method. I will leave the rationale for making it a _customisable_ input method in footnote [1]. To reuse the existing code that calculates the various tables for the tamil-itrans IM, I turned the code in defvars to defuns. However, the definition of the almighty quail-tamil-itrans-syllable-table is still huge since I needed to do a whole lot to convert the indian-tml-base-table to a format that will accepted by the new defun `quail-tamil-itrans-compute-syllable-table'. The current quail rules is inspired by the one in https://github.com/rnchzn/tamil-phonetic/raw/main/tamil-phonetic.el and the comments in https://emacsnotes.wordpress.com/2022/03/07/tamil-phonetic-input-method-in-emacs-emacs-%E0%AE%87%E0%AE%B2%E0%AF%8D-%E0%AE%A4%E0%AE%AE%E0%AE%BF%E0%AE%B4%E0%AF%8D-%E0%AE%83%E0%AE%AA%E0%AF%8A%E0%AE%A9%E0%AF%86%E0%AE%9F%E0%AE%BF%E0%AE%95%E0%AF%8D/. Avid readers might notice that I went for a nil SIMPLE argument despite my recent complaint in emacs-devel. The reason for that is because we need a way to end the ongoing translation (C-SPC). E.g., if one decides to transliterate ல் as "l" and ள் as "ll", then to type ல்ல the key sequence will be l C-SPC la without the C-SPC, "lla" would be translated to ள. The better way forward would be to present _both_ ல்ல and ள் for the sequence "lla" but I have no idea how to do it. Any pointers would be _highly_ appreciated. I plan to modify indian--puthash-char to have one to many translations i.e., "l" would translate to both ல் and ள் and then the user could decide which one to insert. This combined with the DETERMINISTIC argument to quail-define-package would make it an attractive option, I think. But I'm leaving it out right now since I want the current patch to be reviewed first. I think adding an optional NAME argument to tamil--update-quail-rules might be more flexible since then a user could let bind the relevant defcustoms to define other Tamil input methods without hassle (like the tamil99 layout, which I plan to get to at Some Point™). WDYT? The code for tamil--update-quail-rules is sort of convoluted because of the conversion mentioned above. tamil--make-trans-table is also kind of complicated because, 1. I couldn't make the tamil-vowel-translation (and consonant, and misc) alist have a character key since the Customize interface shows those characters as numbers!! I really do not want to dig into the Customize UI code, sorry. :( 2. indian-tml-base-table has the character க in it but the defcustom tamil-consonant-translation has the character க் in it because the latter makes more sense to a native speaker and also because of (1) above. More explanation as to why in footnote [2]. There are some FIXMEs spattered in the code but I will get to it in a later revision. I also don't have a :set function for the defcustoms since I'm not sure if something along the following is the only way to automagically recalculate the quail rules: (defun tamil--set-variable (sym val) (set-default sym val) (when (and (boundp 'tamil-vowel-translation) (boundp 'tamil-consonant-translation) (boundp 'tamil-misc-translation) (boundp 'tamil-native-digits)) (tamil--update-quail-rules))) Comments on this, and general code review would be much appreciated. I don't think I have missed anything and if you want me to add more comments on some of the stuff, please do tell. Thanks. If Tamil speakers are reading this bug report, shout at me if you want something else and if you have other general comments. Or if I made an embarrassing typo somewhere. Thanks!
[0001-Fix-fallout-from-bug-50143.patch (text/x-diff, attachment)]
[0002-Add-new-customizable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]
[Message part 4 (text/plain, inline)]
--- Footnotes: 1. The itrans input method is absolutely horrible for Tamil since unlike the other Indic languages, it doesn't have a lot of consonants HOWEVER, the consonant sound _changes_ depending on where it ends up. So ideally, the Tamil input method show allow multiple _ways_ to insert a single character. As an example, consider the following words தும்பிக்கை - thumbikai (tusk) படம் - padam (photograph/image) The consonant of interest is "ப". The letter "பி" is pronounced in the first word as "bi" as in "bicycle" however, the letter "ப" is pronounced as "pa" as in "party". This is just one of many examples. There are also pairs of very similar sounding consonants and when transliterated (when you type in "Tanglish" for example), all the characters in the pair use the same letter. E.g., such a pair is the ல/ள family; when one causally chats in "Tanglish", we just type "lXX" as the transliteration for that family. Obviously, when one is typing in _Tamil_, he/she needs to distinguish between these two characters. Leaving the choice of input sequence to transliterate these characters to the writer is much better. For more, please read the wordpress article I linked, thanks. 2. Opting to not go for character key in tamil-consonant-translation because of the Customize interface is only part of the reason. Having the key be TAMIL LETTER XXX + TAMIL SIGN VIRAMA is much more intuitive for the native speaker. Take பு for example, the way you break it down into consonant and vowel is ப் + உ = பு (ippu + u = pu) and NOT ப + உ = பு (pa + u = pu)
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Thu, 30 Jun 2022 14:10:01 GMT) Full text and rfc822 format available.Message #8 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; Add new customisable phonetic Tamil input method Date: Thu, 30 Jun 2022 19:38:49 +0530
[Message part 1 (text/plain, inline)]
[வியாழன் ஜூன் 30, 2022] Visuwesh wrote: > Tags: patch > > The attached patchset adds a new customisable phonetic Tamil input > method. I tried to reuse as much of the existing itrans input method > code since it greatly simplifies the creation of an Indic input method > (see `indian-make-hash'). > > The first patch fixes a fallout from bug#50143 asking to add TAMIL OM ௐ > to the itrans table, and this means that one can insert the TAMIL OM > character using the tamil-itrans input methods as well. I'd prefer it > if this patch can be pushed quickly. This should be better:
[0001-Fix-fallout-from-bug-50143.patch (text/x-diff, attachment)]
[Message part 3 (text/plain, inline)]
[ Ref. https://www.aczoom.com/itrans/online/; insert "sh" and compare the character that shows up in the Sanskrit panel and the Tamil panel (you have to change the language in another panel). ]
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Thu, 30 Jun 2022 15:55:01 GMT) Full text and rfc822 format available.Message #11 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; Add new customisable phonetic Tamil input method Date: Thu, 30 Jun 2022 21:23:48 +0530
[வியாழன் ஜூன் 30, 2022] Visuwesh wrote: > 1. The itrans input method is absolutely horrible for Tamil since unlike > the other Indic languages, it doesn't have a lot of consonants > HOWEVER, the consonant sound _changes_ depending on where it ends up. > So ideally, the Tamil input method show allow multiple _ways_ to > insert a single character. As an example, consider the following > words > > தும்பிக்கை - thumbikai (tusk) ^^^^^ I meant trunk, ofc. As is usual, I keep messing up translations.
Stefan Kangas <stefan <at> marxist.se>
to control <at> debbugs.gnu.org
.
(Thu, 30 Jun 2022 20:53:02 GMT) Full text and rfc822 format available.bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 13:00:02 GMT) Full text and rfc822 format available.Message #16 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 18:29:00 +0530
[Message part 1 (text/plain, inline)]
[வியாழன் ஜூன் 30, 2022] Visuwesh wrote: > The second patch actually adds the new phonetic input method. I will > leave the rationale for making it a _customisable_ input method in > footnote [1]. To reuse the existing code that calculates the various > tables for the tamil-itrans IM, I turned the code in defvars to defuns. > However, the definition of the almighty > quail-tamil-itrans-syllable-table is still huge since I needed to do a > whole lot to convert the indian-tml-base-table to a format that will > accepted by the new defun `quail-tamil-itrans-compute-syllable-table'. > [blah blah blah...] Here's a second revision of the second patch.
[0001-Add-new-customizable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]
[Message part 3 (text/plain, inline)]
I still haven't added a :set function yet since I'm not sure if there's a way to avoid the chain of boundp checks. In this revision, I simplified the code a tiny bit wrt calculating the translation table since I no longer use the indian-make-hash function but call whatever functions it call directly in tamil--update-quail-rules: this greatly reduces the amount of massaging that needs to be done. Also, can someone guide me to write a sort function for quail-tamil-itrans-compute-syllable-table please? The ideal order of consonants should be the same as the one in the default value of tamil-consonant-translation, same for tamil-vowel-translation. I tried the following (sort (reverse (mapcar #'car tamil-consonant-translation)) (lambda (x y) (let ((lx (length x)) (ly (length y))) (if (= lx ly) (string-lessp x y) (< lx ly))))) but that definitely doesn't do what I want. The idea was to sort the list so that the basic consonants (க் ங் ச் etc.) first then the composite ones (க்ஷ் க்ஷ் etc.) but `string-lessp' does not even sort the basic consonants in the right order (the right order being the order in the default value of `tamil-consonant-translation'). Can I use the min-width property in buffer text? I'm not sure if it was finished since I remember some discussion surrounding that it wasn't quite finished yet. I would like to try to use it for syllable table and friends.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 13:03:02 GMT) Full text and rfc822 format available.Message #19 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 18:31:50 +0530
[Message part 1 (text/plain, inline)]
[வெள்ளி ஜூலை 01, 2022] Visuwesh wrote: > [வியாழன் ஜூன் 30, 2022] Visuwesh wrote: > >> The second patch actually adds the new phonetic input method. I will >> leave the rationale for making it a _customisable_ input method in >> footnote [1]. To reuse the existing code that calculates the various >> tables for the tamil-itrans IM, I turned the code in defvars to defuns. >> However, the definition of the almighty >> quail-tamil-itrans-syllable-table is still huge since I needed to do a >> whole lot to convert the indian-tml-base-table to a format that will >> accepted by the new defun `quail-tamil-itrans-compute-syllable-table'. >> [blah blah blah...] > > Here's a second revision of the second patch. > Here's a corrected patch with a really silly oversight fixed:
[0001-Add-new-customizable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]
[Message part 3 (text/plain, inline)]
Sorry for the noise.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 13:23:01 GMT) Full text and rfc822 format available.Message #22 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 16:22:36 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Date: Fri, 01 Jul 2022 18:29:00 +0530 > > Also, can someone guide me to write a sort function for > quail-tamil-itrans-compute-syllable-table please? The ideal order of > consonants should be the same as the one in the default value of > tamil-consonant-translation, same for tamil-vowel-translation. I tried > the following > > (sort (reverse (mapcar #'car tamil-consonant-translation)) > (lambda (x y) (let ((lx (length x)) > (ly (length y))) > (if (= lx ly) (string-lessp x y) (< lx ly))))) > > > but that definitely doesn't do what I want. The idea was to sort the > list so that the basic consonants (க் ங் ச் etc.) first then the composite > ones (க்ஷ் க்ஷ் etc.) but `string-lessp' does not even sort the basic > consonants in the right order (the right order being the order in the > default value of `tamil-consonant-translation'). Then you'll need to write your own comparison function and use it instead string-lessp. > Can I use the min-width property in buffer text? Why do you need that? Please tell more about what you want to accomplish.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 13:48:02 GMT) Full text and rfc822 format available.Message #25 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 19:17:18 +0530
[Message part 1 (text/plain, inline)]
[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Date: Fri, 01 Jul 2022 18:29:00 +0530 >> >> Also, can someone guide me to write a sort function for >> quail-tamil-itrans-compute-syllable-table please? The ideal order of >> consonants should be the same as the one in the default value of >> tamil-consonant-translation, same for tamil-vowel-translation. I tried >> the following >> >> (sort (reverse (mapcar #'car tamil-consonant-translation)) >> (lambda (x y) (let ((lx (length x)) >> (ly (length y))) >> (if (= lx ly) (string-lessp x y) (< lx ly))))) >> >> >> but that definitely doesn't do what I want. The idea was to sort the >> list so that the basic consonants (க் ங் ச் etc.) first then the composite >> ones (க்ஷ் க்ஷ் etc.) but `string-lessp' does not even sort the basic >> consonants in the right order (the right order being the order in the >> default value of `tamil-consonant-translation'). > > Then you'll need to write your own comparison function and use it > instead string-lessp. > I suppose so. How does the following look? (sort '("க்" "ங்" "ச்" "ஞ்" "ட்" "ண்" "ற்ற்" "ந்" "ப்" "ய்" "ம்" "த்" "ர்" "ல்" "வ்" "ள்" "ற்" "ழ்" "ன்" "ஸ்" "ஜ்" "க்ஷ்" "ஷ்" "ஹ்" "க்ஷ்" "ஶ்") (lambda (x y) (let* ((cp '(("க்" . 0) ("ங்" . 1) ("ச்" . 2) ("ஞ்" . 3) ("ட்" . 4) ("ண்" . 5) ("த்" . 6) ("ந்" . 7) ("ப்" . 8) ("ம்" . 9) ("ய்" . 10) ("ர்" . 11) ("ல்" . 12) ("வ்" . 13) ("ழ்" . 14) ("ள்" . 15) ("ற்" . 16) ("ன்" . 17) ("ஜ்" . 18) ("ஸ்" . 19) ("ஷ்" . 20) ("ஹ்" . 21) ("க்ஷ்" . 22) ("க்ஷ்" . 23) ("ஶ்" . 24))) (xp (or (assoc-default x cp nil) 10000)) (yp (or (assoc-default y cp nil) 10000))) (< xp yp)))) [ I won't have the unnecessary let in the final version. ] >> Can I use the min-width property in buffer text? > > Why do you need that? Please tell more about what you want to > accomplish. Currently we don't try too hard to ensure that text don't bump into each other in the tables we calculate. If you are unlucky, then the table will be incomprehensible so I thought about putting a reasonable min-width value on the text in signs table at least. Of course, finding a reasonable value is a headache in of itself; the better solution would be probably pulling in the vtable library but I'm not too sure about that. I also attached a screenshot comparing my running Emacs session and emacs -Q (yellow window is my current Emacs session) to get the point across better.
[screenshot_202207011914.png (image/png, attachment)]
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 14:07:01 GMT) Full text and rfc822 format available.Message #28 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 17:06:36 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Fri, 01 Jul 2022 19:17:18 +0530 > > > Then you'll need to write your own comparison function and use it > > instead string-lessp. > > > > I suppose so. How does the following look? > > (sort > '("க்" "ங்" "ச்" "ஞ்" "ட்" "ண்" "ற்ற்" "ந்" "ப்" "ய்" > "ம்" "த்" "ர்" "ல்" "வ்" "ள்" "ற்" "ழ்" "ன்" > "ஸ்" "ஜ்" "க்ஷ்" "ஷ்" "ஹ்" "க்ஷ்" "ஶ்") > (lambda (x y) > (let* ((cp '(("க்" . 0) ("ங்" . 1) ("ச்" . 2) ("ஞ்" . 3) ("ட்" . 4) ("ண்" . 5) > ("த்" . 6) ("ந்" . 7) ("ப்" . 8) ("ம்" . 9) ("ய்" . 10) ("ர்" . 11) > ("ல்" . 12) ("வ்" . 13) ("ழ்" . 14) ("ள்" . 15) ("ற்" . 16) ("ன்" . 17) > ("ஜ்" . 18) ("ஸ்" . 19) ("ஷ்" . 20) ("ஹ்" . 21) ("க்ஷ்" . 22) > ("க்ஷ்" . 23) ("ஶ்" . 24))) > (xp (or (assoc-default x cp nil) 10000)) > (yp (or (assoc-default y cp nil) 10000))) > (< xp yp)))) I don't think I understand what you want to achieve, and don't read Tamil in the first place, to tell you whether this is correct or not, sorry. > >> Can I use the min-width property in buffer text? > > > > Why do you need that? Please tell more about what you want to > > accomplish. > > Currently we don't try too hard to ensure that text don't bump into each > other in the tables we calculate. If you are unlucky, then the table > will be incomprehensible so I thought about putting a reasonable > min-width value on the text in signs table at least. Of course, finding > a reasonable value is a headache in of itself; the better solution would > be probably pulling in the vtable library but I'm not too sure about > that. I think it would be better to be more accurate in alignment of table cells. We do have string-width and string-pixel-width, let alone window-text-pixel-size. > I also attached a screenshot comparing my running Emacs session and > emacs -Q (yellow window is my current Emacs session) to get the point > across better. Looks like simple misalignment to me, which should be cured by using pixel-resolution alignment features.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 14:31:02 GMT) Full text and rfc822 format available.Message #31 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 20:00:03 +0530
[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Cc: 56323 <at> debbugs.gnu.org >> Date: Fri, 01 Jul 2022 19:17:18 +0530 >> >> > Then you'll need to write your own comparison function and use it >> > instead string-lessp. >> > >> >> I suppose so. How does the following look? >> >> (sort >> '("க்" "ங்" "ச்" "ஞ்" "ட்" "ண்" "ற்ற்" "ந்" "ப்" "ய்" >> "ம்" "த்" "ர்" "ல்" "வ்" "ள்" "ற்" "ழ்" "ன்" >> "ஸ்" "ஜ்" "க்ஷ்" "ஷ்" "ஹ்" "க்ஷ்" "ஶ்") >> (lambda (x y) >> (let* ((cp '(("க்" . 0) ("ங்" . 1) ("ச்" . 2) ("ஞ்" . 3) ("ட்" . 4) ("ண்" . 5) >> ("த்" . 6) ("ந்" . 7) ("ப்" . 8) ("ம்" . 9) ("ய்" . 10) ("ர்" . 11) >> ("ல்" . 12) ("வ்" . 13) ("ழ்" . 14) ("ள்" . 15) ("ற்" . 16) ("ன்" . 17) >> ("ஜ்" . 18) ("ஸ்" . 19) ("ஷ்" . 20) ("ஹ்" . 21) ("க்ஷ்" . 22) >> ("க்ஷ்" . 23) ("ஶ்" . 24))) >> (xp (or (assoc-default x cp nil) 10000)) >> (yp (or (assoc-default y cp nil) 10000))) >> (< xp yp)))) > > I don't think I understand what you want to achieve, and don't read > Tamil in the first place, to tell you whether this is correct or not, > sorry. > I mostly meant to ask if the weighted approach was good but I wasn't clear enough, sorry. Let me try to explain it better: Let's suppose that string-lessp does not work for English for the discussion here. The task is to sort a list of jumbled English alphabets in alphabetical order. What I'm currently doing is creating an alist where the key is the alphabet and the value is the alphabet's order (so a will be 1, b will be 2, etc.). Then in the sort function, I look for this order. If the alphabet is not in this list, then I fall back to a large number. So the code above would look like this if it were in English, (sort '("b" "z" "c" "n" "a" "aa" "p") (lambda (x y) (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4) ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9) ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14) ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19) ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24) ("z" . 25)))) (< (or (assoc-default x cp) 10000) (or (assoc-default y cp) 10000))))) and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa") which is exactly what I desire. I hope this is clear enough. Obviously, I don't have much programming experience, so I'm unsure if there's a better way to sort. >> >> Can I use the min-width property in buffer text? >> > >> > Why do you need that? Please tell more about what you want to >> > accomplish. >> >> Currently we don't try too hard to ensure that text don't bump into each >> other in the tables we calculate. If you are unlucky, then the table >> will be incomprehensible so I thought about putting a reasonable >> min-width value on the text in signs table at least. Of course, finding >> a reasonable value is a headache in of itself; the better solution would >> be probably pulling in the vtable library but I'm not too sure about >> that. > > I think it would be better to be more accurate in alignment of table > cells. We do have string-width and string-pixel-width, let alone > window-text-pixel-size. > >> I also attached a screenshot comparing my running Emacs session and >> emacs -Q (yellow window is my current Emacs session) to get the point >> across better. > > Looks like simple misalignment to me, which should be cured by using > pixel-resolution alignment features. Yep, it is misalignment. I could try to use those pixel-resolution alignment features but I really don't think I can do a good enough job. It is something I tried in the past but gave up since it was too complex for me. The current code produces a Good Enough™ table and I think I will just leave it unless Someone™ complains since after all, the current situation is much better than what we have in Emacs 28 (the docfix that happened as part of bug#50143 isn't in Emacs 28). Maybe someday, I will be annoyed enough at the misalignment to come back and fix it. But until that day, I will just leave the code as is. BTW, do you have any other code/documentation review? And what about the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html? No rush but I would like to know if it can go in since it only addresses fallouts from the previous bug in this area. Thanks.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 16:10:02 GMT) Full text and rfc822 format available.Message #34 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 19:09:36 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Fri, 01 Jul 2022 20:00:03 +0530 > > > I don't think I understand what you want to achieve, and don't read > > Tamil in the first place, to tell you whether this is correct or not, > > sorry. > > > > I mostly meant to ask if the weighted approach was good but I wasn't > clear enough, sorry. Let me try to explain it better: > > Let's suppose that string-lessp does not work for English for the > discussion here. The task is to sort a list of jumbled English > alphabets in alphabetical order. What I'm currently doing is creating > an alist where the key is the alphabet and the value is the alphabet's > order (so a will be 1, b will be 2, etc.). Then in the sort function, I > look for this order. If the alphabet is not in this list, then I fall > back to a large number. > > So the code above would look like this if it were in English, > > (sort '("b" "z" "c" "n" "a" "aa" "p") > (lambda (x y) > (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4) > ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9) > ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14) > ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19) > ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24) > ("z" . 25)))) > (< (or (assoc-default x cp) 10000) > (or (assoc-default y cp) 10000))))) > > and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa") > which is exactly what I desire. I hope this is clear enough. The above just gives each letter its order in the alphabet. But if that is what you wanted, string-lessp (or even just direct comparison of characters) would have worked for you. So there's still something important missing from your description, I think. > > Looks like simple misalignment to me, which should be cured by using > > pixel-resolution alignment features. > > Yep, it is misalignment. I could try to use those pixel-resolution > alignment features but I really don't think I can do a good enough job. > It is something I tried in the past but gave up since it was too complex > for me. The current code produces a Good Enough™ table and I think I > will just leave it unless Someone™ complains since after all, the > current situation is much better than what we have in Emacs 28 (the > docfix that happened as part of bug#50143 isn't in Emacs 28). I thought vtable.el was about solving such problems? > BTW, do you have any other code/documentation review? And what about > the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html? > No rush but I would like to know if it can go in since it only addresses > fallouts from the previous bug in this area. Thanks. It sounded to me like you are still working on the code, so I didn't see a need to review it. If you have specific parts that you'd like me to review nonetheless, please tell which parts are those.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 16:38:02 GMT) Full text and rfc822 format available.Message #37 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 22:07:38 +0530
[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote: >> I mostly meant to ask if the weighted approach was good but I wasn't >> clear enough, sorry. Let me try to explain it better: >> >> Let's suppose that string-lessp does not work for English for the >> discussion here. The task is to sort a list of jumbled English >> alphabets in alphabetical order. What I'm currently doing is creating >> an alist where the key is the alphabet and the value is the alphabet's >> order (so a will be 1, b will be 2, etc.). Then in the sort function, I >> look for this order. If the alphabet is not in this list, then I fall >> back to a large number. >> >> So the code above would look like this if it were in English, >> >> (sort '("b" "z" "c" "n" "a" "aa" "p") >> (lambda (x y) >> (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4) >> ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9) >> ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14) >> ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19) >> ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24) >> ("z" . 25)))) >> (< (or (assoc-default x cp) 10000) >> (or (assoc-default y cp) 10000))))) >> >> and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa") >> which is exactly what I desire. I hope this is clear enough. > > The above just gives each letter its order in the alphabet. But if > that is what you wanted, string-lessp (or even just direct comparison > of characters) would have worked for you. So there's still something > important missing from your description, I think. > Unfortunately, string-lessp does not do the job. (string-lessp "ஞ" "ஜ") should return t but it returns nil probably because ஞ's codepoint is 2974 and ஜ's codepoint is 2972. But ஜ is not even part of the "core" Tamil characters and hence should come at last. This is why I went with defining an alist with the _actual_ order of the characters. I hope this is clear: to demonstrate this using English, it would be something like... c's codepoint is 29 and d's codepoint is 27. Clearly, c comes before d but since string-lessp seems to rely on the Unicode codepoint, when we do the sorting with string-lessp, we get "... d c ..." in the list instead of the desired "... c d ...". I hope this is clear. >> Yep, it is misalignment. I could try to use those pixel-resolution >> alignment features but I really don't think I can do a good enough job. >> It is something I tried in the past but gave up since it was too complex >> for me. The current code produces a Good Enough™ table and I think I >> will just leave it unless Someone™ complains since after all, the >> current situation is much better than what we have in Emacs 28 (the >> docfix that happened as part of bug#50143 isn't in Emacs 28). > > I thought vtable.el was about solving such problems? Okay then, I will use that. I was mostly unsure if using vtable would be alright especially since it puts keymap properties and the entire vtable object as a text property -- it seemed too excessive for a docstring. Maybe some of this can be addressed? >> BTW, do you have any other code/documentation review? And what about >> the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html? >> No rush but I would like to know if it can go in since it only addresses >> fallouts from the previous bug in this area. Thanks. > > It sounded to me like you are still working on the code, so I didn't > see a need to review it. If you have specific parts that you'd like > me to review nonetheless, please tell which parts are those. Thanks. The patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html is done, and can be pushed to master if you see no problems. All it does is address a few fallouts that were accidentally left out when fixing bug#50143. Specifically, it adds an entry for the TAMIL OM character, and adds two more Sanskrit consonants to the Tamil itrans table. Also, I would like to know if there's a better to write the :set function for the defcustoms tamil-vowel-translation, tamil-consonant-translation, tamil-misc-translation, tamil-native-digits without the boundp check chain below, (defun tamil--set-variable (sym val) (set-default sym val) (when (and (boundp 'tamil-vowel-translation) (boundp 'tamil-consonant-translation) (boundp 'tamil-misc-translation) (boundp 'tamil-native-digits)) (tamil--update-quail-rules))) I'm also doubtful about the current group being used for these defcustoms. Should I go ahead and make a new 'tamil' group and make it a subgroup of leim or i18n? And is the prefix tamil- okay or should I change it to something else? Finally, I'm unsure if "List of input sequences to translate to ..." is clear. I think it sounds a mouthful and there should be a better way to put it. I think "translation rules" is quite nice but I'm afraid that it is too Quail specific and might not be well understood.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Fri, 01 Jul 2022 18:17:02 GMT) Full text and rfc822 format available.Message #40 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Fri, 01 Jul 2022 21:16:13 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Fri, 01 Jul 2022 22:07:38 +0530 > > >> (sort '("b" "z" "c" "n" "a" "aa" "p") > >> (lambda (x y) > >> (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4) > >> ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9) > >> ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14) > >> ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19) > >> ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24) > >> ("z" . 25)))) > >> (< (or (assoc-default x cp) 10000) > >> (or (assoc-default y cp) 10000))))) > >> > >> and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa") > >> which is exactly what I desire. I hope this is clear enough. > > > > The above just gives each letter its order in the alphabet. But if > > that is what you wanted, string-lessp (or even just direct comparison > > of characters) would have worked for you. So there's still something > > important missing from your description, I think. > > > > Unfortunately, string-lessp does not do the job. (string-lessp "ஞ" "ஜ") > should return t but it returns nil probably because ஞ's codepoint is > 2974 and ஜ's codepoint is 2972. But ஜ is not even part of the "core" > Tamil characters and hence should come at last. This is why I went with > defining an alist with the _actual_ order of the characters. Please tell what is the actual order of the characters. That is, where is that order defined, and by what criteria? I'll look into the other issues later.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 04:03:02 GMT) Full text and rfc822 format available.Message #43 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 09:32:34 +0530
[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote: >> Unfortunately, string-lessp does not do the job. (string-lessp "ஞ" "ஜ") >> should return t but it returns nil probably because ஞ's codepoint is >> 2974 and ஜ's codepoint is 2972. But ஜ is not even part of the "core" >> Tamil characters and hence should come at last. This is why I went with >> defining an alist with the _actual_ order of the characters. > > Please tell what is the actual order of the characters. That is, > where is that order defined, and by what criteria? I'm not sure what you mean "where is that order defined," I don't think there is a definition per se, it just happens to be so. There are two "classes" of consonants: those that are part of Tamil (let's call them "core") and those borrowed from Sanskrit. When one writes the consonants in order, the core consonants come first then the Sanskrit ones. You can find the order of the core consonants in wikipedia here in the table titled "Tamil consonants": https://en.wikipedia.org/wiki/Tamil_script#Letters We need not worry too much about the order of Sanskrit consonants, we just need to ensure that they come after the core consonants. You can find these Sanskrit consonants in the table titled "Grantha consonants in Tamil" in the same link. I hope this is clear. As for the criteria, it is simply "Tamil consonants then the Sanskrit consonants." > I'll look into the other issues later. Thanks.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 06:37:02 GMT) Full text and rfc822 format available.Message #46 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 09:35:56 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 09:32:34 +0530 > > > Please tell what is the actual order of the characters. That is, > > where is that order defined, and by what criteria? > > I'm not sure what you mean "where is that order defined," I don't think > there is a definition per se, it just happens to be so. > > There are two "classes" of consonants: those that are part of Tamil > (let's call them "core") and those borrowed from Sanskrit. When one > writes the consonants in order, the core consonants come first then the > Sanskrit ones. You can find the order of the core consonants in > wikipedia here in the table titled "Tamil consonants": > https://en.wikipedia.org/wiki/Tamil_script#Letters > > We need not worry too much about the order of Sanskrit consonants, we > just need to ensure that they come after the core consonants. You can > find these Sanskrit consonants in the table titled "Grantha consonants > in Tamil" in the same link. > > I hope this is clear. > > As for the criteria, it is simply "Tamil consonants then the Sanskrit > consonants." Then your comparison function should first see whether a character is in the former or the latter group, and use string-lessp or character codepoint comparison with each group, right? But that's not what you did, so I wonder whether my understanding is correct.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 06:55:01 GMT) Full text and rfc822 format available.Message #49 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 12:24:39 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Cc: 56323 <at> debbugs.gnu.org >> Date: Sat, 02 Jul 2022 09:32:34 +0530 >> >> > Please tell what is the actual order of the characters. That is, >> > where is that order defined, and by what criteria? >> >> I'm not sure what you mean "where is that order defined," I don't think >> there is a definition per se, it just happens to be so. >> >> There are two "classes" of consonants: those that are part of Tamil >> (let's call them "core") and those borrowed from Sanskrit. When one >> writes the consonants in order, the core consonants come first then the >> Sanskrit ones. You can find the order of the core consonants in >> wikipedia here in the table titled "Tamil consonants": >> https://en.wikipedia.org/wiki/Tamil_script#Letters >> >> We need not worry too much about the order of Sanskrit consonants, we >> just need to ensure that they come after the core consonants. You can >> find these Sanskrit consonants in the table titled "Grantha consonants >> in Tamil" in the same link. >> >> I hope this is clear. >> >> As for the criteria, it is simply "Tamil consonants then the Sanskrit >> consonants." > > Then your comparison function should first see whether a character is > in the former or the latter group, and use string-lessp or character > codepoint comparison with each group, right? But that's not what you > did, so I wonder whether my understanding is correct. It didn't occur to me to do it this way so I tried it out but then I noticed, string-lessp even within a group won't work. When you evaluate the following sexp, you don't get a list of increasing numbers... (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ப" "ம" "ய" "ர" "ல" "வ" "ழ" "ள" "ற" "ன"))) (mapcar (lambda (c) (string-to-char c)) core-consonants)) ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985) and sure enough when you do (sort core-consonants #'string-lessp) the list is jumbled up instead of retaining the order. [ core-consonants, as declared, is in the right order but sort jumbles it up. ] But string-lessp works for vowels. It is the consonants that is the problem.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 06:59:01 GMT) Full text and rfc822 format available.Message #52 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 09:58:17 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Fri, 01 Jul 2022 22:07:38 +0530 > > >> BTW, do you have any other code/documentation review? And what about > >> the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html? > >> No rush but I would like to know if it can go in since it only addresses > >> fallouts from the previous bug in this area. Thanks. > > > > It sounded to me like you are still working on the code, so I didn't > > see a need to review it. If you have specific parts that you'd like > > me to review nonetheless, please tell which parts are those. > > Thanks. The patch I posted in > https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html > is done, and can be pushed to master if you see no problems. I installed it, thanks. > Also, I would like to know if there's a better to write the :set > function for the defcustoms tamil-vowel-translation, > tamil-consonant-translation, tamil-misc-translation, tamil-native-digits > without the boundp check chain below, > > (defun tamil--set-variable (sym val) > (set-default sym val) > (when (and (boundp 'tamil-vowel-translation) > (boundp 'tamil-consonant-translation) > (boundp 'tamil-misc-translation) > (boundp 'tamil-native-digits)) > (tamil--update-quail-rules))) Why do you need a single function for all of them? Would a separate setter function for each defcustom do the job? I also don't understand the need for the boundp tests -- the function will live on the same indian.el file as the defcustoms, so if the function is defined, the defcustoms are also bound, no? > I'm also doubtful about the current group being used for these > defcustoms. Should I go ahead and make a new 'tamil' group and make it > a subgroup of leim or i18n? It's okay to have a separate group, but what would be the subject of this group? If it's just about input methods, the name had better reflected that, and just "tamil" is too general for that. > And is the prefix tamil- okay or should I change it to something > else? I see no problem with 'tamil-'. > Finally, I'm unsure if "List of input sequences to translate to ..." is > clear. I think it sounds a mouthful and there should be a better way to > put it. I think "translation rules" is quite nice but I'm afraid that > it is too Quail specific and might not be well understood. I have no problem with that wording, but I wonder whether we should have these defcustoms in the first place. What are the chances that some user will want to change the sequences, and why would they want that? P.S. Please in the future don't modify the Subject of the messages in the same bug report: that makes it harder to find related messages at least when using Rmail.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 07:19:01 GMT) Full text and rfc822 format available.Message #55 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 10:17:56 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 12:24:39 +0530 > > [சனி ஜூலை 02, 2022] Eli Zaretskii wrote: > > >> There are two "classes" of consonants: those that are part of Tamil > >> (let's call them "core") and those borrowed from Sanskrit. When one > >> writes the consonants in order, the core consonants come first then the > >> Sanskrit ones. You can find the order of the core consonants in > >> wikipedia here in the table titled "Tamil consonants": > >> https://en.wikipedia.org/wiki/Tamil_script#Letters > >> > >> We need not worry too much about the order of Sanskrit consonants, we > >> just need to ensure that they come after the core consonants. You can > >> find these Sanskrit consonants in the table titled "Grantha consonants > >> in Tamil" in the same link. > >> > >> I hope this is clear. > >> > >> As for the criteria, it is simply "Tamil consonants then the Sanskrit > >> consonants." > > > > Then your comparison function should first see whether a character is > > in the former or the latter group, and use string-lessp or character > > codepoint comparison with each group, right? But that's not what you > > did, so I wonder whether my understanding is correct. > > It didn't occur to me to do it this way so I tried it out but then I > noticed, string-lessp even within a group won't work. When you evaluate > the following sexp, you don't get a list of increasing numbers... > > (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" > "ந" "ப" "ம" "ய" "ர" "ல" > "வ" "ழ" "ள" "ற" "ன"))) > (mapcar (lambda (c) (string-to-char c)) core-consonants)) > > ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 > 2994 2997 2996 2995 2993 2985) > > and sure enough when you do (sort core-consonants #'string-lessp) the > list is jumbled up instead of retaining the order. > [ core-consonants, as declared, is in the right order but sort jumbles > it up. ] > > But string-lessp works for vowels. It is the consonants that is the > problem. Sorry, I don't understand what you are saying here. How is the above code related to the issue at hand, which is how to sort characters in the order you want them to be sorted? (And please keep in mind that I don't even know which of those characters are consonants and which are vowels -- if you want me to say something intelligent about that.)
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 07:36:02 GMT) Full text and rfc822 format available.Message #58 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: visuweshm <at> gmail.com Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 10:35:18 +0300
> Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 10:17:56 +0300 > From: Eli Zaretskii <eliz <at> gnu.org> > > > (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" > > "ந" "ப" "ம" "ய" "ர" "ல" > > "வ" "ழ" "ள" "ற" "ன"))) > > (mapcar (lambda (c) (string-to-char c)) core-consonants)) > > > > ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 > > 2994 2997 2996 2995 2993 2985) > > > > and sure enough when you do (sort core-consonants #'string-lessp) the > > list is jumbled up instead of retaining the order. > > [ core-consonants, as declared, is in the right order but sort jumbles > > it up. ] > > > > But string-lessp works for vowels. It is the consonants that is the > > problem. > > Sorry, I don't understand what you are saying here. How is the above > code related to the issue at hand, which is how to sort characters in > the order you want them to be sorted? (And please keep in mind that I > don't even know which of those characters are consonants and which are > vowels -- if you want me to say something intelligent about that.) Or maybe my guess below will be lucky. You probably want this: (defun sort-by-codepoint (c1 c2) (< (string-to-char c1) (string-to-char c2))) (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ப" "ம" "ய" "ர" "ல" "வ" "ழ" "ள" "ற" "ன"))) (sort core-consonants 'sort-by-codepoint)) => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ") (To understand why, read the doc string of 'sort' carefully, where it explains what is expected from PREDICATE.)
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 07:47:02 GMT) Full text and rfc822 format available.Message #61 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: visuweshm <at> gmail.com Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 10:46:00 +0300
> Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 10:35:18 +0300 > From: Eli Zaretskii <eliz <at> gnu.org> > > (defun sort-by-codepoint (c1 c2) > (< (string-to-char c1) (string-to-char c2))) > > (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" > "ந" "ப" "ம" "ய" "ர" "ல" > "வ" "ழ" "ள" "ற" "ன"))) > > (sort core-consonants 'sort-by-codepoint)) > => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ") > > (To understand why, read the doc string of 'sort' carefully, where it > explains what is expected from PREDICATE.) Hmm... but if I use string-lessp instead of sort-by-codepoint, I get the same result, as I'd expect. Which probably means I'm still missing something.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 07:59:01 GMT) Full text and rfc822 format available.Message #64 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 13:28:29 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Cc: 56323 <at> debbugs.gnu.org >> Date: Fri, 01 Jul 2022 22:07:38 +0530 >> >> >> BTW, do you have any other code/documentation review? And what about >> >> the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html? >> >> No rush but I would like to know if it can go in since it only addresses >> >> fallouts from the previous bug in this area. Thanks. >> > >> > It sounded to me like you are still working on the code, so I didn't >> > see a need to review it. If you have specific parts that you'd like >> > me to review nonetheless, please tell which parts are those. >> >> Thanks. The patch I posted in >> https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html >> is done, and can be pushed to master if you see no problems. > > I installed it, thanks. > Thanks. >> Also, I would like to know if there's a better to write the :set >> function for the defcustoms tamil-vowel-translation, >> tamil-consonant-translation, tamil-misc-translation, tamil-native-digits >> without the boundp check chain below, >> >> (defun tamil--set-variable (sym val) >> (set-default sym val) >> (when (and (boundp 'tamil-vowel-translation) >> (boundp 'tamil-consonant-translation) >> (boundp 'tamil-misc-translation) >> (boundp 'tamil-native-digits)) >> (tamil--update-quail-rules))) > > Why do you need a single function for all of them? Would a separate > setter function for each defcustom do the job? > Because it is harder to clear the old translation rules and add the new translation rules than clearing ALL translation rules and starting over again. When the user changes tamil-vowel-translation, then not only does the translation rule for the vowels change, we also need to change the translation rules for consonant+vowel pairs so that means we need to check if the consonant var is bound. (The translation rules for consonant+vowel pairs are auto-generated based on the rules for vowels and consonants.) Similarly, when the consonant defcustom changes, we need to change both the consonant and the consonant+vowel pair translation rules. Moreover, if the user decides to delete an extra consonant translation, then we need to smartly detect that and delete it from the current quail map. Instead of all this, a simple clear ALL+start over approach is much simpler. And since this approach doesn't take too much time, I don't think implementing the smarter approach would be worth it. Besides, even if this smart approach is easy to implement, quail-map structure is just too hard to manipulate by hand... > I also don't understand the need for the boundp tests -- the function > will live on the same indian.el file as the defcustoms, so if the > function is defined, the defcustoms are also bound, no? > IIUC, when we load indian.el, first, the vowel defcustom will be bound, then the consonant defcustom and so on. So this boundp test is needed, I think? See above for why the defcustoms have a "dependency" on each other. When the vowel defcustom is loaded, then its job _sometimes_ depends on the consonant defcustom being bound as well. I say sometimes because when we initially load the vowel defcustom, having a separate setter should be fine but when we change it after loading _all_ the other defcustoms (example in the Customize interface), we also need to access the consonant translation values and update the translation rules for consonant+vowel pairs. A big fat setter function that does everything at the cost of boundp checks is simpler AFAIU. >> I'm also doubtful about the current group being used for these >> defcustoms. Should I go ahead and make a new 'tamil' group and make it >> a subgroup of leim or i18n? > > It's okay to have a separate group, but what would be the subject of > this group? If it's just about input methods, the name had better > reflected that, and just "tamil" is too general for that. > I thought the subject could be "Translation rules for the Tamil input method." If you think the group name is too general, then "tamil-im" could work? >> And is the prefix tamil- okay or should I change it to something >> else? > > I see no problem with 'tamil-'. > Okay, thanks. >> Finally, I'm unsure if "List of input sequences to translate to ..." is >> clear. I think it sounds a mouthful and there should be a better way to >> put it. I think "translation rules" is quite nice but I'm afraid that >> it is too Quail specific and might not be well understood. > > I have no problem with that wording, but I wonder whether we should > have these defcustoms in the first place. What are the chances that > some user will want to change the sequences, and why would they want > that? I think the chances are quite high. As I tried to explain in the first mail, there are too many ambiguities when transliterating Tamil and sometimes there is no perfect transliteration for a character/consonant family. For example, the user in the wordpress article I linked chooses to translate ல் as 'l' ள் as 'll' and take the penalty of having to type C-SPC at the right time: to write ல்ல the sequence would l C-SPC la since lla would translate to ள. That user can take this penalty but I would rather translate ள் as L instead and not worry about C-SPC at all. Bottom line, there is no one size fits all. These small annoyances can be dealt with when one writes Tamil rarely but for frequent writing, the flexibility this input method offers will be welcome IMO. The users _can_ update the quail-map themselves by hand but that becomes tricky and a REAL chore for a language like Tamil. [ FWIW, I add new translations and modify existing translations for the compose input method by setf-ing its quail map. That is hard enough already, and I definitely wouldn't wish someone to do it for the Tamil input method. Offering a defcustom is the least we can do to ease the pain of tweaking the translation rules. ] > P.S. Please in the future don't modify the Subject of the messages in > the same bug report: that makes it harder to find related messages at > least when using Rmail. Oops, sorry about that. I thought it would be easier to track the progress but I guess it misfired.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 08:12:02 GMT) Full text and rfc822 format available.Message #67 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 13:41:17 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Cc: 56323 <at> debbugs.gnu.org >> Date: Sat, 02 Jul 2022 12:24:39 +0530 >> >> [சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> >> >> There are two "classes" of consonants: those that are part of Tamil >> >> (let's call them "core") and those borrowed from Sanskrit. When one >> >> writes the consonants in order, the core consonants come first then the >> >> Sanskrit ones. You can find the order of the core consonants in >> >> wikipedia here in the table titled "Tamil consonants": >> >> https://en.wikipedia.org/wiki/Tamil_script#Letters >> >> >> >> We need not worry too much about the order of Sanskrit consonants, we >> >> just need to ensure that they come after the core consonants. You can >> >> find these Sanskrit consonants in the table titled "Grantha consonants >> >> in Tamil" in the same link. >> >> >> >> I hope this is clear. >> >> >> >> As for the criteria, it is simply "Tamil consonants then the Sanskrit >> >> consonants." >> > >> > Then your comparison function should first see whether a character is >> > in the former or the latter group, and use string-lessp or character >> > codepoint comparison with each group, right? But that's not what you >> > did, so I wonder whether my understanding is correct. >> >> It didn't occur to me to do it this way so I tried it out but then I >> noticed, string-lessp even within a group won't work. When you evaluate >> the following sexp, you don't get a list of increasing numbers... >> >> (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" >> "ந" "ப" "ம" "ய" "ர" "ல" >> "வ" "ழ" "ள" "ற" "ன"))) >> (mapcar (lambda (c) (string-to-char c)) core-consonants)) >> >> ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 >> 2994 2997 2996 2995 2993 2985) >> >> and sure enough when you do (sort core-consonants #'string-lessp) the >> list is jumbled up instead of retaining the order. >> [ core-consonants, as declared, is in the right order but sort jumbles >> it up. ] >> >> But string-lessp works for vowels. It is the consonants that is the >> problem. > > Sorry, I don't understand what you are saying here. How is the above > code related to the issue at hand, which is how to sort characters in > the order you want them to be sorted? (And please keep in mind that I > don't even know which of those characters are consonants and which are > vowels -- if you want me to say something intelligent about that.) I'm trying to explain the behaviour of string-lessp which seems to sort the characters by their Unicode codepoints. But the order these characters appear in Unicode and their actual order is not the same so string-lessp does not do the job we want it to. [சனி ஜூலை 02, 2022] Eli Zaretskii wrote: > > Or maybe my guess below will be lucky. You probably want this: > > (defun sort-by-codepoint (c1 c2) > (< (string-to-char c1) (string-to-char c2))) > > (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" > "ந" "ப" "ம" "ய" "ர" "ல" > "வ" "ழ" "ள" "ற" "ன"))) > > (sort core-consonants 'sort-by-codepoint)) > => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ") > > (To understand why, read the doc string of 'sort' carefully, where it > explains what is expected from PREDICATE.) Unfortunately not, since it jumbles up the list. The desired outcome is the same list.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 08:31:01 GMT) Full text and rfc822 format available.Message #70 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 11:29:55 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 13:41:17 +0530 > > > (defun sort-by-codepoint (c1 c2) > > (< (string-to-char c1) (string-to-char c2))) > > > > (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" > > "ந" "ப" "ம" "ய" "ர" "ல" > > "வ" "ழ" "ள" "ற" "ன"))) > > > > (sort core-consonants 'sort-by-codepoint)) > > => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ") > > > > (To understand why, read the doc string of 'sort' carefully, where it > > explains what is expected from PREDICATE.) > > Unfortunately not, since it jumbles up the list. The desired outcome is > the same list. But we already established that you need to break the list in two, and always sort any member of one of the two sub-lists before any member of the other sub-list. I then suggested to use string-lessp _within_ each sub-list, but you said it still yielded a wrong order for some reason. So when you now return to the issue of splitting the list in two, and show how sorting the full list doesn't work, you make a step back: we already established the list cannot be sorted as a single list. The only remaining issue, AFAIU, is why string-lessp is not good enough for sorting within each sub-list.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 08:41:01 GMT) Full text and rfc822 format available.Message #73 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 11:39:47 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 13:28:29 +0530 > > >> Also, I would like to know if there's a better to write the :set > >> function for the defcustoms tamil-vowel-translation, > >> tamil-consonant-translation, tamil-misc-translation, tamil-native-digits > >> without the boundp check chain below, > >> > >> (defun tamil--set-variable (sym val) > >> (set-default sym val) > >> (when (and (boundp 'tamil-vowel-translation) > >> (boundp 'tamil-consonant-translation) > >> (boundp 'tamil-misc-translation) > >> (boundp 'tamil-native-digits)) > >> (tamil--update-quail-rules))) > > > > Why do you need a single function for all of them? Would a separate > > setter function for each defcustom do the job? > > > > Because it is harder to clear the old translation rules and add the new > translation rules than clearing ALL translation rules and starting over > again. When the user changes tamil-vowel-translation, then not only > does the translation rule for the vowels change, we also need to change > the translation rules for consonant+vowel pairs so that means we need to > check if the consonant var is bound. (The translation rules for > consonant+vowel pairs are auto-generated based on the rules for vowels > and consonants.) If the rules are generated based on both defcustom's, then shouldn't we have just one defcustom for both? IOW, what is the purpose of having two separate defcustom's here? > > I also don't understand the need for the boundp tests -- the function > > will live on the same indian.el file as the defcustoms, so if the > > function is defined, the defcustoms are also bound, no? > > > > IIUC, when we load indian.el, first, the vowel defcustom will be bound, > then the consonant defcustom and so on. So this boundp test is needed, > I think? Wouldn't that be fixed by having the setter function defined before the defcustom's? > See above for why the defcustoms have a "dependency" on each > other. When the vowel defcustom is loaded, then its job _sometimes_ > depends on the consonant defcustom being bound as well. Since the defcustom's have their default value, I don't think I see the problem. Did you actually see any problems, and if so, in which scenario, and what were the error messages? > I thought the subject could be "Translation rules for the Tamil input > method." If you think the group name is too general, then "tamil-im" > could work? tamil-input, perhaps?
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 08:41:02 GMT) Full text and rfc822 format available.Message #76 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 14:10:07 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Cc: 56323 <at> debbugs.gnu.org >> Date: Sat, 02 Jul 2022 13:41:17 +0530 >> >> > (defun sort-by-codepoint (c1 c2) >> > (< (string-to-char c1) (string-to-char c2))) >> > >> > (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த" >> > "ந" "ப" "ம" "ய" "ர" "ல" >> > "வ" "ழ" "ள" "ற" "ன"))) >> > >> > (sort core-consonants 'sort-by-codepoint)) >> > => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ") >> > >> > (To understand why, read the doc string of 'sort' carefully, where it >> > explains what is expected from PREDICATE.) >> >> Unfortunately not, since it jumbles up the list. The desired outcome is >> the same list. > > But we already established that you need to break the list in two, and > always sort any member of one of the two sub-lists before any member > of the other sub-list. I then suggested to use string-lessp _within_ > each sub-list, but you said it still yielded a wrong order for some > reason. > Yes, I hope I made my point clear below. > So when you now return to the issue of splitting the list in two, and > show how sorting the full list doesn't work, you make a step back: we > already established the list cannot be sorted as a single list. I think I might not have made my point clear: the sort function above sorts one of the sub-lists. > The only remaining issue, AFAIU, is why string-lessp is not good > enough for sorting within each sub-list. It is not good enough for each sub-list for the same reason: the order produced by string-lessp is not the same as the actual order. I will try to explain the situation using the regular English alphabets and the extra letter þ (which was used in place of "th" AFAIU). The core English alphabets are a-z then we have some extra alphabets like the þ above. When we have a list containing _both_ a-z and þ, the order produced by string-lessp is wrong. To work around this issue, we decided to break the list into two. I think we were on the same page till here. When I did as you suggested and broke the list into two -- a-z and þ -- and sorted the sub-list that only contained a-z with string-lessp, the sorted sub-list was not in the right alphabetical order i.e., instead of "a b c d ..." it was "a c b d ..." I hope the above makes the situation clear.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 08:55:02 GMT) Full text and rfc822 format available.Message #79 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 11:54:01 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 14:10:07 +0530 > > > The only remaining issue, AFAIU, is why string-lessp is not good > > enough for sorting within each sub-list. > > It is not good enough for each sub-list for the same reason: the order > produced by string-lessp is not the same as the actual order. So, then please explain what should be the "correct" order within each sub-list. Is the correct order within each sub-list in the ascending order of the codepoint? If not, what is the correct order? > I will try to explain the situation using the regular English alphabets > and the extra letter þ (which was used in place of "th" AFAIU). > > The core English alphabets are a-z then we have some extra alphabets > like the þ above. When we have a list containing _both_ a-z and þ, the > order produced by string-lessp is wrong. > > When I did as you suggested and broke the list into two -- a-z and þ -- > and sorted the sub-list that only contained a-z with string-lessp, the > sorted sub-list was not in the right alphabetical order i.e., instead of > "a b c d ..." it was "a c b d ..." That's not what I see: (let ((letters '("a" "b" "r" "x" "z"))) (sort letters 'string-lessp)) => ("a" "b" "r" "x" "z") Please show an example where characters a-z are sorted by string-lessp in the wrong order.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 09:29:01 GMT) Full text and rfc822 format available.Message #82 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 14:58:32 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> > Why do you need a single function for all of them? Would a separate >> > setter function for each defcustom do the job? >> > >> >> Because it is harder to clear the old translation rules and add the new >> translation rules than clearing ALL translation rules and starting over >> again. When the user changes tamil-vowel-translation, then not only >> does the translation rule for the vowels change, we also need to change >> the translation rules for consonant+vowel pairs so that means we need to >> check if the consonant var is bound. (The translation rules for >> consonant+vowel pairs are auto-generated based on the rules for vowels >> and consonants.) > > If the rules are generated based on both defcustom's, then shouldn't > we have just one defcustom for both? IOW, what is the purpose of > having two separate defcustom's here? > It simply seemed natural to me to separate consonants and vowels. I combined the three defcustoms (vowels, consonants and misc) as you told but the native digits defcustom is still a problem... hmm. I can just leave it to the user to add the native digit translations to the defcustom if they want. >> > I also don't understand the need for the boundp tests -- the function >> > will live on the same indian.el file as the defcustoms, so if the >> > function is defined, the defcustoms are also bound, no? >> > >> >> IIUC, when we load indian.el, first, the vowel defcustom will be bound, >> then the consonant defcustom and so on. So this boundp test is needed, >> I think? > > Wouldn't that be fixed by having the setter function defined before > the defcustom's? > >> See above for why the defcustoms have a "dependency" on each >> other. When the vowel defcustom is loaded, then its job _sometimes_ >> depends on the consonant defcustom being bound as well. > > Since the defcustom's have their default value, I don't think I see > the problem. Did you actually see any problems, and if so, in which > scenario, and what were the error messages? > I was mostly worried about the tamil-native-digits defcustom but that can be easily avoided. >> I thought the subject could be "Translation rules for the Tamil input >> method." If you think the group name is too general, then "tamil-im" >> could work? > > tamil-input, perhaps? Okay, then. That looks better to me as well. I will post an updated patch later when I clean up the comments, and docstrings. Thanks.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 09:34:02 GMT) Full text and rfc822 format available.Message #85 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 15:03:42 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Cc: 56323 <at> debbugs.gnu.org >> Date: Sat, 02 Jul 2022 14:10:07 +0530 >> >> > The only remaining issue, AFAIU, is why string-lessp is not good >> > enough for sorting within each sub-list. >> >> It is not good enough for each sub-list for the same reason: the order >> produced by string-lessp is not the same as the actual order. > > So, then please explain what should be the "correct" order within each > sub-list. Is the correct order within each sub-list in the ascending > order of the codepoint? If not, what is the correct order? > The correct order is not the ascending order of the codepoint, the correct order is க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன and their respective codepoints are 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985 >> I will try to explain the situation using the regular English alphabets >> and the extra letter þ (which was used in place of "th" AFAIU). >> >> The core English alphabets are a-z then we have some extra alphabets >> like the þ above. When we have a list containing _both_ a-z and þ, the >> order produced by string-lessp is wrong. >> >> When I did as you suggested and broke the list into two -- a-z and þ -- >> and sorted the sub-list that only contained a-z with string-lessp, the >> sorted sub-list was not in the right alphabetical order i.e., instead of >> "a b c d ..." it was "a c b d ..." > > That's not what I see: > > (let ((letters '("a" "b" "r" "x" "z"))) > (sort letters 'string-lessp)) > => ("a" "b" "r" "x" "z") > > Please show an example where characters a-z are sorted by string-lessp > in the wrong order. I didn't mean literally that string-lessp produced the wrong list for a-z, I tried to draw an analogy with a hypothetical scenario where a-z sorting did not work with string-lessp. This hypothetical scenario is the actual in case of the Tamil consonants.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 09:40:01 GMT) Full text and rfc822 format available.Message #88 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 12:38:55 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 15:03:42 +0530 > > > So, then please explain what should be the "correct" order within each > > sub-list. Is the correct order within each sub-list in the ascending > > order of the codepoint? If not, what is the correct order? > > > > The correct order is not the ascending order of the codepoint, the > correct order is > > க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன > > and their respective codepoints are > > 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985 Why is this the correct order? Does it have any definition based on some principles, not just on the above list?
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 10:32:02 GMT) Full text and rfc822 format available.Message #91 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 16:01:14 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> > So, then please explain what should be the "correct" order within each >> > sub-list. Is the correct order within each sub-list in the ascending >> > order of the codepoint? If not, what is the correct order? >> > >> >> The correct order is not the ascending order of the codepoint, the >> correct order is >> >> க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன >> >> and their respective codepoints are >> >> 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985 > > Why is this the correct order? Does it have any definition based on > some principles, not just on the above list? I'm not sure if there is a principle behind it. Is there a principle behind why a comes first after b? Same thing, I suppose. But it does raise my brow when I see them out of order which is why I'm bothering to sort them.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 10:48:01 GMT) Full text and rfc822 format available.Message #94 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 13:46:55 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sat, 02 Jul 2022 16:01:14 +0530 > > [சனி ஜூலை 02, 2022] Eli Zaretskii wrote: > > >> > So, then please explain what should be the "correct" order within each > >> > sub-list. Is the correct order within each sub-list in the ascending > >> > order of the codepoint? If not, what is the correct order? > >> > > >> > >> The correct order is not the ascending order of the codepoint, the > >> correct order is > >> > >> க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன > >> > >> and their respective codepoints are > >> > >> 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985 > > > > Why is this the correct order? Does it have any definition based on > > some principles, not just on the above list? > > I'm not sure if there is a principle behind it. Is there a principle > behind why a comes first after b? Yes: the codepoint order. There's no question about ordering when it's according to the codepoints. If you want some other order, then you need to define the rules for the order you want. Is the order in which you want to sort the characters for Tamil accepted somewhere, or is it your own preference? If the former, where can one read about that order? There was also another part to your original question about sorting, AFAIR: you wanted to sort syllables, not just single characters. Assuming the sorting order of the single characters is established in some way, what is left to determine how to order syllables?
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 11:07:02 GMT) Full text and rfc822 format available.Message #97 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: समीर सिंह Sameer Singh <lumarzeli30 <at> gmail.com> To: Visuwesh <visuweshm <at> gmail.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 2 Jul 2022 16:35:51 +0530
[Message part 1 (text/plain, inline)]
There is indeed a principle behind the ordering of letters in Indian languages taken from Sanskrit, and AFAICT Tamil also follows it. க் ங் ச் ஞ் ட் ண் த் ந் ப் ம் If we look at it rowwise, the first row is the velar consonants, then the palatal then retroflex then dental then labial. If you notice here, we are gradually moving from the back of the mouth to the front! If we look at it columnwise the first column consists of unvoiced/voiced consonants and the second column consists of nasals. Then come the semivowels ய் ர் ல் வ் ழ் ள் After that ற் ன் शनि, 2 जुल॰ 2022, 4:02 pm को Visuwesh <visuweshm <at> gmail.com> ने लिखा: > [சனி ஜூலை 02, 2022] Eli Zaretskii wrote: > > >> > So, then please explain what should be the "correct" order within each > >> > sub-list. Is the correct order within each sub-list in the ascending > >> > order of the codepoint? If not, what is the correct order? > >> > > >> > >> The correct order is not the ascending order of the codepoint, the > >> correct order is > >> > >> க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன > >> > >> and their respective codepoints are > >> > >> 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 > 2996 2995 2993 2985 > > > > Why is this the correct order? Does it have any definition based on > > some principles, not just on the above list? > > I'm not sure if there is a principle behind it. Is there a principle > behind why a comes first after b? Same thing, I suppose. But it does > raise my brow when I see them out of order which is why I'm bothering to > sort them. > > > >
[Message part 2 (text/html, inline)]
[Screenshot_20220702-163431_Twitter.png (image/png, attachment)]
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 12:20:02 GMT) Full text and rfc822 format available.Message #100 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: समीर सिंह Sameer Singh <lumarzeli30 <at> gmail.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 17:34:05 +0530
[சனி ஜூலை 02, 2022] समीर सिंह Sameer Singh wrote: > There is indeed a principle behind the ordering of letters in Indian > languages taken from Sanskrit, and AFAICT Tamil also follows it. > > க் ங் > ச் ஞ் > ட் ண் > த் ந் > ப் ம் > > If we look at it rowwise, the first row is the velar consonants, then the > palatal then retroflex then dental then labial. If you notice here, we are > gradually moving from the back of the mouth to the front! > Aha! I never noticed this, thanks for this interesting info. It was just an order for me just like A B C D ... etc. > If we look at it columnwise the first column consists of unvoiced/voiced > consonants and the second column consists of nasals. > > Then come the semivowels > ய் ர் ல் வ் ழ் ள் > > After that > ற் ன் >
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 12:20:02 GMT) Full text and rfc822 format available.Message #103 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 17:45:45 +0530
[Message part 1 (text/plain, inline)]
[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote: >> > Looks like simple misalignment to me, which should be cured by using >> > pixel-resolution alignment features. >> >> Yep, it is misalignment. I could try to use those pixel-resolution >> alignment features but I really don't think I can do a good enough job. >> It is something I tried in the past but gave up since it was too complex >> for me. The current code produces a Good Enough™ table and I think I >> will just leave it unless Someone™ complains since after all, the >> current situation is much better than what we have in Emacs 28 (the >> docfix that happened as part of bug#50143 isn't in Emacs 28). > > I thought vtable.el was about solving such problems? I tried to use vtable.el to produce the syllable table. There are two problems: . all the calculation done by vtable is slow (perhaps to no one's surprise). . the buffer becomes noticeably slow to scroll after the table is inserted. I've attached an elisp file of my current progress.
[table.el (application/emacs-lisp, attachment)]
[Message part 3 (text/plain, inline)]
When I commented out the make-vtable call and benchmarked it, it was fast so it is not the creation of table data structure that is the bottleneck.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 12:20:03 GMT) Full text and rfc822 format available.Message #106 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 17:38:11 +0530
[சனி ஜூலை 02, 2022] Eli Zaretskii wrote: >> I'm not sure if there is a principle behind it. Is there a principle >> behind why a comes first after b? > > Yes: the codepoint order. I meant the order in the English language, not the codepoints. > There's no question about ordering when it's according to the > codepoints. If you want some other order, then you need to define the > rules for the order you want. > > Is the order in which you want to sort the characters for Tamil > accepted somewhere, or is it your own preference? If the former, where > can one read about that order? > It is the order followed by everyone. See the table titled "Tamil consonants" in this wikipedia article https://en.wikipedia.org/wiki/Tamil_script#Letters. If you want details about the order, it will probably be not translated in English. I also skimmed through the Tamil wikipedia and found nothing there. > There was also another part to your original question about sorting, > AFAIR: you wanted to sort syllables, not just single characters. > Assuming the sorting order of the single characters is established in > some way, what is left to determine how to order syllables? The order of the syllables fall in place once we sort the consonants and the vowels. Vowels can be sorted by using string-lessp so once we sort the consonants, it is a simple matter of concatenation to produce the table. (See my other email also.)
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sat, 02 Jul 2022 12:25:02 GMT) Full text and rfc822 format available.Message #109 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: समीर सिंह Sameer Singh <lumarzeli30 <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org, visuweshm <at> gmail.com Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sat, 02 Jul 2022 15:23:56 +0300
> From: समीर सिंह Sameer Singh <lumarzeli30 <at> gmail.com> > Date: Sat, 2 Jul 2022 16:35:51 +0530 > Cc: Eli Zaretskii <eliz <at> gnu.org>, 56323 <at> debbugs.gnu.org > > There is indeed a principle behind the ordering of letters in Indian languages taken from Sanskrit, and AFAICT > Tamil also follows it. > > க் ங் > ச் ஞ் > ட் ண் > த் ந் > ப் ம் > > If we look at it rowwise, the first row is the velar consonants, then the palatal then retroflex then dental then > labial. If you notice here, we are gradually moving from the back of the mouth to the front! > > If we look at it columnwise the first column consists of unvoiced/voiced consonants and the second column > consists of nasals. > > Then come the semivowels > ய் ர் ல் வ் ழ் ள் > > After that > ற் ன் Thanks. If there's no existing property of characters that we could use to produce this order, I guess we will need an alist of characters and their ordinal numbers, and use that. Or, if the codepoints of these characters are contiguous, we could have just the ordinal numbers in the order of the codepoints, and use that in the function passed as the PREDICATE argument to 'sort'.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sun, 03 Jul 2022 03:59:02 GMT) Full text and rfc822 format available.Message #112 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sun, 03 Jul 2022 09:27:55 +0530
[சனி ஜூலை 02, 2022] Visuwesh wrote: > [வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote: > >>> > Looks like simple misalignment to me, which should be cured by using >>> > pixel-resolution alignment features. >>> >>> Yep, it is misalignment. I could try to use those pixel-resolution >>> alignment features but I really don't think I can do a good enough job. >>> It is something I tried in the past but gave up since it was too complex >>> for me. The current code produces a Good Enough™ table and I think I >>> will just leave it unless Someone™ complains since after all, the >>> current situation is much better than what we have in Emacs 28 (the >>> docfix that happened as part of bug#50143 isn't in Emacs 28). >> >> I thought vtable.el was about solving such problems? > > I tried to use vtable.el to produce the syllable table. There are two > problems: > > . all the calculation done by vtable is slow (perhaps to no one's > surprise). > . the buffer becomes noticeably slow to scroll after the table is > inserted. Stripping the text-properties keymap, vtable, vtable-column and vtable-object from the buffer text improved the performance of scrolling substantially but it is still kind of sluggish. > I've attached an elisp file of my current progress. > > When I commented out the make-vtable call and benchmarked it, it was > fast so it is not the table data structure that is the bottleneck.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sun, 10 Jul 2022 03:57:02 GMT) Full text and rfc822 format available.Message #115 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sun, 10 Jul 2022 09:26:39 +0530
[Message part 1 (text/plain, inline)]
[சனி ஜூலை 02, 2022] Visuwesh wrote: > I will post an updated patch later when I clean up the comments, and > docstrings. Thanks. Here's an updated patch.
[0001-Add-new-customisable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]
[Message part 3 (text/plain, inline)]
I don't use vtable since it is too slow. :( [ Also, I don't see the customization group until I load lisp/leim/quail/indian.el? But AFAICT, that's not the case for other custom groups. ]
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sun, 10 Jul 2022 05:35:02 GMT) Full text and rfc822 format available.Message #118 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sun, 10 Jul 2022 08:34:12 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sun, 10 Jul 2022 09:26:39 +0530 > > > I will post an updated patch later when I clean up the comments, and > > docstrings. Thanks. > > Here's an updated patch. Thanks. > +--- > +*** New default phonetic input method for the Tamil language environment. > +The default input method for the Tamil language environment is now > +"tamil" which is a customizable phonetic input method. To change the > +input method's translation rules, customize the user option > +'tamil-translation-rules'. > + > > * Changes in Specialized Modes and Packages in Emacs 29.1 > > diff --git a/lisp/language/indian.el b/lisp/language/indian.el > index 2887d410ad..91ad818533 100644 > --- a/lisp/language/indian.el > +++ b/lisp/language/indian.el > @@ -109,7 +109,7 @@ 'devanagari > "Tamil" '((charset unicode) > (coding-system utf-8) > (coding-priority utf-8) > - (input-method . "tamil-itrans") > + (input-method . "tamil") > (sample-text . "Tamil (தமிழ்) வணக்கம்") > (documentation . "\ Please name the new input method "tamil-phonetic", not just "tamil", so that users who type "C-u C-\ tamil TAB" could have some means of making the decision which one to choose. > +;; This is needed since the Unicode codepoint order does not reflect > +;; the actual order in the Tamil language. > +(defvar quail-tamil-itrans--consonant-order > + '(("க" . 0) ("ங" . 1) ("ச" . 2) ("ஞ" . 3) ("ட" . 4) ("ண" . 5) > + ("த" . 6) ("ந" . 7) ("ப" . 8) ("ம" . 9) ("ய" . 10) ("ர" . 11) > + ("ல" . 12) ("வ" . 13) ("ழ" . 14) ("ள" . 15) ("ற" . 16) ("ன" . 17) > + ("ஜ" . 18) ("ஸ" . 19) ("ஷ" . 20) ("ஹ" . 21) ("க்ஷ" . 22) > + ("க்ஷ" . 23) ("ஶ" . 24))) Since the characters are ordered in the correct order, I wonder why we need the explicit ordinal numbers here: they are determined by the index of the character in the list. > +(defun quail-tamil-itrans-compute-syllable-table (vowels consonants) > + "Return the syllable table for the input method as a string. > +VOWELS is a list of (VOWEL SIGN TRANS) where VOWEL is a string or > +character representing the Tamil vowel character, SIGN is the ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ What does it mean "character representing ... character"? Can you clarify this confusing part of the doc string? > +vowel sign corresponding to VOWEL or nil for none, Likewise here: "vowel corresponding to VOWEL"? > and TRANS is > +the input sequence to insert VOWEL. The input sequence is generally a sequence of ASCII characters, is that right? If so, I think telling that would make the documentation more clear. Also, TRANS is a peculiar name for something described as "input sequence", so maybe rename it to INPUT-SEQ? > +CONSONANTS is a list of (CONSONANT TRANS...) where CONSONANT is > +the Tamil consonant character, and TRANS is one or more strings > +that describe how to insert CONSONANT." Same here regarding TRANS and its description. > + (setq vowels (sort vowels (lambda (x y) (string-lessp (car x) (car y)))) > + consonants (sort consonants > + (lambda (x y) > + (< (or (assoc-default (car x) quail-tamil-itrans--consonant-order) 10000) > + (or (assoc-default (car y) quail-tamil-itrans--consonant-order) 10000))))) Can you wrap these long lines, so that they would be easier to read? > + (let ((digits "௦௧௨௩௪௫௬௭௮௯") > (width 6) clm) > (with-temp-buffer > - (insert "\n" (make-string 18 ?-) "+") > - (when digitp (insert (make-string 60 ?-))) > + (insert "\n" (make-string 18 ?-)) > + (when digitp > + (insert "+" (make-string 60 ?-))) > (insert "\n") > (insert > (propertize "\t" 'display '(space :align-to 5)) "various" > - (propertize "\t" 'display '(space :align-to 18)) "|") > + (propertize "\t" 'display '(space :align-to 18))) > (when digitp > (insert > - (propertize "\t" 'display '(space :align-to 45)) "digits")) > - (insert "\n" (make-string 18 ?-) "+") > + "|" (propertize "\t" 'display '(space :align-to 45)) "digits")) > + (insert "\n" (make-string 18 ?-)) Did you test those :align-to specs when display-line-numbers is in use? > +;;; > +;;; Tamil phonetic input method > +;;; > + > +;; Define the input method straightaway. > +(quail-define-package "tamil" "Tamil" "ழ" t > + "Customisable Tamil phonetic input method. See above regarding the name of the input method. > + ;; Consonants. > + ("க்" "k" "g") ("ங்" "ng") ("ச்" "ch" "s") ("ஞ்" "nj") ("ட்" "t" "d") > + ("ண்" "N") ("த்" "th" "dh") ("ந்" "nh") ("ப்" "p" "b") ("ம்" "m") > + ("ய்" "y") ("ர்" "r") ("ல்" "l") ("வ்" "v") ("ழ்" "z" "zh") > + ("ள்" "L") ("ற்" "rh") ("ன்" "n") > + ;; Sanskrit. > + ("ஜ்" "j") ("ஸ்" "S") ("ஷ்" "sh") ("ஹ்" "h") > + ("க்ஷ்" "ksh") ("க்ஷ்" "ksH") ("ஶ்" "Z") > + > + ;; Misc. ஃ is neither a consonant nor a vowel. > + ("ஃ" "F" "q") > + ("ௐ" "OM")) > + "List of input sequences to translate to Tamil characters. > +Each element should be (CHARACTER . TRANSLATIONS) where CHARACTER The (CHARACTER . TRANSLATIONS) form seems to imply the elements are cons cells, but the value itself uses lists. Suggest to say instead Each element should be (CHARACTER TRANSLATIONS...) > +is the Tamil character, and TRANSLATIONS is a list of input > +sequences to translate to that character. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ "sequences which produce that character" is better. And I suggest to use INPUT-SEQUENCES here, not TRANSLATIONS, for the reason explained above. > +CHARACTER is considered as a consonant (மெய் எழுத்து) if it ends > +with a pulli. What is a "pulli"? It is not a character name AFAICT. > +CHARACTER is that is neither a vowel nor a consonant are ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Typo and/or redundant words here. > +considered as \"miscellaneous\" characters and are inserted as > +is. Not sure what this wants to say: the fact that characters are inserted in some way seems to be unrelated to the description of the value. What is this about? > +The input sequence for consonant+vowel pairs (உயிர்மெய் எழுத்துக்கள்) > +is the input sequence for the consonant followed by the > +corresponding vowel." Isn't that obvious? If not, the non-obvious part(s) should be mentioned explicitly. > + :group 'tamil-input > + :type '(alist :key-type string :value-type (repeat string)) > + :set #'tamil--setter > + :options This defcustom lacks the :version tag. > [ Also, I don't see the customization group until I load > lisp/leim/quail/indian.el? But AFAICT, that's not the case for other > custom groups. ] There are no defcustoms in leim/quail/ files. How about moving the defcustom to lisp/language/indian.el?
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sun, 10 Jul 2022 06:44:01 GMT) Full text and rfc822 format available.Message #121 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sun, 10 Jul 2022 12:12:47 +0530
[Message part 1 (text/plain, inline)]
[ஞாயிறு ஜூலை 10, 2022] Eli Zaretskii wrote: > Please name the new input method "tamil-phonetic", not just "tamil", > so that users who type "C-u C-\ tamil TAB" could have some means of > making the decision which one to choose. Done. >> +;; This is needed since the Unicode codepoint order does not reflect >> +;; the actual order in the Tamil language. >> +(defvar quail-tamil-itrans--consonant-order >> + '(("க" . 0) ("ங" . 1) ("ச" . 2) ("ஞ" . 3) ("ட" . 4) ("ண" . 5) >> + ("த" . 6) ("ந" . 7) ("ப" . 8) ("ம" . 9) ("ய" . 10) ("ர" . 11) >> + ("ல" . 12) ("வ" . 13) ("ழ" . 14) ("ள" . 15) ("ற" . 16) ("ன" . 17) >> + ("ஜ" . 18) ("ஸ" . 19) ("ஷ" . 20) ("ஹ" . 21) ("க்ஷ" . 22) >> + ("க்ஷ" . 23) ("ஶ" . 24))) > > Since the characters are ordered in the correct order, I wonder why we > need the explicit ordinal numbers here: they are determined by the > index of the character in the list. Ah yes, we could use seq-position, I forgot about that. Now done. >> +(defun quail-tamil-itrans-compute-syllable-table (vowels consonants) >> + "Return the syllable table for the input method as a string. >> +VOWELS is a list of (VOWEL SIGN TRANS) where VOWEL is a string or >> +character representing the Tamil vowel character, SIGN is the > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > What does it mean "character representing ... character"? Can you > clarify this confusing part of the doc string? I mean to say that VOWEL can be the datatypes string or character. But now, I cut that part out since I say no such thing for CONSONANT as well. >> +vowel sign corresponding to VOWEL or nil for none, > > Likewise here: "vowel corresponding to VOWEL"? It should be vowel sign corresponding to VOWEL. I'm not sure how to phrase it better, I borrowed the term "vowel sign" from the Unicode name (e.g., name of ு a.k.a. #x0bc1). >> and TRANS is >> +the input sequence to insert VOWEL. > > The input sequence is generally a sequence of ASCII characters, is > that right? If so, I think telling that would make the documentation > more clear. Also, TRANS is a peculiar name for something described as > "input sequence", so maybe rename it to INPUT-SEQ? > >> +CONSONANTS is a list of (CONSONANT TRANS...) where CONSONANT is >> +the Tamil consonant character, and TRANS is one or more strings >> +that describe how to insert CONSONANT." > > Same here regarding TRANS and its description. Now done. >> + (setq vowels (sort vowels (lambda (x y) (string-lessp (car x) (car y)))) >> + consonants (sort consonants >> + (lambda (x y) >> + (< (or (assoc-default (car x) quail-tamil-itrans--consonant-order) 10000) >> + (or (assoc-default (car y) quail-tamil-itrans--consonant-order) 10000))))) > > Can you wrap these long lines, so that they would be easier to read? I hope it is better now. >> + (let ((digits "௦௧௨௩௪௫௬௭௮௯") >> (width 6) clm) >> (with-temp-buffer >> - (insert "\n" (make-string 18 ?-) "+") >> - (when digitp (insert (make-string 60 ?-))) >> + (insert "\n" (make-string 18 ?-)) >> + (when digitp >> + (insert "+" (make-string 60 ?-))) >> (insert "\n") >> (insert >> (propertize "\t" 'display '(space :align-to 5)) "various" >> - (propertize "\t" 'display '(space :align-to 18)) "|") >> + (propertize "\t" 'display '(space :align-to 18))) >> (when digitp >> (insert >> - (propertize "\t" 'display '(space :align-to 45)) "digits")) >> - (insert "\n" (make-string 18 ?-) "+") >> + "|" (propertize "\t" 'display '(space :align-to 45)) "digits")) >> + (insert "\n" (make-string 18 ?-)) > > Did you test those :align-to specs when display-line-numbers is in > use? Seems to work fine from a short test on my side. >> +;;; >> +;;; Tamil phonetic input method >> +;;; >> + >> +;; Define the input method straightaway. >> +(quail-define-package "tamil" "Tamil" "ழ" t >> + "Customisable Tamil phonetic input method. > > See above regarding the name of the input method. Done. >> + ;; Consonants. >> + ("க்" "k" "g") ("ங்" "ng") ("ச்" "ch" "s") ("ஞ்" "nj") ("ட்" "t" "d") >> + ("ண்" "N") ("த்" "th" "dh") ("ந்" "nh") ("ப்" "p" "b") ("ம்" "m") >> + ("ய்" "y") ("ர்" "r") ("ல்" "l") ("வ்" "v") ("ழ்" "z" "zh") >> + ("ள்" "L") ("ற்" "rh") ("ன்" "n") >> + ;; Sanskrit. >> + ("ஜ்" "j") ("ஸ்" "S") ("ஷ்" "sh") ("ஹ்" "h") >> + ("க்ஷ்" "ksh") ("க்ஷ்" "ksH") ("ஶ்" "Z") >> + >> + ;; Misc. ஃ is neither a consonant nor a vowel. >> + ("ஃ" "F" "q") >> + ("ௐ" "OM")) >> + "List of input sequences to translate to Tamil characters. >> +Each element should be (CHARACTER . TRANSLATIONS) where CHARACTER > > The (CHARACTER . TRANSLATIONS) form seems to imply the elements are > cons cells, but the value itself uses lists. Suggest to say instead > > Each element should be (CHARACTER TRANSLATIONS...) > Done. >> +is the Tamil character, and TRANSLATIONS is a list of input >> +sequences to translate to that character. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > "sequences which produce that character" is better. And I suggest to > use INPUT-SEQUENCES here, not TRANSLATIONS, for the reason explained > above. > Done. >> +CHARACTER is considered as a consonant (மெய் எழுத்து) if it ends >> +with a pulli. > > What is a "pulli"? It is not a character name AFAICT. > It is the Tamil name for virama. I use pulli over virama since I don't think any Tamil reader would know it. But I put virama in brackets now for future maintainers. >> +CHARACTER is that is neither a vowel nor a consonant are > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Typo and/or redundant words here. > Fixed, thanks. >> +considered as \"miscellaneous\" characters and are inserted as >> +is. > > Not sure what this wants to say: the fact that characters are inserted > in some way seems to be unrelated to the description of the value. > What is this about? I tried to allude to the miscellaneous section in the docstring but I don't think it is really necessary. Now removed. >> +The input sequence for consonant+vowel pairs (உயிர்மெய் எழுத்துக்கள்) >> +is the input sequence for the consonant followed by the >> +corresponding vowel." > > Isn't that obvious? If not, the non-obvious part(s) should be > mentioned explicitly. Thinking twice, yes, it should be obvious. I removed this part. >> + :group 'tamil-input >> + :type '(alist :key-type string :value-type (repeat string)) >> + :set #'tamil--setter >> + :options > > This defcustom lacks the :version tag. > Oops, now fixed. Updated patch attached.
[0001-Add-new-customisable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]
[Message part 3 (text/plain, inline)]
>> [ Also, I don't see the customization group until I load >> lisp/leim/quail/indian.el? But AFAICT, that's not the case for other >> custom groups. ] > > There are no defcustoms in leim/quail/ files. How about moving the > defcustom to lisp/language/indian.el? Hmm, moving it to lisp/language/indian.el brings in warnings about undefined vars and functions, and an error when dumping. In toplevel form: language/indian.el:147:31: Warning: reference to free variable ‘tamil--vowel-signs’ language/indian.el:151:32: Warning: reference to free variable ‘indian-tml-base-table’ language/indian.el:154:41: Warning: reference to free variable ‘indian-tml-base-digits-table’ In end of data: language/indian.el:143:10: Warning: the function ‘tamil--setter’ is not known to be defined. rm -f emacs && cp -f temacs emacs LC_ALL=C ./temacs -batch -l loadup --temacs=pdump \ --bin-dest /usr/local/bin/ --eln-dest /usr/local/lib/emacs/29.0.50/ Loading loadup.el (source)... Dump mode: pdump Using load-path (/home/viz/lib/ports/emacs/lisp) Loading emacs-lisp/debug-early... Loading emacs-lisp/byte-run... Loading emacs-lisp/backquote... Loading subr... Loading keymap... Loading version... Loading widget... Loading custom... Loading emacs-lisp/map-ynp... Loading international/mule... Loading international/mule-conf... Loading env... Loading format... Loading bindings... Loading window... Loading files... Loading emacs-lisp/macroexp... Loading cus-face... Loading faces... Loading loaddefs.el (source)... Loading button... Loading emacs-lisp/cl-preloaded... Loading emacs-lisp/oclosure... Loading obarray... Loading abbrev... Loading help... Loading jka-cmpr-hook... Loading epa-hook... Loading international/mule-cmds... Loading case-table... Loading international/charprop.el (source)... Loading international/characters... Loading international/charscript... Loading international/emoji-zwj... Loading composite... Loading language/chinese... Loading language/cyrillic... Loading language/indian... Error: void-variable (tamil--vowel-signs) (require cl-print) while preparing to dump make[1]: *** [Makefile:639: emacs.pdmp] Error 255 make[1]: Leaving directory '/home/viz/lib/ports/emacs/src' make: *** [Makefile:469: src] Error 2 Should I stick in defvar's and declare-function's?
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Sun, 10 Jul 2022 07:33:01 GMT) Full text and rfc822 format available.Message #124 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 56323 <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Sun, 10 Jul 2022 13:02:11 +0530
[Message part 1 (text/plain, inline)]
[ஞாயிறு ஜூலை 10, 2022] Visuwesh wrote: > [ஞாயிறு ஜூலை 10, 2022] Eli Zaretskii wrote: > > Updated patch attached. > I managed to miss a comment, sorry about that. Now fixed in attached patch.
[0001-Add-new-customisable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]
Eli Zaretskii <eliz <at> gnu.org>
:Visuwesh <visuweshm <at> gmail.com>
:Message #129 received at 56323-done <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Visuwesh <visuweshm <at> gmail.com> Cc: 56323-done <at> debbugs.gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Thu, 14 Jul 2022 09:34:14 +0300
> From: Visuwesh <visuweshm <at> gmail.com> > Cc: 56323 <at> debbugs.gnu.org > Date: Sun, 10 Jul 2022 13:02:11 +0530 > > > Updated patch attached. > > > > I managed to miss a comment, sorry about that. Now fixed in attached > patch. Thanks, installed.
bug-gnu-emacs <at> gnu.org
:bug#56323
; Package emacs
.
(Thu, 14 Jul 2022 07:13:02 GMT) Full text and rfc822 format available.Message #132 received at 56323 <at> debbugs.gnu.org (full text, mbox):
From: Visuwesh <visuweshm <at> gmail.com> To: 56323 <at> debbugs.gnu.org Cc: eliz <at> gnu.org Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil input method Date: Thu, 14 Jul 2022 12:41:58 +0530
[வியாழன் ஜூலை 14, 2022] Eli Zaretskii wrote: >> From: Visuwesh <visuweshm <at> gmail.com> >> Cc: 56323 <at> debbugs.gnu.org >> Date: Sun, 10 Jul 2022 13:02:11 +0530 >> >> > Updated patch attached. >> > >> >> I managed to miss a comment, sorry about that. Now fixed in attached >> patch. > > Thanks, installed. Thanks!
Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Thu, 11 Aug 2022 11:24:06 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.