GNU bug report logs - #56323
29.0.50; Add new customisable phonetic Tamil input method

Previous Next

Package: emacs;

Reported by: Visuwesh <visuweshm <at> gmail.com>

Date: Thu, 30 Jun 2022 12:14:02 UTC

Severity: wishlist

Tags: patch

Found in version 29.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56323 in the body.
You can then email your comments to 56323 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Thu, 30 Jun 2022 12:14:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Visuwesh <visuweshm <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 30 Jun 2022 12:14:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; Add new customisable phonetic Tamil input method
Date: Thu, 30 Jun 2022 17:43:21 +0530

[Message part 1 (text/plain, inline)]

Tags: patch

The attached patchset adds a new customisable phonetic Tamil input
method. I tried to reuse as much of the existing itrans input method
code since it greatly simplifies the creation of an Indic input method
(see `indian-make-hash').

The first patch fixes a fallout from bug#50143 asking to add TAMIL OM ௐ
to the itrans table, and this means that one can insert the TAMIL OM
character using the tamil-itrans input methods as well. I'd prefer it
if this patch can be pushed quickly.

The second patch actually adds the new phonetic input method. I will
leave the rationale for making it a _customisable_ input method in
footnote [1]. To reuse the existing code that calculates the various
tables for the tamil-itrans IM, I turned the code in defvars to defuns.
However, the definition of the almighty
quail-tamil-itrans-syllable-table is still huge since I needed to do a
whole lot to convert the indian-tml-base-table to a format that will
accepted by the new defun `quail-tamil-itrans-compute-syllable-table'.

The current quail rules is inspired by the one in
https://github.com/rnchzn/tamil-phonetic/raw/main/tamil-phonetic.el and
the comments in
https://emacsnotes.wordpress.com/2022/03/07/tamil-phonetic-input-method-in-emacs-emacs-%E0%AE%87%E0%AE%B2%E0%AF%8D-%E0%AE%A4%E0%AE%AE%E0%AE%BF%E0%AE%B4%E0%AF%8D-%E0%AE%83%E0%AE%AA%E0%AF%8A%E0%AE%A9%E0%AF%86%E0%AE%9F%E0%AE%BF%E0%AE%95%E0%AF%8D/.

Avid readers might notice that I went for a nil SIMPLE argument despite
my recent complaint in emacs-devel. The reason for that is because we
need a way to end the ongoing translation (C-SPC). E.g., if one decides
to transliterate ல் as "l" and ள் as "ll", then to type ல்ல the key
sequence will be

l C-SPC la

without the C-SPC, "lla" would be translated to ள. The better way
forward would be to present _both_ ல்ல and ள் for the sequence "lla" but I
have no idea how to do it. Any pointers would be _highly_ appreciated.

I plan to modify indian--puthash-char to have one to many translations
i.e., "l" would translate to both ல் and ள் and then the user could decide
which one to insert. This combined with the DETERMINISTIC argument to
quail-define-package would make it an attractive option, I think. But
I'm leaving it out right now since I want the current patch to be
reviewed first.

I think adding an optional NAME argument to tamil--update-quail-rules
might be more flexible since then a user could let bind the relevant
defcustoms to define other Tamil input methods without hassle (like the
tamil99 layout, which I plan to get to at Some Point™). WDYT?

The code for tamil--update-quail-rules is sort of convoluted because of
the conversion mentioned above. tamil--make-trans-table is also kind of
complicated because,

1. I couldn't make the tamil-vowel-translation (and consonant, and
misc) alist have a character key since the Customize interface
shows those characters as numbers!! I really do not want to dig
into the Customize UI code, sorry. :(

2. indian-tml-base-table has the character க in it but the defcustom
tamil-consonant-translation has the character க் in it because the
latter makes more sense to a native speaker and also because of
(1) above. More explanation as to why in footnote [2].

There are some FIXMEs spattered in the code but I will get to it in a
later revision. I also don't have a :set function for the defcustoms
since I'm not sure if something along the following is the only way to
automagically recalculate the quail rules:

(defun tamil--set-variable (sym val)
(set-default sym val)
(when (and (boundp 'tamil-vowel-translation)
(boundp 'tamil-consonant-translation)
(boundp 'tamil-misc-translation)
(boundp 'tamil-native-digits))
(tamil--update-quail-rules)))

Comments on this, and general code review would be much appreciated.
I don't think I have missed anything and if you want me to add more
comments on some of the stuff, please do tell. Thanks.

If Tamil speakers are reading this bug report, shout at me if you want
something else and if you have other general comments. Or if I made an
embarrassing typo somewhere. Thanks!

[0001-Fix-fallout-from-bug-50143.patch (text/x-diff, attachment)]

[0002-Add-new-customizable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]

[Message part 4 (text/plain, inline)]


---

Footnotes:

1. The itrans input method is absolutely horrible for Tamil since unlike
   the other Indic languages, it doesn't have a lot of consonants
   HOWEVER, the consonant sound _changes_ depending on where it ends up.
   So ideally, the Tamil input method show allow multiple _ways_ to
   insert a single character.  As an example, consider the following
   words

        தும்பிக்கை - thumbikai            (tusk)
        படம் - padam                      (photograph/image)

    The consonant of interest is "ப".  The letter "பி" is pronounced in
    the first word as "bi" as in "bicycle" however, the letter "ப" is
    pronounced as "pa" as in "party".  This is just one of many
    examples.

    There are also pairs of very similar sounding consonants and when
    transliterated (when you type in "Tanglish" for example), all the
    characters in the pair use the same letter.  E.g., such a pair is
    the ல/ள family; when one causally chats in "Tanglish", we just type
    "lXX" as the transliteration for that family.  Obviously, when one
    is typing in _Tamil_, he/she needs to distinguish between these two
    characters.  Leaving the choice of input sequence to transliterate
    these characters to the writer is much better.  For more, please
    read the wordpress article I linked, thanks.

2. Opting to not go for character key in tamil-consonant-translation
   because of the Customize interface is only part of the reason.

   Having the key be TAMIL LETTER XXX + TAMIL SIGN VIRAMA is much more
   intuitive for the native speaker.  Take பு for example, the way you
   break it down into consonant and vowel is

        ப் + உ = பு
        (ippu + u = pu)

   and NOT

        ப + உ = பு
        (pa + u = pu)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Thu, 30 Jun 2022 14:10:01 GMT) Full text and rfc822 format available.

Message #8 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; Add new customisable phonetic Tamil input
 method
Date: Thu, 30 Jun 2022 19:38:49 +0530

[Message part 1 (text/plain, inline)]

[வியாழன் ஜூன் 30, 2022] Visuwesh wrote:

> Tags: patch
>
> The attached patchset adds a new customisable phonetic Tamil input
> method.  I tried to reuse as much of the existing itrans input method
> code since it greatly simplifies the creation of an Indic input method
> (see `indian-make-hash').
>
> The first patch fixes a fallout from bug#50143 asking to add TAMIL OM ௐ
> to the itrans table, and this means that one can insert the TAMIL OM
> character using the tamil-itrans input methods as well.  I'd prefer it
> if this patch can be pushed quickly.

This should be better:

[0001-Fix-fallout-from-bug-50143.patch (text/x-diff, attachment)]

[Message part 3 (text/plain, inline)]

[ Ref. https://www.aczoom.com/itrans/online/; insert "sh" and compare
  the character that shows up in the Sanskrit panel and the Tamil panel
  (you have to change the language in another panel).  ]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Thu, 30 Jun 2022 15:55:01 GMT) Full text and rfc822 format available.

Message #11 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; Add new customisable phonetic Tamil input
 method
Date: Thu, 30 Jun 2022 21:23:48 +0530

[வியாழன் ஜூன் 30, 2022] Visuwesh wrote:

> 1. The itrans input method is absolutely horrible for Tamil since unlike
>    the other Indic languages, it doesn't have a lot of consonants
>    HOWEVER, the consonant sound _changes_ depending on where it ends up.
>    So ideally, the Tamil input method show allow multiple _ways_ to
>    insert a single character.  As an example, consider the following
>    words
>
>         தும்பிக்கை - thumbikai            (tusk)
                                              ^^^^^
                                              I meant trunk, ofc.
As is usual, I keep messing up translations.

Severity set to 'wishlist' from 'normal' Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Thu, 30 Jun 2022 20:53:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 13:00:02 GMT) Full text and rfc822 format available.

Message #16 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 18:29:00 +0530

[Message part 1 (text/plain, inline)]

[வியாழன் ஜூன் 30, 2022] Visuwesh wrote:

> The second patch actually adds the new phonetic input method.  I will
> leave the rationale for making it a _customisable_ input method in
> footnote [1].  To reuse the existing code that calculates the various
> tables for the tamil-itrans IM, I turned the code in defvars to defuns.
> However, the definition of the almighty
> quail-tamil-itrans-syllable-table is still huge since I needed to do a
> whole lot to convert the indian-tml-base-table to a format that will
> accepted by the new defun `quail-tamil-itrans-compute-syllable-table'.
> [blah blah blah...]

Here's a second revision of the second patch.

[0001-Add-new-customizable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]

[Message part 3 (text/plain, inline)]

I still haven't added a :set function yet since I'm not sure if there's
a way to avoid the chain of boundp checks.

In this revision, I simplified the code a tiny bit wrt calculating the
translation table since I no longer use the indian-make-hash function
but call whatever functions it call directly in
tamil--update-quail-rules: this greatly reduces the amount of massaging
that needs to be done.

Also, can someone guide me to write a sort function for
quail-tamil-itrans-compute-syllable-table please?  The ideal order of
consonants should be the same as the one in the default value of
tamil-consonant-translation, same for tamil-vowel-translation.  I tried
the following

    (sort (reverse (mapcar #'car tamil-consonant-translation))
          (lambda (x y) (let ((lx (length x))
                              (ly (length y)))
                           (if (= lx ly) (string-lessp x y) (< lx ly)))))


but that definitely doesn't do what I want.  The idea was to sort the
list so that the basic consonants (க் ங் ச் etc.) first then the composite
ones (க்‌ஷ் க்ஷ் etc.) but `string-lessp' does not even sort the basic
consonants in the right order (the right order being the order in the
default value of `tamil-consonant-translation').

Can I use the min-width property in buffer text?  I'm not sure if it was
finished since I remember some discussion surrounding that it wasn't
quite finished yet.  I would like to try to use it for syllable table
and friends.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 13:03:02 GMT) Full text and rfc822 format available.

Message #19 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 18:31:50 +0530

[Message part 1 (text/plain, inline)]

[வெள்ளி ஜூலை 01, 2022] Visuwesh wrote:

> [வியாழன் ஜூன் 30, 2022] Visuwesh wrote:
>
>> The second patch actually adds the new phonetic input method.  I will
>> leave the rationale for making it a _customisable_ input method in
>> footnote [1].  To reuse the existing code that calculates the various
>> tables for the tamil-itrans IM, I turned the code in defvars to defuns.
>> However, the definition of the almighty
>> quail-tamil-itrans-syllable-table is still huge since I needed to do a
>> whole lot to convert the indian-tml-base-table to a format that will
>> accepted by the new defun `quail-tamil-itrans-compute-syllable-table'.
>> [blah blah blah...]
>
> Here's a second revision of the second patch.
>
Here's a corrected patch with a really silly oversight fixed:

[0001-Add-new-customizable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]

[Message part 3 (text/plain, inline)]

Sorry for the noise.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 13:23:01 GMT) Full text and rfc822 format available.

Message #22 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50;
 [v2] Add new customisable phonetic Tamil input method
Date: Fri, 01 Jul 2022 16:22:36 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Date: Fri, 01 Jul 2022 18:29:00 +0530
> 
> Also, can someone guide me to write a sort function for
> quail-tamil-itrans-compute-syllable-table please?  The ideal order of
> consonants should be the same as the one in the default value of
> tamil-consonant-translation, same for tamil-vowel-translation.  I tried
> the following
> 
>     (sort (reverse (mapcar #'car tamil-consonant-translation))
>           (lambda (x y) (let ((lx (length x))
>                               (ly (length y)))
>                            (if (= lx ly) (string-lessp x y) (< lx ly)))))
> 
> 
> but that definitely doesn't do what I want.  The idea was to sort the
> list so that the basic consonants (க் ங் ச் etc.) first then the composite
> ones (க்‌ஷ் க்ஷ் etc.) but `string-lessp' does not even sort the basic
> consonants in the right order (the right order being the order in the
> default value of `tamil-consonant-translation').

Then you'll need to write your own comparison function and use it
instead string-lessp.

> Can I use the min-width property in buffer text?

Why do you need that?  Please tell more about what you want to
accomplish.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 13:48:02 GMT) Full text and rfc822 format available.

Message #25 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 19:17:18 +0530

[Message part 1 (text/plain, inline)]

[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Date: Fri, 01 Jul 2022 18:29:00 +0530
>> 
>> Also, can someone guide me to write a sort function for
>> quail-tamil-itrans-compute-syllable-table please?  The ideal order of
>> consonants should be the same as the one in the default value of
>> tamil-consonant-translation, same for tamil-vowel-translation.  I tried
>> the following
>> 
>>     (sort (reverse (mapcar #'car tamil-consonant-translation))
>>           (lambda (x y) (let ((lx (length x))
>>                               (ly (length y)))
>>                            (if (= lx ly) (string-lessp x y) (< lx ly)))))
>> 
>> 
>> but that definitely doesn't do what I want.  The idea was to sort the
>> list so that the basic consonants (க் ங் ச் etc.) first then the composite
>> ones (க்‌ஷ் க்ஷ் etc.) but `string-lessp' does not even sort the basic
>> consonants in the right order (the right order being the order in the
>> default value of `tamil-consonant-translation').
>
> Then you'll need to write your own comparison function and use it
> instead string-lessp.
>

I suppose so.  How does the following look?

    (sort
     '("க்" "ங்" "ச்" "ஞ்" "ட்" "ண்" "ற்ற்" "ந்" "ப்" "ய்"
       "ம்" "த்" "ர்" "ல்" "வ்" "ள்" "ற்" "ழ்" "ன்"
       "ஸ்" "ஜ்" "க்ஷ்" "ஷ்" "ஹ்" "க்‌ஷ்" "ஶ்")
     (lambda (x y)
       (let* ((cp '(("க்" . 0) ("ங்" . 1) ("ச்" . 2) ("ஞ்" . 3) ("ட்" . 4) ("ண்" . 5)
                    ("த்" . 6) ("ந்" . 7) ("ப்" . 8) ("ம்" . 9) ("ய்" . 10) ("ர்" . 11)
                    ("ல்" . 12) ("வ்" . 13) ("ழ்" . 14) ("ள்" . 15) ("ற்" . 16) ("ன்" . 17)
                    ("ஜ்" . 18) ("ஸ்" . 19) ("ஷ்" . 20) ("ஹ்" . 21) ("க்ஷ்" . 22)
                    ("க்‌ஷ்" . 23) ("ஶ்" . 24)))
              (xp (or (assoc-default x cp nil) 10000))
              (yp (or (assoc-default y cp nil) 10000)))
         (< xp yp))))

[ I won't have the unnecessary let in the final version.  ]

>> Can I use the min-width property in buffer text?
>
> Why do you need that?  Please tell more about what you want to
> accomplish.

Currently we don't try too hard to ensure that text don't bump into each
other in the tables we calculate.  If you are unlucky, then the table
will be incomprehensible so I thought about putting a reasonable
min-width value on the text in signs table at least.  Of course, finding
a reasonable value is a headache in of itself; the better solution would
be probably pulling in the vtable library but I'm not too sure about
that.

I also attached a screenshot comparing my running Emacs session and
emacs -Q (yellow window is my current Emacs session) to get the point
across better.

[screenshot_202207011914.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 14:07:01 GMT) Full text and rfc822 format available.

Message #28 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 17:06:36 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Fri, 01 Jul 2022 19:17:18 +0530
> 
> > Then you'll need to write your own comparison function and use it
> > instead string-lessp.
> >
> 
> I suppose so.  How does the following look?
> 
>     (sort
>      '("க்" "ங்" "ச்" "ஞ்" "ட்" "ண்" "ற்ற்" "ந்" "ப்" "ய்"
>        "ம்" "த்" "ர்" "ல்" "வ்" "ள்" "ற்" "ழ்" "ன்"
>        "ஸ்" "ஜ்" "க்ஷ்" "ஷ்" "ஹ்" "க்‌ஷ்" "ஶ்")
>      (lambda (x y)
>        (let* ((cp '(("க்" . 0) ("ங்" . 1) ("ச்" . 2) ("ஞ்" . 3) ("ட்" . 4) ("ண்" . 5)
>                     ("த்" . 6) ("ந்" . 7) ("ப்" . 8) ("ம்" . 9) ("ய்" . 10) ("ர்" . 11)
>                     ("ல்" . 12) ("வ்" . 13) ("ழ்" . 14) ("ள்" . 15) ("ற்" . 16) ("ன்" . 17)
>                     ("ஜ்" . 18) ("ஸ்" . 19) ("ஷ்" . 20) ("ஹ்" . 21) ("க்ஷ்" . 22)
>                     ("க்‌ஷ்" . 23) ("ஶ்" . 24)))
>               (xp (or (assoc-default x cp nil) 10000))
>               (yp (or (assoc-default y cp nil) 10000)))
>          (< xp yp))))

I don't think I understand what you want to achieve, and don't read
Tamil in the first place, to tell you whether this is correct or not,
sorry.

> >> Can I use the min-width property in buffer text?
> >
> > Why do you need that?  Please tell more about what you want to
> > accomplish.
> 
> Currently we don't try too hard to ensure that text don't bump into each
> other in the tables we calculate.  If you are unlucky, then the table
> will be incomprehensible so I thought about putting a reasonable
> min-width value on the text in signs table at least.  Of course, finding
> a reasonable value is a headache in of itself; the better solution would
> be probably pulling in the vtable library but I'm not too sure about
> that.

I think it would be better to be more accurate in alignment of table
cells.  We do have string-width and string-pixel-width, let alone
window-text-pixel-size.

> I also attached a screenshot comparing my running Emacs session and
> emacs -Q (yellow window is my current Emacs session) to get the point
> across better.

Looks like simple misalignment to me, which should be cured by using
pixel-resolution alignment features.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 14:31:02 GMT) Full text and rfc822 format available.

Message #31 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 20:00:03 +0530

[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Cc: 56323 <at> debbugs.gnu.org
>> Date: Fri, 01 Jul 2022 19:17:18 +0530
>> 
>> > Then you'll need to write your own comparison function and use it
>> > instead string-lessp.
>> >
>> 
>> I suppose so.  How does the following look?
>> 
>>     (sort
>>      '("க்" "ங்" "ச்" "ஞ்" "ட்" "ண்" "ற்ற்" "ந்" "ப்" "ய்"
>>        "ம்" "த்" "ர்" "ல்" "வ்" "ள்" "ற்" "ழ்" "ன்"
>>        "ஸ்" "ஜ்" "க்ஷ்" "ஷ்" "ஹ்" "க்‌ஷ்" "ஶ்")
>>      (lambda (x y)
>>        (let* ((cp '(("க்" . 0) ("ங்" . 1) ("ச்" . 2) ("ஞ்" . 3) ("ட்" . 4) ("ண்" . 5)
>>                     ("த்" . 6) ("ந்" . 7) ("ப்" . 8) ("ம்" . 9) ("ய்" . 10) ("ர்" . 11)
>>                     ("ல்" . 12) ("வ்" . 13) ("ழ்" . 14) ("ள்" . 15) ("ற்" . 16) ("ன்" . 17)
>>                     ("ஜ்" . 18) ("ஸ்" . 19) ("ஷ்" . 20) ("ஹ்" . 21) ("க்ஷ்" . 22)
>>                     ("க்‌ஷ்" . 23) ("ஶ்" . 24)))
>>               (xp (or (assoc-default x cp nil) 10000))
>>               (yp (or (assoc-default y cp nil) 10000)))
>>          (< xp yp))))
>
> I don't think I understand what you want to achieve, and don't read
> Tamil in the first place, to tell you whether this is correct or not,
> sorry.
>

I mostly meant to ask if the weighted approach was good but I wasn't
clear enough, sorry.  Let me try to explain it better:

Let's suppose that string-lessp does not work for English for the
discussion here.  The task is to sort a list of jumbled English
alphabets in alphabetical order.  What I'm currently doing is creating
an alist where the key is the alphabet and the value is the alphabet's
order (so a will be 1, b will be 2, etc.).  Then in the sort function, I
look for this order.  If the alphabet is not in this list, then I fall
back to a large number.

So the code above would look like this if it were in English,

    (sort '("b" "z" "c" "n" "a" "aa" "p")
          (lambda (x y)
            (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4)
                        ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9)
                        ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14)
                        ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19)
                        ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24)
                        ("z" . 25))))
              (< (or (assoc-default x cp) 10000)
                 (or (assoc-default y cp) 10000)))))

and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa")
which is exactly what I desire.  I hope this is clear enough.

Obviously, I don't have much programming experience, so I'm unsure if
there's a better way to sort.

>> >> Can I use the min-width property in buffer text?
>> >
>> > Why do you need that?  Please tell more about what you want to
>> > accomplish.
>> 
>> Currently we don't try too hard to ensure that text don't bump into each
>> other in the tables we calculate.  If you are unlucky, then the table
>> will be incomprehensible so I thought about putting a reasonable
>> min-width value on the text in signs table at least.  Of course, finding
>> a reasonable value is a headache in of itself; the better solution would
>> be probably pulling in the vtable library but I'm not too sure about
>> that.
>
> I think it would be better to be more accurate in alignment of table
> cells.  We do have string-width and string-pixel-width, let alone
> window-text-pixel-size.
>
>> I also attached a screenshot comparing my running Emacs session and
>> emacs -Q (yellow window is my current Emacs session) to get the point
>> across better.
>
> Looks like simple misalignment to me, which should be cured by using
> pixel-resolution alignment features.

Yep, it is misalignment.  I could try to use those pixel-resolution
alignment features but I really don't think I can do a good enough job.
It is something I tried in the past but gave up since it was too complex
for me.  The current code produces a Good Enough™ table and I think I
will just leave it unless Someone™ complains since after all, the
current situation is much better than what we have in Emacs 28 (the
docfix that happened as part of bug#50143 isn't in Emacs 28).

Maybe someday, I will be annoyed enough at the misalignment to come back
and fix it.  But until that day, I will just leave the code as is.

BTW, do you have any other code/documentation review?  And what about
the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html?
No rush but I would like to know if it can go in since it only addresses
fallouts from the previous bug in this area.  Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 16:10:02 GMT) Full text and rfc822 format available.

Message #34 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 19:09:36 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Fri, 01 Jul 2022 20:00:03 +0530
> 
> > I don't think I understand what you want to achieve, and don't read
> > Tamil in the first place, to tell you whether this is correct or not,
> > sorry.
> >
> 
> I mostly meant to ask if the weighted approach was good but I wasn't
> clear enough, sorry.  Let me try to explain it better:
> 
> Let's suppose that string-lessp does not work for English for the
> discussion here.  The task is to sort a list of jumbled English
> alphabets in alphabetical order.  What I'm currently doing is creating
> an alist where the key is the alphabet and the value is the alphabet's
> order (so a will be 1, b will be 2, etc.).  Then in the sort function, I
> look for this order.  If the alphabet is not in this list, then I fall
> back to a large number.
> 
> So the code above would look like this if it were in English,
> 
>     (sort '("b" "z" "c" "n" "a" "aa" "p")
>           (lambda (x y)
>             (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4)
>                         ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9)
>                         ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14)
>                         ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19)
>                         ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24)
>                         ("z" . 25))))
>               (< (or (assoc-default x cp) 10000)
>                  (or (assoc-default y cp) 10000)))))
> 
> and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa")
> which is exactly what I desire.  I hope this is clear enough.

The above just gives each letter its order in the alphabet.  But if
that is what you wanted, string-lessp (or even just direct comparison
of characters) would have worked for you.  So there's still something
important missing from your description, I think.

> > Looks like simple misalignment to me, which should be cured by using
> > pixel-resolution alignment features.
> 
> Yep, it is misalignment.  I could try to use those pixel-resolution
> alignment features but I really don't think I can do a good enough job.
> It is something I tried in the past but gave up since it was too complex
> for me.  The current code produces a Good Enough™ table and I think I
> will just leave it unless Someone™ complains since after all, the
> current situation is much better than what we have in Emacs 28 (the
> docfix that happened as part of bug#50143 isn't in Emacs 28).

I thought vtable.el was about solving such problems?

> BTW, do you have any other code/documentation review?  And what about
> the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html?
> No rush but I would like to know if it can go in since it only addresses
> fallouts from the previous bug in this area.  Thanks.

It sounded to me like you are still working on the code, so I didn't
see a need to review it.  If you have specific parts that you'd like
me to review nonetheless, please tell which parts are those.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 16:38:02 GMT) Full text and rfc822 format available.

Message #37 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 22:07:38 +0530

[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote:

>> I mostly meant to ask if the weighted approach was good but I wasn't
>> clear enough, sorry.  Let me try to explain it better:
>> 
>> Let's suppose that string-lessp does not work for English for the
>> discussion here.  The task is to sort a list of jumbled English
>> alphabets in alphabetical order.  What I'm currently doing is creating
>> an alist where the key is the alphabet and the value is the alphabet's
>> order (so a will be 1, b will be 2, etc.).  Then in the sort function, I
>> look for this order.  If the alphabet is not in this list, then I fall
>> back to a large number.
>> 
>> So the code above would look like this if it were in English,
>> 
>>     (sort '("b" "z" "c" "n" "a" "aa" "p")
>>           (lambda (x y)
>>             (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4)
>>                         ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9)
>>                         ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14)
>>                         ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19)
>>                         ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24)
>>                         ("z" . 25))))
>>               (< (or (assoc-default x cp) 10000)
>>                  (or (assoc-default y cp) 10000)))))
>> 
>> and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa")
>> which is exactly what I desire.  I hope this is clear enough.
>
> The above just gives each letter its order in the alphabet.  But if
> that is what you wanted, string-lessp (or even just direct comparison
> of characters) would have worked for you.  So there's still something
> important missing from your description, I think.
>

Unfortunately, string-lessp does not do the job.  (string-lessp "ஞ" "ஜ")
should return t but it returns nil probably because ஞ's codepoint is
2974 and ஜ's codepoint is 2972.  But ஜ is not even part of the "core"
Tamil characters and hence should come at last.  This is why I went with
defining an alist with the _actual_ order of the characters.  I hope
this is clear: to demonstrate this using English, it would be something
like...

    c's codepoint is 29 and d's codepoint is 27.  Clearly, c comes
    before d but since string-lessp seems to rely on the Unicode
    codepoint, when we do the sorting with string-lessp, we get 
    "... d c ..." in the list instead of the desired "... c d ...".

I hope this is clear.

>> Yep, it is misalignment.  I could try to use those pixel-resolution
>> alignment features but I really don't think I can do a good enough job.
>> It is something I tried in the past but gave up since it was too complex
>> for me.  The current code produces a Good Enough™ table and I think I
>> will just leave it unless Someone™ complains since after all, the
>> current situation is much better than what we have in Emacs 28 (the
>> docfix that happened as part of bug#50143 isn't in Emacs 28).
>
> I thought vtable.el was about solving such problems?

Okay then, I will use that.  I was mostly unsure if using vtable would
be alright especially since it puts keymap properties and the entire
vtable object as a text property -- it seemed too excessive for a
docstring.  Maybe some of this can be addressed?

>> BTW, do you have any other code/documentation review?  And what about
>> the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html?
>> No rush but I would like to know if it can go in since it only addresses
>> fallouts from the previous bug in this area.  Thanks.
>
> It sounded to me like you are still working on the code, so I didn't
> see a need to review it.  If you have specific parts that you'd like
> me to review nonetheless, please tell which parts are those.

Thanks.  The patch I posted in
https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html
is done, and can be pushed to master if you see no problems.  All it
does is address a few fallouts that were accidentally left out when
fixing bug#50143.  Specifically, it adds an entry for the TAMIL OM
character, and adds two more Sanskrit consonants to the Tamil itrans
table.

Also, I would like to know if there's a better to write the :set
function for the defcustoms tamil-vowel-translation,
tamil-consonant-translation, tamil-misc-translation, tamil-native-digits
without the boundp check chain below,

    (defun tamil--set-variable (sym val)
      (set-default sym val)
      (when (and (boundp 'tamil-vowel-translation)
                 (boundp 'tamil-consonant-translation)
                 (boundp 'tamil-misc-translation)
                 (boundp 'tamil-native-digits))
        (tamil--update-quail-rules)))

I'm also doubtful about the current group being used for these
defcustoms.  Should I go ahead and make a new 'tamil' group and make it
a subgroup of leim or i18n?  And is the prefix tamil- okay or should I
change it to something else?

Finally, I'm unsure if "List of input sequences to translate to ..." is
clear.  I think it sounds a mouthful and there should be a better way to
put it.  I think "translation rules" is quite nice but I'm afraid that
it is too Quail specific and might not be well understood.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Fri, 01 Jul 2022 18:17:02 GMT) Full text and rfc822 format available.

Message #40 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Fri, 01 Jul 2022 21:16:13 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Fri, 01 Jul 2022 22:07:38 +0530
> 
> >>     (sort '("b" "z" "c" "n" "a" "aa" "p")
> >>           (lambda (x y)
> >>             (let ((cp '(("a" . 0) ("b" . 1) ("c" . 2) ("d" . 3) ("e" . 4)
> >>                         ("f" . 5) ("g" . 6) ("h" . 7) ("i" . 8) ("j" . 9)
> >>                         ("k" . 10) ("l" . 11) ("m" . 12) ("n" . 13) ("o" . 14)
> >>                         ("p" . 15) ("q" . 16) ("r" . 17) ("s" . 18) ("t" . 19)
> >>                         ("u" . 20) ("v" . 21) ("w" . 22) ("x" . 23) ("y" . 24)
> >>                         ("z" . 25))))
> >>               (< (or (assoc-default x cp) 10000)
> >>                  (or (assoc-default y cp) 10000)))))
> >> 
> >> and the sorted list comes out as ("a" "b" "c" "n" "p" "z" "aa")
> >> which is exactly what I desire.  I hope this is clear enough.
> >
> > The above just gives each letter its order in the alphabet.  But if
> > that is what you wanted, string-lessp (or even just direct comparison
> > of characters) would have worked for you.  So there's still something
> > important missing from your description, I think.
> >
> 
> Unfortunately, string-lessp does not do the job.  (string-lessp "ஞ" "ஜ")
> should return t but it returns nil probably because ஞ's codepoint is
> 2974 and ஜ's codepoint is 2972.  But ஜ is not even part of the "core"
> Tamil characters and hence should come at last.  This is why I went with
> defining an alist with the _actual_ order of the characters.

Please tell what is the actual order of the characters.  That is,
where is that order defined, and by what criteria?

I'll look into the other issues later.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 04:03:02 GMT) Full text and rfc822 format available.

Message #43 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 09:32:34 +0530

[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote:

>> Unfortunately, string-lessp does not do the job.  (string-lessp "ஞ" "ஜ")
>> should return t but it returns nil probably because ஞ's codepoint is
>> 2974 and ஜ's codepoint is 2972.  But ஜ is not even part of the "core"
>> Tamil characters and hence should come at last.  This is why I went with
>> defining an alist with the _actual_ order of the characters.
>
> Please tell what is the actual order of the characters.  That is,
> where is that order defined, and by what criteria?

I'm not sure what you mean "where is that order defined," I don't think
there is a definition per se, it just happens to be so.

There are two "classes" of consonants: those that are part of Tamil
(let's call them "core") and those borrowed from Sanskrit.  When one
writes the consonants in order, the core consonants come first then the
Sanskrit ones.  You can find the order of the core consonants in
wikipedia here in the table titled "Tamil consonants":
https://en.wikipedia.org/wiki/Tamil_script#Letters

We need not worry too much about the order of Sanskrit consonants, we
just need to ensure that they come after the core consonants.  You can
find these Sanskrit consonants in the table titled "Grantha consonants
in Tamil" in the same link.

I hope this is clear.

As for the criteria, it is simply "Tamil consonants then the Sanskrit
consonants."

> I'll look into the other issues later.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 06:37:02 GMT) Full text and rfc822 format available.

Message #46 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 09:35:56 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 09:32:34 +0530
> 
> > Please tell what is the actual order of the characters.  That is,
> > where is that order defined, and by what criteria?
> 
> I'm not sure what you mean "where is that order defined," I don't think
> there is a definition per se, it just happens to be so.
> 
> There are two "classes" of consonants: those that are part of Tamil
> (let's call them "core") and those borrowed from Sanskrit.  When one
> writes the consonants in order, the core consonants come first then the
> Sanskrit ones.  You can find the order of the core consonants in
> wikipedia here in the table titled "Tamil consonants":
> https://en.wikipedia.org/wiki/Tamil_script#Letters
> 
> We need not worry too much about the order of Sanskrit consonants, we
> just need to ensure that they come after the core consonants.  You can
> find these Sanskrit consonants in the table titled "Grantha consonants
> in Tamil" in the same link.
> 
> I hope this is clear.
> 
> As for the criteria, it is simply "Tamil consonants then the Sanskrit
> consonants."

Then your comparison function should first see whether a character is
in the former or the latter group, and use string-lessp or character
codepoint comparison with each group, right?  But that's not what you
did, so I wonder whether my understanding is correct.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 06:55:01 GMT) Full text and rfc822 format available.

Message #49 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 12:24:39 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Cc: 56323 <at> debbugs.gnu.org
>> Date: Sat, 02 Jul 2022 09:32:34 +0530
>> 
>> > Please tell what is the actual order of the characters.  That is,
>> > where is that order defined, and by what criteria?
>> 
>> I'm not sure what you mean "where is that order defined," I don't think
>> there is a definition per se, it just happens to be so.
>> 
>> There are two "classes" of consonants: those that are part of Tamil
>> (let's call them "core") and those borrowed from Sanskrit.  When one
>> writes the consonants in order, the core consonants come first then the
>> Sanskrit ones.  You can find the order of the core consonants in
>> wikipedia here in the table titled "Tamil consonants":
>> https://en.wikipedia.org/wiki/Tamil_script#Letters
>> 
>> We need not worry too much about the order of Sanskrit consonants, we
>> just need to ensure that they come after the core consonants.  You can
>> find these Sanskrit consonants in the table titled "Grantha consonants
>> in Tamil" in the same link.
>> 
>> I hope this is clear.
>> 
>> As for the criteria, it is simply "Tamil consonants then the Sanskrit
>> consonants."
>
> Then your comparison function should first see whether a character is
> in the former or the latter group, and use string-lessp or character
> codepoint comparison with each group, right?  But that's not what you
> did, so I wonder whether my understanding is correct.

It didn't occur to me to do it this way so I tried it out but then I
noticed, string-lessp even within a group won't work.  When you evaluate
the following sexp, you don't get a list of increasing numbers...

    (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
                             "ந" "ப" "ம" "ய" "ர" "ல"
                             "வ" "ழ" "ள" "ற" "ன")))
      (mapcar (lambda (c) (string-to-char c)) core-consonants))

      ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992
             2994 2997 2996 2995 2993 2985)

and sure enough when you do (sort core-consonants #'string-lessp) the
list is jumbled up instead of retaining the order.
[ core-consonants, as declared, is in the right order but sort jumbles
  it up.  ]

But string-lessp works for vowels.  It is the consonants that is the
problem.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 06:59:01 GMT) Full text and rfc822 format available.

Message #52 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 09:58:17 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Fri, 01 Jul 2022 22:07:38 +0530
> 
> >> BTW, do you have any other code/documentation review?  And what about
> >> the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html?
> >> No rush but I would like to know if it can go in since it only addresses
> >> fallouts from the previous bug in this area.  Thanks.
> >
> > It sounded to me like you are still working on the code, so I didn't
> > see a need to review it.  If you have specific parts that you'd like
> > me to review nonetheless, please tell which parts are those.
> 
> Thanks.  The patch I posted in
> https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html
> is done, and can be pushed to master if you see no problems.

I installed it, thanks.

> Also, I would like to know if there's a better to write the :set
> function for the defcustoms tamil-vowel-translation,
> tamil-consonant-translation, tamil-misc-translation, tamil-native-digits
> without the boundp check chain below,
> 
>     (defun tamil--set-variable (sym val)
>       (set-default sym val)
>       (when (and (boundp 'tamil-vowel-translation)
>                  (boundp 'tamil-consonant-translation)
>                  (boundp 'tamil-misc-translation)
>                  (boundp 'tamil-native-digits))
>         (tamil--update-quail-rules)))

Why do you need a single function for all of them?  Would a separate
setter function for each defcustom do the job?

I also don't understand the need for the boundp tests -- the function
will live on the same indian.el file as the defcustoms, so if the
function is defined, the defcustoms are also bound, no?

> I'm also doubtful about the current group being used for these
> defcustoms.  Should I go ahead and make a new 'tamil' group and make it
> a subgroup of leim or i18n?

It's okay to have a separate group, but what would be the subject of
this group?  If it's just about input methods, the name had better
reflected that, and just "tamil" is too general for that.

> And is the prefix tamil- okay or should I change it to something
> else?

I see no problem with 'tamil-'.

> Finally, I'm unsure if "List of input sequences to translate to ..." is
> clear.  I think it sounds a mouthful and there should be a better way to
> put it.  I think "translation rules" is quite nice but I'm afraid that
> it is too Quail specific and might not be well understood.

I have no problem with that wording, but I wonder whether we should
have these defcustoms in the first place.  What are the chances that
some user will want to change the sequences, and why would they want
that?

P.S. Please in the future don't modify the Subject of the messages in
the same bug report: that makes it harder to find related messages at
least when using Rmail.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 07:19:01 GMT) Full text and rfc822 format available.

Message #55 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 10:17:56 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 12:24:39 +0530
> 
> [சனி ஜூலை 02, 2022] Eli Zaretskii wrote:
> 
> >> There are two "classes" of consonants: those that are part of Tamil
> >> (let's call them "core") and those borrowed from Sanskrit.  When one
> >> writes the consonants in order, the core consonants come first then the
> >> Sanskrit ones.  You can find the order of the core consonants in
> >> wikipedia here in the table titled "Tamil consonants":
> >> https://en.wikipedia.org/wiki/Tamil_script#Letters
> >> 
> >> We need not worry too much about the order of Sanskrit consonants, we
> >> just need to ensure that they come after the core consonants.  You can
> >> find these Sanskrit consonants in the table titled "Grantha consonants
> >> in Tamil" in the same link.
> >> 
> >> I hope this is clear.
> >> 
> >> As for the criteria, it is simply "Tamil consonants then the Sanskrit
> >> consonants."
> >
> > Then your comparison function should first see whether a character is
> > in the former or the latter group, and use string-lessp or character
> > codepoint comparison with each group, right?  But that's not what you
> > did, so I wonder whether my understanding is correct.
> 
> It didn't occur to me to do it this way so I tried it out but then I
> noticed, string-lessp even within a group won't work.  When you evaluate
> the following sexp, you don't get a list of increasing numbers...
> 
>     (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
>                              "ந" "ப" "ம" "ய" "ர" "ல"
>                              "வ" "ழ" "ள" "ற" "ன")))
>       (mapcar (lambda (c) (string-to-char c)) core-consonants))
> 
>       ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992
>              2994 2997 2996 2995 2993 2985)
> 
> and sure enough when you do (sort core-consonants #'string-lessp) the
> list is jumbled up instead of retaining the order.
> [ core-consonants, as declared, is in the right order but sort jumbles
>   it up.  ]
> 
> But string-lessp works for vowels.  It is the consonants that is the
> problem.

Sorry, I don't understand what you are saying here.  How is the above
code related to the issue at hand, which is how to sort characters in
the order you want them to be sorted?  (And please keep in mind that I
don't even know which of those characters are consonants and which are
vowels -- if you want me to say something intelligent about that.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 07:36:02 GMT) Full text and rfc822 format available.

Message #58 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: visuweshm <at> gmail.com
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50;
 [v2] Add new customisable phonetic Tamil input method
Date: Sat, 02 Jul 2022 10:35:18 +0300

> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 10:17:56 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> >     (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
> >                              "ந" "ப" "ம" "ய" "ர" "ல"
> >                              "வ" "ழ" "ள" "ற" "ன")))
> >       (mapcar (lambda (c) (string-to-char c)) core-consonants))
> > 
> >       ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992
> >              2994 2997 2996 2995 2993 2985)
> > 
> > and sure enough when you do (sort core-consonants #'string-lessp) the
> > list is jumbled up instead of retaining the order.
> > [ core-consonants, as declared, is in the right order but sort jumbles
> >   it up.  ]
> > 
> > But string-lessp works for vowels.  It is the consonants that is the
> > problem.
> 
> Sorry, I don't understand what you are saying here.  How is the above
> code related to the issue at hand, which is how to sort characters in
> the order you want them to be sorted?  (And please keep in mind that I
> don't even know which of those characters are consonants and which are
> vowels -- if you want me to say something intelligent about that.)

Or maybe my guess below will be lucky.  You probably want this:

  (defun sort-by-codepoint (c1 c2)
    (< (string-to-char c1) (string-to-char c2)))

  (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
			   "ந" "ப" "ம" "ய" "ர" "ல"
			   "வ" "ழ" "ள" "ற" "ன")))

 (sort core-consonants 'sort-by-codepoint))
  => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ")

(To understand why, read the doc string of 'sort' carefully, where it
explains what is expected from PREDICATE.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 07:47:02 GMT) Full text and rfc822 format available.

Message #61 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: visuweshm <at> gmail.com
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50;
 [v2] Add new customisable phonetic Tamil input method
Date: Sat, 02 Jul 2022 10:46:00 +0300

> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 10:35:18 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
>   (defun sort-by-codepoint (c1 c2)
>     (< (string-to-char c1) (string-to-char c2)))
> 
>   (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
> 			   "ந" "ப" "ம" "ய" "ர" "ல"
> 			   "வ" "ழ" "ள" "ற" "ன")))
> 
>  (sort core-consonants 'sort-by-codepoint))
>   => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ")
> 
> (To understand why, read the doc string of 'sort' carefully, where it
> explains what is expected from PREDICATE.)

Hmm... but if I use string-lessp instead of sort-by-codepoint, I get
the same result, as I'd expect.  Which probably means I'm still
missing something.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 07:59:01 GMT) Full text and rfc822 format available.

Message #64 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 13:28:29 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Cc: 56323 <at> debbugs.gnu.org
>> Date: Fri, 01 Jul 2022 22:07:38 +0530
>> 
>> >> BTW, do you have any other code/documentation review?  And what about
>> >> the patch I posted in https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html?
>> >> No rush but I would like to know if it can go in since it only addresses
>> >> fallouts from the previous bug in this area.  Thanks.
>> >
>> > It sounded to me like you are still working on the code, so I didn't
>> > see a need to review it.  If you have specific parts that you'd like
>> > me to review nonetheless, please tell which parts are those.
>> 
>> Thanks.  The patch I posted in
>> https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-06/msg02256.html
>> is done, and can be pushed to master if you see no problems.
>
> I installed it, thanks.
>

Thanks.

>> Also, I would like to know if there's a better to write the :set
>> function for the defcustoms tamil-vowel-translation,
>> tamil-consonant-translation, tamil-misc-translation, tamil-native-digits
>> without the boundp check chain below,
>> 
>>     (defun tamil--set-variable (sym val)
>>       (set-default sym val)
>>       (when (and (boundp 'tamil-vowel-translation)
>>                  (boundp 'tamil-consonant-translation)
>>                  (boundp 'tamil-misc-translation)
>>                  (boundp 'tamil-native-digits))
>>         (tamil--update-quail-rules)))
>
> Why do you need a single function for all of them?  Would a separate
> setter function for each defcustom do the job?
>

Because it is harder to clear the old translation rules and add the new
translation rules than clearing ALL translation rules and starting over
again.  When the user changes tamil-vowel-translation, then not only
does the translation rule for the vowels change, we also need to change
the translation rules for consonant+vowel pairs so that means we need to
check if the consonant var is bound.  (The translation rules for
consonant+vowel pairs are auto-generated based on the rules for vowels
and consonants.)

Similarly, when the consonant defcustom changes, we need to change both
the consonant and the consonant+vowel pair translation rules.  Moreover,
if the user decides to delete an extra consonant translation, then we
need to smartly detect that and delete it from the current quail map.

Instead of all this, a simple clear ALL+start over approach is much
simpler.  And since this approach doesn't take too much time, I don't
think implementing the smarter approach would be worth it.

Besides, even if this smart approach is easy to implement, quail-map
structure is just too hard to manipulate by hand...

> I also don't understand the need for the boundp tests -- the function
> will live on the same indian.el file as the defcustoms, so if the
> function is defined, the defcustoms are also bound, no?
>

IIUC, when we load indian.el, first, the vowel defcustom will be bound,
then the consonant defcustom and so on.  So this boundp test is needed,
I think?  See above for why the defcustoms have a "dependency" on each
other.  When the vowel defcustom is loaded, then its job _sometimes_
depends on the consonant defcustom being bound as well.

I say sometimes because when we initially load the vowel defcustom,
having a separate setter should be fine but when we change it after
loading _all_ the other defcustoms (example in the Customize interface),
we also need to access the consonant translation values and update the
translation rules for consonant+vowel pairs.  A big fat setter function
that does everything at the cost of boundp checks is simpler AFAIU.

>> I'm also doubtful about the current group being used for these
>> defcustoms.  Should I go ahead and make a new 'tamil' group and make it
>> a subgroup of leim or i18n?
>
> It's okay to have a separate group, but what would be the subject of
> this group?  If it's just about input methods, the name had better
> reflected that, and just "tamil" is too general for that.
>

I thought the subject could be "Translation rules for the Tamil input
method."  If you think the group name is too general, then "tamil-im"
could work?

>> And is the prefix tamil- okay or should I change it to something
>> else?
>
> I see no problem with 'tamil-'.
>

Okay, thanks.

>> Finally, I'm unsure if "List of input sequences to translate to ..." is
>> clear.  I think it sounds a mouthful and there should be a better way to
>> put it.  I think "translation rules" is quite nice but I'm afraid that
>> it is too Quail specific and might not be well understood.
>
> I have no problem with that wording, but I wonder whether we should
> have these defcustoms in the first place.  What are the chances that
> some user will want to change the sequences, and why would they want
> that?

I think the chances are quite high.  As I tried to explain in the first
mail, there are too many ambiguities when transliterating Tamil and
sometimes there is no perfect transliteration for a character/consonant
family.

For example, the user in the wordpress article I linked chooses to
translate ல் as 'l' ள் as 'll' and take the penalty of having to type
C-SPC at the right time: to write ல்ல the sequence would l C-SPC la since
lla would translate to ள.

That user can take this penalty but I would rather translate ள் as L
instead and not worry about C-SPC at all.  

Bottom line, there is no one size fits all.  These small annoyances can
be dealt with when one writes Tamil rarely but for frequent writing, the
flexibility this input method offers will be welcome IMO.

The users _can_ update the quail-map themselves by hand but that becomes
tricky and a REAL chore for a language like Tamil.

[ FWIW, I add new translations and modify existing translations for the
  compose input method by setf-ing its quail map.  That is hard enough
  already, and I definitely wouldn't wish someone to do it for the Tamil
  input method.  Offering a defcustom is the least we can do to ease the
  pain of tweaking the translation rules.  ]

> P.S. Please in the future don't modify the Subject of the messages in
> the same bug report: that makes it harder to find related messages at
> least when using Rmail.

Oops, sorry about that.  I thought it would be easier to track the
progress but I guess it misfired.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 08:12:02 GMT) Full text and rfc822 format available.

Message #67 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 13:41:17 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Cc: 56323 <at> debbugs.gnu.org
>> Date: Sat, 02 Jul 2022 12:24:39 +0530
>> 
>> [சனி ஜூலை 02, 2022] Eli Zaretskii wrote:
>> 
>> >> There are two "classes" of consonants: those that are part of Tamil
>> >> (let's call them "core") and those borrowed from Sanskrit.  When one
>> >> writes the consonants in order, the core consonants come first then the
>> >> Sanskrit ones.  You can find the order of the core consonants in
>> >> wikipedia here in the table titled "Tamil consonants":
>> >> https://en.wikipedia.org/wiki/Tamil_script#Letters
>> >> 
>> >> We need not worry too much about the order of Sanskrit consonants, we
>> >> just need to ensure that they come after the core consonants.  You can
>> >> find these Sanskrit consonants in the table titled "Grantha consonants
>> >> in Tamil" in the same link.
>> >> 
>> >> I hope this is clear.
>> >> 
>> >> As for the criteria, it is simply "Tamil consonants then the Sanskrit
>> >> consonants."
>> >
>> > Then your comparison function should first see whether a character is
>> > in the former or the latter group, and use string-lessp or character
>> > codepoint comparison with each group, right?  But that's not what you
>> > did, so I wonder whether my understanding is correct.
>> 
>> It didn't occur to me to do it this way so I tried it out but then I
>> noticed, string-lessp even within a group won't work.  When you evaluate
>> the following sexp, you don't get a list of increasing numbers...
>> 
>>     (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
>>                              "ந" "ப" "ம" "ய" "ர" "ல"
>>                              "வ" "ழ" "ள" "ற" "ன")))
>>       (mapcar (lambda (c) (string-to-char c)) core-consonants))
>> 
>>       ;; => (2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992
>>              2994 2997 2996 2995 2993 2985)
>> 
>> and sure enough when you do (sort core-consonants #'string-lessp) the
>> list is jumbled up instead of retaining the order.
>> [ core-consonants, as declared, is in the right order but sort jumbles
>>   it up.  ]
>> 
>> But string-lessp works for vowels.  It is the consonants that is the
>> problem.
>
> Sorry, I don't understand what you are saying here.  How is the above
> code related to the issue at hand, which is how to sort characters in
> the order you want them to be sorted?  (And please keep in mind that I
> don't even know which of those characters are consonants and which are
> vowels -- if you want me to say something intelligent about that.)

I'm trying to explain the behaviour of string-lessp which seems to sort
the characters by their Unicode codepoints.  But the order these
characters appear in Unicode and their actual order is not the same so
string-lessp does not do the job we want it to.

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>
> Or maybe my guess below will be lucky.  You probably want this:
>
>   (defun sort-by-codepoint (c1 c2)
>     (< (string-to-char c1) (string-to-char c2)))
>
>   (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
> 			   "ந" "ப" "ம" "ய" "ர" "ல"
> 			   "வ" "ழ" "ள" "ற" "ன")))
>
>  (sort core-consonants 'sort-by-codepoint))
>   => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ")
>
> (To understand why, read the doc string of 'sort' carefully, where it
> explains what is expected from PREDICATE.)

Unfortunately not, since it jumbles up the list.  The desired outcome is
the same list.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 08:31:01 GMT) Full text and rfc822 format available.

Message #70 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 11:29:55 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 13:41:17 +0530
> 
> >   (defun sort-by-codepoint (c1 c2)
> >     (< (string-to-char c1) (string-to-char c2)))
> >
> >   (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
> > 			   "ந" "ப" "ம" "ய" "ர" "ல"
> > 			   "வ" "ழ" "ள" "ற" "ன")))
> >
> >  (sort core-consonants 'sort-by-codepoint))
> >   => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ")
> >
> > (To understand why, read the doc string of 'sort' carefully, where it
> > explains what is expected from PREDICATE.)
> 
> Unfortunately not, since it jumbles up the list.  The desired outcome is
> the same list.

But we already established that you need to break the list in two, and
always sort any member of one of the two sub-lists before any member
of the other sub-list.  I then suggested to use string-lessp _within_
each sub-list, but you said it still yielded a wrong order for some
reason.

So when you now return to the issue of splitting the list in two, and
show how sorting the full list doesn't work, you make a step back: we
already established the list cannot be sorted as a single list.  The
only remaining issue, AFAIU, is why string-lessp is not good enough
for sorting within each sub-list.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 08:41:01 GMT) Full text and rfc822 format available.

Message #73 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 11:39:47 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 13:28:29 +0530
> 
> >> Also, I would like to know if there's a better to write the :set
> >> function for the defcustoms tamil-vowel-translation,
> >> tamil-consonant-translation, tamil-misc-translation, tamil-native-digits
> >> without the boundp check chain below,
> >> 
> >>     (defun tamil--set-variable (sym val)
> >>       (set-default sym val)
> >>       (when (and (boundp 'tamil-vowel-translation)
> >>                  (boundp 'tamil-consonant-translation)
> >>                  (boundp 'tamil-misc-translation)
> >>                  (boundp 'tamil-native-digits))
> >>         (tamil--update-quail-rules)))
> >
> > Why do you need a single function for all of them?  Would a separate
> > setter function for each defcustom do the job?
> >
> 
> Because it is harder to clear the old translation rules and add the new
> translation rules than clearing ALL translation rules and starting over
> again.  When the user changes tamil-vowel-translation, then not only
> does the translation rule for the vowels change, we also need to change
> the translation rules for consonant+vowel pairs so that means we need to
> check if the consonant var is bound.  (The translation rules for
> consonant+vowel pairs are auto-generated based on the rules for vowels
> and consonants.)

If the rules are generated based on both defcustom's, then shouldn't
we have just one defcustom for both?  IOW, what is the purpose of
having two separate defcustom's here?

> > I also don't understand the need for the boundp tests -- the function
> > will live on the same indian.el file as the defcustoms, so if the
> > function is defined, the defcustoms are also bound, no?
> >
> 
> IIUC, when we load indian.el, first, the vowel defcustom will be bound,
> then the consonant defcustom and so on.  So this boundp test is needed,
> I think?

Wouldn't that be fixed by having the setter function defined before
the defcustom's?

> See above for why the defcustoms have a "dependency" on each
> other.  When the vowel defcustom is loaded, then its job _sometimes_
> depends on the consonant defcustom being bound as well.

Since the defcustom's have their default value, I don't think I see
the problem.  Did you actually see any problems, and if so, in which
scenario, and what were the error messages?

> I thought the subject could be "Translation rules for the Tamil input
> method."  If you think the group name is too general, then "tamil-im"
> could work?

tamil-input, perhaps?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 08:41:02 GMT) Full text and rfc822 format available.

Message #76 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 14:10:07 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Cc: 56323 <at> debbugs.gnu.org
>> Date: Sat, 02 Jul 2022 13:41:17 +0530
>> 
>> >   (defun sort-by-codepoint (c1 c2)
>> >     (< (string-to-char c1) (string-to-char c2)))
>> >
>> >   (let ((core-consonants '("க" "ங" "ச" "ஞ" "ட" "ண" "த"
>> > 			   "ந" "ப" "ம" "ய" "ர" "ல"
>> > 			   "வ" "ழ" "ள" "ற" "ன")))
>> >
>> >  (sort core-consonants 'sort-by-codepoint))
>> >   => ("க" "ங" "ச" "ஞ" "ட" "ண" "த" "ந" "ன" "ப" "ம" "ய" "ர" "ற" "ல" "ள" "ழ" "வ")
>> >
>> > (To understand why, read the doc string of 'sort' carefully, where it
>> > explains what is expected from PREDICATE.)
>> 
>> Unfortunately not, since it jumbles up the list.  The desired outcome is
>> the same list.
>
> But we already established that you need to break the list in two, and
> always sort any member of one of the two sub-lists before any member
> of the other sub-list.  I then suggested to use string-lessp _within_
> each sub-list, but you said it still yielded a wrong order for some
> reason.
>

Yes, I hope I made my point clear below.

> So when you now return to the issue of splitting the list in two, and
> show how sorting the full list doesn't work, you make a step back: we
> already established the list cannot be sorted as a single list.

I think I might not have made my point clear: the sort function above
sorts one of the sub-lists.

> The only remaining issue, AFAIU, is why string-lessp is not good
> enough for sorting within each sub-list.

It is not good enough for each sub-list for the same reason: the order
produced by string-lessp is not the same as the actual order.

I will try to explain the situation using the regular English alphabets
and the extra letter þ (which was used in place of "th" AFAIU).

The core English alphabets are a-z then we have some extra alphabets
like the þ above.  When we have a list containing _both_ a-z and þ, the
order produced by string-lessp is wrong.  To work around this issue, we
decided to break the list into two.  I think we were on the same page
till here.

When I did as you suggested and broke the list into two -- a-z and þ --
and sorted the sub-list that only contained a-z with string-lessp, the
sorted sub-list was not in the right alphabetical order i.e., instead of
"a b c d ..." it was "a c b d ..."

I hope the above makes the situation clear.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 08:55:02 GMT) Full text and rfc822 format available.

Message #79 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 11:54:01 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 14:10:07 +0530
> 
> > The only remaining issue, AFAIU, is why string-lessp is not good
> > enough for sorting within each sub-list.
> 
> It is not good enough for each sub-list for the same reason: the order
> produced by string-lessp is not the same as the actual order.

So, then please explain what should be the "correct" order within each
sub-list.  Is the correct order within each sub-list in the ascending
order of the codepoint?  If not, what is the correct order?

> I will try to explain the situation using the regular English alphabets
> and the extra letter þ (which was used in place of "th" AFAIU).
> 
> The core English alphabets are a-z then we have some extra alphabets
> like the þ above.  When we have a list containing _both_ a-z and þ, the
> order produced by string-lessp is wrong.
> 
> When I did as you suggested and broke the list into two -- a-z and þ --
> and sorted the sub-list that only contained a-z with string-lessp, the
> sorted sub-list was not in the right alphabetical order i.e., instead of
> "a b c d ..." it was "a c b d ..."

That's not what I see:

  (let ((letters '("a" "b" "r" "x" "z")))
    (sort letters 'string-lessp))
   => ("a" "b" "r" "x" "z")

Please show an example where characters a-z are sorted by string-lessp
in the wrong order.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 09:29:01 GMT) Full text and rfc822 format available.

Message #82 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 14:58:32 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> > Why do you need a single function for all of them?  Would a separate
>> > setter function for each defcustom do the job?
>> >
>> 
>> Because it is harder to clear the old translation rules and add the new
>> translation rules than clearing ALL translation rules and starting over
>> again.  When the user changes tamil-vowel-translation, then not only
>> does the translation rule for the vowels change, we also need to change
>> the translation rules for consonant+vowel pairs so that means we need to
>> check if the consonant var is bound.  (The translation rules for
>> consonant+vowel pairs are auto-generated based on the rules for vowels
>> and consonants.)
>
> If the rules are generated based on both defcustom's, then shouldn't
> we have just one defcustom for both?  IOW, what is the purpose of
> having two separate defcustom's here?
>

It simply seemed natural to me to separate consonants and vowels.  I
combined the three defcustoms (vowels, consonants and misc) as you told
but the native digits defcustom is still a problem... hmm.  I can just
leave it to the user to add the native digit translations to the
defcustom if they want.

>> > I also don't understand the need for the boundp tests -- the function
>> > will live on the same indian.el file as the defcustoms, so if the
>> > function is defined, the defcustoms are also bound, no?
>> >
>> 
>> IIUC, when we load indian.el, first, the vowel defcustom will be bound,
>> then the consonant defcustom and so on.  So this boundp test is needed,
>> I think?
>
> Wouldn't that be fixed by having the setter function defined before
> the defcustom's?
>
>> See above for why the defcustoms have a "dependency" on each
>> other.  When the vowel defcustom is loaded, then its job _sometimes_
>> depends on the consonant defcustom being bound as well.
>
> Since the defcustom's have their default value, I don't think I see
> the problem.  Did you actually see any problems, and if so, in which
> scenario, and what were the error messages?
>

I was mostly worried about the tamil-native-digits defcustom but that
can be easily avoided.    

>> I thought the subject could be "Translation rules for the Tamil input
>> method."  If you think the group name is too general, then "tamil-im"
>> could work?
>
> tamil-input, perhaps?

Okay, then.  That looks better to me as well.

I will post an updated patch later when I clean up the comments, and
docstrings.  Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 09:34:02 GMT) Full text and rfc822 format available.

Message #85 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 15:03:42 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Cc: 56323 <at> debbugs.gnu.org
>> Date: Sat, 02 Jul 2022 14:10:07 +0530
>> 
>> > The only remaining issue, AFAIU, is why string-lessp is not good
>> > enough for sorting within each sub-list.
>> 
>> It is not good enough for each sub-list for the same reason: the order
>> produced by string-lessp is not the same as the actual order.
>
> So, then please explain what should be the "correct" order within each
> sub-list.  Is the correct order within each sub-list in the ascending
> order of the codepoint?  If not, what is the correct order?
>

The correct order is not the ascending order of the codepoint, the
correct order is

க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன

and their respective codepoints are

2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985

>> I will try to explain the situation using the regular English alphabets
>> and the extra letter þ (which was used in place of "th" AFAIU).
>> 
>> The core English alphabets are a-z then we have some extra alphabets
>> like the þ above.  When we have a list containing _both_ a-z and þ, the
>> order produced by string-lessp is wrong.
>> 
>> When I did as you suggested and broke the list into two -- a-z and þ --
>> and sorted the sub-list that only contained a-z with string-lessp, the
>> sorted sub-list was not in the right alphabetical order i.e., instead of
>> "a b c d ..." it was "a c b d ..."
>
> That's not what I see:
>
>   (let ((letters '("a" "b" "r" "x" "z")))
>     (sort letters 'string-lessp))
>    => ("a" "b" "r" "x" "z")
>
> Please show an example where characters a-z are sorted by string-lessp
> in the wrong order.

I didn't mean literally that string-lessp produced the wrong list for
a-z, I tried to draw an analogy with a hypothetical scenario where a-z
sorting did not work with string-lessp.  This hypothetical scenario is
the actual in case of the Tamil consonants.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 09:40:01 GMT) Full text and rfc822 format available.

Message #88 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 12:38:55 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 15:03:42 +0530
> 
> > So, then please explain what should be the "correct" order within each
> > sub-list.  Is the correct order within each sub-list in the ascending
> > order of the codepoint?  If not, what is the correct order?
> >
> 
> The correct order is not the ascending order of the codepoint, the
> correct order is
> 
> க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன
> 
> and their respective codepoints are
> 
> 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985

Why is this the correct order?  Does it have any definition based on
some principles, not just on the above list?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 10:32:02 GMT) Full text and rfc822 format available.

Message #91 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 16:01:14 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> > So, then please explain what should be the "correct" order within each
>> > sub-list.  Is the correct order within each sub-list in the ascending
>> > order of the codepoint?  If not, what is the correct order?
>> >
>> 
>> The correct order is not the ascending order of the codepoint, the
>> correct order is
>> 
>> க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன
>> 
>> and their respective codepoints are
>> 
>> 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985
>
> Why is this the correct order?  Does it have any definition based on
> some principles, not just on the above list?

I'm not sure if there is a principle behind it.  Is there a principle
behind why a comes first after b?  Same thing, I suppose.  But it does
raise my brow when I see them out of order which is why I'm bothering to
sort them.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 10:48:01 GMT) Full text and rfc822 format available.

Message #94 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 13:46:55 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 16:01:14 +0530
> 
> [சனி ஜூலை 02, 2022] Eli Zaretskii wrote:
> 
> >> > So, then please explain what should be the "correct" order within each
> >> > sub-list.  Is the correct order within each sub-list in the ascending
> >> > order of the codepoint?  If not, what is the correct order?
> >> >
> >> 
> >> The correct order is not the ascending order of the codepoint, the
> >> correct order is
> >> 
> >> க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன
> >> 
> >> and their respective codepoints are
> >> 
> >> 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997 2996 2995 2993 2985
> >
> > Why is this the correct order?  Does it have any definition based on
> > some principles, not just on the above list?
> 
> I'm not sure if there is a principle behind it.  Is there a principle
> behind why a comes first after b?

Yes: the codepoint order.  There's no question about ordering when
it's according to the codepoints.  If you want some other order, then
you need to define the rules for the order you want.

Is the order in which you want to sort the characters for Tamil
accepted somewhere, or is it your own preference?  If the former,
where can one read about that order?

There was also another part to your original question about sorting,
AFAIR: you wanted to sort syllables, not just single characters.
Assuming the sorting order of the single characters is established in
some way, what is left to determine how to order syllables?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 11:07:02 GMT) Full text and rfc822 format available.

Message #97 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: समीर सिंह Sameer Singh
 <lumarzeli30 <at> gmail.com>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 2 Jul 2022 16:35:51 +0530

[Message part 1 (text/plain, inline)]

There is indeed a principle behind the ordering of letters in Indian
languages taken from Sanskrit, and AFAICT Tamil also follows it.

க் ங்
ச் ஞ்
ட் ண்
த் ந்
ப் ம்

If we look at it rowwise, the first row is the velar consonants, then the
palatal then retroflex then dental then labial. If you notice here, we are
gradually moving from the back of the mouth to the front!

If we look at it columnwise the first column consists of unvoiced/voiced
consonants and the second column consists of nasals.

Then come the semivowels
ய் ர் ல் வ் ழ் ள்

After that
ற் ன்


शनि, 2 जुल॰ 2022, 4:02 pm को Visuwesh <visuweshm <at> gmail.com> ने लिखा:

> [சனி ஜூலை 02, 2022] Eli Zaretskii wrote:
>
> >> > So, then please explain what should be the "correct" order within each
> >> > sub-list.  Is the correct order within each sub-list in the ascending
> >> > order of the codepoint?  If not, what is the correct order?
> >> >
> >>
> >> The correct order is not the ascending order of the codepoint, the
> >> correct order is
> >>
> >> க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன
> >>
> >> and their respective codepoints are
> >>
> >> 2965 2969 2970 2974 2975 2979 2980 2984 2986 2990 2991 2992 2994 2997
> 2996 2995 2993 2985
> >
> > Why is this the correct order?  Does it have any definition based on
> > some principles, not just on the above list?
>
> I'm not sure if there is a principle behind it.  Is there a principle
> behind why a comes first after b?  Same thing, I suppose.  But it does
> raise my brow when I see them out of order which is why I'm bothering to
> sort them.
>
>
>
>

[Message part 2 (text/html, inline)]

[Screenshot_20220702-163431_Twitter.png (image/png, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 12:20:02 GMT) Full text and rfc822 format available.

Message #100 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: समीर सिंह Sameer Singh
 <lumarzeli30 <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 17:34:05 +0530

[சனி ஜூலை 02, 2022] समीर सिंह Sameer Singh wrote:

> There is indeed a principle behind the ordering of letters in Indian
> languages taken from Sanskrit, and AFAICT Tamil also follows it.
>
> க் ங்
> ச் ஞ்
> ட் ண்
> த் ந்
> ப் ம்
>
> If we look at it rowwise, the first row is the velar consonants, then the
> palatal then retroflex then dental then labial. If you notice here, we are
> gradually moving from the back of the mouth to the front!
>

Aha!  I never noticed this, thanks for this interesting info.  It was
just an order for me just like A B C D ... etc.

> If we look at it columnwise the first column consists of unvoiced/voiced
> consonants and the second column consists of nasals.
>
> Then come the semivowels
> ய் ர் ல் வ் ழ் ள்
>
> After that
> ற் ன்
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 12:20:02 GMT) Full text and rfc822 format available.

Message #103 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 17:45:45 +0530

[Message part 1 (text/plain, inline)]

[வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote:

>> > Looks like simple misalignment to me, which should be cured by using
>> > pixel-resolution alignment features.
>> 
>> Yep, it is misalignment.  I could try to use those pixel-resolution
>> alignment features but I really don't think I can do a good enough job.
>> It is something I tried in the past but gave up since it was too complex
>> for me.  The current code produces a Good Enough™ table and I think I
>> will just leave it unless Someone™ complains since after all, the
>> current situation is much better than what we have in Emacs 28 (the
>> docfix that happened as part of bug#50143 isn't in Emacs 28).
>
> I thought vtable.el was about solving such problems?

I tried to use vtable.el to produce the syllable table.  There are two
problems:

    . all the calculation done by vtable is slow (perhaps to no one's
      surprise).
    . the buffer becomes noticeably slow to scroll after the table is
      inserted.

I've attached an elisp file of my current progress.

[table.el (application/emacs-lisp, attachment)]

[Message part 3 (text/plain, inline)]

When I commented out the make-vtable call and benchmarked it, it was
fast so it is not the creation of table data structure that is the
bottleneck.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 12:20:03 GMT) Full text and rfc822 format available.

Message #106 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 17:38:11 +0530

[சனி ஜூலை 02, 2022] Eli Zaretskii wrote:

>> I'm not sure if there is a principle behind it.  Is there a principle
>> behind why a comes first after b?
>
> Yes: the codepoint order.  

I meant the order in the English language, not the codepoints.

> There's no question about ordering when it's according to the
> codepoints.  If you want some other order, then you need to define the
> rules for the order you want.
>
> Is the order in which you want to sort the characters for Tamil
> accepted somewhere, or is it your own preference? If the former, where
> can one read about that order?
>

It is the order followed by everyone.  See the table titled "Tamil
consonants" in this wikipedia article
https://en.wikipedia.org/wiki/Tamil_script#Letters.  If you want details
about the order, it will probably be not translated in English.  I also
skimmed through the Tamil wikipedia and found nothing there.

> There was also another part to your original question about sorting,
> AFAIR: you wanted to sort syllables, not just single characters.
> Assuming the sorting order of the single characters is established in
> some way, what is left to determine how to order syllables?

The order of the syllables fall in place once we sort the consonants and
the vowels.  Vowels can be sorted by using string-lessp so once we sort
the consonants, it is a simple matter of concatenation to produce the
table.  (See my other email also.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sat, 02 Jul 2022 12:25:02 GMT) Full text and rfc822 format available.

Message #109 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: समीर सिंह Sameer Singh
 <lumarzeli30 <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org, visuweshm <at> gmail.com
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sat, 02 Jul 2022 15:23:56 +0300

> From: समीर सिंह Sameer Singh <lumarzeli30 <at> gmail.com>
> Date: Sat, 2 Jul 2022 16:35:51 +0530
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 56323 <at> debbugs.gnu.org
> 
> There is indeed a principle behind the ordering of letters in Indian languages taken from Sanskrit, and AFAICT
> Tamil also follows it.
> 
> க் ங்
> ச் ஞ்
> ட் ண்
> த் ந்
> ப் ம்
> 
> If we look at it rowwise, the first row is the velar consonants, then the palatal then retroflex then dental then
> labial. If you notice here, we are gradually moving from the back of the mouth to the front!
> 
> If we look at it columnwise the first column consists of unvoiced/voiced consonants and the second column
> consists of nasals.
> 
> Then come the semivowels
> ய் ர் ல் வ் ழ் ள்
> 
> After that 
> ற் ன்

Thanks.  If there's no existing property of characters that we could
use to produce this order, I guess we will need an alist of characters
and their ordinal numbers, and use that.  Or, if the codepoints of
these characters are contiguous, we could have just the ordinal
numbers in the order of the codepoints, and use that in the function
passed as the PREDICATE argument to 'sort'.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sun, 03 Jul 2022 03:59:02 GMT) Full text and rfc822 format available.

Message #112 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sun, 03 Jul 2022 09:27:55 +0530

[சனி ஜூலை 02, 2022] Visuwesh wrote:

> [வெள்ளி ஜூலை 01, 2022] Eli Zaretskii wrote:
>
>>> > Looks like simple misalignment to me, which should be cured by using
>>> > pixel-resolution alignment features.
>>> 
>>> Yep, it is misalignment.  I could try to use those pixel-resolution
>>> alignment features but I really don't think I can do a good enough job.
>>> It is something I tried in the past but gave up since it was too complex
>>> for me.  The current code produces a Good Enough™ table and I think I
>>> will just leave it unless Someone™ complains since after all, the
>>> current situation is much better than what we have in Emacs 28 (the
>>> docfix that happened as part of bug#50143 isn't in Emacs 28).
>>
>> I thought vtable.el was about solving such problems?
>
> I tried to use vtable.el to produce the syllable table.  There are two
> problems:
>
>     . all the calculation done by vtable is slow (perhaps to no one's
>       surprise).
>     . the buffer becomes noticeably slow to scroll after the table is
>       inserted.

Stripping the text-properties keymap, vtable, vtable-column and
vtable-object from the buffer text improved the performance of scrolling
substantially but it is still kind of sluggish.

> I've attached an elisp file of my current progress.
>
> When I commented out the make-vtable call and benchmarked it, it was
> fast so it is not the table data structure that is the bottleneck.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sun, 10 Jul 2022 03:57:02 GMT) Full text and rfc822 format available.

Message #115 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sun, 10 Jul 2022 09:26:39 +0530

[Message part 1 (text/plain, inline)]

[சனி ஜூலை 02, 2022] Visuwesh wrote:

> I will post an updated patch later when I clean up the comments, and
> docstrings.  Thanks.

Here's an updated patch.

[0001-Add-new-customisable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]

[Message part 3 (text/plain, inline)]

I don't use vtable since it is too slow.  :(

[ Also, I don't see the customization group until I load
  lisp/leim/quail/indian.el?  But AFAICT, that's not the case for other
  custom groups.  ]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sun, 10 Jul 2022 05:35:02 GMT) Full text and rfc822 format available.

Message #118 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sun, 10 Jul 2022 08:34:12 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sun, 10 Jul 2022 09:26:39 +0530
> 
> > I will post an updated patch later when I clean up the comments, and
> > docstrings.  Thanks.
> 
> Here's an updated patch.

Thanks.

> +---
> +*** New default phonetic input method for the Tamil language environment.
> +The default input method for the Tamil language environment is now
> +"tamil" which is a customizable phonetic input method.  To change the
> +input method's translation rules, customize the user option
> +'tamil-translation-rules'.
> +
> 
>  * Changes in Specialized Modes and Packages in Emacs 29.1
> 
> diff --git a/lisp/language/indian.el b/lisp/language/indian.el
> index 2887d410ad..91ad818533 100644
> --- a/lisp/language/indian.el
> +++ b/lisp/language/indian.el
> @@ -109,7 +109,7 @@ 'devanagari
>   "Tamil" '((charset unicode)
>  	   (coding-system utf-8)
>  	   (coding-priority utf-8)
> -	   (input-method . "tamil-itrans")
> +	   (input-method . "tamil")
>             (sample-text . "Tamil (தமிழ்)	வணக்கம்")
>  	   (documentation . "\

Please name the new input method "tamil-phonetic", not just "tamil",
so that users who type "C-u C-\ tamil TAB" could have some means of
making the decision which one to choose.

> +;; This is needed since the Unicode codepoint order does not reflect
> +;; the actual order in the Tamil language.
> +(defvar quail-tamil-itrans--consonant-order
> +  '(("க" . 0) ("ங" . 1) ("ச" . 2) ("ஞ" . 3) ("ட" . 4) ("ண" . 5)
> +    ("த" . 6) ("ந" . 7) ("ப" . 8) ("ம" . 9) ("ய" . 10) ("ர" . 11)
> +    ("ல" . 12) ("வ" . 13) ("ழ" . 14) ("ள" . 15) ("ற" . 16) ("ன" . 17)
> +    ("ஜ" . 18) ("ஸ" . 19) ("ஷ" . 20) ("ஹ" . 21) ("க்ஷ" . 22)
> +    ("க்‌ஷ" . 23) ("ஶ" . 24)))

Since the characters are ordered in the correct order, I wonder why we
need the explicit ordinal numbers here: they are determined by the
index of the character in the list.

> +(defun quail-tamil-itrans-compute-syllable-table (vowels consonants)
> +  "Return the syllable table for the input method as a string.
> +VOWELS is a list of (VOWEL SIGN TRANS) where VOWEL is a string or
> +character representing the Tamil vowel character, SIGN is the
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What does it mean "character representing ... character"?  Can you
clarify this confusing part of the doc string?

> +vowel sign corresponding to VOWEL or nil for none,

Likewise here: "vowel corresponding to VOWEL"?

>                                                    and TRANS is
> +the input sequence to insert VOWEL.

The input sequence is generally a sequence of ASCII characters, is
that right?  If so, I think telling that would make the documentation
more clear.  Also, TRANS is a peculiar name for something described as
"input sequence", so maybe rename it to INPUT-SEQ?

> +CONSONANTS is a list of (CONSONANT TRANS...) where CONSONANT is
> +the Tamil consonant character, and TRANS is one or more strings
> +that describe how to insert CONSONANT."

Same here regarding TRANS and its description.

> +  (setq vowels (sort vowels (lambda (x y) (string-lessp (car x) (car y))))
> +        consonants (sort consonants
> +                         (lambda (x y)
> +                           (< (or (assoc-default (car x) quail-tamil-itrans--consonant-order) 10000)
> +                              (or (assoc-default (car y) quail-tamil-itrans--consonant-order) 10000)))))

Can you wrap these long lines, so that they would be easier to read?

> +  (let ((digits "௦௧௨௩௪௫௬௭௮௯")
>  	(width 6) clm)
>      (with-temp-buffer
> -      (insert "\n" (make-string 18 ?-) "+")
> -      (when digitp (insert (make-string 60 ?-)))
> +      (insert "\n" (make-string 18 ?-))
> +      (when digitp
> +        (insert "+" (make-string 60 ?-)))
>        (insert "\n")
>        (insert
>         (propertize "\t" 'display '(space :align-to 5)) "various"
> -       (propertize "\t" 'display '(space :align-to 18)) "|")
> +       (propertize "\t" 'display '(space :align-to 18)))
>        (when digitp
>          (insert
> -         (propertize "\t" 'display '(space :align-to 45)) "digits"))
> -      (insert "\n" (make-string 18 ?-) "+")
> +          "|" (propertize "\t" 'display '(space :align-to 45)) "digits"))
> +      (insert "\n" (make-string 18 ?-))

Did you test those :align-to specs when display-line-numbers is in
use?

> +;;;
> +;;; Tamil phonetic input method
> +;;;
> +
> +;; Define the input method straightaway.
> +(quail-define-package "tamil" "Tamil" "ழ" t
> + "Customisable Tamil phonetic input method.

See above regarding the name of the input method.

> +    ;; Consonants.
> +    ("க்" "k" "g") ("ங்" "ng") ("ச்" "ch" "s") ("ஞ்" "nj") ("ட்" "t" "d")
> +    ("ண்" "N") ("த்" "th" "dh") ("ந்" "nh") ("ப்" "p" "b") ("ம்" "m")
> +    ("ய்" "y") ("ர்" "r") ("ல்" "l") ("வ்" "v") ("ழ்" "z" "zh")
> +    ("ள்" "L") ("ற்" "rh") ("ன்" "n")
> +    ;; Sanskrit.
> +    ("ஜ்" "j") ("ஸ்" "S") ("ஷ்" "sh") ("ஹ்" "h")
> +    ("க்‌ஷ்" "ksh") ("க்ஷ்" "ksH") ("ஶ்" "Z")
> +
> +    ;; Misc.  ஃ is neither a consonant nor a vowel.
> +    ("ஃ" "F" "q")
> +    ("ௐ" "OM"))
> +  "List of input sequences to translate to Tamil characters.
> +Each element should be (CHARACTER . TRANSLATIONS) where CHARACTER

The (CHARACTER . TRANSLATIONS) form seems to imply the elements are
cons cells, but the value itself uses lists.  Suggest to say instead

  Each element should be (CHARACTER TRANSLATIONS...)

> +is the Tamil character, and TRANSLATIONS is a list of input
> +sequences to translate to that character.
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"sequences which produce that character" is better.  And I suggest to
use INPUT-SEQUENCES here, not TRANSLATIONS, for the reason explained
above.

> +CHARACTER is considered as a consonant (மெய் எழுத்து) if it ends
> +with a pulli.

What is a "pulli"?  It is not a character name AFAICT.

> +CHARACTER is that is neither a vowel nor a consonant are
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Typo and/or redundant words here.

> +considered as \"miscellaneous\" characters and are inserted as
> +is.

Not sure what this wants to say: the fact that characters are inserted
in some way seems to be unrelated to the description of the value.
What is this about?

> +The input sequence for consonant+vowel pairs (உயிர்மெய் எழுத்துக்கள்)
> +is the input sequence for the consonant followed by the
> +corresponding vowel."

Isn't that obvious?  If not, the non-obvious part(s) should be
mentioned explicitly.

> +  :group 'tamil-input
> +  :type '(alist :key-type string :value-type (repeat string))
> +  :set #'tamil--setter
> +  :options

This defcustom lacks the :version tag.

> [ Also, I don't see the customization group until I load
>   lisp/leim/quail/indian.el?  But AFAICT, that's not the case for other
>   custom groups.  ]

There are no defcustoms in leim/quail/ files.  How about moving the
defcustom to lisp/language/indian.el?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sun, 10 Jul 2022 06:44:01 GMT) Full text and rfc822 format available.

Message #121 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sun, 10 Jul 2022 12:12:47 +0530

[Message part 1 (text/plain, inline)]

[ஞாயிறு ஜூலை 10, 2022] Eli Zaretskii wrote:

> Please name the new input method "tamil-phonetic", not just "tamil",
> so that users who type "C-u C-\ tamil TAB" could have some means of
> making the decision which one to choose.

Done.

>> +;; This is needed since the Unicode codepoint order does not reflect
>> +;; the actual order in the Tamil language.
>> +(defvar quail-tamil-itrans--consonant-order
>> +  '(("க" . 0) ("ங" . 1) ("ச" . 2) ("ஞ" . 3) ("ட" . 4) ("ண" . 5)
>> +    ("த" . 6) ("ந" . 7) ("ப" . 8) ("ம" . 9) ("ய" . 10) ("ர" . 11)
>> +    ("ல" . 12) ("வ" . 13) ("ழ" . 14) ("ள" . 15) ("ற" . 16) ("ன" . 17)
>> +    ("ஜ" . 18) ("ஸ" . 19) ("ஷ" . 20) ("ஹ" . 21) ("க்ஷ" . 22)
>> +    ("க்‌ஷ" . 23) ("ஶ" . 24)))
>
> Since the characters are ordered in the correct order, I wonder why we
> need the explicit ordinal numbers here: they are determined by the
> index of the character in the list.

Ah yes, we could use seq-position, I forgot about that.  Now done.

>> +(defun quail-tamil-itrans-compute-syllable-table (vowels consonants)
>> +  "Return the syllable table for the input method as a string.
>> +VOWELS is a list of (VOWEL SIGN TRANS) where VOWEL is a string or
>> +character representing the Tamil vowel character, SIGN is the
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> What does it mean "character representing ... character"?  Can you
> clarify this confusing part of the doc string?

I mean to say that VOWEL can be the datatypes string or character.  But
now, I cut that part out since I say no such thing for CONSONANT as
well.

>> +vowel sign corresponding to VOWEL or nil for none,
>
> Likewise here: "vowel corresponding to VOWEL"?

It should be vowel sign corresponding to VOWEL.  I'm not sure how to
phrase it better, I borrowed the term "vowel sign" from the Unicode name
(e.g., name of ு a.k.a. #x0bc1).

>>                                                    and TRANS is
>> +the input sequence to insert VOWEL.
>
> The input sequence is generally a sequence of ASCII characters, is
> that right?  If so, I think telling that would make the documentation
> more clear.  Also, TRANS is a peculiar name for something described as
> "input sequence", so maybe rename it to INPUT-SEQ?
>
>> +CONSONANTS is a list of (CONSONANT TRANS...) where CONSONANT is
>> +the Tamil consonant character, and TRANS is one or more strings
>> +that describe how to insert CONSONANT."
>
> Same here regarding TRANS and its description.

Now done.

>> +  (setq vowels (sort vowels (lambda (x y) (string-lessp (car x) (car y))))
>> +        consonants (sort consonants
>> +                         (lambda (x y)
>> +                           (< (or (assoc-default (car x) quail-tamil-itrans--consonant-order) 10000)
>> +                              (or (assoc-default (car y) quail-tamil-itrans--consonant-order) 10000)))))
>
> Can you wrap these long lines, so that they would be easier to read?

I hope it is better now.

>> +  (let ((digits "௦௧௨௩௪௫௬௭௮௯")
>>  	(width 6) clm)
>>      (with-temp-buffer
>> -      (insert "\n" (make-string 18 ?-) "+")
>> -      (when digitp (insert (make-string 60 ?-)))
>> +      (insert "\n" (make-string 18 ?-))
>> +      (when digitp
>> +        (insert "+" (make-string 60 ?-)))
>>        (insert "\n")
>>        (insert
>>         (propertize "\t" 'display '(space :align-to 5)) "various"
>> -       (propertize "\t" 'display '(space :align-to 18)) "|")
>> +       (propertize "\t" 'display '(space :align-to 18)))
>>        (when digitp
>>          (insert
>> -         (propertize "\t" 'display '(space :align-to 45)) "digits"))
>> -      (insert "\n" (make-string 18 ?-) "+")
>> +          "|" (propertize "\t" 'display '(space :align-to 45)) "digits"))
>> +      (insert "\n" (make-string 18 ?-))
>
> Did you test those :align-to specs when display-line-numbers is in
> use?

Seems to work fine from a short test on my side.

>> +;;;
>> +;;; Tamil phonetic input method
>> +;;;
>> +
>> +;; Define the input method straightaway.
>> +(quail-define-package "tamil" "Tamil" "ழ" t
>> + "Customisable Tamil phonetic input method.
>
> See above regarding the name of the input method.

Done.

>> +    ;; Consonants.
>> +    ("க்" "k" "g") ("ங்" "ng") ("ச்" "ch" "s") ("ஞ்" "nj") ("ட்" "t" "d")
>> +    ("ண்" "N") ("த்" "th" "dh") ("ந்" "nh") ("ப்" "p" "b") ("ம்" "m")
>> +    ("ய்" "y") ("ர்" "r") ("ல்" "l") ("வ்" "v") ("ழ்" "z" "zh")
>> +    ("ள்" "L") ("ற்" "rh") ("ன்" "n")
>> +    ;; Sanskrit.
>> +    ("ஜ்" "j") ("ஸ்" "S") ("ஷ்" "sh") ("ஹ்" "h")
>> +    ("க்‌ஷ்" "ksh") ("க்ஷ்" "ksH") ("ஶ்" "Z")
>> +
>> +    ;; Misc.  ஃ is neither a consonant nor a vowel.
>> +    ("ஃ" "F" "q")
>> +    ("ௐ" "OM"))
>> +  "List of input sequences to translate to Tamil characters.
>> +Each element should be (CHARACTER . TRANSLATIONS) where CHARACTER
>
> The (CHARACTER . TRANSLATIONS) form seems to imply the elements are
> cons cells, but the value itself uses lists.  Suggest to say instead
>
>   Each element should be (CHARACTER TRANSLATIONS...)
>

Done.

>> +is the Tamil character, and TRANSLATIONS is a list of input
>> +sequences to translate to that character.
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> "sequences which produce that character" is better.  And I suggest to
> use INPUT-SEQUENCES here, not TRANSLATIONS, for the reason explained
> above.
>

Done.

>> +CHARACTER is considered as a consonant (மெய் எழுத்து) if it ends
>> +with a pulli.
>
> What is a "pulli"?  It is not a character name AFAICT.
>

It is the Tamil name for virama.  I use pulli over virama since I don't
think any Tamil reader would know it.  But I put virama in brackets now
for future maintainers.

>> +CHARACTER is that is neither a vowel nor a consonant are
>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Typo and/or redundant words here.
>

Fixed, thanks.

>> +considered as \"miscellaneous\" characters and are inserted as
>> +is.
>
> Not sure what this wants to say: the fact that characters are inserted
> in some way seems to be unrelated to the description of the value.
> What is this about?

I tried to allude to the miscellaneous section in the docstring but I
don't think it is really necessary.  Now removed.

>> +The input sequence for consonant+vowel pairs (உயிர்மெய் எழுத்துக்கள்)
>> +is the input sequence for the consonant followed by the
>> +corresponding vowel."
>
> Isn't that obvious?  If not, the non-obvious part(s) should be
> mentioned explicitly.

Thinking twice, yes, it should be obvious.  I removed this part.

>> +  :group 'tamil-input
>> +  :type '(alist :key-type string :value-type (repeat string))
>> +  :set #'tamil--setter
>> +  :options
>
> This defcustom lacks the :version tag.
>

Oops, now fixed.

Updated patch attached.

[0001-Add-new-customisable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]

[Message part 3 (text/plain, inline)]

>> [ Also, I don't see the customization group until I load
>>   lisp/leim/quail/indian.el?  But AFAICT, that's not the case for other
>>   custom groups.  ]
>
> There are no defcustoms in leim/quail/ files.  How about moving the
> defcustom to lisp/language/indian.el?

Hmm, moving it to lisp/language/indian.el brings in warnings about
undefined vars and functions, and an error when dumping.

    In toplevel form:
    language/indian.el:147:31: Warning: reference to free variable ‘tamil--vowel-signs’
    language/indian.el:151:32: Warning: reference to free variable ‘indian-tml-base-table’
    language/indian.el:154:41: Warning: reference to free variable ‘indian-tml-base-digits-table’

    In end of data:
    language/indian.el:143:10: Warning: the function ‘tamil--setter’ is not known to be defined.
    rm -f emacs && cp -f temacs emacs
    LC_ALL=C ./temacs -batch  -l loadup --temacs=pdump \
        --bin-dest /usr/local/bin/ --eln-dest /usr/local/lib/emacs/29.0.50/
    Loading loadup.el (source)...
    Dump mode: pdump
    Using load-path (/home/viz/lib/ports/emacs/lisp)
    Loading emacs-lisp/debug-early...
    Loading emacs-lisp/byte-run...
    Loading emacs-lisp/backquote...
    Loading subr...
    Loading keymap...
    Loading version...
    Loading widget...
    Loading custom...
    Loading emacs-lisp/map-ynp...
    Loading international/mule...
    Loading international/mule-conf...
    Loading env...
    Loading format...
    Loading bindings...
    Loading window...
    Loading files...
    Loading emacs-lisp/macroexp...
    Loading cus-face...
    Loading faces...
    Loading loaddefs.el (source)...
    Loading button...
    Loading emacs-lisp/cl-preloaded...
    Loading emacs-lisp/oclosure...
    Loading obarray...
    Loading abbrev...
    Loading help...
    Loading jka-cmpr-hook...
    Loading epa-hook...
    Loading international/mule-cmds...
    Loading case-table...
    Loading international/charprop.el (source)...
    Loading international/characters...
    Loading international/charscript...
    Loading international/emoji-zwj...
    Loading composite...
    Loading language/chinese...
    Loading language/cyrillic...
    Loading language/indian...

    Error: void-variable (tamil--vowel-signs)
    (require cl-print) while preparing to dump
    make[1]: *** [Makefile:639: emacs.pdmp] Error 255
    make[1]: Leaving directory '/home/viz/lib/ports/emacs/src'
    make: *** [Makefile:469: src] Error 2

Should I stick in defvar's and declare-function's?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Sun, 10 Jul 2022 07:33:01 GMT) Full text and rfc822 format available.

Message #124 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56323 <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Sun, 10 Jul 2022 13:02:11 +0530

[Message part 1 (text/plain, inline)]

[ஞாயிறு ஜூலை 10, 2022] Visuwesh wrote:

> [ஞாயிறு ஜூலை 10, 2022] Eli Zaretskii wrote:
>
> Updated patch attached.
>

I managed to miss a comment, sorry about that.  Now fixed in attached
patch.

[0001-Add-new-customisable-phonetic-Tamil-input-method.patch (text/x-diff, attachment)]

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 14 Jul 2022 06:35:01 GMT) Full text and rfc822 format available.

Notification sent to Visuwesh <visuweshm <at> gmail.com>:
bug acknowledged by developer. (Thu, 14 Jul 2022 06:35:01 GMT) Full text and rfc822 format available.

Message #129 received at 56323-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Visuwesh <visuweshm <at> gmail.com>
Cc: 56323-done <at> debbugs.gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Thu, 14 Jul 2022 09:34:14 +0300

> From: Visuwesh <visuweshm <at> gmail.com>
> Cc: 56323 <at> debbugs.gnu.org
> Date: Sun, 10 Jul 2022 13:02:11 +0530
> 
> > Updated patch attached.
> >
> 
> I managed to miss a comment, sorry about that.  Now fixed in attached
> patch.

Thanks, installed.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#56323; Package emacs. (Thu, 14 Jul 2022 07:13:02 GMT) Full text and rfc822 format available.

Message #132 received at 56323 <at> debbugs.gnu.org (full text, mbox):

From: Visuwesh <visuweshm <at> gmail.com>
To: 56323 <at> debbugs.gnu.org
Cc: eliz <at> gnu.org
Subject: Re: bug#56323: 29.0.50; [v2] Add new customisable phonetic Tamil
 input method
Date: Thu, 14 Jul 2022 12:41:58 +0530

[வியாழன் ஜூலை 14, 2022] Eli Zaretskii wrote:

>> From: Visuwesh <visuweshm <at> gmail.com>
>> Cc: 56323 <at> debbugs.gnu.org
>> Date: Sun, 10 Jul 2022 13:02:11 +0530
>> 
>> > Updated patch attached.
>> >
>> 
>> I managed to miss a comment, sorry about that.  Now fixed in attached
>> patch.
>
> Thanks, installed.

Thanks!

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 11 Aug 2022 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 364 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #56323 29.0.50; Add new customisable phonetic Tamil input method

GNU bug report logs - #56323
29.0.50; Add new customisable phonetic Tamil input method