GNU bug report logs - #34215
27.0.50; Provide elisp access to Chinese pinyin-to-character mapping

Previous Next

Package: emacs;

Reported by: Eric Abrahamsen <eric <at> ericabrahamsen.net>

Date: Sun, 27 Jan 2019 05:44:01 UTC

Severity: wishlist

Found in version 27.0.50

Done: Eric Abrahamsen <eric <at> ericabrahamsen.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 34215 in the body.
You can then email your comments to 34215 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 27 Jan 2019 05:44:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Abrahamsen <eric <at> ericabrahamsen.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 27 Jan 2019 05:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; Provide elisp access to Chinese pinyin-to-character mapping
Date: Sat, 26 Jan 2019 21:34:39 -0800
[Message part 1 (text/plain, inline)]
This bug report is apropos to this[1] emacs.devel thread.

The basic idea is that in the Emacs sources there's a file containing a
mapping between pinyin -- the most common Chinese romanization system --
and Chinese characters themselves. The mapping lives in
leim/MISC-DIC/pinyin.map, and is converted into a quail input method by
the `py-converter' function in titdic-cnv.el, which is part of the
"make" process.

I want this mapping to be available to elisp code in general, because
it's useful for all kinds of other language utilities (searching Chinese
characters using ascii letters, etc).

pinyin.map is a plain text file, each line consisting of a romanized
syllable, a TAB, and a string of the possible corresponding Chinese
characters. `titdic-convert' parses this and feeds it to
`quail-define-rules'.

My first thought was to add an intermediate step, where `titdic-convert'
first composes an alist, then feeds that alist to `quail-define-rules',
which would also allow us access to the alist.

The more I looked at it, the more hacky and awkward that approached
seemed, and it's not like it would save any memory: you still end up
with the data both in a quail method, and in a separate alist.

So this proposed patch simply parses the same file in the same way, but
in a different location. I've put it in china-util.el, but chinese.el
would also be a reasonable spot. Both those files are concerned with
encoding, but at least "china-util" gives the impression that it could
be a grab-bag.

I'm not sure this use of `source-directory' is particularly robust, but
I don't know how else to handle it.

Hope this will be considered!

Eric

[1]: https://lists.gnu.org/archive/html/emacs-devel/2019-01/msg00306.html
[0001-New-constant-chinese-pinyin-character-map.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 27 Jan 2019 15:43:01 GMT) Full text and rfc822 format available.

Message #8 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sun, 27 Jan 2019 17:41:50 +0200
> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
> Date: Sat, 26 Jan 2019 21:34:39 -0800
> 
> So this proposed patch simply parses the same file in the same way, but
> in a different location. I've put it in china-util.el, but chinese.el
> would also be a reasonable spot. Both those files are concerned with
> encoding, but at least "china-util" gives the impression that it could
> be a grab-bag.

How much does this add to Emacs memory footprint when loaded?  Since
this will be required only rarely, I doubt that it would be a good
idea to force every user of Chinese language to pay the price, if it
is significant.  It would be better to have this as a separate file
with autoloaded variable or function, IMO.

> I'm not sure this use of `source-directory' is particularly robust, but
> I don't know how else to handle it.

source-directory might not exist in a given installation.

Maybe we should have the data copied into that separate file I
mentioned above.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 27 Jan 2019 18:04:01 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sun, 27 Jan 2019 10:02:48 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Sat, 26 Jan 2019 21:34:39 -0800
>> 
>> So this proposed patch simply parses the same file in the same way, but
>> in a different location. I've put it in china-util.el, but chinese.el
>> would also be a reasonable spot. Both those files are concerned with
>> encoding, but at least "china-util" gives the impression that it could
>> be a grab-bag.
>
> How much does this add to Emacs memory footprint when loaded?  

I actually don't know how to measure the memory taken up by the contents
of a variable, but I imagine it's fairly significant. Or maybe I could
do a "before and after" measurement of all of Emacs.

> Since this will be required only rarely, I doubt that it would be a
> good idea to force every user of Chinese language to pay the price, if
> it is significant. It would be better to have this as a separate file
> with autoloaded variable or function, IMO.

That sounds fine to me. I agree the data shouldn't be loaded unless it's
been explicitly requested.

>> I'm not sure this use of `source-directory' is particularly robust, but
>> I don't know how else to handle it.
>
> source-directory might not exist in a given installation.
>
> Maybe we should have the data copied into that separate file I
> mentioned above.

I can imagine a few ways of doing that:

1. Just manually copy the data into a new file and add it to the repo
   (pinyin.map hasn't been updated in years).
2. Do the copy at build time. I'm not quite sure where that function
   would live, or how it would get called.
3. Use an `eval-and-compile' form as in the patch I provided. Is working
   back from `load-file-name' more reliable than using
   `source-directory'?

Autoloading a variable seems to copy the value of the variable into the
loaddefs file, so there's no point to that. I figure we can just ask
people who want this value to require the library.

Thanks,
Eric

PS: pinyin.map is ancient and is missing a lot of good correspondences.
Google's pinyin input method uses a much larger map, licensed with
Apache v2.0. This[1] seems to indicate that Apache 2.0 is okay for Gnu
projects, maybe we could consider switching to that map?

Footnotes:
[1]  https://www.gnu.org/licenses/license-list.en.html#apache2






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 27 Jan 2019 18:15:01 GMT) Full text and rfc822 format available.

Message #14 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sun, 27 Jan 2019 20:14:23 +0200
> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
> Date: Sun, 27 Jan 2019 10:02:48 -0800
> 
> >> I'm not sure this use of `source-directory' is particularly robust, but
> >> I don't know how else to handle it.
> >
> > source-directory might not exist in a given installation.
> >
> > Maybe we should have the data copied into that separate file I
> > mentioned above.
> 
> I can imagine a few ways of doing that:
> 
> 1. Just manually copy the data into a new file and add it to the repo
>    (pinyin.map hasn't been updated in years).
> 2. Do the copy at build time. I'm not quite sure where that function
>    would live, or how it would get called.
> 3. Use an `eval-and-compile' form as in the patch I provided. Is working
>    back from `load-file-name' more reliable than using
>    `source-directory'?

2 is what I had in mind.  I don't think it matters where the code
lives, it's small enough to not matter.  It would be called like the
various *-convert functions we invoke at build time to build the
dictionaries needed for CJK input methods, see the files in the leim/
directory.

> Autoloading a variable seems to copy the value of the variable into the
> loaddefs file, so there's no point to that. I figure we can just ask
> people who want this value to require the library.

Right.

> PS: pinyin.map is ancient and is missing a lot of good correspondences.
> Google's pinyin input method uses a much larger map, licensed with
> Apache v2.0. This[1] seems to indicate that Apache 2.0 is okay for Gnu
> projects, maybe we could consider switching to that map?

Maybe.  Unfortunately, I don't know enough about these input methods
to tell whether replacing the file is a good idea.  I wonder who can
we ask about this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 27 Jan 2019 19:19:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sun, 27 Jan 2019 11:18:29 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Sun, 27 Jan 2019 10:02:48 -0800
>> 
>> >> I'm not sure this use of `source-directory' is particularly robust, but
>> >> I don't know how else to handle it.
>> >
>> > source-directory might not exist in a given installation.
>> >
>> > Maybe we should have the data copied into that separate file I
>> > mentioned above.
>> 
>> I can imagine a few ways of doing that:
>> 
>> 1. Just manually copy the data into a new file and add it to the repo
>>    (pinyin.map hasn't been updated in years).
>> 2. Do the copy at build time. I'm not quite sure where that function
>>    would live, or how it would get called.
>> 3. Use an `eval-and-compile' form as in the patch I provided. Is working
>>    back from `load-file-name' more reliable than using
>>    `source-directory'?
>
> 2 is what I had in mind.  I don't think it matters where the code
> lives, it's small enough to not matter.  It would be called like the
> various *-convert functions we invoke at build time to build the
> dictionaries needed for CJK input methods, see the files in the leim/
> directory.

Okay, I'll put that together and add it to one of the Makefiles. I
suppose it could go in leim/Makefile.in, though it technically isn't
part of leim, and I was expecting the resulting file to go to
lisp/language/. But it would be convenient to put the generation
function in titdic-cnv.el.

>> Autoloading a variable seems to copy the value of the variable into the
>> loaddefs file, so there's no point to that. I figure we can just ask
>> people who want this value to require the library.
>
> Right.
>
>> PS: pinyin.map is ancient and is missing a lot of good correspondences.
>> Google's pinyin input method uses a much larger map, licensed with
>> Apache v2.0. This[1] seems to indicate that Apache 2.0 is okay for Gnu
>> projects, maybe we could consider switching to that map?
>
> Maybe.  Unfortunately, I don't know enough about these input methods
> to tell whether replacing the file is a good idea.  I wonder who can
> we ask about this.

It's more or less a drop-in replacement -- the format of the data would
be the same, only a bit more of it. I'm not sure who is "in charge" of
these files, though.

Eric





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 27 Jan 2019 19:49:02 GMT) Full text and rfc822 format available.

Message #20 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sun, 27 Jan 2019 21:48:10 +0200
> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
> Date: Sun, 27 Jan 2019 11:18:29 -0800
> 
> >> PS: pinyin.map is ancient and is missing a lot of good correspondences.
> >> Google's pinyin input method uses a much larger map, licensed with
> >> Apache v2.0. This[1] seems to indicate that Apache 2.0 is okay for Gnu
> >> projects, maybe we could consider switching to that map?
> >
> > Maybe.  Unfortunately, I don't know enough about these input methods
> > to tell whether replacing the file is a good idea.  I wonder who can
> > we ask about this.
> 
> It's more or less a drop-in replacement -- the format of the data would
> be the same, only a bit more of it.

I understand, but I wonder if someone could try that for a while and
see if it makes better input method(s), before we decide to import it.

> I'm not sure who is "in charge" of these files, though.

No one, I'm afraid.  Not these days.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Tue, 29 Jan 2019 17:50:01 GMT) Full text and rfc822 format available.

Message #23 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Tue, 29 Jan 2019 09:48:30 -0800
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Sun, 27 Jan 2019 11:18:29 -0800

I've attached a diff adding the conversion function itself, but I'm not
familiar with makefiles and so far haven't been able to figure out how
to call it. It looks like the invocation I want will look like:

$(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert \
  ${srcdir}/MISC-DIC/pinyin.map ${srcdir}/../lisp/language/pinyin.el

Where ${srcdir} is the leim directory, but I don't actually know how to
get this code called by make...

Additionally, I could factor the common code in py-converter and
pinyin-convert out into a separate defsubst.

>> >> PS: pinyin.map is ancient and is missing a lot of good correspondences.
>> >> Google's pinyin input method uses a much larger map, licensed with
>> >> Apache v2.0. This[1] seems to indicate that Apache 2.0 is okay for Gnu
>> >> projects, maybe we could consider switching to that map?
>> >
>> > Maybe.  Unfortunately, I don't know enough about these input methods
>> > to tell whether replacing the file is a good idea.  I wonder who can
>> > we ask about this.
>> 
>> It's more or less a drop-in replacement -- the format of the data would
>> be the same, only a bit more of it.
>
> I understand, but I wonder if someone could try that for a while and
> see if it makes better input method(s), before we decide to import it.

FWIW, that mapping is used by the pyim package, which I believe is the
most popular pinyin-based Chinese input method out there. I also use it
via the system-wide input framework fcitx, and it works very well.

>> I'm not sure who is "in charge" of these files, though.
>
> No one, I'm afraid.  Not these days.

That's too bad.

Eric

[pinyinconvert.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Wed, 30 Jan 2019 17:11:02 GMT) Full text and rfc822 format available.

Message #26 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Wed, 30 Jan 2019 19:09:47 +0200
> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
> Date: Tue, 29 Jan 2019 09:48:30 -0800
> 
> I've attached a diff adding the conversion function itself, but I'm not
> familiar with makefiles and so far haven't been able to figure out how
> to call it. It looks like the invocation I want will look like:
> 
> $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert \
>   ${srcdir}/MISC-DIC/pinyin.map ${srcdir}/../lisp/language/pinyin.el
> 
> Where ${srcdir} is the leim directory, but I don't actually know how to
> get this code called by make...

Add a target that is the file produced by this command, then make the
above command the recipe of that target.  Similar to the
${leimdir}/ja-dic/ja-dic.el target.

But if the above doesn't help, someone else could do this part for
you.

> > I understand, but I wonder if someone could try that for a while and
> > see if it makes better input method(s), before we decide to import it.
> 
> FWIW, that mapping is used by the pyim package, which I believe is the
> most popular pinyin-based Chinese input method out there. I also use it
> via the system-wide input framework fcitx, and it works very well.

Then I guess we will be fine importing the new version.

> +(defun pinyin-convert ()
> +  "Convert text file pinyin.map into an elisp library.
> +The library is named pinyin.el, and contains the constant
> +`pinyin-character-map'."

This writes out a .el file, but does it encode that file in UTF-8,
even if the locale's codeset is something other than UTF-8?  If not,
you need to bind coding-system-for-write to UTF-8.

> +      (insert ";; This file is automatically generated from pinyin.map,\
> + by the function pinyin-convert.")

This line is too long, suggest to break it in two.

> +      (insert ")\n\"An alist holding correspondences between pinyin syllables\
> + and Chinese characters.\")\n")

Likewise here.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Wed, 30 Jan 2019 20:35:02 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Wed, 30 Jan 2019 12:33:56 -0800
[Message part 1 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Tue, 29 Jan 2019 09:48:30 -0800
>> 
>> I've attached a diff adding the conversion function itself, but I'm not
>> familiar with makefiles and so far haven't been able to figure out how
>> to call it. It looks like the invocation I want will look like:
>> 
>> $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert \
>>   ${srcdir}/MISC-DIC/pinyin.map ${srcdir}/../lisp/language/pinyin.el
>> 
>> Where ${srcdir} is the leim directory, but I don't actually know how to
>> get this code called by make...
>
> Add a target that is the file produced by this command, then make the
> above command the recipe of that target.  Similar to the
> ${leimdir}/ja-dic/ja-dic.el target.
>
> But if the above doesn't help, someone else could do this part for
> you.

I've attached this as a commit patch -- it seems to work fine but I
would appreciate it if you'd check it.

>> > I understand, but I wonder if someone could try that for a while and
>> > see if it makes better input method(s), before we decide to import it.
>> 
>> FWIW, that mapping is used by the pyim package, which I believe is the
>> most popular pinyin-based Chinese input method out there. I also use it
>> via the system-wide input framework fcitx, and it works very well.
>
> Then I guess we will be fine importing the new version.

Cool -- I'll file another report for this in a bit.

>> +(defun pinyin-convert ()
>> +  "Convert text file pinyin.map into an elisp library.
>> +The library is named pinyin.el, and contains the constant
>> +`pinyin-character-map'."
>
> This writes out a .el file, but does it encode that file in UTF-8,
> even if the locale's codeset is something other than UTF-8?  If not,
> you need to bind coding-system-for-write to UTF-8.
>
>> +      (insert ";; This file is automatically generated from pinyin.map,\
>> + by the function pinyin-convert.")
>
> This line is too long, suggest to break it in two.
>
>> +      (insert ")\n\"An alist holding correspondences between pinyin syllables\
>> + and Chinese characters.\")\n")
>
> Likewise here.

Okay, I've fixed all of the above. Thanks for the pointers.

Eric
[0001-Make-pinyin-to-Chinese-character-mapping-available-t.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Wed, 30 Jan 2019 20:49:02 GMT) Full text and rfc822 format available.

Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Wed, 30 Jan 2019 12:48:06 -0800
Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>>> Date: Tue, 29 Jan 2019 09:48:30 -0800
>>> 
>>> I've attached a diff adding the conversion function itself, but I'm not
>>> familiar with makefiles and so far haven't been able to figure out how
>>> to call it. It looks like the invocation I want will look like:
>>> 
>>> $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert \
>>>   ${srcdir}/MISC-DIC/pinyin.map ${srcdir}/../lisp/language/pinyin.el
>>> 
>>> Where ${srcdir} is the leim directory, but I don't actually know how to
>>> get this code called by make...
>>
>> Add a target that is the file produced by this command, then make the
>> above command the recipe of that target.  Similar to the
>> ${leimdir}/ja-dic/ja-dic.el target.
>>
>> But if the above doesn't help, someone else could do this part for
>> you.
>
> I've attached this as a commit patch -- it seems to work fine but I
> would appreciate it if you'd check it.

Oh, after reading a couple of "make" tutorials, I see maybe the make
rule could be simplified to:

${leimdir}/../lisp/language/pinyin.el: ${srcdir}/MISC-DIC/pinyin.map
  $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert $< $0





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Thu, 31 Jan 2019 08:52:02 GMT) Full text and rfc822 format available.

Message #35 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Robert Pluim <rpluim <at> gmail.com>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Thu, 31 Jan 2019 09:50:54 +0100
Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:

>
> Oh, after reading a couple of "make" tutorials, I see maybe the make
> rule could be simplified to:
>
> ${leimdir}/../lisp/language/pinyin.el: ${srcdir}/MISC-DIC/pinyin.map
>   $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert $< $0

$@ , I think.

Robert




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Thu, 31 Jan 2019 19:36:02 GMT) Full text and rfc822 format available.

Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Thu, 31 Jan 2019 11:35:32 -0800
[Message part 1 (text/plain, inline)]
Robert Pluim <rpluim <at> gmail.com> writes:

> Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
>
>>
>> Oh, after reading a couple of "make" tutorials, I see maybe the make
>> rule could be simplified to:
>>
>> ${leimdir}/../lisp/language/pinyin.el: ${srcdir}/MISC-DIC/pinyin.map
>>   $(AM_V_GEN)${RUN_EMACS} -l titdic-cnv -f pinyin-convert $< $0
>
> $@ , I think.

Ah, right you are, thanks. I was wondering why that wasn't working. This
version should do the trick; it also gitignores the generated file.

Eric
[0001-Make-pinyin-to-Chinese-character-mapping-available-t.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Fri, 01 Feb 2019 09:50:02 GMT) Full text and rfc822 format available.

Message #41 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Fri, 01 Feb 2019 11:48:52 +0200
> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
> Date: Thu, 31 Jan 2019 11:35:32 -0800
> 
> +(defun pinyin-convert ()
> +  "Convert text file pinyin.map into an elisp library.
> +The library is named pinyin.el, and contains the constant
> +`pinyin-character-map'."
> +  (let ((src-file (car command-line-args-left))
> +        (dst-file (cadr command-line-args-left))
> +        (coding-system-for-write 'utf-8-emacs))

This should be 'utf-8-unix.  There's no reason to write out stuff in
our internal encoding, as the file is not supposed to have any
characters not representable in UTF-8.

Otherwise, this LGTM.  Let's wait for a few days for more comments,
and then push.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Fri, 01 Feb 2019 16:28:01 GMT) Full text and rfc822 format available.

Message #44 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Fri, 01 Feb 2019 08:27:08 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Thu, 31 Jan 2019 11:35:32 -0800
>> 
>> +(defun pinyin-convert ()
>> +  "Convert text file pinyin.map into an elisp library.
>> +The library is named pinyin.el, and contains the constant
>> +`pinyin-character-map'."
>> +  (let ((src-file (car command-line-args-left))
>> +        (dst-file (cadr command-line-args-left))
>> +        (coding-system-for-write 'utf-8-emacs))
>
> This should be 'utf-8-unix.  There's no reason to write out stuff in
> our internal encoding, as the file is not supposed to have any
> characters not representable in UTF-8.

Oh, okay. For my information -- is that not platform-dependent? I
noticed titdic-cnv.el has a utf-8-emacs encoding cookie at the top.

> Otherwise, this LGTM.  Let's wait for a few days for more comments,
> and then push.

Sure thing.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Fri, 01 Feb 2019 18:55:02 GMT) Full text and rfc822 format available.

Message #47 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Fri, 01 Feb 2019 20:53:39 +0200
> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
> Date: Fri, 01 Feb 2019 08:27:08 -0800
> 
> >> +        (coding-system-for-write 'utf-8-emacs))
> >
> > This should be 'utf-8-unix.  There's no reason to write out stuff in
> > our internal encoding, as the file is not supposed to have any
> > characters not representable in UTF-8.
> 
> Oh, okay. For my information -- is that not platform-dependent?

No, the defaults are platform-dependent.  utf-8-unix is an explicit
specification of an encoding, so it leaves nothing to the platform.

> I noticed titdic-cnv.el has a utf-8-emacs encoding cookie at the
> top.

utf-8-emacs is the internal representation of characters used by
Emacs, it should only be used when some of the characters might not be
expressible in UTF-8 (i.e. they are beyond the Unicode codespace).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Fri, 01 Feb 2019 19:16:02 GMT) Full text and rfc822 format available.

Message #50 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Fri, 01 Feb 2019 11:15:08 -0800
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Fri, 01 Feb 2019 08:27:08 -0800
>> 
>> >> +        (coding-system-for-write 'utf-8-emacs))
>> >
>> > This should be 'utf-8-unix.  There's no reason to write out stuff in
>> > our internal encoding, as the file is not supposed to have any
>> > characters not representable in UTF-8.
>> 
>> Oh, okay. For my information -- is that not platform-dependent?
>
> No, the defaults are platform-dependent.  utf-8-unix is an explicit
> specification of an encoding, so it leaves nothing to the platform.
>
>> I noticed titdic-cnv.el has a utf-8-emacs encoding cookie at the
>> top.
>
> utf-8-emacs is the internal representation of characters used by
> Emacs, it should only be used when some of the characters might not be
> expressible in UTF-8 (i.e. they are beyond the Unicode codespace).

Interesting, thank you for this background.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 24 Feb 2019 05:37:02 GMT) Full text and rfc822 format available.

Message #53 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sat, 23 Feb 2019 21:36:10 -0800
On 02/01/19 11:48 AM, Eli Zaretskii wrote:
>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Thu, 31 Jan 2019 11:35:32 -0800
>> 
>> +(defun pinyin-convert ()
>> +  "Convert text file pinyin.map into an elisp library.
>> +The library is named pinyin.el, and contains the constant
>> +`pinyin-character-map'."
>> +  (let ((src-file (car command-line-args-left))
>> +        (dst-file (cadr command-line-args-left))
>> +        (coding-system-for-write 'utf-8-emacs))
>
> This should be 'utf-8-unix.  There's no reason to write out stuff in
> our internal encoding, as the file is not supposed to have any
> characters not representable in UTF-8.
>
> Otherwise, this LGTM.  Let's wait for a few days for more comments,
> and then push.

Doesn't look like anything more is forthcoming, shall I push to master?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 24 Feb 2019 16:07:02 GMT) Full text and rfc822 format available.

Message #56 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Abrahamsen <eric <at> ericabrahamsen.net>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sun, 24 Feb 2019 18:06:13 +0200
> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
> Date: Sat, 23 Feb 2019 21:36:10 -0800
> 
> > Otherwise, this LGTM.  Let's wait for a few days for more comments,
> > and then push.
> 
> Doesn't look like anything more is forthcoming, shall I push to master?

Yes, please.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34215; Package emacs. (Sun, 24 Feb 2019 18:55:01 GMT) Full text and rfc822 format available.

Message #59 received at 34215 <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34215 <at> debbugs.gnu.org
Subject: Re: bug#34215: 27.0.50;
 Provide elisp access to Chinese pinyin-to-character mapping
Date: Sun, 24 Feb 2019 10:53:57 -0800
On 02/24/19 18:06 PM, Eli Zaretskii wrote:
>> From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
>> Date: Sat, 23 Feb 2019 21:36:10 -0800
>> 
>> > Otherwise, this LGTM.  Let's wait for a few days for more comments,
>> > and then push.
>> 
>> Doesn't look like anything more is forthcoming, shall I push to master?
>
> Yes, please.

Done, thanks.




Reply sent to Eric Abrahamsen <eric <at> ericabrahamsen.net>:
You have taken responsibility. (Sun, 24 Feb 2019 19:13:03 GMT) Full text and rfc822 format available.

Notification sent to Eric Abrahamsen <eric <at> ericabrahamsen.net>:
bug acknowledged by developer. (Sun, 24 Feb 2019 19:13:04 GMT) Full text and rfc822 format available.

Message #64 received at 34215-done <at> debbugs.gnu.org (full text, mbox):

From: Eric Abrahamsen <eric <at> ericabrahamsen.net>
To: 34215-done <at> debbugs.gnu.org
Date: Sun, 24 Feb 2019 11:12:16 -0800



bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 25 Mar 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 81 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.