GNU bug report logs - #52918
29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry

Previous Next

Package: emacs;

Reported by: Van Ly <van.ly <at> sdf.org>

Date: Fri, 31 Dec 2021 17:56:01 UTC

Severity: wishlist

Found in version 29.0.50

To reply to this bug, email your comments to 52918 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Fri, 31 Dec 2021 17:56:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Van Ly <van.ly <at> sdf.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 31 Dec 2021 17:56:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Van Ly <van.ly <at> sdf.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; to make use of ucd/Unihan_Readings.txt for kDefinition entry
Date: Fri, 31 Dec 2021 17:55:01 +0000 (UTC)
[Message part 1 (text/plain, inline)]
Hello,

I was looking in the master's emacs/admin/notes subdirectory and 
found the unicode file.  It has a list of files from the ucd and has 
left out:

  . Unihan_Readings.txt

Like how quail-show-key helps by showing in the minibuffer the input 
sequence needed to type a character for a specific input method, can 
there be a function called quail-show-unihan that exposes in the 
minibuffer the kDefinition entry associated with the East Asian 
character from ucd/Unihan_Readings.txt?

-- 
vl
[bug-gnu-emacs-29.text (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Mon, 03 Jan 2022 13:55:01 GMT) Full text and rfc822 format available.

Message #8 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Van Ly <van.ly <at> sdf.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50;
 to make use of ucd/Unihan_Readings.txt for kDefinition entry
Date: Mon, 03 Jan 2022 15:54:33 +0200
> Date: Fri, 31 Dec 2021 17:55:01 +0000 (UTC)
> From: Van Ly <van.ly <at> sdf.org>
> 
> I was looking in the master's emacs/admin/notes subdirectory and 
> found the unicode file.  It has a list of files from the ucd and has 
> left out:
> 
>    . Unihan_Readings.txt
> 
> Like how quail-show-key helps by showing in the minibuffer the input 
> sequence needed to type a character for a specific input method, can 
> there be a function called quail-show-unihan that exposes in the 
> minibuffer the kDefinition entry associated with the East Asian 
> character from ucd/Unihan_Readings.txt?

Yes, this could be added to Emacs, and IMO would be a useful feature.

Suggested implementation:

  . import the Unihan_Readings.txt file into Emacs
  . add Makefile rules to produce a uni-unihan-readings.el file from
    Unihan_Readings.txt, which defines a char-table where each
    character has its kDefinition property value
  . code a minor mode which will show in the echo area the value of
    the kDefinition property, if any, of the character at point

Patches welcome.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Tue, 04 Jan 2022 15:14:02 GMT) Full text and rfc822 format available.

Message #11 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Van Ly <van.ly <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Tue, 4 Jan 2022 15:13:17 +0000 (UTC)
[Message part 1 (text/plain, inline)]
On Mon, 3 Jan 2022, Eli Zaretskii wrote:

>
> Suggested implementation:
>
>  . import the Unihan_Readings.txt file into Emacs
>
> Patches welcome.
>

Attached is the diff listing for admin/unidata/README to source 
Unihan_Readings.txt from

=> https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip

The version specific path alternatives are

=> https://www.unicode.org/Public/14.0.0/ucd/Unihan.zip
=> https://www.unicode.org/Public/15.0.0/ucd/Unihan-15.0.0d1.zip

-- 
vl
[admin-unidata-README-diff.text (text/plain, attachment)]

Severity set to 'wishlist' from 'normal' Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Sun, 09 Jan 2022 15:46:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Mon, 17 Jan 2022 18:26:02 GMT) Full text and rfc822 format available.

Message #16 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Van Ly <van.ly <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Mon, 17 Jan 2022 18:25:26 +0000 (UTC)
[Message part 1 (text/plain, inline)]
On Mon, 3 Jan 2022, Eli Zaretskii wrote:

>
> Suggested implementation:
>
>  . add Makefile rules to produce a uni-unihan-readings.el file from
>    Unihan_Readings.txt, which defines a char-table where each
>    character has its kDefinition property value
>

A candidate for the Makefile rule to produce uni-unihan-readings.el 
is

'''
#!/bin/sh
X='/usr/X/Projects/emacs-28.0.91/admin/unidata/Unihan_Readings.txt'
fgrep 'kDefinition' "$X" | sed -e '/^#/d' -e 's/^../#x/' | head -n 3 
| awk '-F	' 'BEGIN {printf("(defvar 
readings-table\n\t(make-char-table '\'readings-table' nil)\n\t\"Char 
table of definitions for East Asian characters.\")\n")} 
{printf("(aset readings-table %s \"%s\")\n", $1, $3)}'
 '''

The result is

'''
(defvar readings-table
	(make-char-table 'readings-table nil)
	"Char table of definitions for East Asian characters.")
(aset readings-table #x3400 "(same as U+4E18 丘) hillock or mound")
(aset readings-table #x3401 "to lick; to taste, a mat, bamboo bark")
(aset readings-table #x3402 "(J) non-standard form of U+559C 喜, to 
like, love, enjoy; a joyful thing")
'''

-- 
vl

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Tue, 18 Jan 2022 11:31:01 GMT) Full text and rfc822 format available.

Message #19 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Van Ly <van.ly <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Tue, 18 Jan 2022 11:30:28 +0000 (UTC)
Place node in etc/TODO file for the suggested implementation here to 
be done.

'''
diff -u --label /usr/X/Projects/emacs-28.0.91/etc/TODO --label 
\#\<buffer\ TODO\> /usr/X/Projects/emacs-28.0.91/etc/TODO 
/dev/shm/buffer-content-Q1ArDD
--- /usr/X/Projects/emacs-28.0.91/etc/TODO
+++ #<buffer TODO>
@@ -747,6 +747,9 @@

 ** Add definitions for symbol properties, for documentation purposes

+** Make use of char-table for reading definitions from 
ucd/Unihan_Readings.txt
+bug#52918 see.
+
 ** Temporarily remove scroll bars when they are not needed
 Typically when a buffer can be fully displayed in its window.


Diff finished.  Tue Jan 18 22:22:52 2022

'''

-- 
vl





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Sun, 23 Jan 2022 03:51:02 GMT) Full text and rfc822 format available.

Message #22 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Van Ly <van.ly <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Sun, 23 Jan 2022 02:15:06 +0000 (UTC)
[Message part 1 (text/plain, inline)]
On Mon, 3 Jan 2022, Eli Zaretskii wrote:

>>    . Unihan_Readings.txt
>>
>> Like how quail-show-key helps by showing in the minibuffer the input
>> sequence needed to type a character for a specific input method, can
>> there be a function called quail-show-unihan that exposes in the
>> minibuffer the kDefinition entry associated with the East Asian
>> character from ucd/Unihan_Readings.txt?
>
> Yes, this could be added to Emacs, and IMO would be a useful feature.
>
> Suggested implementation:
>
>  . import the Unihan_Readings.txt file into Emacs
>  . add Makefile rules to produce a uni-unihan-readings.el file from
>    Unihan_Readings.txt, which defines a char-table where each
>    character has its kDefinition property value
>  . code a minor mode which will show in the echo area the value of
>    the kDefinition property, if any, of the character at point
>
> Patches welcome.
>

See patch attached.

Two of the three implementation steps suggested are done.

-- 
vl
[0029-bug-52918-generate-East-Asian-readings-char-table.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Sun, 23 Jan 2022 06:01:01 GMT) Full text and rfc822 format available.

Message #25 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Van Ly <van.ly <at> sdf.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Sun, 23 Jan 2022 08:00:01 +0200
> Date: Sun, 23 Jan 2022 02:15:06 +0000 (UTC)
> From: Van Ly <van.ly <at> sdf.org>
> cc: 52918 <at> debbugs.gnu.org
> 
> On Mon, 3 Jan 2022, Eli Zaretskii wrote:
> 
> >>    . Unihan_Readings.txt
> >>
> >> Like how quail-show-key helps by showing in the minibuffer the input
> >> sequence needed to type a character for a specific input method, can
> >> there be a function called quail-show-unihan that exposes in the
> >> minibuffer the kDefinition entry associated with the East Asian
> >> character from ucd/Unihan_Readings.txt?
> >
> > Yes, this could be added to Emacs, and IMO would be a useful feature.
> >
> > Suggested implementation:
> >
> >  . import the Unihan_Readings.txt file into Emacs
> >  . add Makefile rules to produce a uni-unihan-readings.el file from
> >    Unihan_Readings.txt, which defines a char-table where each
> >    character has its kDefinition property value
> >  . code a minor mode which will show in the echo area the value of
> >    the kDefinition property, if any, of the character at point
> >
> > Patches welcome.
> >
> 
> See patch attached.
> 
> Two of the three implementation steps suggested are done.

Thanks.

You don't seem to have copyright assignment on file, and without that
we cannot accept such large contributions.  Would you like to start
your legal paperwork now?  If so, I will send you the form and the
instructions.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Sun, 23 Jan 2022 11:23:01 GMT) Full text and rfc822 format available.

Message #28 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Van Ly <van.ly <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Sun, 23 Jan 2022 11:22:03 +0000 (UTC)
On Sun, 23 Jan 2022, Eli Zaretskii wrote:

>>>
>>> Patches welcome.
>>>
>>
>> See patch attached.
>>
>> Two of the three implementation steps suggested are done.
>
> Thanks.
>
> You don't seem to have copyright assignment on file, and without that
> we cannot accept such large contributions.  Would you like to start
> your legal paperwork now?  If so, I will send you the form and the
> instructions.
>

I sent an email to assign <at> gnu.org in the 24hr before this patch was 
submitted.  I was hoping this patch would fall below the 15 line 
limit and not need the formality of the legal paperwork.  The minor 
mode contribution would climb above the limit, which was why I sent 
the request to assign copyright.  Best case is a 2 week wait.

That generated uni-unihan-readings.el will need a line as follows:

diff --git a/admin/unidata/Unihan_Readings.awk 
b/admin/unidata/Unihan_Readings.awk
index cf319449e59..f01c75b88f9 100644
--- a/admin/unidata/Unihan_Readings.awk
+++ b/admin/unidata/Unihan_Readings.awk
@@ -1,5 +1,6 @@
 BEGIN {
     FS="	"
+    printf(";; -*-no-byte-compile: t; -*-\n")
     printf("(defvar readings-table\n\
 	(make-char-table 'readings-table nil)\n\
 	\"Char table of definitions for East Asian characters.\")\n")


-- 
vl





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Sun, 23 Jan 2022 11:41:02 GMT) Full text and rfc822 format available.

Message #31 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Van Ly <van.ly <at> sdf.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Sun, 23 Jan 2022 13:40:23 +0200
> Date: Sun, 23 Jan 2022 11:22:03 +0000 (UTC)
> From: Van Ly <van.ly <at> sdf.org>
> cc: 52918 <at> debbugs.gnu.org
> 
> > You don't seem to have copyright assignment on file, and without that
> > we cannot accept such large contributions.  Would you like to start
> > your legal paperwork now?  If so, I will send you the form and the
> > instructions.
> >
> 
> I sent an email to assign <at> gnu.org in the 24hr before this patch was 
> submitted.  I was hoping this patch would fall below the 15 line 
> limit and not need the formality of the legal paperwork.  The minor 
> mode contribution would climb above the limit, which was why I sent 
> the request to assign copyright.  Best case is a 2 week wait.
> 
> That generated uni-unihan-readings.el will need a line as follows:

Thanks, I prefer to wait until your assignment is in place, and you
can then submit the final pieces to make this feature complete.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Tue, 25 Jul 2023 15:44:01 GMT) Full text and rfc822 format available.

Message #34 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: van.ly <at> sdf.org
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Tue, 25 Jul 2023 18:44:01 +0300
[Message part 1 (text/plain, inline)]
> Date: Sun, 23 Jan 2022 13:40:23 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 52918 <at> debbugs.gnu.org
> 
> > Date: Sun, 23 Jan 2022 11:22:03 +0000 (UTC)
> > From: Van Ly <van.ly <at> sdf.org>
> > cc: 52918 <at> debbugs.gnu.org
> > 
> > > You don't seem to have copyright assignment on file, and without that
> > > we cannot accept such large contributions.  Would you like to start
> > > your legal paperwork now?  If so, I will send you the form and the
> > > instructions.
> > >
> > 
> > I sent an email to assign <at> gnu.org in the 24hr before this patch was 
> > submitted.  I was hoping this patch would fall below the 15 line 
> > limit and not need the formality of the legal paperwork.  The minor 
> > mode contribution would climb above the limit, which was why I sent 
> > the request to assign copyright.  Best case is a 2 week wait.
> > 
> > That generated uni-unihan-readings.el will need a line as follows:
> 
> Thanks, I prefer to wait until your assignment is in place, and you
> can then submit the final pieces to make this feature complete.

<Time passes...>

> Date: Tue, 25 Jul 2023 14:47:52 GMT
> From: Van Ly <van.ly <at> sdf.org>
> 
> More than 18-months ago I left hanging in one of the bug report
> threads the suggestion to include a readings table for CJKV characters
> from Unicode.
> 
> At the time I hadn't done the paperwork and posted the awk transformer
> script which was about fewer than 16 lines that generated the 21346
> lines reading table.  See attached.
> 
> I have since done the paperwork and was prompted to get this done or
> close the bug report seeing the configure script for 29.1 on line
> 2761 has the option to generate a smaller sized Japanese dictionary.
> 
> The awk script I have since misplaced but it should be somewhere in
> the bug report if details have not been purged beyond 12 months.

Details were not purged, but please look at the past discussions of
this bug and tell where in it should we look for the Awk script.

I forward below the attachments you sent to me in private email;
please continue discussing this issue in this thread, not separately
and not in private email to me.

Thanks.

[Unihan_Readings.el (application/emacs-lisp, attachment)]
[create-readings-table.sh (application/x-sh, attachment)]
[example-configuration.el (application/emacs-lisp, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#52918; Package emacs. (Tue, 25 Jul 2023 18:19:01 GMT) Full text and rfc822 format available.

Message #37 received at 52918 <at> debbugs.gnu.org (full text, mbox):

From: Van Ly <van.ly <at> sdf.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 52918 <at> debbugs.gnu.org
Subject: Re: bug#52918: 29.0.50; to make use of ucd/Unihan_Readings.txt for
 kDefinition entry
Date: Tue, 25 Jul 2023 18:18:20 GMT
> Date: Tue, 25 Jul 2023 18:44:01 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > Date: Sun, 23 Jan 2022 13:40:23 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> 

 <Time passes...>

> 
> > Date: Tue, 25 Jul 2023 14:47:52 GMT
> > From: Van Ly <van.ly <at> sdf.org>
> > 
> > I have since done the paperwork and was prompted to get this done or
> > close the bug report seeing the configure script for 29.1 on line
> > 2761 has the option to generate a smaller sized Japanese dictionary.
> > 
> > The awk script I have since misplaced but it should be somewhere in
> > the bug report if details have not been purged beyond 12 months.
> 
> Details were not purged, but please look at the past discussions of
> this bug and tell where in it should we look for the Awk script.
> 

The patch is located at X and the Awk script in there looks as follows

 1  BEGIN {
 2      FS="       "
 3      printf("(defvar readings-table\n\
 4         (make-char-table 'readings-table nil)\n\
 5         \"Char table of definitions for East Asian characters.\")\n")
 6  }
 7  /^#/ { next }
 8  /kDefinition/ {
 9      sub(/^../, "#x", $1)
10      printf("(aset readings-table %s \"%s\")\n", $1, $3)
11  }
12
13  # Local Variables:
14  # indent-tabs-mode: t
15  # tab-width: 8
16  # End:

 X
    https://lists.gnu.org/archive/html/bug-gnu-emacs/2022-01/msg01393.html




This bug report was last modified 2 years and 17 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.