GNU bug report logs - #24405
24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.

Previous Next

Package: emacs;

Reported by: Oleksandr Gavenko <gavenkoa <at> gmail.com>

Date: Sat, 10 Sep 2016 08:35:01 UTC

Severity: normal

Tags: notabug

Found in version 24.5

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 24405 in the body.
You can then email your comments to 24405 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#24405; Package emacs. (Sat, 10 Sep 2016 08:35:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Oleksandr Gavenko <gavenkoa <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 10 Sep 2016 08:35:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Oleksandr Gavenko <gavenkoa <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing between
 latin/phonetic scripts.
Date: Sat, 10 Sep 2016 11:33:45 +0300
Evaluate following form by C-x C-e:

  (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l)))
        (word-separating-categories nil))
    (forward-word))

  HelloПривLLжɪəʊheləʊaiɪa

My pointer stopped between ʊh.

I have:

  (aref char-script-table ?ʊ) phonetic
  (aref char-script-table ?h) latin
  (aref char-script-table ?ж) cyrillic

  (category-set-mnemonics (char-category-set ?ʊ)) ".Ljl"
  (category-set-mnemonics (char-category-set ?h)) ".Lalr"

  (category-docstring ?y) "Cyrillic"
  (category-docstring ?l) "Latin"

I expect that point moved to last character before new line.

Seems that:

  (?l . ?y) (?y . ?l)

has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic
scripts but refused to move through Latin/Phonetic scripts.

If it is intended behavior how will I make Emacs to move across Latin/Phonetic
scripts?

See also:

  http://emacs.stackexchange.com/questions/21131/does-word-syntax-take-script-into-account

In GNU Emacs 24.5.1 (x86_64-pc-linux-gnu, GTK+ Version 3.18.6)
 of 2016-01-22 on binet, modified by Debian
Windowing system distributor `The X.Org Foundation', version 11.0.11803000
System Description:	Debian GNU/Linux testing (stretch)

-- 
http://defun.work/




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24405; Package emacs. (Sat, 10 Sep 2016 10:06:02 GMT) Full text and rfc822 format available.

Message #8 received at 24405 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Oleksandr Gavenko <gavenkoa <at> gmail.com>
Cc: 24405 <at> debbugs.gnu.org
Subject: Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing
 between latin/phonetic scripts.
Date: Sat, 10 Sep 2016 13:05:09 +0300
tags 24405 + notabug
thanks

> From: Oleksandr Gavenko <gavenkoa <at> gmail.com>
> Date: Sat, 10 Sep 2016 11:33:45 +0300
> 
> Evaluate following form by C-x C-e:
> 
>   (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l)))
>         (word-separating-categories nil))
>     (forward-word))
> 
>   HelloПривLLжɪəʊheləʊaiɪa
> 
> My pointer stopped between ʊh.
> 
> I have:
> 
>   (aref char-script-table ?ʊ) phonetic
>   (aref char-script-table ?h) latin
>   (aref char-script-table ?ж) cyrillic
> 
>   (category-set-mnemonics (char-category-set ?ʊ)) ".Ljl"
>   (category-set-mnemonics (char-category-set ?h)) ".Lalr"
> 
>   (category-docstring ?y) "Cyrillic"
>   (category-docstring ?l) "Latin"
> 
> I expect that point moved to last character before new line.
> 
> Seems that:
> 
>   (?l . ?y) (?y . ?l)
> 
> has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic
> scripts but refused to move through Latin/Phonetic scripts.
> 
> If it is intended behavior how will I make Emacs to move across Latin/Phonetic
> scripts?

You can't do this for 2 characters that belong to different scripts,
but have the same categories in their category sets.  Those two
characters both have the 'l' (Latin) category in their sets, so you
cannot force Emacs to consider them not as word boundary.

For the same reason, including a cons cell whose members are
identical, such as (?l . ?l), has no effect.

This is the intended behavior, yes.  The word-combining-categories
feature is designed to support specific rare situations with mixing
the Far Eastern scripts (e.g., use of Kanji characters in Japanese
text), not for arbitrary games with Latin and European scripts.

May I ask why do you need to consider the above a single word?  In
what situation(s) does that make sense?

Thanks.




Added tag(s) notabug. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 10 Sep 2016 10:06:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24405; Package emacs. (Sat, 10 Sep 2016 17:14:01 GMT) Full text and rfc822 format available.

Message #13 received at 24405 <at> debbugs.gnu.org (full text, mbox):

From: Oleksandr Gavenko <gavenkoa <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 24405 <at> debbugs.gnu.org
Subject: Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing between
 latin/phonetic scripts.
Date: Sat, 10 Sep 2016 20:12:57 +0300
On 2016-09-10, Eli Zaretskii wrote:

> This is the intended behavior, yes.  The word-combining-categories
> feature is designed to support specific rare situations with mixing
> the Far Eastern scripts (e.g., use of Kanji characters in Japanese
> text), not for arbitrary games with Latin and European scripts.
>
> May I ask why do you need to consider the above a single word?  In
> what situation(s) does that make sense?

I work on dictionary. Dictionary article and supplemented texts uses IPA
symbols for word pronunciation.

I like with single move to select pronunciation in text like:

  leap [liːp]        lip [lɪp]
  wheel [wiːl]       will [wɪl]
  seek [siːk]        sick [sɪk]

It's annoying to move across long mixed words with C-Left, C-Right or
C-S-Left, C-S-Right, you may try to move across:

  international [ˌɪntərˈnæʃənəl]

Also I found that some IPA characters marked as latin script:

  (aref char-script-table ?æ)  latin

But it may be discussing because it is usual letter for some languages.

As a workaround should I modify char-script-table?

Like:

  (mapc (lambda (ch) (aset char-script-table ch 'latin) (modify-syntax-entry ch "w"))
        '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))

This brings desired behavior but it is unclear if this is fine.

Another solution is to invent own:

  (define-category ?p "Phonetic")

and to add it to IPA characters:

  (mapc (lambda (ch) (modify-category-entry ch "p"))
        '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))

so it becomes possible to use:

  (add-to-list 'word-combining-categories '(?p . ?l))
  (add-to-list 'word-combining-categories '(?l . ?p))

-- 
http://defun.work/




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24405; Package emacs. (Sat, 10 Sep 2016 17:24:01 GMT) Full text and rfc822 format available.

Message #16 received at 24405 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Oleksandr Gavenko <gavenkoa <at> gmail.com>
Cc: 24405 <at> debbugs.gnu.org
Subject: Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing between
 latin/phonetic scripts.
Date: Sat, 10 Sep 2016 20:23:25 +0300
> From: Oleksandr Gavenko <gavenkoa <at> gmail.com>
> Cc: 24405 <at> debbugs.gnu.org
> Date: Sat, 10 Sep 2016 20:12:57 +0300
> 
> As a workaround should I modify char-script-table?

I'd suggest to write your own word-motion commands.  It's not
complicated, you can use regular expressions (which understand
categories, if you need that).

> Another solution is to invent own:
> 
>   (define-category ?p "Phonetic")
> 
> and to add it to IPA characters:
> 
>   (mapc (lambda (ch) (modify-category-entry ch "p"))
>         '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))
> 
> so it becomes possible to use:
> 
>   (add-to-list 'word-combining-categories '(?p . ?l))
>   (add-to-list 'word-combining-categories '(?l . ?p))

That'd be my second best advice.  But I think regular expressions
should provide a better and easier solution.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#24405; Package emacs. (Sun, 11 Sep 2016 11:58:02 GMT) Full text and rfc822 format available.

Message #19 received at 24405 <at> debbugs.gnu.org (full text, mbox):

From: Oleksandr Gavenko <gavenkoa <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 24405 <at> debbugs.gnu.org
Subject: Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing between
 latin/phonetic scripts.
Date: Sun, 11 Sep 2016 14:57:33 +0300
On 2016-09-10, Eli Zaretskii wrote:

>> Another solution is to invent own:
>> 
>>   (define-category ?p "Phonetic")
>> 
>> and to add it to IPA characters:
>> 
>>   (mapc (lambda (ch) (modify-category-entry ch "p"))
>>         '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))
>> 
>> so it becomes possible to use:
>> 
>>   (add-to-list 'word-combining-categories '(?p . ?l))
>>   (add-to-list 'word-combining-categories '(?l . ?p))
>
> That'd be my second best advice.  But I think regular expressions
> should provide a better and easier solution.

This works for me:

  (defconst my/ipa-chars (list ?ˈ ?ˌ ?ː ?ǁ ?ʲ ?θ ?ð ?ŋ ?ɡ ?ʒ ?ʃ ?ʧ ?ə ?ɜ ?ɛ ?ʌ ?ɒ ?ɔ ?ɑ ?æ ?ʊ ?ɪ))
  (define-category ?p "Phonetic")
  (mapc (lambda (ch)
       (cond
        ((eq (aref char-script-table ch) 'phonetic)
         (modify-category-entry ch ?p)
         (modify-category-entry ch ?l nil t))
        ((eq (aref char-script-table ch) 'latin)  ; (aref char-script-table ?ˌ) is 'latin but (char-category-set ?ˌ) is ".j"
         (modify-category-entry ch ?l))))
        my/ipa-chars)
  (add-to-list 'word-combining-categories '(?p . ?l))
  (add-to-list 'word-combining-categories '(?l . ?p))

But adding and removing categories looks too low level. It is necessary to use
some (define-category ?p "Phonetic") that is not defined in Emacs itself.

This looks easier to me:

  (mapc (lambda (ch)
          (aset char-script-table ch 'latin)
          (modify-syntax-entry ch "w"))
        my/ipa-chars)

But ``char-script-table`` derived from Unicode and some code my depends on
this database...

-- 
http://defun.work/




Reply sent to Stefan Kangas <stefan <at> marxist.se>:
You have taken responsibility. (Sun, 29 Sep 2019 04:35:02 GMT) Full text and rfc822 format available.

Notification sent to Oleksandr Gavenko <gavenkoa <at> gmail.com>:
bug acknowledged by developer. (Sun, 29 Sep 2019 04:35:02 GMT) Full text and rfc822 format available.

Message #24 received at 24405-done <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 24405-done <at> debbugs.gnu.org, Oleksandr Gavenko <gavenkoa <at> gmail.com>
Subject: Re: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect
 ``word-combining-categories`` for word boundaries on changing between
 latin/phonetic scripts.
Date: Sun, 29 Sep 2019 06:33:45 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> tags 24405 + notabug
> thanks
[...]
> This is the intended behavior, yes.  The word-combining-categories
> feature is designed to support specific rare situations with mixing
> the Far Eastern scripts (e.g., use of Kanji characters in Japanese
> text), not for arbitrary games with Latin and European scripts.

This was already tagged notabug, and I can see nothing more to do here.
I'm therefore closing this now.

Best regards,
Stefan Kangas




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 27 Oct 2019 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 293 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.