GNU bug report logs - #24405
24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.

Previous Next

Package: emacs;

Reported by: Oleksandr Gavenko <gavenkoa <at> gmail.com>

Date: Sat, 10 Sep 2016 08:35:01 UTC

Severity: normal

Tags: notabug

Found in version 24.5

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Oleksandr Gavenko <gavenkoa <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 24405 <at> debbugs.gnu.org
Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
Date: Sat, 10 Sep 2016 20:12:57 +0300
On 2016-09-10, Eli Zaretskii wrote:

> This is the intended behavior, yes.  The word-combining-categories
> feature is designed to support specific rare situations with mixing
> the Far Eastern scripts (e.g., use of Kanji characters in Japanese
> text), not for arbitrary games with Latin and European scripts.
>
> May I ask why do you need to consider the above a single word?  In
> what situation(s) does that make sense?

I work on dictionary. Dictionary article and supplemented texts uses IPA
symbols for word pronunciation.

I like with single move to select pronunciation in text like:

  leap [liːp]        lip [lɪp]
  wheel [wiːl]       will [wɪl]
  seek [siːk]        sick [sɪk]

It's annoying to move across long mixed words with C-Left, C-Right or
C-S-Left, C-S-Right, you may try to move across:

  international [ˌɪntərˈnæʃənəl]

Also I found that some IPA characters marked as latin script:

  (aref char-script-table ?æ)  latin

But it may be discussing because it is usual letter for some languages.

As a workaround should I modify char-script-table?

Like:

  (mapc (lambda (ch) (aset char-script-table ch 'latin) (modify-syntax-entry ch "w"))
        '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))

This brings desired behavior but it is unclear if this is fine.

Another solution is to invent own:

  (define-category ?p "Phonetic")

and to add it to IPA characters:

  (mapc (lambda (ch) (modify-category-entry ch "p"))
        '(?ʌ ?ə ?ɜ ?ɒ ?ɛ ?θ ?ʊ ?ɪ ?ɔ ?ɑ ?ʃ ?ʧ ?ː ?ˈ ?ˌ ?ʒ ?ŋ))

so it becomes possible to use:

  (add-to-list 'word-combining-categories '(?p . ?l))
  (add-to-list 'word-combining-categories '(?l . ?p))

-- 
http://defun.work/




This bug report was last modified 5 years and 294 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.