GNU bug report logs - #24405
24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.

Previous Next

Package: emacs;

Reported by: Oleksandr Gavenko <gavenkoa <at> gmail.com>

Date: Sat, 10 Sep 2016 08:35:01 UTC

Severity: normal

Tags: notabug

Found in version 24.5

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Oleksandr Gavenko <gavenkoa <at> gmail.com>
Cc: 24405 <at> debbugs.gnu.org
Subject: bug#24405: 24.5; Possibly ``forward-word`` doesn't respect ``word-combining-categories`` for word boundaries on changing between latin/phonetic scripts.
Date: Sat, 10 Sep 2016 13:05:09 +0300
tags 24405 + notabug
thanks

> From: Oleksandr Gavenko <gavenkoa <at> gmail.com>
> Date: Sat, 10 Sep 2016 11:33:45 +0300
> 
> Evaluate following form by C-x C-e:
> 
>   (let ((word-combining-categories '((?l . ?y) (?y . ?l) (?l . ?l)))
>         (word-separating-categories nil))
>     (forward-word))
> 
>   HelloПривLLжɪəʊheləʊaiɪa
> 
> My pointer stopped between ʊh.
> 
> I have:
> 
>   (aref char-script-table ?ʊ) phonetic
>   (aref char-script-table ?h) latin
>   (aref char-script-table ?ж) cyrillic
> 
>   (category-set-mnemonics (char-category-set ?ʊ)) ".Ljl"
>   (category-set-mnemonics (char-category-set ?h)) ".Lalr"
> 
>   (category-docstring ?y) "Cyrillic"
>   (category-docstring ?l) "Latin"
> 
> I expect that point moved to last character before new line.
> 
> Seems that:
> 
>   (?l . ?y) (?y . ?l)
> 
> has effect because pointer moved across Cyrillic/Latin and Cyrillic/Phonetic
> scripts but refused to move through Latin/Phonetic scripts.
> 
> If it is intended behavior how will I make Emacs to move across Latin/Phonetic
> scripts?

You can't do this for 2 characters that belong to different scripts,
but have the same categories in their category sets.  Those two
characters both have the 'l' (Latin) category in their sets, so you
cannot force Emacs to consider them not as word boundary.

For the same reason, including a cons cell whose members are
identical, such as (?l . ?l), has no effect.

This is the intended behavior, yes.  The word-combining-categories
feature is designed to support specific rare situations with mixing
the Far Eastern scripts (e.g., use of Kanji characters in Japanese
text), not for arbitrary games with Latin and European scripts.

May I ask why do you need to consider the above a single word?  In
what situation(s) does that make sense?

Thanks.




This bug report was last modified 5 years and 293 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.