Package: emacs;
Reported by: Richard Wordingham <richard.wordingham <at> ntlworld.com>
Date: Wed, 18 Mar 2015 22:21:02 UTC
Severity: normal
Tags: moreinfo
Found in version 24.4
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Message #23 received at 20140 <at> debbugs.gnu.org (full text, mbox):
From: handa <at> gnu.org (K. Handa) To: Richard Wordingham <richard.wordingham <at> ntlworld.com> Cc: 20140 <at> debbugs.gnu.org Subject: Re: bug#20140: 24.4; M17n shaper output rejected Date: Wed, 25 Mar 2015 23:25:54 +0900
Hi, thank you for the detailed explanation. In article <20150321175818.1b125eba <at> JRWUBU2>, Richard Wordingham <richard.wordingham <at> ntlworld.com> writes: > What I ought to want is SIL's split cursor scheme, which indicated the > next ('point') and previous characters, even in bidirectional text. > Unfortunately, that's not compatible with m17n, which seems to assume > that cursor position will be a single number. The Emacs functions > forward-char-intrusive and backward-char-intrusive provided a pleasant, > more intuitive, alternative, and I am sad to hear they are gone. > Perhaps I'll have to start using toggle-auto-composition. Those Emacs functions are just my idea for improving Emacs for CTL users, and have never been included in the official Emacs verison. I check the code and found two problems: (1) When the command sets disable-point-adjustment to t, command_loop_1 should force updating the display if point is within a grapheme cluster. So we need this patch: diff --git a/src/keyboard.c b/src/keyboard.c index bf65df1..13125c1 100644 --- a/src/keyboard.c +++ b/src/keyboard.c @@ -1636,6 +1636,16 @@ command_loop_1 (void) adjust_point_for_property (last_point_position, MODIFF != prev_modiff); } + else if (current_buffer == prev_buffer + && last_point_position != PT) + { + if (PT > BEGV && PT < ZV + && (composition_adjust_point (last_point_position, PT) != PT)) + /* Now point is within a grapheme cluster. We must update + the display so that this cluster is discomosed on the + screen and the cursor is correctly placed at point. */ + windows_or_buffers_changed = 22; + } /* Install chars successfully executed in kbd macro. */ (2) We should break a grapheme cluster at point. So we need this patch. diff --git a/src/xdisp.c b/src/xdisp.c index a17f5a9..0c56395 100644 --- a/src/xdisp.c +++ b/src/xdisp.c @@ -3408,6 +3408,9 @@ compute_stop_pos (struct it *it) pos = next_overlay_change (charpos); if (pos < it->stop_charpos) it->stop_charpos = pos; + /* If point is in front of the current stop pos, stop there. */ + if (charpos < PT && PT < it->stop_charpos) + it->stop_charpos = PT; /* Set up variables for computing the stop position from text property changes. */ @@ -8166,7 +8169,12 @@ next_element_from_buffer (struct it *it) && IT_CHARPOS (*it) >= it->redisplay_end_trigger_charpos) run_redisplay_end_trigger_hook (it); - stop = it->bidi_it.scan_dir < 0 ? -1 : it->end_charpos; + /* Set stop position considering the bidi direction and point. */ + if (it->bidi_it.scan_dir < 0) + stop = (PT < IT_CHARPOS (*it)) ? PT : -1; + else + stop = ((IT_CHARPOS (*it) < PT && PT < it->end_charpos) + ? PT : it->end_charpos); if (CHAR_COMPOSED_P (it, IT_CHARPOS (*it), IT_BYTEPOS (*it), stop) && next_element_from_composition (it)) Could you try these patches and test the usability of forward-char-intrusive and backward-char-intrusive? > > Please try to move cursor over this Devanagri text "हिंदी" on > > Emacs, gedit, and, for instance, firefox. They all treat > > that text as 2 grapheme clusters "हिं" and "दी". The first > > one corresponds to character the sequence U+935 U+93F, and > > U+93F (vowel I) is displayed before U+935 (base cosonant). > Note that those clusters are only 3 and 2 characters long. Retyping > them is tolerable. Now consider the Sanskrit Devanagari text स्त्री, > which contains two consonant-combining viramas. Emacs moves across it > in 1 step, but Claws e-mail (GTK-based, I believe) and LibreOffice > (HarfBuzz-based, at least for linux) both take 3 steps to move across > it. Claws and LibreOffice use different algorithms to position the > cursor. That of LibreOffice seems more reasonable, but that of > Claws works better! The reason is that Unicode did not declare virama > as forming grapheme clusters. Ah, hmmm, that a problem of DEVA-OTF.flt and DEV2-OTF.flt of the m17n library. I'll try to fix them. > It seems to have solved all of them. When I reported the bug, I was > having problems with my font because libotf was silently ignoring half > the lookups in my font. Could you please send me (not on this list) an appropriate bug/problem report if libotf should be fixed? > I though I might have problems with U+1A58 TAI THAM SIGN MAI KANG LAI, > which in Lao visually groups (usually) with the following base > consonant and in Tai Khuen groups with the preceding base consonant. My > clustering in Emacs follows the Tai Khuen scheme. (I compose two > orthographic clusters together in Emacs, but declare two grapheme > clusters in the FLT processing.) However, my font follows a major > Northern Thai dictionary and places it on the following base consonant > if there is nothing above it, but otherwise places it on the preceding > base consonant. However, my implementation is too dirty to cause > problems - the second cluster is not reported as deriving from the > mai kang lai character. > I wonder, though, what will happen if I manage to implement the > Universal Shaping Engine's (USE) rphf feature. The author of a Lao-style > Tai Tham font wanted this feature in HarfBuzz. The desired effect seems > easy to achieve in m17n-flt, but placing it under font control is more > difficult. I'm studying MLM2-OTF.flt to see how to do it. I've just started to study the Universal Shaping Engine. It seems that we can implement it by a proper FLT file. > > > However, it then makes editing of the 'clusters' more > > > difficult. Note that there are examples above with 5 > > > characters in a cluster, and this is by no means the > > > limit. > > > > But, it seems that the current behavior is accepted, at > > least, by Indic people. > Who do you mean by 'Indic people'? I just mean that I have not heard any complaints about that "too long cluster problem" of Emacs. No one is using Emacs for Indic scripts? > New Tai Lue is an interesting case. Microsoft delayed support for this > simple Indic script for so long that most apparently Unicode-encoded > New Tai Lue text was actually encoded in visual order. With Unicode > 8.0, New Tai Lue is changing from phonetic order to visual order, and > it will no longer need any clusters at all! Wow, I didn't know that. > Emacs 23.3 (which is what is in long-term support Ubuntu > 12.04) offers no support for New Tai Lue, so I am not sure > that there is yet a New Tai Lue view on composition in > Emacs. We may be able to provide supports for new scripts in elpa. --- K. Handa handa <at> gnu.org
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.