GNU bug report logs - #20140
24.4; M17n shaper output rejected

Previous Next

Package: emacs;

Reported by: Richard Wordingham <richard.wordingham <at> ntlworld.com>

Date: Wed, 18 Mar 2015 22:21:02 UTC

Severity: normal

Tags: moreinfo

Found in version 24.4

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 20140 <at> debbugs.gnu.org (full text, mbox):

From: handa <at> gnu.org (K. Handa)
To: Richard Wordingham <richard.wordingham <at> ntlworld.com>
Cc: 20140 <at> debbugs.gnu.org
Subject: Re: bug#20140: 24.4; M17n shaper output rejected
Date: Wed, 25 Mar 2015 23:25:54 +0900
Hi, thank you for the detailed explanation.

In article <20150321175818.1b125eba <at> JRWUBU2>, Richard Wordingham <richard.wordingham <at> ntlworld.com> writes:

> What I ought to want is SIL's split cursor scheme, which indicated the
> next ('point') and previous characters, even in bidirectional text.
> Unfortunately, that's not compatible with m17n, which seems to assume
> that cursor position will be a single number.  The Emacs functions
> forward-char-intrusive and backward-char-intrusive provided a pleasant,
> more intuitive, alternative, and I am sad to hear they are gone.
> Perhaps I'll have to start using toggle-auto-composition.

Those Emacs functions are just my idea for improving Emacs
for CTL users, and have never been included in the official
Emacs verison.  I check the code and found two problems:

(1) When the command sets disable-point-adjustment to t,
command_loop_1 should force updating the display if point is
within a grapheme cluster.  So we need this patch:

diff --git a/src/keyboard.c b/src/keyboard.c
index bf65df1..13125c1 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -1636,6 +1636,16 @@ command_loop_1 (void)
 	    adjust_point_for_property (last_point_position,
 				       MODIFF != prev_modiff);
 	}
+      else if (current_buffer == prev_buffer
+	       && last_point_position != PT)
+	{
+	  if (PT > BEGV && PT < ZV
+	      && (composition_adjust_point (last_point_position, PT) != PT))
+	    /* Now point is within a grapheme cluster.  We must update
+	       the display so that this cluster is discomosed on the
+	       screen and the cursor is correctly placed at point.  */
+	    windows_or_buffers_changed = 22;
+	}
 
       /* Install chars successfully executed in kbd macro.  */
 
(2) We should break a grapheme cluster at point.  So we need
this patch.

diff --git a/src/xdisp.c b/src/xdisp.c
index a17f5a9..0c56395 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -3408,6 +3408,9 @@ compute_stop_pos (struct it *it)
       pos = next_overlay_change (charpos);
       if (pos < it->stop_charpos)
 	it->stop_charpos = pos;
+      /* If point is in front of the current stop pos, stop there.  */
+      if (charpos < PT && PT < it->stop_charpos)
+	it->stop_charpos = PT;
 
       /* Set up variables for computing the stop position from text
          property changes.  */
@@ -8166,7 +8169,12 @@ next_element_from_buffer (struct it *it)
 	  && IT_CHARPOS (*it) >= it->redisplay_end_trigger_charpos)
 	run_redisplay_end_trigger_hook (it);
 
-      stop = it->bidi_it.scan_dir < 0 ? -1 : it->end_charpos;
+      /* Set stop position considering the bidi direction and point.  */
+      if (it->bidi_it.scan_dir < 0)
+	stop = (PT < IT_CHARPOS (*it)) ? PT : -1;
+      else
+	stop = ((IT_CHARPOS (*it) < PT && PT < it->end_charpos)
+		? PT : it->end_charpos);
       if (CHAR_COMPOSED_P (it, IT_CHARPOS (*it), IT_BYTEPOS (*it),
 			   stop)
 	  && next_element_from_composition (it))

Could you try these patches and test the usability of
forward-char-intrusive and backward-char-intrusive?

> > Please try to move cursor over this Devanagri text "हिंदी" on
> > Emacs, gedit, and, for instance, firefox.  They all treat
> > that text as 2 grapheme clusters "हिं" and "दी".  The first
> > one corresponds to character the sequence U+935 U+93F, and
> > U+93F (vowel I) is displayed before U+935 (base cosonant).

> Note that those clusters are only 3 and 2 characters long.  Retyping
> them is tolerable.  Now consider the Sanskrit Devanagari text स्त्री,
> which contains two consonant-combining viramas.  Emacs moves across it
> in 1 step, but Claws e-mail (GTK-based, I believe) and LibreOffice
> (HarfBuzz-based, at least for linux) both take 3 steps to move across
> it.  Claws and LibreOffice use different algorithms to position the
> cursor.  That of LibreOffice seems more reasonable, but that of
> Claws works better!  The reason is that Unicode did not declare virama
> as forming grapheme clusters.

Ah, hmmm, that a problem of DEVA-OTF.flt and DEV2-OTF.flt of
the m17n library.  I'll try to fix them.

> It seems to have solved all of them.  When I reported the bug, I was
> having problems with my font because libotf was silently ignoring half
> the lookups in my font.

Could you please send me (not on this list) an appropriate
bug/problem report if libotf should be fixed?

> I though I might have problems with U+1A58 TAI THAM SIGN MAI KANG LAI,
> which in Lao visually groups (usually) with the following base
> consonant and in Tai Khuen groups with the preceding base consonant. My
> clustering in Emacs follows the Tai Khuen scheme.  (I compose two
> orthographic clusters together in Emacs, but declare two grapheme
> clusters in the FLT processing.)  However, my font follows a major
> Northern Thai dictionary and places it on the following base consonant
> if there is nothing above it, but otherwise places it on the preceding
> base consonant.  However, my implementation is too dirty to cause
> problems - the second cluster is not reported as deriving from the
> mai kang lai character.

> I wonder, though, what will happen if I manage to implement the
> Universal Shaping Engine's (USE) rphf feature. The author of a Lao-style
> Tai Tham font wanted this feature in HarfBuzz.  The desired effect seems
> easy to achieve in m17n-flt, but placing it under font control is more
> difficult.  I'm studying MLM2-OTF.flt to see how to do it.

I've just started to study the Universal Shaping Engine.  It
seems that we can implement it by a proper FLT file.

> > > However, it then makes editing of the 'clusters' more
> > > difficult.  Note that there are examples above with 5
> > > characters in a cluster, and this is by no means the
> > > limit.
> > 
> > But, it seems that the current behavior is accepted, at
> > least, by Indic people.

> Who do you mean by 'Indic people'?

I just mean that I have not heard any complaints about that
"too long cluster problem" of Emacs.  No one is using Emacs
for Indic scripts?

> New Tai Lue is an interesting case.  Microsoft delayed support for this
> simple Indic script for so long that most apparently Unicode-encoded
> New Tai Lue text was actually encoded in visual order.  With Unicode
> 8.0, New Tai Lue is changing from phonetic order to visual order, and
> it will no longer need any clusters at all!  

Wow, I didn't know that.

> Emacs 23.3 (which is what is in long-term support Ubuntu
> 12.04) offers no support for New Tai Lue, so I am not sure
> that there is yet a New Tai Lue view on composition in
> Emacs.

We may be able to provide supports for new scripts in elpa.

---
K. Handa
handa <at> gnu.org




This bug report was last modified 3 years and 155 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.