GNU bug report logs - #49066
26.3; Segmentation fault on specific utf8 string

Previous Next

Package: emacs;

Reported by: "Miguel V. S. Frasson" <mvsfrasson <at> gmail.com>

Date: Wed, 16 Jun 2021 21:08:02 UTC

Severity: normal

Tags: patch

Found in version 26.3

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: handa <handa <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 49066 <at> debbugs.gnu.org, rpluim <at> gmail.com, eggert <at> cs.ucla.edu, larsi <at> gnus.org, mvsfrasson <at> gmail.com
Subject: bug#49066: 26.3; Segmentation fault on specific utf8 string
Date: Sat, 03 Jul 2021 11:05:05 +0900
[Message part 1 (text/plain, inline)]
In article <83bl7qp52q.fsf <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org> writes:
> > With the patch it still crashes for me in emacs-master with harfbuzz disabled:

> Too bad.
> Kenichi, any suggestions?

I checked the code again, and found that it was a fault of m17n-lib
which was not robust enough to handle an OTF table that is different
from what the library expects.

Here is a revised patch to handle such a case.  Could you please try it?

------------------------------------------------------------
diff --git a/src/ftfont.c b/src/ftfont.c
index 0603dd9ce6..12d0d72d27 100644
--- a/src/ftfont.c
+++ b/src/ftfont.c
@@ -2798,10 +2798,31 @@ ftfont_shape_by_flt (Lisp_Object lgstring, struct font *font,
 
   if (gstring.used > LGSTRING_GLYPH_LEN (lgstring))
     return Qnil;
+
+  /* mflt_run may fail to set g->g.to (which must be a valid index
+     into lgstring) correctly if the font has an OTF table that is
+     different from what the m17n library expects. */
   for (i = 0; i < gstring.used; i++)
     {
       MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+      if (g->g.to >= len)
+	{
+	  /* Invalid g->g.to. */
+	  g->g.to = len - 1;
+	  int from = g->g.from;
+	  /* Fix remaining glyphs. */
+	  for (++i; i < gstring.used; i++)
+	    {
+	      g = (MFLTGlyphFT *) (gstring.glyphs) + i;
+	      g->g.from = from;
+	      g->g.to = len - 1;
+	    }
+	}
+    }
 
+  for (i = 0; i < gstring.used; i++)
+    {
+      MFLTGlyphFT *g = (MFLTGlyphFT *) (gstring.glyphs) + i;
       g->g.from = LGLYPH_FROM (LGSTRING_GLYPH (lgstring, g->g.from));
       g->g.to = LGLYPH_TO (LGSTRING_GLYPH (lgstring, g->g.to));
     }
------------------------------------------------------------

> Btw, I think there's a bug in those patterns: ZWJ and ZWNJ shouldn't
> compose unless they are followed by a character.  See section 12.2 in
> the Unicode Standard.

Even if they should not be composed with, we must include them in the
string to shape because their existence may change the glyph of the
previous character.  A shaper (m17n-lib or harfbuzz) must return a glyph
string that has an independent grapheme cluster for the last ZWJ/ZWNJ.

At the time of developing m17n-lib, the above rule was not clear.  To
conform to that rule, please to put the attached BNG2-OTF.flt under the
directory ~/.m17n.d/.

---
K. Handa
handa <at> gnu.org

[BNG2-OTF.flt (application/octet-stream, attachment)]

This bug report was last modified 3 years and 306 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.