GNU bug report logs - #15984
24.3; Problem with combining characters in attachment filename

Previous Next

Package: emacs;

Reported by: nisse <at> lysator.liu.se (Niels Möller)

Date: Thu, 28 Nov 2013 08:33:01 UTC

Severity: normal

Found in version 24.3

Fixed in version 24.4

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> gnu.org>
Cc: 15984 <at> debbugs.gnu.org, nisse <at> lysator.liu.se
Subject: bug#15984: 24.3; Problem with combining characters in attachment filename
Date: Sat, 30 Nov 2013 15:20:13 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 15984 <at> debbugs.gnu.org
> 
> > From: nisse <at> lysator.liu.se (Niels Möller)
> > Cc: 15984 <at> debbugs.gnu.org
> > Date: Fri, 29 Nov 2013 13:41:01 +0100
> > 
> >   $ rm -rf ~/tmp/home/ && mkdir ~/tmp/home/ && HOME=$HOME/tmp/home emacs -nw -Q -l bug.el
> > 
> > where bug.el contains
> > 
> >   (setq gnus-init-file nil)
> >   (setq gnus-nntp-server nil)
> >   (gnus-no-server)
> > 
> > Then create the group with G d, pointing out the spool-like directory,
> > enter the group (RET), view the message (RET), try to write out the
> > attachment ("o" on the attachment button). Still crashes for me.
> 
> It crashes in the current development trunk as well, but only if the
> locale is set to Latin-1, like yours.
> 
> I'm looking at this.

There's something strange going on here; I'm CC'ing Handa-san, because
the problem is related to processing character compositions on a TTY.

The reason for the crash is simple: the following code from
indent.c:scan_for_column

      /* Check composition sequence.  */
      if (cmp_it.id >= 0
	  || (scan == cmp_it.stop_pos
	      && composition_reseat_it (&cmp_it, scan, scan_byte, end,
					w, NULL, Qnil)))
	composition_update_it (&cmp_it, scan, scan_byte, Qnil);
      if (cmp_it.id >= 0)
	{
	  scan += cmp_it.nchars;
	  scan_byte += cmp_it.nbytes;
	  if (scan <= end)
	    col += cmp_it.width;
	  if (cmp_it.to == cmp_it.nglyphs)
	    {
	      cmp_it.id = -1;
	      composition_compute_stop_pos (&cmp_it, scan, scan_byte, end,
					    Qnil);
	    }
	  else
	    cmp_it.from = cmp_it.to;
	  continue;
	}

incorrectly steps into the middle of a multibyte sequence #xCC #x88
for the character u+0308, the Combining Diaeresis, because
cmp_it.nbytes is computed as 1 instead of 2.  The question is why it
does so.

From stepping through composition_reseat_it and composition_update_it,
it looks like the code contradicts itself: it thinks that 'a' and the
combining diaeresis should be composed, but then acts as if no
composition should happen.  As result, this code in
composition_update_it:

      glyph = LGSTRING_GLYPH (gstring, cmp_it->from);
      cmp_it->nchars = LGLYPH_TO (glyph) + 1 - from;
      cmp_it->nbytes = 0;
      cmp_it->width = 0;
      for (i = cmp_it->nchars - 1; i >= 0; i--)
	{
	  c = XINT (LGSTRING_CHAR (gstring, i));
	  cmp_it->nbytes += CHAR_BYTES (c);
	  cmp_it->width += CHAR_WIDTH (c);
	}

always considers only 'a', never the diaeresis, and so cmp_it->nbytes
is always computed as 1.  So scan_for_column advances only 1 byte,
instead of 2, and finds itself in the middle of a multibyte sequence.
From there, it's a sure way to a crash.

I hope Handa-san will be able to find the problem.  The crash is 100%
reproducible with the steps described above and a mail message that
Niels can send you off-list.

TIA




This bug report was last modified 11 years and 102 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.