GNU bug report logs - #13399
24.3.50; Word-wrap can't wrap at zero-width space U-200B

Previous Next

Package: emacs;

Reported by: martin rudalics <rudalics <at> gmx.at>

Date: Thu, 10 Jan 2013 08:31:02 UTC

Severity: wishlist

Found in version 24.3.50

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13399 <at> debbugs.gnu.org
Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B
Date: Sun, 03 Feb 2013 19:57:31 +0100
Just to recite the initial problem and your proposal:

>> With emacs -Q evaluate
>>
>> (with-current-buffer (get-buffer-create "*foo*")
>>    (dotimes (i 1000)
>>      (insert "1234​")) ; U-200B
>>    (setq word-wrap t)
>>    (display-buffer "*foo*"))
>>
>> where the character after 1234 is a zero-width space character with
>> unicode code point U-200B.  As can be seen in the window showing *foo*,
>> lines are not regularly wrapped at that character.
>
> You mean, not wrapped at all.  Witness the continuation bitmaps in the
> fringes, which shouldn't appear when a line is wrapped.
>
>> Doing
>>
>> (with-current-buffer (get-buffer-create "*foo*")
>>    (dotimes (i 1000)
>>      (insert "1234 "))
>>    (setq word-wrap t)
>>    (display-buffer "*foo*"))
>>
>> instead wraps lines as expected.
>
> If anything, this is a missing feature, since word-wrap is explicitly
> coded to break lines only on SPC and TAB characters.  See the
> IT_DISPLAYING_WHITESPACE macro in xdisp.c.
>
> If we want to add more characters to the set, we should probably
> arrange a special char-table for this, and have it exposed to Lisp, so
> it could be customized.  Patches are welcome.

I now rewrote IT_DISPLAYING_WHITESPACE as

#define IT_DISPLAYING_WHITESPACE(it)					\
  ((it->what == IT_CHARACTER						\
    && !NILP (CHAR_TABLE_REF (Vword_wrap_chars, it->c)))		\
   || ((STRINGP (it->string)						\
	&& !NILP (CHAR_TABLE_REF					\
		   (Vword_wrap_chars,					\
		      SREF (it->string, IT_STRING_BYTEPOS (*it)))))	\
       || (it->s && !NILP (CHAR_TABLE_REF				\
			    (Vword_wrap_chars,				\
			       it->s[IT_BYTEPOS (*it)])))		\
       || (IT_BYTEPOS (*it) < ZV_BYTE					\
	   && !NILP (CHAR_TABLE_REF					\
		      (Vword_wrap_chars,				\
			 (*BYTE_POS_ADDR (IT_BYTEPOS (*it))))))))	\


and have a character table called `word-wrap-chars' such that
(aref word-wrap-chars ?​) returns t, but it doesn't wrap at a
U-200B character.  Is there some additional wrinkle like some
hardcoded space/tab in the word-wrap code I have to observe?
Or is my code wrong?

Thanks, martin





This bug report was last modified 4 years and 245 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.