Thanks for the very fast replies and the suggestions!

> I think this is okay, but maybe the macro could be converted into an
> inline function, and then fetching the character from the various
> objects separated from looking up the char-table for that character?

I've made the conversion — it's now slightly less messy.  Regarding
the separation, I think that the most that can be done is to have the
look-up in a separate function.  Regrettably, trying to first obtain
the character, for example via a set of if-else clauses, and then
looking it up, which would be cleaner, can't really work since the
cases (in particular the first and fourth) are not disjunct.

> Well, since it's a char-table, users will probably want to control
> which characters cause word-wrap.  One idea would be to have a minor
> mode or some such, providing users an ability to include or exclude
> different groups of related whitespace characters as a whole?  This
> could be in follow-up patches, though.

Customisability was the idea. :)  I'm not sure how best to expose it in
a reasonably user-friendly way, though.  For the time being, allowing
control directly via the char-table might suffice.

> We could also look at LineBreak.txt in the Unicode database for
> inspiration and ideas.

The three main customisation options that I'm considering are:

i) Unicode whitespace (U+2000 - U+200B),
ii) vim's breakat characters (default " ^I!@*-+;:,./?"), since
presumably they had given it some thought,
iii) The characters in LineBreak.txt (parsing the file shouldn't be
hard, if there aren't copyright issues).

> But I do think that the default should be only TAB and SPC, as Emacs
> always did, and the rest should be optional, and probably in Lisp, not
> C.

> And also a couple of tests (the ones you used would be a good start).

These would presumably have to be in tests/manual since the position of
the word-wrap depends on too many variables (width of window, font
type, font size)?

> I will send the forms off-list, thanks.

Thanks!

> One other thought: since TAB and SPC are single-byte characters,
> whereas the other "whitespace" characters are not, supporting the
> non-ASCII whitespace will be associated with some performance hit in
> the display engine, because it requires a char-table look up and
> fetching multibyte characters.  So perhaps we should allow the
> word-wrap-chars char-table to be nil (and make that the default), and
> in that case support only TAB and SPC as word-wrap characters.  This
> would let the default configuration work as fast is it does now,
> imposing the performance penalty only on those who want to support
> more whitespace characters.

> WDYT?

That seems sensible.  The old behaviour will now be the default and
look-up using the char-table only enabled with the global minor mode
`word-wrap-char-table-mode' (suggestions for a catchier name very
welcome).  For the time being, its definition is in a new file
`lisp/word-wrap.el'.  Also temporarily, for ease of testing, it allows
wrapping on the unicode whitespace characters.


The current iteration is attached.  Until they've found a proper home,
the slightly updated tests are below.

(require 'word-wrap)

(with-current-buffer (get-buffer-create "*bar*")
  (dotimes (i 1000)
    (insert "1234")) ; U-200B
  (setq word-wrap t)
  (setq whitespace-display-mappings
    '((space-mark 32
              [183]
              [46])
      (space-mark 160
              [164]
              [95])
      (space-mark 8203
              [164]
              [95])
      (newline-mark 10
            [36 10])
      (tab-mark 9
            [187 9]
            [92 9])))
  (whitespace-mode)
  (word-wrap-char-table-mode)
  (display-buffer "*bar*"))

(with-current-buffer (get-buffer-create "*foo*")
  (dotimes (i 1000)
    (insert "1234")) ; U-200B
  (setq word-wrap t)
  (word-wrap-char-table-mode)
  (display-buffer "*foo*"))