On Mon, 15 Dec 2014 09:43:54 -0800 Paul Eggert wrote: > Can't we improve this when using_utf8 () is true? In that case, every > ASCII character is always single byte. Also, the bytes 0xc0, 0xc1, > and 0xf5 through 0xff can be added to the table: they are not > single-byte characters but they are always encoding errors so they will > be a character boundary as far as skip_remains_mb is concerned. This > suggests that the table 'always_single_byte' should be renamed to > something like 'always_character_boundary'. > > > wint_t wc = WEOF; > > + if (always_single_byte[*p]) > > + return p; Thanks for the review and suggestion. If using_utf8 () is true, we can set always_character_boundary to true except 0x80-0xbf. > This won't assign anything to *WCP, contrary to the documented API for > for skip_remains_mb. This is OK (as callers don't care) but the API > documentation should be changed to reflect the actual behavior. Oh! if WCP is needed, we must be go through step by step, as a wide character before P is set to *WCP. I fixed it and updated the API documentation.