GNU bug report logs -
#26396
25.1; char-displayable-p on a latin1 tty
Previous Next
Full log
View this message in rfc822 format
On 04/14/2017 05:37 AM, Eli Zaretskii wrote:
>> This should not be a problem, as the Linux console has only
>> single-width characters.
> Are you sure? AFAIU, the Linux console supports the BMP, and some of
> the characters in the BMP are double-width (a.k.a. "full-width"), for
> example U+1100, U+231A, U+2B1B, and others. What does the Linux
> console do when these characters are sent to the screen driver?
I haven't experimented with it, so I'm not 100% sure. However, as I
understand the implementation, the console driver can support at most
512 simultaneously-displayable characters, as this is a property of the
classic IBM VGA design that is the greatest common denominator of
current or recent (post-1990) PC graphics hardware. The user can specify
what each character looks like down to the pixel level, but cannot alter
character sizes on a character-by-character basis. In theory one could
display double-wide characters by splitting them into halves and
displaying each half separately, but I don't know of anyone who does
that (it would not be practical due to that 512 limit).
>
>>> And what does "display as-is" means in practice? Should we send to
>>> the console the glyph codes corresponding to Unicode points, or should
>>> we send UTF-8 encoded characters?
>> It depends on whether the console is in UTF-8 mode. If so, send UTF-8;
>> if not, send a byte that is transformed according to the current mapping
>> table into a Unicode value. I hope we don't need to bother with the
>> latter possibility.
> What software puts the console in UTF-8 mode? Is that the locale
> setting?
It's done at boot time. The escape sequences ESC % G (or ESC % 8) and
ESC % @ get you into and out of UTF-8 mode; see
<http://man7.org/linux/man-pages/man4/console_codes.4.html>. Common
practice is to stay in UTF-8 mode as the alternative is worse (it has
only 256 simultaneously-displayable characters).
> http://www.tldp.org/LDP/LG/issue91/loozzr.html
> http://man7.org/linux/man-pages/man4/console_codes.4.html
> that seems to be just the tip of an iceberg. Or maybe the
> issue is easier than I envisioned.
Both, I hope. :-)
> Suppose we only wanted to use this feature for UTF-8 locales.
> Assuming that the OS takes care of putting the console in UTF-8 mode,
> we don't need any changes in Emacs, since Emacs already sends UTF-8
> sequences to the screen driver. As the Linux console only supports
> the BMP, we could then simply amend the code of char-displayable-p to
> check whether a character is within the BMP, when the terminal is the
> Linux console. Do you agree with this conclusion?
No, because a character is displayable only if it's in that set of
at-most-512 characters.
> OTOH, now I'm not sure I understand the need for terminal_glyph_code.
> What does it do that a simple check for a Linux console and UTF-8
> terminal encoding, plus a character being inside a BMP, don't?
terminal_glyph_code gets the current set of at-most-512 displayable
characters from from the kernel.
This bug report was last modified 8 years and 66 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.