GNU bug report logs -
#11082
24.0.94; u.glyphless member in struct glyph does not fit in 32 bits
Previous Next
Full log
Message #8 received at 11082 <at> debbugs.gnu.org (full text, mbox):
> Date: Sat, 24 Mar 2012 14:23:28 +0900
> From: YAMAMOTO Mitsuharu <mituharu <at> math.s.chiba-u.ac.jp>
>
> In dispextern.h:
>
> 316 struct glyph
> 317 {
> (snip)
> 418 /* A union of sub-structures for different glyph types. */
> 419 union
> 420 {
> (snip)
> 447 /* Sub-stretch for type == GLYPHLESS_GLYPH. */
> 448 struct
> 449 {
> 450 /* Value is an enum of the type glyphless_display_method. */
> 451 unsigned method : 2;
> 452 /* 1 iff this glyph is for a character of no font. */
> 453 unsigned for_no_font : 1;
> 454 /* Length of acronym or hexadecimal code string (at most 8). */
> 455 unsigned len : 4;
> 456 /* Character to display. Actually we need only 22 bits. */
> 457 unsigned ch : 26;
> 458 } glyphless;
> 459
> 460 /* Used to compare all bit-fields above in one step. */
> 461 unsigned val;
> 462 } u;
> 463 };
>
> The member `u.glyphless' above requires at least 33 bits and does not
> fit in the size (32 bits) of `u.val' on many environments. As a
> result, equality with respect to the `u.val' member (e.g., used in
> GLYPH_EQUAL_P) does not necessarily mean the equality of glyphless
> glyphs.
?? Isn't the size of a union defined by its widest member? If so, we
just end up wasting some storage here, but we should never truncate a
bit field. Do you have an actual test case that shows such kind of a
bug?
> According to the comment above, it seems to be OK to shorten the
> length of `u.glyphless.ch' member from 26 to 25. Could someone
> confirm this?
Confirmed. From the ELisp manual:
To support this multitude of characters and scripts, Emacs closely
follows the "Unicode Standard". The Unicode Standard assigns a unique
number, called a "codepoint", to each and every character. The range
of codepoints defined by Unicode, or the Unicode "codespace", is
`0..#x10FFFF' (in hexadecimal notation), inclusive. Emacs extends this
range with codepoints in the range `#x110000..#x3FFFFF', which it uses
for representing characters that are not unified with Unicode and "raw
8-bit bytes" that cannot be interpreted as characters. Thus, a
character codepoint in Emacs is a 22-bit integer number.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I would actually suggest to use 22-bit for this field, to avoid
confusion in the future.
This bug report was last modified 13 years and 139 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.