Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Fri, 31 Jan 2025 09:41:02 UTC
Severity: wishlist
Message #209 received at 75964 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: pipcet <at> protonmail.com, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com Subject: Re: bug#75964: Switching the Emacs build to -Wswitch-enum in src/ Date: Wed, 05 Feb 2025 15:05:31 +0200
> Date: Tue, 4 Feb 2025 13:23:25 -0800 > From: Paul Eggert <eggert <at> cs.ucla.edu> > Cc: pipcet <at> protonmail.com, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com > > On 2/3/25 04:08, Eli Zaretskii wrote: > >> From: Paul Eggert <eggert <at> cs.ucla.edu> > >> The main point is to catch more potential inadvertent omissions of "case > >> X:" in a "switch (E)" where E's type is an enum containing X. etags.c, > >> like the rest of Emacs, currently uses one style to do this; the > >> proposed style would be better at catching these mistakes. > > > > And how do the proposed changes achieve that goal? > > With an enum expression E and code like this: > > switch (E) { case A: ...; case B: ...; } > > there is currently no compile-time diagnostic for mismatches between the > type and the code when the enum type also has a value C, either because > the programmer forgot to handle case C when the switch was written, or > because the enum type was changed and the programmer forgot to check all > uses of the enum. > > There *is* a compile-time diagnostic for similar mismatches between type > and code, e.g., if the enum type lacks a B (perhaps because B was > formerly present but has since been removed). But there is no > compile-time diagnostic for the mismatch in question. > > Mismatches like these are all too common and can lead to user-visible > bugs. It's a win to detect these mismatches statically. There's no way to know, just by looking at the code, whether the omission is a mistake or deliberate and required. Enlisting the compiler to flag these cases is as likely to yield false positives as it is likely to point out a real mistake. Ideally, the code authors should have either explicitly written all the values, or added assertions or comments to explain the intent if they don't. But in practice, this seldom if ever happens, and when someone else reads the code they didn't write, they cannot always know which is which, and neither can the compiler. So I very much doubt these techniques will make our situation significantly better, if at all. > >> For these switch statements, use "switch (+E)" instead of "switch (E)". > >> This pacifies GCC and clearly signals to the reader that the switch's > >> cases are not intended to exhaust the enum. A "switch (E)" must list all > >> the enum values; a "switch (+E)" need not do so. A reasonable guideline > > > > The above seems to say that "+E" is different from "E" in ways that > > the compiler knows about. So I'm asking where is this special meaning > > documented > > It is more than merely documented; it's been common practice for years. > For example, given enum W {X,Y,Z} the type of an expression like X+1 has > been int (not enum W) ever since enums were standardized in the C89 > standard. So basically the plus sign converts the enum to an int, and thus shuts up the compiler warnings? If so, why not use an explicit cast -- it should at least be more clear to more people. > For documentation chapter and verse, please see C23 §6.5.4.3 ¶2, which > says that the type of +E is like that of -E: the type is E's type after > integer promotions. And which doesn't mention "enum" even once. > Also please see C23 §6.3.1.1 ¶1–2, which says that > int is the type of these Emacs enums after integer promotion. Which also don't mention "enum". > So, when E is one of these enums, the type of +E (and of -E) is int. How many people you envision to be aware of the meaning of this trickery? When I see a unary plus sign in code I consider it strange formatting at best, a mistake at worst. I wouldn't want our code to use this if it can be avoided, definitely not without comments. > > I don't think I see how these changes improve checking and make the > > code easier to read. For me, the "+E" thing is an obstacle to > > negotiate; I'm sure others will also stumble on that. > > If necessary we can package "+E" inside an inline identity function. Wouldn't a cast (preferably followed by a comment) do the job? If so, I prefer to use an explicit cast. > > If you really consider this long list of values more readable than the > > original code, then I guess we disagree on what is and isn't readable. > > Oh, that long list was in some sense a misfire, as the patch did two > things where I suppose it should have done one. First, the patch > improved the runtime checkcing for invalid values, something that can > and should be done independently of what we do about -Wswitch-enum. > Second, the patch pacified -Wswitch-enum. There are no invalid values in that place, by definition. The code checks for invalid bidi types elsewhere, where it needs that, by using bidi_check_type. > I fixed this misfire by splitting the patch into two parts, which I'm > attaching. The first attached patch improves the runtime checking > without changing the -Wswitch-enum style, and I installed that on > master. The second attached patch, which I have not installed, is the > -Wswitch-enum change proper. > > In this example, pacifying -Wswitch-enum helped find code where Emacs's > internal runtime checking was missing some invalid values. Which code was that? > diff --git a/src/bidi.c b/src/bidi.c > index d8754e2db73..fd0bebb85e0 100644 > --- a/src/bidi.c > +++ b/src/bidi.c > @@ -282,12 +282,6 @@ bidi_get_type (int ch, bidi_dir_t override) > emacs_abort (); > > default_type = (bidi_type_t) XFIXNUM (CHAR_TABLE_REF (bidi_type_table, ch)); > - /* Every valid character code, even those that are unassigned by the > - UCD, have some bidi-class property, according to > - DerivedBidiClass.txt file. Therefore, if we ever get UNKNOWN_BT > - (= zero) code from CHAR_TABLE_REF, that's a bug. */ > - if (default_type == UNKNOWN_BT) > - emacs_abort (); > > switch (default_type) > { > @@ -303,13 +297,26 @@ bidi_get_type (int ch, bidi_dir_t override) > case FSI: > case PDI: > return default_type; > - default: > + > + case STRONG_L: case STRONG_R: > + case WEAK_EN: case WEAK_AN: > + case STRONG_AL: > + case WEAK_ES: case WEAK_ET: case WEAK_CS: case WEAK_NSM: > + case NEUTRAL_S: case NEUTRAL_WS: case NEUTRAL_ON: > if (override == L2R) > return STRONG_L; > else if (override == R2L) > return STRONG_R; > else > return default_type; > + > + case UNKNOWN_BT: > + default: > + /* Every valid character code, even those unassigned by the UCD, > + have some bidi-class property, according to DerivedBidiClass.txt. > + Therefore, if we ever get UNKNOWN_BT (= zero) or some unknown > + code from CHAR_TABLE_REF, that's a bug. */ > + emacs_abort (); > } > } This change is for the worse, sorry. The test for values outside of bidi_type_t is done elsewhere, as I explained, and the explicit list of types whose directionality is overridden makes the code less future-proof. So I've reverted that commit. > * src/bidi.c (characters, bidi_find_bracket_pairs) > (bidi_resolve_brackets, bidi_resolve_neutral): > Use ‘switch (+E)’ to indicate that it’s intended that we not > enumerate all the enum values. I don't understand why we need to potentially obfuscate our code just to indicate that not all the enum values are explicitly mentioned. Would a comment to that effect do the same? In fact, there are already comments there, like this one: > - switch (prev_type) > + switch (+prev_type) > { > case RLI: /* X5a */ "X5a" is a reference to a section in the UBA description, where the interested reader can find the details which will explain why only some values are mentioned. Why is that worse than the sign trick? > Remove ‘default: break;’ cases that are not now needed to pacify GCC. They were there not for pacifying GCC, they were an important part of the code. For example, this: > @@ -2204,9 +2204,6 @@ bidi_resolve_explicit (struct bidi_it *bidi_it) > bidi_check_type (bidi_it->type_after_wn); > type = WEAK_BN; /* X9/Retaining */ > break; > - default: > - /* Nothing. */ > - break; is there because no other characters neither start nor stop explicit embeddings (see the UBA sections to which the comments refer). I didn't write the "Nothing" comment there because I didn't trust the reader to know what "default: break;" means. Why would you assume that this default case could be removed without any adverse effects? So all in all, I think these changes in the particular case of bidi.c don't improve our code. I realize that it is easier for you (as it is for every one of us) to read the code you yourself wrote than to read the code of someone else, but that doesn't mean changing the code to match your preferences is necessarily a change for the better. E.g., it is the opposite for me. I'm not saying that there are no places in Emacs where similar changes could help, but a general rule of using the cryptic "+E" everywhere where not all the enum values are mentioned is not something I agree to.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.