Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Fri, 31 Jan 2025 09:41:02 UTC
Severity: wishlist
View this message in rfc822 format
From: Pip Cet <pipcet <at> protonmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: Andrea Corallo <acorallo <at> gnu.org>, eggert <at> cs.ucla.edu, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com, Stefan Monnier <monnier <at> iro.umontreal.ca> Subject: bug#75964: Switching the Emacs build to -Wswitch-enum in src/ Date: Sun, 02 Feb 2025 12:50:15 +0000
"Eli Zaretskii" <eliz <at> gnu.org> writes: >> Date: Sun, 02 Feb 2025 11:48:45 +0000 >> From: Pip Cet <pipcet <at> protonmail.com> >> Cc: eggert <at> cs.ucla.edu, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com >> >> So while I'd like us to keep those options in mind, I've removed them >> from the following diff. The idea is this would give an overview over >> which code segments would become hard to read as a result of such a >> change, and whether it's bad enough that we need to look at >> alternatives. >> >> I think this concerns bidi.c and keyboard.c in particular. >> >> I've marked some places where I'm not sure the old behavior was okay, >> but I haven't checked them in detail. I haven't found any "there was a >> bug here and we only caught it because of this new warning" smoking >> guns. > > Hmm... like I envisioned, where the list of enum values hiding behind > 'default' is longer than half a dozen or so, the result looks really > problematic: a long and hard-to-read list of values without any > promise that it's exhaustive. (I realize that the compiler will flag > any missing values, but when I read code, I prefer not to compile it, > and besides, it could not be in a compilable shape at that point). > > Humans are not good when they need to deal with long lists, which is > why when the list gets long, it is easier to list values not in the > list. > > So I wonder if we could find a better way, I haven't found one that's easy to do. > or perhaps decide that > where we need such long lists, we could do it the old way. Absolutely. There are plenty of ways to do that, but the easiest is, for now, to decide which files this applies to and either put them in a list in Makefile.in or add a #pragma. Looking at bidi.c, I find bidi_get_type, which is essentially a single switch statement, much easier to read than the new long switch statement in in bidi_find_bracket_pairs: switch (bidi_it->type) { case STRONG_L: flag = ((embedding_level & 1) == 0 ? FLAG_EMBEDDING_INSIDE : FLAG_OPPOSITE_INSIDE); l2r_seen = true; break; case STRONG_R: case WEAK_EN: case WEAK_AN: flag = ((embedding_level & 1) == 1 ? FLAG_EMBEDDING_INSIDE : FLAG_OPPOSITE_INSIDE); r2l_seen = true; break; case UNKNOWN_BT: case WEAK_BN: case NEUTRAL_B: case STRONG_AL: case LRE: case LRO: case RLE: case RLO: case PDF: case LRI: case RLI: case FSI: case PDI: case WEAK_ES: case WEAK_ET: case WEAK_CS: case WEAK_NSM: case NEUTRAL_S: case NEUTRAL_WS: case NEUTRAL_ON: default: break; } I think that could be rewritten to be more readable in a number of ways, including, of course, putting a #pragma in. (I'm focusing on bidi.c because it uses a very large externally-mandated enum. Splitting up the specpdl cases is something we can do, but touching the unicode classification tables is clearly not a good idea.) I'd probably try to avoid switches in deeply-nested contexts, but that's an aesthetic preference. And, of course, code like switch (prev_type) { <...> case LRO: case RLE: case RLO: case FSI: case PDI: case WEAK_ES: case WEAK_ET: case WEAK_CS: case WEAK_NSM: case NEUTRAL_S: case NEUTRAL_WS: case NEUTRAL_ON: default: eassert (prev_type != FSI); /* Nothing. */ break; } simply isn't readable, and that one's my fault. My current impression is this will eventually help us find rare bugs, and produce better C code, but where it hurts readability we should use #pragma and leave the code as it is. When we really want to, we can rewrite predicates such as the first example to be inline functions, and then bidi_find_brackt_pairs could be at least a bit shorter and perhaps more readable; but, again, that's an aesthetic choice. Stefan Kangas, are there files where we currently use switch statements that you'd like to clean up most? I think it's most important for pdumper.c (subtle code) and where external library unions are in use (GTK, RSVG, tree sitter). I'm not sure whether there's a good way to invert the switch without hiding things in Makefile.in where no one ever looks. Maybe just add #include "exhaustive.h" in the relevant C files, which does whatever pragma magic we need? But I'm open to other suggestions. > Btw, does eassume compile to emacs_abort in a production build (i.e. a > build without --enable-checking)? It does not, no. This is sometimes necessary for speed, but most of that effect can be achieved by using emacs_abort: it's a cold function and GCC will penalize code paths that lead to this function call and optimize those which don't. That's what I liked about the eunreachable proposal: in some, very rare cases, a full eassume () is what we want, but usually we're okay with "crash if this happens, but optimize the non-crashing case statically". If someone wants to define eunreachable to eassume (0) they can, but on x86 I'd prefer a single-insn trap. > If not, then perhaps this replacement is not always a good idea? Sorry to point to someone else's code as an example, but Paul's changes meant that if the mmap protection flag enum in pdumper.c was out of bounds, we'd read something from outside the array and pass it to mmap as its protection argument. I think finding such a bug, even if it happens just with production builds, is hard enough to justify a bounds check or a ud2. > We (should) use emacs_abort when the code has no way of dealing with > some situation, so letting the execution continue past that point will > sometimes crash or corrupt data in ways that are much harder to debug > than an abort. We often use eassert for that, and I don't know how many people realize this subtle difference between emacs_abort () and eassert (0). I think there should be an even subtler difference between eunreachable () and eassume (0): initially, they'd do the same both in production builds and development builds, but we reserve the right to weaken eunreachable to an abort if we think it helps us find bugs, and eassume is only to be used where the performance gains are significantly larger than an enum check would be. I still think eassume and eassert shouldn't evaluate expressions with side effects. I'd like to enforce that in GCC but no one liked my patch. Pip
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.