Package: emacs;
Reported by: Pip Cet <pipcet <at> protonmail.com>
Date: Fri, 31 Jan 2025 09:41:02 UTC
Severity: wishlist
View this message in rfc822 format
From: Eli Zaretskii <eliz <at> gnu.org> To: Pip Cet <pipcet <at> protonmail.com> Cc: eggert <at> cs.ucla.edu, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com Subject: bug#75964: Switching the Emacs build to -Wswitch-enum in src/ Date: Mon, 03 Feb 2025 15:13:49 +0200
> Date: Mon, 03 Feb 2025 12:24:18 +0000 > From: Pip Cet <pipcet <at> protonmail.com> > Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com > > "Eli Zaretskii" <eliz <at> gnu.org> writes: > > >> + case NEUTRAL_ON: case NEUTRAL_S: case NEUTRAL_WS: > >> + case STRONG_AL: case STRONG_L: case STRONG_R: > >> + case WEAK_AN: case WEAK_CS: case WEAK_EN: case WEAK_ES: case WEAK_ET: case WEAK_NSM: > >> if (override == L2R) > >> return STRONG_L; > >> else if (override == R2L) > > > > If you really consider this long list of values more readable than the > > original code, then I guess we disagree on what is and isn't readable. > > As I mentioned up-thread, such a long list doesn't convince me that > > all the values were mentioned. > > The compiler, with -Wswitch-enum, would have warned about that. Yes, but when I read code, I don't want to run the compiler, I want to be able to understand the code and reason about it by myself. > The > point is you would not have to convince yourself that the "default" > doesn't catch a case you should have explicitly handled otherwise. I don't need to convince myself: the code says so. If the algorithm is in error, that's a different issue. > As an example: why is a European Number, which is always written L2R > even in R2L context, treated the same as a strongly R2L character in the > specific context of bidi_find_bracket_pairs? That's confusing, because > I don't understand the reason, and I should look at the Unicode > algorithm to answer that question. Adding more case labels doesn't help > the confusion at all; in fact, it distracts from this important > exception by hiding it in a long list of no-op cases. Like I said: bidi.c must be read with the UBA at hand. I've explicitly added pointers to specific parts of the UBA description for that very reason. I've also made the code resemble the UBA text as much as reasonably possible. > > So for me this is a net loss, because the original code left no doubt > > that all the types get the directionality-override handling. > > I'm very tempted to suggest rewriting this code to make the exceptional > handling of WEAK_EN more obvious Why does it have to be more obvious? First, it isn't just WEAK_EN, it's also WEAK_AN. Here's what the UBA says about that: For each bracket-pair element in the list of pairs of text positions a. Inspect the bidirectional types of the characters enclosed within the bracket pair. b. If any strong type (either L or R) matching the embedding direction is found, set the type for both brackets in the pair to match the embedding direction. Note that EN and AN should be treated as a strong R type when searching within the brackets. Note that the isolating run sequence may not be contiguous. Implementations should take care to ignore characters not contained in the isolating run sequence when processing neutral or weak characters. o [ e ] o → o e e e o o [ o e ] → o e o e e o [ NI e ] → o e NI e e c. Otherwise, if there is a strong type it must be opposite the embedding direction. Therefore, test for an established context with a preceding strong type by checking backwards before the opening paired bracket until the first strong type (L or R) is found, using the value of sos if there is none. Note that EN and AN should be treated as a strong R type when searching for established context. 1. If the preceding strong type is also opposite the embedding direction, context is established, so set the type for both brackets in the pair to that direction. o [ o ] e → o o o o e o [ o NI ] o → o o o NI o o 2. Otherwise set the type for both brackets in the pair to the embedding direction. e [ o ] o → e e o e o e [ o ] e → e e o e e Note that taken together the two steps in item 2 are guaranteed to set the type for both brackets to the preceding strong type, as there are only two possible values (L and R). d. Otherwise, there are no strong types within the bracket pair. Therefore, do not set the type for that bracket pair. e ( NI ) o → e ( NI ) o Note that if the enclosed text contains no strong types the bracket pairs will both resolve to the same level when resolved individually using rules N1 and N2. As you see, the UBA description explicitly mentions EN and AN, so having that in the code makes it easier (at least IMO) to read and validate the code. > Let's look at the potential bugs this uncovered in detail, improve > comments where we can (because most likely, they're not bugs), and close > this bug. Agreed.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.