GNU bug report logs - #75964
Switching the Emacs build to -Wswitch-enum in src/

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> protonmail.com>

Date: Fri, 31 Jan 2025 09:41:02 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> protonmail.com>
Cc: eggert <at> cs.ucla.edu, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com
Subject: bug#75964: Switching the Emacs build to -Wswitch-enum in src/
Date: Mon, 03 Feb 2025 15:13:49 +0200
> Date: Mon, 03 Feb 2025 12:24:18 +0000
> From: Pip Cet <pipcet <at> protonmail.com>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 75964 <at> debbugs.gnu.org, stefankangas <at> gmail.com
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> +      case NEUTRAL_ON: case NEUTRAL_S: case NEUTRAL_WS:
> >> +      case STRONG_AL: case STRONG_L: case STRONG_R:
> >> +      case WEAK_AN: case WEAK_CS: case WEAK_EN: case WEAK_ES: case WEAK_ET: case WEAK_NSM:
> >>  	if (override == L2R)
> >>  	  return STRONG_L;
> >>  	else if (override == R2L)
> >
> > If you really consider this long list of values more readable than the
> > original code, then I guess we disagree on what is and isn't readable.
> > As I mentioned up-thread, such a long list doesn't convince me that
> > all the values were mentioned.
> 
> The compiler, with -Wswitch-enum, would have warned about that.

Yes, but when I read code, I don't want to run the compiler, I want to
be able to understand the code and reason about it by myself.

> The
> point is you would not have to convince yourself that the "default"
> doesn't catch a case you should have explicitly handled otherwise.

I don't need to convince myself: the code says so.  If the algorithm
is in error, that's a different issue.

> As an example: why is a European Number, which is always written L2R
> even in R2L context, treated the same as a strongly R2L character in the
> specific context of bidi_find_bracket_pairs?  That's confusing, because
> I don't understand the reason, and I should look at the Unicode
> algorithm to answer that question.  Adding more case labels doesn't help
> the confusion at all; in fact, it distracts from this important
> exception by hiding it in a long list of no-op cases.

Like I said: bidi.c must be read with the UBA at hand.  I've
explicitly added pointers to specific parts of the UBA description for
that very reason.  I've also made the code resemble the UBA text as
much as reasonably possible.

> > So for me this is a net loss, because the original code left no doubt
> > that all the types get the directionality-override handling.
> 
> I'm very tempted to suggest rewriting this code to make the exceptional
> handling of WEAK_EN more obvious

Why does it have to be more obvious?

First, it isn't just WEAK_EN, it's also WEAK_AN.

Here's what the UBA says about that:

  For each bracket-pair element in the list of pairs of text positions

    a. Inspect the bidirectional types of the characters enclosed
       within the bracket pair.
    b. If any strong type (either L or R) matching the embedding
       direction is found, set the type for both brackets in the pair
       to match the embedding direction.

	  Note that EN and AN should be treated as a strong R type
	  when searching within the brackets.

	  Note that the isolating run sequence may not be
	  contiguous. Implementations should take care to ignore
	  characters not contained in the isolating run sequence when
	  processing neutral or weak characters.

	  o [ e ] o → o e e e o

	  o [ o e ] → o e o e e

	  o [ NI e ] → o e NI e e

    c. Otherwise, if there is a strong type it must be opposite the
       embedding direction. Therefore, test for an established context
       with a preceding strong type by checking backwards before the
       opening paired bracket until the first strong type (L or R) is
       found, using the value of sos if there is none.

          Note that EN and AN should be treated as a strong R type
	  when searching for established context.

	  1. If the preceding strong type is also opposite the
	     embedding direction, context is established, so set the
	     type for both brackets in the pair to that direction.

	      o [ o ] e → o o o o e

	      o [ o NI ] o → o o o NI o o

	  2. Otherwise set the type for both brackets in the pair to the
	     embedding direction.

	      e [ o ] o → e e o e o

	      e [ o ] e → e e o e e

	    Note that taken together the two steps in item 2 are
            guaranteed to set the type for both brackets to the
            preceding strong type, as there are only two possible
            values (L and R).

    d. Otherwise, there are no strong types within the bracket
       pair. Therefore, do not set the type for that bracket pair.

	  e ( NI ) o → e ( NI ) o

       Note that if the enclosed text contains no strong types the
       bracket pairs will both resolve to the same level when resolved
       individually using rules N1 and N2.

As you see, the UBA description explicitly mentions EN and AN, so
having that in the code makes it easier (at least IMO) to read and
validate the code.

> Let's look at the potential bugs this uncovered in detail, improve
> comments where we can (because most likely, they're not bugs), and close
> this bug.

Agreed.




This bug report was last modified 127 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.