GNU bug report logs - #16581
suggested code simplification in dfa.c

Previous Next

Package: grep;

Reported by: Aharon Robbins <arnold <at> skeeve.com>

Date: Tue, 28 Jan 2014 20:12:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log

Message #26 received at 16581 <at> debbugs.gnu.org (full text, mbox):

From: Aaron Crane <grep <at> aaroncrane.co.uk>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Aharon Robbins <arnold <at> skeeve.com>, 16581 <at> debbugs.gnu.org
Subject: Re: bug#16581: suggested code simplification in dfa.c
Date: Wed, 29 Jan 2014 14:20:10 +0000

Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> +/* The following functions exploit the commutativity and associativity of ^,
> +   and the fact that X ^ X is zero.  POSIX requires that C equals
> +   either tolower (C) or toupper (C); if the former, then C ^ tolower (C)
> +   is zero so C ^ xor_other (C) equals toupper (C), and similarly
> +   for the latter.  */
> +
> +/* Return the exclusive-OR of C and C's other case, or zero if C is
> +   not a letter that changes case.  */
> +
> +static wint_t
> +xor_wother (wint_t c)
> +{
> +  return towlower (c) ^ towupper (c);
> +}
[…]
> +      if (case_fold)
>          {
> +          wchar_t xor = xor_wother (wc);
> +          if (xor)
> +            {
> +              addtok_wc (wc ^ xor);
> +              addtok (OR);
> +            }

I don't think this works for the wide-character case. For example, in
a suitable locale, I'd expect U+01C8 LATIN CAPITAL LETTER L WITH SMALL
LETTER J ("Lj", roughly) to be U+01C7 LATIN CAPITAL LETTER LJ ("LJ")
under towupper(), and U+01C9 LATIN SMALL LETTER LJ ("lj") under
towlower(). This matches the behaviour I can observe with a simple
test program under the en_GB.UTF-8 locale on both Linux and Mac OS.

Since 0x1c7 ^ 0x1c9 == 14, and 0x1c8 ^ 14 == 0x1c6, this means we'd
call addtok_wc(0x1c6), and U+01C6 is LATIN SMALL LETTER DZ WITH CARON,
which isn't a desired character.

-- 
Aaron Crane ** http://aaroncrane.co.uk/

This bug report was last modified 11 years and 128 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16581 suggested code simplification in dfa.c

GNU bug report logs - #16581
suggested code simplification in dfa.c