GNU bug report logs - #16581
suggested code simplification in dfa.c

Previous Next

Package: grep;

Reported by: Aharon Robbins <arnold <at> skeeve.com>

Date: Tue, 28 Jan 2014 20:12:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #17 received at 16581 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Aharon Robbins <arnold <at> skeeve.com>,
 16581 <at> debbugs.gnu.org
Subject: Re: bug#16581: suggested code simplification in dfa.c
Date: Wed, 29 Jan 2014 06:42:10 -0700
[Message part 1 (text/plain, inline)]
On 01/28/2014 11:48 PM, Paul Eggert wrote:

>  
> +/* The following functions exploit the commutativity and associativity of ^,
> +   and the fact that X ^ X is zero.  POSIX requires that C equals
> +   either tolower (C) or toupper (C);

Unfortunately, while this is true, I'm not sure if it accurately covers
all possible case-folded comparisons outside of the C locale.

http://www.unicode.org/faq/casemap_charprop.html

Consider the Greek locale, el_GR.UTF-8, which has two lower-case sigma:
L'\x3c3' and L'\x3c2', but only one upper-case: L'\x3a3'.  As a result,
all three wchar_t values must compare case-insensitively to one another.

Or consider titlecase characters, such as Unicode L'\x1c8' (Lj), which
has both an uppercase mapping L'\x1c7' (LJ) and lowercase mapping
L'\x1c9' (lj) - again, all three wchar_t values must compare
case-insensitively to one another.

Your hack is great at finding characters that have a case mapping, but
not necessarily at finding all such characters that map to the same
result when passed through towlower(towupper(c)).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

This bug report was last modified 11 years and 79 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.