GNU bug report logs -
#16581
suggested code simplification in dfa.c
Previous Next
Reported by: Aharon Robbins <arnold <at> skeeve.com>
Date: Tue, 28 Jan 2014 20:12:01 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Aaron Crane wrote:
> > I'd expect U+01C8 LATIN CAPITAL LETTER L WITH SMALL
> > LETTER J ("Lj", roughly) to be U+01C7 LATIN CAPITAL LETTER LJ ("LJ")
> > under towupper(), and U+01C9 LATIN SMALL LETTER LJ ("lj") under
> > towlower().
>
> Ouch, thanks, I hadn't considered that. So my idea was all wrong. But
> this means the current code is all wrong too. I'll take a look at it. I
> hope I don't regret picking up this thread....
This seems to be a weird (and very much corner) case: wc != towlower(wc)
and wc != towupper(wc). It can only be an issue if doing case folding,
and there are only a few spots in the code that deal with case folding
when compiling the dfa.
I suggest starting with the XOR changes for unibyte locales - they seem
(to me) to be good no matter what. And then separately try to deal with
the multibyte case.
And just to increase the need for Aspirin, any idea how regex handles
this case? I would not be surprised if the code there also doesn't
catch this. Wheeeeeeeee! :-)
Arnold
This bug report was last modified 11 years and 78 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.