GNU bug report logs - #16581
suggested code simplification in dfa.c

Previous Next

Package: grep;

Reported by: Aharon Robbins <arnold <at> skeeve.com>

Date: Tue, 28 Jan 2014 20:12:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: arnold <at> skeeve.com
To: grep <at> aaroncrane.co.uk, eggert <at> cs.ucla.edu
Cc: arnold <at> skeeve.com, 16581 <at> debbugs.gnu.org
Subject: bug#16581: suggested code simplification in dfa.c
Date: Thu, 30 Jan 2014 08:51:34 -0700
Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> Aaron Crane wrote:
> > I'd expect U+01C8 LATIN CAPITAL LETTER L WITH SMALL
> > LETTER J ("Lj", roughly) to be U+01C7 LATIN CAPITAL LETTER LJ ("LJ")
> > under towupper(), and U+01C9 LATIN SMALL LETTER LJ ("lj") under
> > towlower().
>
> Ouch, thanks, I hadn't considered that.  So my idea was all wrong.  But 
> this means the current code is all wrong too.  I'll take a look at it. I 
> hope I don't regret picking up this thread....

This seems to be a weird (and very much corner) case: wc != towlower(wc)
and wc != towupper(wc).  It can only be an issue if doing case folding,
and there are only a few spots in the code that deal with case folding
when compiling the dfa.

I suggest starting with the XOR changes for unibyte locales - they seem
(to me) to be good no matter what. And then separately try to deal with
the multibyte case.

And just to increase the need for Aspirin, any idea how regex handles
this case?  I would not be surprised if the code there also doesn't
catch this.  Wheeeeeeeee!  :-)

Arnold




This bug report was last modified 11 years and 78 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.