#16895 - [PATCH] grep: fix multiple bugs with bracket expressions

GNU bug report logs - #16895
[PATCH] grep: fix multiple bugs with bracket expressions

Package: grep;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Thu, 27 Feb 2014 17:35:01 UTC

Severity: normal

Tags: fixed, patch

Fixed in versions 16232, 16777

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Aharon Robbins <arnold <at> skeeve.com> To: eggert <at> cs.ucla.edu, 16895 <at> debbugs.gnu.org Subject: bug#16895: [PATCH] grep: fix multiple bugs with bracket expressions Date: Thu, 27 Feb 2014 22:31:14 +0200

Hi Paul. > Subject: bug#16895: [PATCH] grep: fix multiple bugs with bracket expressions > To: 16895 <at> debbugs.gnu.org > Date: Thu, 27 Feb 2014 09:34:33 -0800 > From: Paul Eggert <eggert <at> cs.ucla.edu> > > I'm afraid there are several problems in the dfa code. I still don't > have a handle on all of them, but here's my first patch to deal with the > first major one I found. Patterns like [a-[.z.]], which caused 'grep' > to dump core until recently, still aren't being handled correctly, and > there are several closely related bugs here. I've taken the liberty of > pushing the attached patch. Thanks. This looks promising. A few comments / questions. > +/* Return true if the current locale is known to be a unibyte locale > + without multicharacter collating sequences and where range > + comparisons simply use the native encoding. These locales can be > + processed more efficiently. */ > + > +static bool > +using_simple_locale (void) > +{ > + /* True if the native character set is known to be compatible with > + the C locale. The following test isn't perfect, but it's good > + enough in practice, as only ASCII and EBCDIC are in common use > + and this test correctly accepts ASCII and rejects EBCDIC. */ > + enum { native_c_charset = > + ('\b' == 8 && '\t' == 9 && '\n' == 10 && '\v' == 11 && '\f' == 12 > + && '\r' == 13 && ' ' == 32 && '!' == 33 && '"' == 34 && '#' == 35 > + && '%' == 37 && '&' == 38 && '\'' == 39 && '(' == 40 && ')' == 41 > + && '*' == 42 && '+' == 43 && ',' == 44 && '-' == 45 && '.' == 46 > + && '/' == 47 && '0' == 48 && '9' == 57 && ':' == 58 && ';' == 59 > + && '<' == 60 && '=' == 61 && '>' == 62 && '?' == 63 && 'A' == 65 > + && 'Z' == 90 && '[' == 91 && '\\' == 92 && ']' == 93 && '^' == 94 > + && '_' == 95 && 'a' == 97 && 'z' == 122 && '{' == 123 && '|' == 124 > + && '}' == 125 && '~' == 126) > + }; What a mouthful! Is all that really necessary? > + if ((c1 == ':' && syntax_bits & RE_CHAR_CLASSES) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I'd suggest parentheses around the bit with the bitwise operator, both for readability and to match the rest of the code. > @@ -1000,7 +1043,10 @@ parse_bracket_exp (void) > /* Fetch bracket. */ > FETCH_WC (c, wc, _("unbalanced [")); > if (c1 == ':') > - /* build character class. */ > + /* Build character class. POSIX allows character > + classes to match multicharacter collating elements, > + but the regex code does not support that, so do not > + worry about that possibility. */ I thought GLIBC did support them? I will try this out in gawk, sometime in the next few days and let you know how it goes. Thanks for the work! Arnold

This bug report was last modified 11 years and 84 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #16895 [PATCH] grep: fix multiple bugs with bracket expressions

GNU bug report logs - #16895
[PATCH] grep: fix multiple bugs with bracket expressions