GNU bug report logs -
#20638
BUG: standard & extended RE's don't find NUL's :-(
Previous Next
Reported by: "L. A. Walsh" <gnu <at> tlinx.org>
Date: Sun, 24 May 2015 00:06:02 UTC
Severity: normal
Tags: notabug
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Paul Eggert wrote:
> Linda Walsh wrote:
>
>> Perhaps you want to tell me where the documentation on the
>> standard and/or extended RE's is that you use?
----
Here is another:
*POSIX Extended Regular Expression Syntax:
(http://www.boost.org/doc/libs/1_43_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html)
Escapes
The POSIX standard defines no escape sequences for POSIX-Extended
regular expressions, except that:
* Any special character preceded by an escape shall match itself.
* The effect of any ordinary character being preceded by an escape
is undefined.
* An escape inside a character class declaration shall match itself:
in other words the escape character is not "special" inside a character
class declaration; so [\^] will match either a literal '\' or a '^'.
However, that's rather restrictive, so the following standard-compatible
extensions are also supported by Boost.Regex:
Escapes matching a specific character
The following escape sequences are all synonyms for single characters:
Escape
Character
\a
'\a'
\e
0x1B
\f
\f
\n
\n
\r
\r
\t
\t
\v
\v
\b
\b (but only inside a character class declaration).
\cX
An ASCII escape sequence - the character whose code point is X % 32
\xdd
A hexadecimal escape sequence - matches the single character whose code
point is 0xdd.
\x{dddd}
A hexadecimal escape sequence - matches the single character whose code
point is 0xdddd.
\0ddd
An octal escape sequence - matches the single character whose code point
is 0ddd.
\N{Name}
Matches the single character which has the symbolic name name. For
example \\N{newline} matches the single character \n.
*
>
> We're talking about grep, so the relevant documentation is the grep
> manual, not the awk manual or other random stuff you might find on the
> Internet. Type 'info grep'. Or if you're in Emacs, type 'C-h i m
> grep RET'.
-----
Again another example of \000 octal and \x hex.
Most desccriptions of the chars grep takes say it was designed so that
awk, sed, tr -- any core linux util that takes regexes - to be *the
ssame* so people didn't have to learn a different syntax for each tool.
This bug report was last modified 9 years and 363 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.