GNU bug report logs - #20638
BUG: standard & extended RE's don't find NUL's :-(

Previous Next

Package: grep;

Reported by: "L. A. Walsh" <gnu <at> tlinx.org>

Date: Sun, 24 May 2015 00:06:02 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #35 received at 20638 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <gnu <at> tlinx.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 20638 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#20638: BUG: standard & extended RE's don't find NUL's :-(
Date: Mon, 25 May 2015 19:13:06 -0700

Paul Eggert wrote:
> Linda Walsh wrote:
>
>> Perhaps you want to tell me where the documentation on the
>> standard and/or extended RE's is that you use?
----
Here is another:
*POSIX Extended Regular Expression Syntax: 
(http://www.boost.org/doc/libs/1_43_0/libs/regex/doc/html/boost_regex/syntax/basic_extended.html)


Escapes

The POSIX standard defines no escape sequences for POSIX-Extended 
regular expressions, except that:

   * Any special character preceded by an escape shall match itself.
   * The effect of any ordinary character being preceded by an escape 
is undefined.
   * An escape inside a character class declaration shall match itself: 
in other words the escape character is not "special" inside a character 
class declaration; so [\^] will match either a literal '\' or a '^'.

However, that's rather restrictive, so the following standard-compatible 
extensions are also supported by Boost.Regex:
Escapes matching a specific character

The following escape sequences are all synonyms for single characters:

Escape
  

Character

\a
  

'\a'

\e
  

0x1B

\f
  

\f

\n
  

\n

\r
  

\r

\t
  

\t

\v
  

\v

\b
  

\b (but only inside a character class declaration).

\cX
  

An ASCII escape sequence - the character whose code point is X % 32

\xdd
  

A hexadecimal escape sequence - matches the single character whose code 
point is 0xdd.

\x{dddd}
  

A hexadecimal escape sequence - matches the single character whose code 
point is 0xdddd.

\0ddd
  

An octal escape sequence - matches the single character whose code point 
is 0ddd.

\N{Name}
  

Matches the single character which has the symbolic name name. For 
example \\N{newline} matches the single character \n.

*
>
> We're talking about grep, so the relevant documentation is the grep 
> manual, not the awk manual or other random stuff you might find on the 
> Internet.  Type 'info grep'.  Or if you're in Emacs, type 'C-h i m 
> grep RET'.
-----
Again another example of \000 octal and \x hex.

Most desccriptions of the chars grep takes say it was designed so that
awk, sed, tr -- any core linux util that takes regexes - to be *the 
ssame* so people didn't have to learn a different syntax for each tool.
 





This bug report was last modified 9 years and 363 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.