GNU bug report logs - #20638
BUG: standard & extended RE's don't find NUL's :-(

Previous Next

Package: grep;

Reported by: "L. A. Walsh" <gnu <at> tlinx.org>

Date: Sun, 24 May 2015 00:06:02 UTC

Severity: normal

Tags: notabug

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 20638 <at> debbugs.gnu.org (full text, mbox):

From: Linda Walsh <gnu <at> tlinx.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 20638 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#20638: BUG: standard & extended RE's don't find NUL's :-(
Date: Mon, 25 May 2015 15:22:30 -0700

Paul Eggert wrote:
> Linda Walsh wrote:
>> it is documented, that '\ddd' or '\xHH' can be used
>> to match a single character of the value specified.
>
> I don't see where it's documented to behave that way.  Perhaps you're 
> looking at the wrong documentation?
Perhaps you want to tell me where the documentation on the
standard and/or extended RE's is that you use?
I think I was referred to a number of different manpages...
it's the first reference under "See Also" at the bottom of the
grep page: awk.  From the awk manpage:

  String Constants
      String constants in AWK are sequences of  characters  enclosed  
between
      double quotes (like "value").  Within strings, certain escape 
sequences
      are recognized, as in C.  These are:

      \\   A literal backslash.
      \a   The "alert" character; usually the ASCII BEL character.
      \b   Backspace.
      \f   Form-feed.
      \n   Newline.
      \r   Carriage return.
      \t   Horizontal tab.
      \v   Vertical tab.
      \xhex digits
           The character represented by the string of hexadecimal 
digits fol-
           lowing  the \x.  As in ISO C, all following hexadecimal 
digits are
           considered part of the escape sequence.  (This feature 
should tell
           us something about language design by committee.)  E.g., 
"\x1B" is
           the ASCII ESC (escape) character.
      \ddd The character represented by the 1-, 2-, or  3-digit  
sequence  of
           octal digits.  E.g., "\033" is the ASCII ESC (escape) character.




>>  The argument was that
>> a NUL in a file made it non-text -- therefore it woudln't be a "line".
>
> Obviously -z changes the definition of a line.  -z is explicitly 
> designed to operate on files containing NUL bytes.  So that argument 
> was not coherent.
---
That is my opinion, also, but nevertheless, that '\000' implies binary 
was said
early in this bug-discusion -- I was refuting that.  The other thing that
corrupts some tools is not working well if there is no terminating LF at 
the end
of a page  of text.  (i.e. some editors will text-based files by adding 
an extra
LF at the end, which can cause problems with config files in some cases.


>
>>> I'm afraid you've gone off the deep end here.
>> I didn't bring up POSIX, Eric did.
>
> Eric's comments didn't incorporate conspiracy theories about corporate 
> payoffs; yours did.
---
   I am stating facts.  The ones who had the most influence on posix in 
the past
were the largest "gold sponsors".  Now, it's fewer of them and more 
'silver'....
but they, historically have had the most influence on such standards 
organizations.

   I will remind you that POSIX described its initial mission statement as
"descriptive" -- not "prescriptive".  That changed ~ 2003 or so when 
they started
telling implementors what they had to remove to be posix compliant.
The worst violation I can think of is removing the ability for rm to be used
easily and safely to remove everything under a specific directory:
"rm -fr --one-file-system ." -- It might be good to have a 1 char name for
that.  For some reason I remember "-x" being a reasonable choice.

"rm" was always described to do a depth-first traversal, which means it 
shouldn't
even look at top-paths except to descend into them.That was changed
making coreutils rm's that follow that standard, unreliable for removing 
dir contents (w/o removing the dir).

   I have good reasons -- not conspiracy, but capitalistic reasons for
what I say, and if you don't believe money and capitalism run this country,
I'd have to say it was you, who had gone off the deep end.

But if you had -- I can probably welcome you -- I think I live in the
deep end... ;-)

linda




This bug report was last modified 9 years and 363 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.