GNU bug report logs - #32704
Can grep search for a line feed and a null character at the same time?

Previous Next

Package: grep;

Reported by: 21naown <at> gmail.com

Date: Tue, 11 Sep 2018 16:27:01 UTC

Severity: wishlist

Full log


View this message in rfc822 format

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 21naown <at> gmail.com, 32704 <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>,  Paul Eggert <eggert <at> cs.ucla.edu>
Subject: bug#32704: Can grep search for a line feed and a null character at the same time?
Date: Sat, 15 Sep 2018 14:20:40 -0600
Hello,

On 15/09/18 11:57 AM, 21naown <at> gmail.com wrote:
> Le 15/09/2018 à 19:06, Eric Blake a écrit :
>> On 9/15/18 11:43 AM, 21naown <at> gmail.com wrote:
>>> But is it at least possible to find “\x0A\x00” with grep?
>>
>> If you bend the rules by throwing -P into the mix, yes :)
>>
> So it is possible to find “\x0A\x00” alone, but for example 
> “\x74\x00\x0D\x00\x0A\x00\x74\x00\x65\00” is impossible to find with the 
> “-P” option?

If I may suggest a different tool, GNU sed can handle such regexes more 
easily than grep.
The 'trick' is to accumulate multiple lines into memory, then run the
regex on the entire buffer.

1.
If you input is small enough to fit in memory,
you can load the entire file into memory,
and run the regex on the buffer:

$ printf 
'\xFF\xFE\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x5F\x00\x74\x00\x77\x00\x6F\x00\x0D\x00\x0A\x00' 
\
     | LC_ALL=C sed -n 'H;$!d ; x ; /\x0a\x00/q0 ; q1' \
           && echo MATCH || echo NO-MATCH

The "H;$!d" commands accumulate lines into the hold buffer.
The "x" command copies the hold buffer into the pattern buffer.
Then the regex "/\x0a\x00/" searches in the buffer.
If there was a match, sed quits with exit code 0 ("q0").
Otherwise, sed quits with exit code 1 ("q1").


2.
If the file is too big to fit in memory,
you can process it line-by-line like so:

$ printf 
'\xFF\xFE\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x0D\x00\x0A\x00\x74\x00\x65\x00\x73\x00\x74\x00\x5F\x00\x74\x00\x77\x00\x6F\x00\x0D\x00\x0A\x00' 
\
     | LC_ALL=C sed -n 'N;/\x00\x0a/q0;$q1;D;' \
             && echo MATCH || echo NO-MATCH

The N,D commands work in tandem to append the next line into the
buffer, then delete the last line from the buffer (think FIFO).
The regex then operates on the buffer which contains the last two lines.



More details are in the manual:
 https://www.gnu.org/software/sed/manual/sed.html#Multiline-techniques
https://www.gnu.org/software/sed/manual/sed.html#Text-search-across-multiple-lines



regards,
 - assaf





This bug report was last modified 4 years and 328 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.