GNU bug report logs - #70511
Option to grep into compressed files

Previous Next

Package: grep;

Reported by: Mary <marycada <at> proton.me>

Date: Mon, 22 Apr 2024 06:52:04 UTC

Severity: normal

To reply to this bug, email your comments to 70511 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Mon, 22 Apr 2024 06:52:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mary <marycada <at> proton.me>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 22 Apr 2024 06:52:05 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mary <marycada <at> proton.me>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Option to grep into compressed files
Date: Sun, 21 Apr 2024 16:46:17 +0000
Hello,

I added an option to grep that filters files through a specified program. The main purpose for that is to uncompress files using the zcat (or `gzip -d`) command, or an equivalent for another compression format.

It works like this:

    grep -j zcat pattern textfile.gz [textfile2.gz...]

(I chose `-j` for no particular reason. Any unused letter could go there.)

This will spawn a shell and execute the given command (zcat), which will receive each file through stdin and its stdout will be used in lieu of the file. Any valid shell command can be used instead of zcat.

This is better than the zgrep commands provided by the gzip, bzip2 and xz projects, because it supports all of the options, including `-r`.

It can also be used with arbitrary commands, like less popular compression algorithms or even commands unrelated to compression. I read at https://www.gnu.org/software/grep/devel.html that there were plans to add `-Z` and `-J` options for gzip and bzip2; my implementation can support any algorithm.

The problem I see with that though is that it would add a shell command option to grep. This is longer to type our than `-Z` or `-J`, but it also provides a shell access to anybody who can control what options grep receives. In practice, I'm not sure how serious that is, but I thought it would be useful to point it out.

I have a patch that I can send. I believe my patch is trivial, the only part that's longer than 3 lines is a simple fork-exec pattern (40 lines including whitespace, but they're the same you've seen in plenty of other programs). I know there are specific requirements regarding copyright, and I don't want to cause problems about that (I can't sign the FSF's documents). May I send my patch?

-- Mary

PS- This is my first time using a mailing list, please let me know if I'm doing something wrong!




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Mon, 22 Apr 2024 07:48:01 GMT) Full text and rfc822 format available.

Message #8 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Mary <marycada <at> proton.me>
Cc: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Mon, 22 Apr 2024 00:46:38 -0700
Thanks for the suggestion. You're right, this would be better than zgrep 
etc.

I have some qualms though, as the new option would increase the attack 
surface for 'grep', in that you could then execute arbitrary code by 
passing certain options to 'grep'. Is there some safer way to get what 
you want?




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Mon, 22 Apr 2024 14:02:02 GMT) Full text and rfc822 format available.

Message #11 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: "David G. Pickett" <dgpickett <at> aol.com>
To: Mary <marycada <at> proton.me>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "70511 <at> debbugs.gnu.org" <70511 <at> debbugs.gnu.org>
Subject: Re: bug#70511: Option to grep into compressed files
Date: Mon, 22 Apr 2024 14:00:32 +0000 (UTC)
[Message part 1 (text/plain, inline)]
 One supposes that if the file extension is not trustworthy, one can taste file like the file command, and use libraries like the gzip libraries to handle gzipped files as a stream.  There are so many others: zip files could be treated like directories and all the files in them that match the glob could be searched, and then there is bzip2, 7zip, ....  It becomes a popularity contest!  One can do all this with shell scripting, and leave poor old grep out of it!
    On Monday, April 22, 2024 at 03:48:14 AM EDT, Paul Eggert <eggert <at> cs.ucla.edu> wrote:  
 
 Thanks for the suggestion. You're right, this would be better than zgrep 
etc.

I have some qualms though, as the new option would increase the attack 
surface for 'grep', in that you could then execute arbitrary code by 
passing certain options to 'grep'. Is there some safer way to get what 
you want?



  
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Tue, 23 Apr 2024 15:34:11 GMT) Full text and rfc822 format available.

Message #14 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: Mary <marycada <at> proton.me>
To: "David G. Pickett" <dgpickett <at> aol.com>,
 "eggert <at> cs.ucla.edu" <eggert <at> cs.ucla.edu>,
 "70511 <at> debbugs.gnu.org" <70511 <at> debbugs.gnu.org>
Subject: Re: bug#70511: Option to grep into compressed files
Date: Tue, 23 Apr 2024 15:21:15 +0000
> Thanks for the suggestion. You're right, this would be better than zgrep
> etc.
> 
> I have some qualms though, as the new option would increase the attack
> surface for 'grep', in that you could then execute arbitrary code by
> passing certain options to 'grep'. Is there some safer way to get what
> you want?


There is still the possibility of including the respective compression libraries directly in grep and using the `-Z` and `-J` as proposed, but this wouldn't allow to use less popular compression algorithms.

One possibility, but I'm not sure what it's worth, would be to give grep a special arg0 to enable shell commands, like `jgrep zcat pattern123 file.gz`. But I'm not sure if it's worth the trouble.


> One supposes that if the file extension is not trustworthy, one can taste file like the file command, and use libraries like the gzip libraries to handle gzipped files as a stream.  There are so many others: zip files could be treated like directories and all the files in them that match the glob could be searched, and then there is bzip2, 7zip, ....  It becomes a popularity contest!  One can do all this with shell scripting, and leave poor old grep out of it!


The reason why I wanted to do this in grep directly is because it's difficult to implement this with shell scripting. I noticed that neither zgrep, bzgrep nor xzgrep support the `-r` option, among others, presumably because it's too difficult to implement in a portable way.

I made my patch use a shell command specifically to provide maximum flexibility with minimum maintenance cost. But it does open the door to security risks, so I understand if it's not worth adding to grep.




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Tue, 23 Apr 2024 17:08:10 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dennis Clarke <dclarke <at> blastwave.org>
To: bug-grep <at> gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Tue, 23 Apr 2024 13:07:10 -0400
On 4/23/24 11:21, Mary via Bug reports for GNU grep wrote:
>> Thanks for the suggestion. You're right, this would be better than zgrep
>> etc.

    What happened to the old UNIX concept of

        Do one thing.
        Do it well.
        Then stop.

    To grep a compressed stream of bits you just pass the decompressed
    bits along a pipe.

    Done.




-- 
--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken





Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Tue, 23 Apr 2024 20:50:13 GMT) Full text and rfc822 format available.

Message #20 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: jackson <at> fastmail.com
To: "Paul Eggert" <eggert <at> cs.ucla.edu>, Mary <marycada <at> proton.me>
Cc: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Tue, 23 Apr 2024 15:46:01 -0500
Paul Eggert wrote:
> I have some qualms though, as the new option would increase the attack 
> surface for 'grep',

Agreed.

Given the recent uproar involving liblzma being linked into ssh in systemd
builds, resulting in a potentially very dangerous ssh compromise ...

... I would think that minimizing the attack surface on common commands
by not linking in non-essential compression libraries would be a no brainer.

-- 
Paul Jackson
  jackson <at> fastmail.fm




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Tue, 23 Apr 2024 22:53:03 GMT) Full text and rfc822 format available.

Message #23 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: "David G. Pickett" <dgpickett <at> aol.com>
To: "eggert <at> cs.ucla.edu" <eggert <at> cs.ucla.edu>, 
 "70511 <at> debbugs.gnu.org" <70511 <at> debbugs.gnu.org>, Mary <marycada <at> proton.me>
Subject: Re: bug#70511: Option to grep into compressed files
Date: Tue, 23 Apr 2024 22:51:45 +0000 (UTC)
[Message part 1 (text/plain, inline)]
 Shell scripting can take file names in from a find or ls with 'while read', or by globbing 'for f in pattern', and examine them one by one, run 'grep -q' to find out if the file or uncompressed stream from that file has a match, and if so 'echo' the file name out, or if you want lines, it can 'while read l' the stream out of grep to prefix each line with a file name in an 'echo'.  It helps to juggle steams not file names, create steams not temp files that have to be cleaned up and create delay.  In bash, sometimes while read gets tricky as the variable(s) are local to the loop, so sometimes a parenthesis wrapper helps.  Both ksh and bash also have the nice '<(command)' feature to turn streams of stdout into input file names, and '>(command)' for output streams to file names.  Bash has so many nice tricks I often google for them, like if recognize pattern.  If you do not trust extensions, you can '$(file filename)' to find out what you have in hand:
$ echo $(file .profile).profile: ASCII textdgp <at> dgp-p6803w:~$ 


    On Tuesday, April 23, 2024 at 11:21:26 AM EDT, Mary <marycada <at> proton.me> wrote:  
 
 > Thanks for the suggestion. You're right, this would be better than zgrep
> etc.
> 
> I have some qualms though, as the new option would increase the attack
> surface for 'grep', in that you could then execute arbitrary code by
> passing certain options to 'grep'. Is there some safer way to get what
> you want?


There is still the possibility of including the respective compression libraries directly in grep and using the `-Z` and `-J` as proposed, but this wouldn't allow to use less popular compression algorithms.

One possibility, but I'm not sure what it's worth, would be to give grep a special arg0 to enable shell commands, like `jgrep zcat pattern123 file.gz`. But I'm not sure if it's worth the trouble.


> One supposes that if the file extension is not trustworthy, one can taste file like the file command, and use libraries like the gzip libraries to handle gzipped files as a stream.  There are so many others: zip files could be treated like directories and all the files in them that match the glob could be searched, and then there is bzip2, 7zip, ....  It becomes a popularity contest!  One can do all this with shell scripting, and leave poor old grep out of it!


The reason why I wanted to do this in grep directly is because it's difficult to implement this with shell scripting. I noticed that neither zgrep, bzgrep nor xzgrep support the `-r` option, among others, presumably because it's too difficult to implement in a portable way.

I made my patch use a shell command specifically to provide maximum flexibility with minimum maintenance cost. But it does open the door to security risks, so I understand if it's not worth adding to grep.
  
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Wed, 24 Apr 2024 06:20:13 GMT) Full text and rfc822 format available.

Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mary <marycada <at> proton.me>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Re: bug#70511: Option to grep into compressed files
Date: Wed, 24 Apr 2024 02:26:03 +0000
> Do you know zgrep from zutils?

TIL! My system does not come with those by default, and instead provides a `zgrep` that is a Bash script supporting only `gzip`.

Are those the generally recommended tools to use? (I'm not sure why `zgrep`/`bzgrep`/`xzgrep` would be provided by their respective projects, given the existence of this project.)

> What happened to the old UNIX concept of
> 
> Do one thing.
> Do it well.
> Then stop.
> 
> To grep a compressed stream of bits you just pass the decompressed
> bits along a pipe.
> 
> Done.

I'm not sure what's the threshold for that principle. GNU grep implements a certain number of options beyond POSIX ones. I decided to send my proposal because I read here: https://www.gnu.org/software/grep/devel.html that GNU grep planned to implement the `-Z` and `-J` options, though I'm not sure if that page is still up-to-date.

As for the piping mechanism, it does work for simple cases, but it doesn't work well with `--recursive`, or `--with-filename` for example. There are ways to work around it with certain shells, but they tend to give long and complex strings. They are generally better suited for ad hoc uses, and it's difficult to make them portable.

> ... I would think that minimizing the attack surface on common commands
> by not linking in non-essential compression libraries would be a no brainer.

I agree with that. I only wanted to make life easier for the maintainers of compression libraries. Perhaps it would be better security-wise to provide the regular grep vanilla, but also provide on the side some "flavored" utilities like `zgrep`/`bzgrep`/`xzgrep` which would be compiled against the relevant libraries?




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Thu, 25 Apr 2024 21:16:12 GMT) Full text and rfc822 format available.

Message #29 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: "Dale R. Worley" <worley <at> alum.mit.edu>
To: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Thu, 25 Apr 2024 17:15:19 -0400
People have mentioned that it's best for a utility "to do one thing
well".  It seems to me that even the existing grep options do three
things (in the complex use cases):

- select a set of files
- uncompress the files (if they're compressed)
- search within the file contents

I am ignoring the case of extracting members from archive files.

It seems to me that if one wanted to do a version of this that isn't
covered by the existing grep options, step 1 can best be done by
"find".  Step 3 can be done by running grep on each set of contents,
with --label=name to get output lines labeled with the original file
name.

What doesn't seem to exist is something that does step 2 in a general
way.  The tool that is needed is something that reads the first few
bytes of a file, determines which compression signature is present if
any, then processes the contents through the correct decompressor.
Ideally, it would be programmable in something like the manner of "file"
so that additional compression formats could be fitted into the
framework, and it could use either a compiled-in decompression library
(like zlib) or call an external decompression program, as necessary.

Actually, I'm asking whether anybody knows whether such a tool exists
already.  It seems like a "natural" facility that somebody would have
thought to write maybe fifteen years ago.

Dale




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Fri, 26 Apr 2024 00:37:04 GMT) Full text and rfc822 format available.

Message #32 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: Seth David Schoen <schoen <at> loyalty.org>
To: "Dale R. Worley" <worley <at> alum.mit.edu>
Cc: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Thu, 25 Apr 2024 17:35:34 -0700
Dale R. Worley writes:

> What doesn't seem to exist is something that does step 2 in a general
> way.  The tool that is needed is something that reads the first few
> bytes of a file, determines which compression signature is present if
> any, then processes the contents through the correct decompressor.
> Ideally, it would be programmable in something like the manner of "file"
> so that additional compression formats could be fitted into the
> framework, and it could use either a compiled-in decompression library
> (like zlib) or call an external decompression program, as necessary.
> 
> Actually, I'm asking whether anybody knows whether such a tool exists
> already.  It seems like a "natural" facility that somebody would have
> thought to write maybe fifteen years ago.

For a while, new options were getting added to GNU tar frequently in order
to allow you to do things like

compress -dc | tar xf -
zcat | tar xf -
bzcat | tar xf -
lzcat | tar xf -

etc., but just using the single tar invocation without (explicitly
running) an external compression program.  The current ones are (in
alphabetical order in the man page, not historical order of when they
were added)

       -j, --bzip2
              Filter the archive through bzip2(1).

       -J, --xz
              Filter the archive through xz(1).

       --lzip Filter the archive through lzip(1).

       --lzma Filter the archive through lzma(1).

       --lzop Filter the archive through lzop(1).

       -z, --gzip, --gunzip, --ungzip
              Filter the archive through gzip(1).

       -Z, --compress, --uncompress
              Filter the archive through compress(1).

       --zstd Filter the archive through zstd(1).

Wow, _eight_ specific forms of compression!  But a newer functionality
in GNU tar is

       -a, --auto-compress
              Use archive suffix to determine the compression program.

and something like that (apparently also looking at the file header)
is now the default.

It's weird to me to imagine having all of that functionality in grep,
but maybe all of the functionality that was put into tar for this could
become a separate standalone program?




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Fri, 26 Apr 2024 14:28:04 GMT) Full text and rfc822 format available.

Message #35 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: "Dale R. Worley" <worley <at> alum.mit.edu>
Cc: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Fri, 26 Apr 2024 16:27:05 +0200
Dale R. Worley wrote:
> What doesn't seem to exist is something that does step 2 in a general
> way.  The tool that is needed is something that reads the first few
> bytes of a file, determines which compression signature is present if
> any, then processes the contents through the correct decompressor.

Such tool[1] does in fact exist since 2009. It is only that it is not yet 
widely known. :-)

[1] http://www.nongnu.org/zutils/manual/zutils_manual.html#Zgrep

Gest regards,
Antonio.




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Fri, 26 Apr 2024 21:15:17 GMT) Full text and rfc822 format available.

Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mary <marycada <at> proton.me>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Re: bug#70511: Option to grep into compressed files
Date: Fri, 26 Apr 2024 18:50:24 +0000
> For a while, new options were getting added to GNU tar frequently in order
> to allow you to do things like
> 
> compress -dc | tar xf -
> zcat | tar xf -
> bzcat | tar xf -
> lzcat | tar xf -
> 
> etc., but just using the single tar invocation without (explicitly
> running) an external compression program. The current ones are (in
> alphabetical order in the man page, not historical order of when they
> were added)
> 
> -j, --bzip2
> Filter the archive through bzip2(1).
> 
> -J, --xz
> Filter the archive through xz(1).
> 
> --lzip Filter the archive through lzip(1).
> 
> --lzma Filter the archive through lzma(1).
> 
> --lzop Filter the archive through lzop(1).
> 
> -z, --gzip, --gunzip, --ungzip
> Filter the archive through gzip(1).
> 
> -Z, --compress, --uncompress
> Filter the archive through compress(1).
> 
> --zstd Filter the archive through zstd(1).
> 
> Wow, eight specific forms of compression! But a newer functionality
> in GNU tar is
> 
> -a, --auto-compress
> Use archive suffix to determine the compression program.
> 
> and something like that (apparently also looking at the file header)
> is now the default.
> 
> It's weird to me to imagine having all of that functionality in grep,
> but maybe all of the functionality that was put into tar for this could
> become a separate standalone program?

GNU tar also supports `-I, --use-compress-program=PROG   filter through PROG (must accept -d)`, which is one of the reasons I thought it would be relevant to add a similar option to grep.




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Sun, 28 Apr 2024 21:27:02 GMT) Full text and rfc822 format available.

Message #41 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: "Dale R. Worley" <worley <at> alum.mit.edu>
To: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Sun, 28 Apr 2024 17:25:36 -0400
Antonio Diaz Diaz <antonio <at> gnu.org> writes:
> Dale R. Worley wrote:
>> What doesn't seem to exist is something that does step 2 in a general
>> way.  The tool that is needed is something that reads the first few
>> bytes of a file, determines which compression signature is present if
>> any, then processes the contents through the correct decompressor.
>
> Such tool[1] does in fact exist since 2009. It is only that it is not yet 
> widely known. :-)
>
> [1] http://www.nongnu.org/zutils/manual/zutils_manual.html#Zgrep

Looking at that page, I think you meant to point to #Zcat.  But yes, it
does seem to do that job.

Mary via Bug reports for GNU grep <bug-grep <at> gnu.org> writes:
> GNU tar also supports `-I, --use-compress-program=PROG   filter
> through PROG (must accept -d)`, which is one of the reasons I thought
> it would be relevant to add a similar option to grep.

So the construction I'm thinking of would be

    grep ... --use-compress-program=zcat ... pattern file ...

except it looks like zcat doesn't accept -d (which would need to be a
no-op for it).

Though it looks like zcat supports five compression techniques and gnu
tar handles eight, so zcat should be expanded there.

Dale




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Tue, 30 Apr 2024 18:27:02 GMT) Full text and rfc822 format available.

Message #44 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: Antonio Diaz Diaz <antonio <at> gnu.org>
To: "Dale R. Worley" <worley <at> alum.mit.edu>
Cc: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Tue, 30 Apr 2024 20:26:57 +0200
Dale R. Worley wrote:
> So the construction I'm thinking of would be
>
>      grep ... --use-compress-program=zcat ... pattern file ...

Ah! interesting.

Zgrep duplicates some of the work of grep. For example it recurses through 
directories, feeds grep one file at a time, and prepends the file name to 
the output of grep if needed.

Delegating the decompression to zcat as you propose would allow the full use 
of grep's features. If it is possible it would be possibly the best option.

> except it looks like zcat doesn't accept -d (which would need to be a
> no-op for it).

Zcat does indeed accept (and ignore) option -d for compatibility with gzip. 
Therefore all that is needed is to implement a way for grep to delegate 
decompression to zcat.

> Though it looks like zcat supports five compression techniques and gnu
> tar handles eight, so zcat should be expanded there.

Zcat also supports the (obsolete) compress format (.Z) through gzip.

Of the other two, lzma should be better removed from tar, and I do not 
remember to have seen any tarball compressed with lzop (maybe because it 
compresses less than gzip).

Best regards,
Antonio.




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Wed, 01 May 2024 19:21:02 GMT) Full text and rfc822 format available.

Message #47 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: "Dale R. Worley" <worley <at> alum.mit.edu>
To: Antonio Diaz Diaz <antonio <at> gnu.org>
Cc: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Wed, 01 May 2024 15:20:07 -0400
Antonio Diaz Diaz <antonio <at> gnu.org> writes:
> Dale R. Worley wrote:
>> So the construction I'm thinking of would be
>>
>>      grep ... --use-compress-program=zcat ... pattern file ...

> Zcat does indeed accept (and ignore) option -d for compatibility with gzip. 
> Therefore all that is needed is to implement a way for grep to delegate 
> decompression to zcat.

> Zcat also supports the (obsolete) compress format (.Z) through gzip.

I missed those facts.  I only skimmed the section of
http://www.nongnu.org/zutils/manual/zutils_manual.html about Zcat and
hadn't read the "Common options" section which makes those clear.  I'll
have to remember that zcat has this nice functionality.

I'm not interested enough in this to implement it, but I'll leave this
one note for anyone who is researching the possibility:  You might be
concerned that starting a e.g. zcat process for every file to scan would
be excessively high overhead.  But many years ago, I modified "tar" to
be able to compress each file individually (rather than the entire
archive collectively).  Each file was processed by a separate gzip
process, and in my usage written to an Exabyte tape.  I was worried that
all these process invocations would slow down the backup, but even on a
low-speed 486, the processes were insignificant.  So I never improved
the implementation to use an internal compression library.  Apparently
once all the needed files are in the buffers, creating yet another
process from them is quick.

Dale




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Sat, 04 May 2024 15:40:02 GMT) Full text and rfc822 format available.

Message #50 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mary <marycada <at> proton.me>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: Re: bug#70511: Option to grep into compressed files
Date: Sat, 04 May 2024 15:38:20 +0000
Dale R. Worley worley <at> alum.mit.edu wrote:
> I missed those facts. I only skimmed the section of
> http://www.nongnu.org/zutils/manual/zutils_manual.html about Zcat and
> hadn't read the "Common options" section which makes those clear. I'll
> have to remember that zcat has this nice functionality.
> 
> I'm not interested enough in this to implement it, but I'll leave this
> one note for anyone who is researching the possibility: You might be
> concerned that starting a e.g. zcat process for every file to scan would
> be excessively high overhead. But many years ago, I modified "tar" to
> be able to compress each file individually (rather than the entire
> archive collectively). Each file was processed by a separate gzip
> process, and in my usage written to an Exabyte tape. I was worried that
> all these process invocations would slow down the backup, but even on a
> low-speed 486, the processes were insignificant. So I never improved
> the implementation to use an internal compression library. Apparently
> once all the needed files are in the buffers, creating yet another
> process from them is quick.
> 
> Dale

I already have a patch that I believe is trivial enough to not cause copyright concerns, would you like me to send it?




Information forwarded to bug-grep <at> gnu.org:
bug#70511; Package grep. (Mon, 06 May 2024 17:58:01 GMT) Full text and rfc822 format available.

Message #53 received at 70511 <at> debbugs.gnu.org (full text, mbox):

From: "Dale R. Worley" <worley <at> alum.mit.edu>
To: Mary <marycada <at> proton.me>
Cc: 70511 <at> debbugs.gnu.org
Subject: Re: bug#70511: Option to grep into compressed files
Date: Mon, 06 May 2024 13:56:34 -0400
Mary via Bug reports for GNU grep <bug-grep <at> gnu.org> writes:
> Dale R. Worley worley <at> alum.mit.edu wrote:
>> [...]

> I already have a patch that I believe is trivial enough to not cause
> copyright concerns, would you like me to send it?

*I* am all in favor of it, but I'm not a grep maintainer!

Dale




This bug report was last modified 1 year and 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.