GNU bug report logs - #49698
Search for URL containing certain word

Previous Next

Package: grep;

Reported by: Julius Hamilton <julkhami <at> gmail.com>

Date: Thu, 22 Jul 2021 19:41:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Seth David Schoen <schoen <at> loyalty.org>
To: Julius Hamilton <julkhami <at> gmail.com>
Cc: 49698 <at> debbugs.gnu.org
Subject: bug#49698: Search for URL containing certain word
Date: Thu, 22 Jul 2021 16:29:31 -0700
Julius Hamilton writes:

> Hey,
> 
> I'm new to grep so I'd love any tips on how to search for text in the
> following way.
> 
> I'd like to find a certain URL that is somewhere in a large text file. I
> would like to find it by specifying "a URL which contains word X somewhere
> within it", or even "a URL which is located within 3 lines of the word X".
> 
> I'd like to copy that URL and then write it to the top of the file.
> 
> I am considering doing this with Vim search commands, yet the underlying
> regex would be the same, so I think this would be a good place to ask.
> 
> How would you do this with grep? Or a similar tool?

Hi Julius,

I'm not sure this is quite what the grep bug interface is intended for. :-)

egrep -C 3 X largefile | egrep -o "$URL_REGEX"

where URL_REGEX is a regular expression matching URLs with any
particular level of specificity that you want, with a very simple case
being something like

    https?://[^, ]+

As we might have recently discussed on help-bash (?), Unix doesn't have
a super-nice built-in notion of "writing to the top of a file" and you
would normally need to write the matches, followed by the original file,
to a temporary file.  Something like

set -e
temp=$(mktemp)
egrep -C 3 X largefile | egrep -o "$URL_REGEX" > $temp
cat largefile >> $temp
mv $temp largefile




This bug report was last modified 3 years and 341 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.