From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 02 11:34:11 2016 Received: (at submit) by debbugs.gnu.org; 2 Nov 2016 15:34:11 +0000 Received: from localhost ([127.0.0.1]:40734 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1xYI-00065Z-Vc for submit@debbugs.gnu.org; Wed, 02 Nov 2016 11:34:11 -0400 Received: from eggs.gnu.org ([208.118.235.92]:39844) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1wvo-000566-KE for submit@debbugs.gnu.org; Wed, 02 Nov 2016 10:54:24 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1wvi-0003va-OR for submit@debbugs.gnu.org; Wed, 02 Nov 2016 10:54:19 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: * X-Spam-Status: No, score=1.1 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,HTML_OBFUSCATE_05_10,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:58947) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c1wvi-0003vU-Kt for submit@debbugs.gnu.org; Wed, 02 Nov 2016 10:54:18 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59043) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1wvh-0004H9-Ba for bug-grep@gnu.org; Wed, 02 Nov 2016 10:54:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1wvd-0003u0-Ej for bug-grep@gnu.org; Wed, 02 Nov 2016 10:54:17 -0400 Received: from mail-wm0-x22f.google.com ([2a00:1450:400c:c09::22f]:35593) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c1wvd-0003tU-8N for bug-grep@gnu.org; Wed, 02 Nov 2016 10:54:13 -0400 Received: by mail-wm0-x22f.google.com with SMTP id a197so146022079wmd.0 for ; Wed, 02 Nov 2016 07:54:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=to:from:subject:message-id:date:user-agent:mime-version; bh=pgecRdnKRDQaC4r6/IRpVOS7EzcDenzdArD0ZTdupZE=; b=ZT6vz1Lr7fcKb7BKyAGMW1u7SD1RA7xtCfRp22VArcLdqNvXmxnZbjUtvgpDGUGJJ0 F9HjnSEWel5plX4lDY2cMeVKG3gKQbCZHM92ojVFhmNJ5Rzkcs53lDouCkyGSph2zKr3 xpqt3dektFySprjRHdsXS0jkig/3v0GYOk7Qr+JK8tk/9KZ44J2RqhXQyQ+4emTlQBkK NxMMvzJgF6LgQe4xCGK3OQ5+SvMKpNVOptoAgaEh/DbiyTrOsBuXFWMQD3MIX83MahrB VV8FsCXUTbPU03gAPGf+EztEvGhBfo167/fWT1lsn0nCoczgkrqxeg8U7KxSYt05SciH 8fzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:to:from:subject:message-id:date:user-agent :mime-version; bh=pgecRdnKRDQaC4r6/IRpVOS7EzcDenzdArD0ZTdupZE=; b=IwRcrazC6iI8SyrsZJfHX4JpMCk7ALG87bv76Mcf/87TTE6yA7yRQJOG9PZiuMuNWT qxrm9CTyRpYZ/03gh13gQNnvlxrMy6aSnxHQ+1A451XlxWIbcxBokVxZrjE+JfGpWR5O F5ndsFriuk/4/pPrJvqQ4vuaLWiU3/ND/UBZKrjCKjbfBNST9Ux28q9fwSAUAH6V5VpW Ziqhi/CjxouiTiQsPnei4kHVX18MWCdjCKy8UIt4MxedKEweAjYLd/x5QQO/6+waeRpW 9u7uI7wALza3WsIfMkw+cqGn30kzDrmpt8fXRoJGh3TDokUc8G7MCYBgo/ymxOvvMqFf c9og== X-Gm-Message-State: ABUngvdK/C7uIOoY1Z1QgckLlVDBFk8B2xrRUg/XGcwjtFipsj0y0pZspYAgtPS6T0RQPg== X-Received: by 10.194.200.39 with SMTP id jp7mr4225919wjc.64.1478098451185; Wed, 02 Nov 2016 07:54:11 -0700 (PDT) Received: from [10.0.12.46] (nat3.di.unito.it. [130.192.156.246]) by smtp.gmail.com with ESMTPSA id k17sm36854319wmd.8.2016.11.02.07.54.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Nov 2016 07:54:10 -0700 (PDT) To: bug-grep@gnu.org From: Greta Subject: URGENT: Question about grep Message-ID: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> Date: Wed, 2 Nov 2016 15:53:45 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="------------1D8C4938DF3031D84F1A5CCF" X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 02 Nov 2016 11:34:09 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) This is a multi-part message in MIME format. --------------1D8C4938DF3031D84F1A5CCF Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Dear grep developer, I am Greta Romano and I need your help as soon as possibile. I want to use grep command to search a string of 6 characters in every line of a file (biological file with DNA nucleotide). The problem is that I need to search these 6 characters in the first 30 characters of each line. I report you an example: String to search: GTGTCA File: >HWI-ST740:1:C2GCJACXX:1:1101:1279:1825 1:N:0: _/NGACGCTCTGACCTTGGGGCTGGTCGGGG/__A_TGCTGAGGAGACGGTGACCAGGGTTCCCTGGCCCCACANNNCCAAGCTTCCNNNNNNNNNNNNNNNNNNN >HWI-ST740:1:C2GCJACXX:1:1101:1349:1847 1:N:0: _/NTTAGATGAGGGAAACATCTGCATCAAGTT/__G_TTTATCTGTGACAACAAGTGTTGTTCCACTGCCAAAGAGTTTCTTATAATAAAACAATCGGGGTGGCACNNNNN I want that the research is done only in the underline characters. So what I have to add in grep command to put the limit of 30 characters? Thank you very much Best regards Dr. Greta Romano --------------1D8C4938DF3031D84F1A5CCF Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit

Dear grep developer,

I am Greta Romano and I need your help as soon as possibile.

I want to use grep command to search a string of 6 characters in every line of a file (biological file with DNA nucleotide).

The problem is that I need to search these 6 characters in the first 30 characters of each line. I report you an example:

String to search: GTGTCA

File:

>HWI-ST740:1:C2GCJACXX:1:1101:1279:1825 1:N:0:
NGACGCTCTGACCTTGGGGCTGGTCGGGGATGCTGAGGAGACGGTGACCAGGGTTCCCTGGCCCCACANNNCCAAGCTTCCNNNNNNNNNNNNNNNNNNN
>HWI-ST740:1:C2GCJACXX:1:1101:1349:1847 1:N:0:
NTTAGATGAGGGAAACATCTGCATCAAGTTGTTTATCTGTGACAACAAGTGTTGTTCCACTGCCAAAGAGTTTCTTATAATAAAACAATCGGGGTGGCACNNNNN

I want that the research is done only in the underline characters.  So what I have to add in grep command to put the limit of 30 characters?

Thank you very much

Best regards

Dr. Greta Romano

--------------1D8C4938DF3031D84F1A5CCF-- From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 02 11:49:39 2016 Received: (at control) by debbugs.gnu.org; 2 Nov 2016 15:49:39 +0000 Received: from localhost ([127.0.0.1]:40764 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1xnH-0008OB-HX for submit@debbugs.gnu.org; Wed, 02 Nov 2016 11:49:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53794) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1xnF-0008Np-FD; Wed, 02 Nov 2016 11:49:38 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BE7DA31B331; Wed, 2 Nov 2016 15:49:31 +0000 (UTC) Received: from [10.3.116.16] (ovpn-116-16.phx2.redhat.com [10.3.116.16]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA2FnVv5031022; Wed, 2 Nov 2016 11:49:31 -0400 Subject: Re: bug#24858: URGENT: Question about grep To: Greta , 24858-done@debbugs.gnu.org References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <8adbf123-5013-9c39-473e-e13eb307d7c5@redhat.com> Date: Wed, 2 Nov 2016 10:49:30 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="oOVD7AOJF1UUhR3meE9vjTOWRrw63Rch3" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 02 Nov 2016 15:49:31 +0000 (UTC) X-Spam-Score: -7.7 (-------) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -7.7 (-------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --oOVD7AOJF1UUhR3meE9vjTOWRrw63Rch3 Content-Type: multipart/mixed; boundary="rFdNAJsqJU6o6DGT9XrOhfRngFIB0ApsG"; protected-headers="v1" From: Eric Blake To: Greta , 24858-done@debbugs.gnu.org Message-ID: <8adbf123-5013-9c39-473e-e13eb307d7c5@redhat.com> Subject: Re: bug#24858: URGENT: Question about grep References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> In-Reply-To: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> --rFdNAJsqJU6o6DGT9XrOhfRngFIB0ApsG Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable tag 24858 notabug thanks On 11/02/2016 09:53 AM, Greta wrote: > String to search: GTGTCA >=20 > File: >=20 >>HWI-ST740:1:C2GCJACXX:1:1101:1279:1825 1:N:0: > _/NGACGCTCTGACCTTGGGGCTGGTCGGGG/__A_TGCTGAGGAGACGGTGACCAGGGTTCCCTGGCCCC= ACANNNCCAAGCTTCCNNNNNNNNNNNNNNNNNNN >=20 >>HWI-ST740:1:C2GCJACXX:1:1101:1349:1847 1:N:0: > _/NTTAGATGAGGGAAACATCTGCATCAAGTT/__G_TTTATCTGTGACAACAAGTGTTGTTCCACTGCCA= AAGAGTTTCTTATAATAAAACAATCGGGGTGGCACNNNNN >=20 >=20 > I want that the research is done only in the underline characters. Underlining doesn't show up in plain text mail (and we prefer plain text over html bloat on the mailing list). But I think your point still made it across > So > what I have to add in grep command to put the limit of 30 characters? You can't do it with grep. But you can do it with sed or awk. Use the right tool for the job at hand :) Let's strip your example down to a smaller test case: I want to search for a one-byte string '1', but only in the first 3 bytes of a file. With grep, it is not possible; the pattern matches anywhere in the line: $ printf '012000001\n345000001\n' | grep 1 012000001 345000001 But with sed, we can copy the entire line to hold space, truncate the line in pattern space, then do the search; if successful, print the line stored in hold space: $ printf '012000001\n345000001\n' | \ sed -n 'h; s/^\(.\{3\}\).*/\1/; /1/ { x;p }' 012000001 And I'll leave the awk program as an exercise for the reader. Therefore, I'm tagging this as not a bug. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --rFdNAJsqJU6o6DGT9XrOhfRngFIB0ApsG-- --oOVD7AOJF1UUhR3meE9vjTOWRrw63Rch3 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJYGgsKAAoJEKeha0olJ0Nqpd0H/38veMecEXLbYBtB9NVQC57T EVLdQYQlC4gGcdn3A26VNsEk5ZfkV7uBkn5j1RYirX9wX4v1qcPYz0GhQOPyo9Ad aKjwBnVNKavW5lbQVZFlZ7gw5M+jKWPM0mUQTqap5Dc6lADwLK5SzWgGLeYNdVbn IProZb7mSPmpSxw18JtPf2UxN+7TWiCUnxtq03IO3Hz18vR7l/U74ocipSXXGKPK 3HYADInUD8TwT+zp3tEXZj/uJymHO5wAZ9pMxSl4SDBecfcJeS2ttQF8+tlMKrbC iwsPDq4UYb/7r9vefrsU12DgC+Q6ZkZbQC/A8LeVlpuqYIc5NyFa7UdV2eJgiCE= =5t3V -----END PGP SIGNATURE----- --oOVD7AOJF1UUhR3meE9vjTOWRrw63Rch3-- From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 02 11:52:50 2016 Received: (at 24858) by debbugs.gnu.org; 2 Nov 2016 15:52:50 +0000 Received: from localhost ([127.0.0.1]:40782 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1xqM-0008Th-7Z for submit@debbugs.gnu.org; Wed, 02 Nov 2016 11:52:50 -0400 Received: from mail-wm0-f44.google.com ([74.125.82.44]:34908) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1xqK-0008TU-Nk for 24858@debbugs.gnu.org; Wed, 02 Nov 2016 11:52:49 -0400 Received: by mail-wm0-f44.google.com with SMTP id a197so149527840wmd.0 for <24858@debbugs.gnu.org>; Wed, 02 Nov 2016 08:52:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=VWQrDjBFhJ3XNOW653lBXpkGBlegWqabOOVaZGU+dN0=; b=DSZuwy3FHX9O28X94g52lscH9nc/WcEJC8697Ikj7QpFb2sE6SnN2kbhR5vxMvUMUr DUHjGR/YcWfOPJ9msnRm78AHQ08fgDySiVxhggSOz3LTthQD0l7WJTAFKTJCS6S5vyg9 NNoYxrTSf6cvsqEg04dmsQOgpLgDc8/m8T+oxFCo/r2G5jM1STL2T9f2zChyDTt6MHSZ K956r3TzD0nOJxXDuWRD1SZUa90xET69vdk/Z6V3tGmpBxKdILneTTSpfilD6X1ZRlHw ySZ79m/EyVE3geRppuxqiv1TzLUMFp7bpHhaTYGKRqU4x1dtu4h7ztxbp4BMLz8bbSX3 ALBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=VWQrDjBFhJ3XNOW653lBXpkGBlegWqabOOVaZGU+dN0=; b=gLJxV5FY5U+eEyGjfefodr5YXh/n0L+iMXVU9QtHyJ/WGpv8xetTeIs8fk0OP3dUTU qDZQ7ipElNBVTmkrIO3OVSbLYQwYJgDxBfN9HEdcEuRSemo1DcpzMY2uXkTwrc/7iIea 8wBi7y1aK7n3SJkiyHXG93AkKpgxEC60bqqy9gj74irNM7lZn/qL13mTm76kqTIUkgWs UX2mZA/z63I2m5NZqppfT+8NUNKHBBV6dtnxrawPrghrNFPCQH5fh3MFe1GBbVrzghen tCTR96rI6Q5D64DyAKbuR92gZwCz53ne+vu1SFqKd5Pv53ol67rT/q1a96ic2EAVdZ2/ vc7w== X-Gm-Message-State: ABUngvf8x+uDiImYk3g/lR74JXBGxYW0P8pwccvQK0kArytwO5MV84fwoCQQLw4p2qiycQ== X-Received: by 10.194.164.202 with SMTP id ys10mr438883wjb.58.1478101962792; Wed, 02 Nov 2016 08:52:42 -0700 (PDT) Received: from [192.168.0.76] (cpe-70-123-244-133.satx.res.rr.com. [70.123.244.133]) by smtp.gmail.com with ESMTPSA id 71sm37041877wmo.7.2016.11.02.08.52.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 02 Nov 2016 08:52:42 -0700 (PDT) Subject: Re: bug#24858: URGENT: Question about grep To: Greta , 24858@debbugs.gnu.org References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> From: Bruce Dubbs Message-ID: <581A0BC7.9060702@gmail.com> Date: Wed, 2 Nov 2016 10:52:39 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0 SeaMonkey/2.39 MIME-Version: 1.0 In-Reply-To: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 24858 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) Greta wrote: > Dear grep developer, > > I am Greta Romano and I need your help as soon as possibile. > > I want to use grep command to search a string of 6 characters in every > line of a file (biological file with DNA nucleotide). > > The problem is that I need to search these 6 characters in the first 30 > characters of each line. I report you an example: > > String to search: GTGTCA > > File: > > >HWI-ST740:1:C2GCJACXX:1:1101:1279:1825 1:N:0: > _/NGACGCTCTGACCTTGGGGCTGGTCGGGG/__A_TGCTGAGGAGACGGTGACCAGGGTTCCCTGGCCCCACANNNCCAAGCTTCCNNNNNNNNNNNNNNNNNNN > > >HWI-ST740:1:C2GCJACXX:1:1101:1349:1847 1:N:0: > _/NTTAGATGAGGGAAACATCTGCATCAAGTT/__G_TTTATCTGTGACAACAAGTGTTGTTCCACTGCCAAAGAGTTTCTTATAATAAAACAATCGGGGTGGCACNNNNN > > > I want that the research is done only in the underline characters. So > what I have to add in grep command to put the limit of 30 characters? cut -c 30 filename | grep ACGTAC -- Bruce From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 02 13:00:30 2016 Received: (at 24858) by debbugs.gnu.org; 2 Nov 2016 17:00:30 +0000 Received: from localhost ([127.0.0.1]:40814 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1ytp-0003Ln-TV for submit@debbugs.gnu.org; Wed, 02 Nov 2016 13:00:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41110) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1yto-0003LW-9D for 24858@debbugs.gnu.org; Wed, 02 Nov 2016 13:00:28 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 77A1261BBF; Wed, 2 Nov 2016 17:00:22 +0000 (UTC) Received: from [10.3.116.16] (ovpn-116-16.phx2.redhat.com [10.3.116.16]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA2H0LNC020925; Wed, 2 Nov 2016 13:00:22 -0400 Subject: Re: bug#24858: URGENT: Question about grep To: Bruce Dubbs , Greta , 24858@debbugs.gnu.org References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> <581A0BC7.9060702@gmail.com> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: <05fb1793-694c-89ba-9245-8b2a4d473a23@redhat.com> Date: Wed, 2 Nov 2016 12:00:21 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <581A0BC7.9060702@gmail.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="8l9UkXdW8FRTUHU5Kk8NfgGRDJJp5SkxG" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 02 Nov 2016 17:00:22 +0000 (UTC) X-Spam-Score: -7.7 (-------) X-Debbugs-Envelope-To: 24858 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -7.7 (-------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --8l9UkXdW8FRTUHU5Kk8NfgGRDJJp5SkxG Content-Type: multipart/mixed; boundary="dHbqkGUtCjaPahKhVWFJbVaf3FmTvjhBf"; protected-headers="v1" From: Eric Blake To: Bruce Dubbs , Greta , 24858@debbugs.gnu.org Message-ID: <05fb1793-694c-89ba-9245-8b2a4d473a23@redhat.com> Subject: Re: bug#24858: URGENT: Question about grep References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> <581A0BC7.9060702@gmail.com> In-Reply-To: <581A0BC7.9060702@gmail.com> --dHbqkGUtCjaPahKhVWFJbVaf3FmTvjhBf Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/02/2016 10:52 AM, Bruce Dubbs wrote: >> >> I want that the research is done only in the underline characters. So= >> what I have to add in grep command to put the limit of 30 characters? >=20 > cut -c 30 filename | grep ACGTAC That works if you are only interested in seeing the first 30 characters of a given line, rather than printing the entire line when the match was only within the first 30 characters. If you need to map back to the entire line, you can use some sort of decorate-search-undecorate algorithm to keep the search portion still under grep, but at that point, it's probably easier to just write it all in a language that can do it in a single pass. I guess I should also mention that if you know your lines are a fixed width (say for example that every line is exactly 80 characters), then you can exploit that using just grep to find a match only in the first 30 characters by explicitly spelling out the fixed-width remainder of the line as an anchor: grep 'ACGTAC.*.\{50\}$' filename Sadly, the two example lines you printed were not the same length, so I don't think it helps for your case. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --dHbqkGUtCjaPahKhVWFJbVaf3FmTvjhBf-- --8l9UkXdW8FRTUHU5Kk8NfgGRDJJp5SkxG Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJYGhulAAoJEKeha0olJ0NqiMsH/16bcTO8RwCbGmBEZ/15qsBN ZdvG0a26cw1Dv29TqfUL3/Tm/JT0c0SaIOlZpM5TLlPqI98KKuy8HhTneILpYgaY X1esWQk4LyI1n+myT9Aobvmt21NatgynwWG8H7DhCFfBeFNBShUMFAarKdPkkIUA qUgnN/KSykilRzu2+/vyK+8TNV3pKk8amEpzqOGgMs8sQDHXWP2b6Zh5hVs/NcER iJv8oyUap6+HuPbPMTj9H2rHAZJC/sX0zkEE95D8a/pSzAq3eRAOCrzL4ierO1Vw 38t08T0i095QCVovBPq67GrRGpRB9nGJXoIPLHuXKrQE6WFu6D0/wDKOlhVeVC8= =w195 -----END PGP SIGNATURE----- --8l9UkXdW8FRTUHU5Kk8NfgGRDJJp5SkxG-- From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 02 13:25:11 2016 Received: (at submit) by debbugs.gnu.org; 2 Nov 2016 17:25:11 +0000 Received: from localhost ([127.0.0.1]:40825 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1zHj-0003yq-9F for submit@debbugs.gnu.org; Wed, 02 Nov 2016 13:25:11 -0400 Received: from eggs.gnu.org ([208.118.235.92]:54639) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1zHh-0003yd-AL for submit@debbugs.gnu.org; Wed, 02 Nov 2016 13:25:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1zHb-0003r0-Bd for submit@debbugs.gnu.org; Wed, 02 Nov 2016 13:25:04 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45755) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c1zHb-0003qw-7x for submit@debbugs.gnu.org; Wed, 02 Nov 2016 13:25:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45591) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c1zHa-0005nJ-56 for bug-grep@gnu.org; Wed, 02 Nov 2016 13:25:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c1zHX-0003pO-1k for bug-grep@gnu.org; Wed, 02 Nov 2016 13:25:02 -0400 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:54446) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c1zHW-0003ot-U6 for bug-grep@gnu.org; Wed, 02 Nov 2016 13:24:58 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 3C08520831 for ; Wed, 2 Nov 2016 13:24:58 -0400 (EDT) Received: from web3 ([10.202.2.213]) by compute1.internal (MEProxy); Wed, 02 Nov 2016 13:24:58 -0400 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=x-me-sender:message-id:from:to :mime-version:content-transfer-encoding:content-type:date :in-reply-to:subject:references; s=smtpout; bh=JASO+uH7k7gkKJfeG oYdsDnreF8=; b=EvqmKVC62V1+5KqHlVQQ4j4udAktSYra1HLzhESR8OHqIG0Ie y/U6pui9POShr71BBd4gdqxsilucXQ/jWKlJPqdKPmjZhlnXnS7+QkV+tPFBjUJq GGZ8mOGuctoXgD4LtNDe/GWjkviAUpRjDI2I/yfMdbKgwgoDhS/Ja+D90U= X-ME-Sender: Received: by mailuser.nyi.internal (Postfix, from userid 99) id 159989F563; Wed, 2 Nov 2016 13:24:58 -0400 (EDT) Message-Id: <1478107498.2112544.775230577.1A950FE2@webmail.messagingengine.com> From: Paul Jackson To: bug-grep@gnu.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-037c6db9 Date: Wed, 02 Nov 2016 12:24:58 -0500 In-Reply-To: <8adbf123-5013-9c39-473e-e13eb307d7c5@redhat.com> Subject: Re: bug#24858: URGENT: Question about grep References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> <8adbf123-5013-9c39-473e-e13eb307d7c5@redhat.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.4 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.4 (----) Greta asked: >> So what I have to add in grep command to put the limit of 30 characters? Eric replied: >> You can't do it with grep. Bruce suggested: >> cut -c 30 filename | grep ACGTAC Using the following grep command seems to work for me, and is about 40% faster, in terms of user CPU time spent, on my system, using a large dataset I have (some web server logs) than using cut and grep in a pipeline, as the extra CPU cost of the more complex grep expression is more than compensated for by the reduced copying of the datastream: grep -E '^.{0,30}GTGTCA === A custom C program could make this dramatically faster, especially if: it avoided using stdio or any other form of line buffering that copied each line of data within the application, it used raw read(2) calls, it used strchr(3) calls to scan to the end of the current line (hence the start of the next line), and it used a mix of strchr and unaligned word compares, say of the 4 bytes "ACGT", then the 2 bytes "AC", which can be done on CPU's supporting unaligned word compares. Finding a programmer who can code that might be difficult, and such optimization would only make sense if you're burning lots of CPU time or project time, on this particular scan. -- Paul Jackson pj@usa.net From debbugs-submit-bounces@debbugs.gnu.org Wed Nov 02 13:29:25 2016 Received: (at 24858) by debbugs.gnu.org; 2 Nov 2016 17:29:25 +0000 Received: from localhost ([127.0.0.1]:40829 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1zLo-00044Z-Sb for submit@debbugs.gnu.org; Wed, 02 Nov 2016 13:29:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39734) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c1zLm-00044M-Rx for 24858@debbugs.gnu.org; Wed, 02 Nov 2016 13:29:23 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 38F63485; Wed, 2 Nov 2016 17:29:16 +0000 (UTC) Received: from [10.3.116.16] (ovpn-116-16.phx2.redhat.com [10.3.116.16]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uA2HTFqq002470; Wed, 2 Nov 2016 13:29:15 -0400 Subject: Re: bug#24858: URGENT: Question about grep To: Paul Jackson , 24858@debbugs.gnu.org References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> <8adbf123-5013-9c39-473e-e13eb307d7c5@redhat.com> <1478107498.2112544.775230577.1A950FE2@webmail.messagingengine.com> From: Eric Blake Openpgp: url=http://people.redhat.com/eblake/eblake.gpg Organization: Red Hat, Inc. Message-ID: Date: Wed, 2 Nov 2016 12:29:15 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <1478107498.2112544.775230577.1A950FE2@webmail.messagingengine.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="t0pevB6w3SFmmxsnopr3kfN7HtSFbqUUF" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 02 Nov 2016 17:29:16 +0000 (UTC) X-Spam-Score: -7.7 (-------) X-Debbugs-Envelope-To: 24858 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -7.7 (-------) This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --t0pevB6w3SFmmxsnopr3kfN7HtSFbqUUF Content-Type: multipart/mixed; boundary="jLveVLVai5OFKoRl2Xwilk3Ik4Q0eJVb5"; protected-headers="v1" From: Eric Blake To: Paul Jackson , 24858@debbugs.gnu.org Message-ID: Subject: Re: bug#24858: URGENT: Question about grep References: <878a8b88-63c9-ada2-bb6b-3dbc617503bb@gmail.com> <8adbf123-5013-9c39-473e-e13eb307d7c5@redhat.com> <1478107498.2112544.775230577.1A950FE2@webmail.messagingengine.com> In-Reply-To: <1478107498.2112544.775230577.1A950FE2@webmail.messagingengine.com> --jLveVLVai5OFKoRl2Xwilk3Ik4Q0eJVb5 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/02/2016 12:24 PM, Paul Jackson wrote: > Greta asked: >>> So what I have to add in grep command to put the limit of 30 characte= rs? >=20 > Eric replied: >>> You can't do it with grep.=20 >=20 > Bruce suggested: >>> cut -c 30 filename | grep ACGTAC >=20 > Using the following grep command seems to work for me, and is about > 40% faster, in terms of user CPU time spent, on my system, using a larg= e > dataset I have (some web server logs) than using cut and grep in a pip= eline, > as the extra CPU cost of the more complex grep expression is more than > compensated for by the reduced copying of the datastream: >=20 > grep -E '^.{0,30}GTGTCA That searches up to 36 characters. If you want to limit it to just the first 30, you need '^.{0,24}GTGTCA', since the match will never occur later than the 24th character of the first 30. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --jLveVLVai5OFKoRl2Xwilk3Ik4Q0eJVb5-- --t0pevB6w3SFmmxsnopr3kfN7HtSFbqUUF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJYGiJrAAoJEKeha0olJ0NqtSUH+wdlMaPzEDW6G9N7y+BS4d5O sScJZHu9txOUpvl1W3OoR0vWtpz9Rs+FMXel1JEZepUUBNNWMBIuealJOnOtKrjV ZdpejH1RE3PvmqAmul4+9CicRYqEpgWyP2hQ7XFkEAVvAK+Ha6zBquA4EqdCwbWF UHZvIE15f4qmq+GAPwxsskF+F4o+f26mIl0WNo1v+uQ/RqT8JHUW9UsEpWDSGHdc SLEuxAB+DecPvIm8+/e89hFgiQj7qXaYAwCSW/uq1dnwfzBqlcr3LKEJPkgCCTiC wdbcTB53WInedMExJ06Ad6dyKQ/XJYquXvKOOCx636fJBuKV2XERhdMyQ7bcGvc= =WRrC -----END PGP SIGNATURE----- --t0pevB6w3SFmmxsnopr3kfN7HtSFbqUUF-- From unknown Sun Aug 10 16:47:31 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 01 Dec 2016 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator