From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 13 21:41:58 2016 Received: (at submit) by debbugs.gnu.org; 14 Nov 2016 02:41:58 +0000 Received: from localhost ([127.0.0.1]:56243 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c67DZ-0000BN-TL for submit@debbugs.gnu.org; Sun, 13 Nov 2016 21:41:58 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54052) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c67DY-0000BC-U4 for submit@debbugs.gnu.org; Sun, 13 Nov 2016 21:41:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c67DS-00045l-RG for submit@debbugs.gnu.org; Sun, 13 Nov 2016 21:41:51 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: *** X-Spam-Status: No, score=3.3 required=5.0 tests=BAYES_50, RECEIVED_FROM_WINDOWS_HOST autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:39133) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c67DS-00045f-Nj for submit@debbugs.gnu.org; Sun, 13 Nov 2016 21:41:50 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45023) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c67DR-0006wY-Dt for bug-grep@gnu.org; Sun, 13 Nov 2016 21:41:50 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c67DM-00043T-Iv for bug-grep@gnu.org; Sun, 13 Nov 2016 21:41:49 -0500 Received: from mail.spocom.com ([206.63.224.240]:62822) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c67DM-00042q-A2 for bug-grep@gnu.org; Sun, 13 Nov 2016 21:41:44 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; q=dns; d=spocom.com; s=mail; h=received:date:from:to:subject:message-id:mail-followup-to :mime-version:content-type:content-disposition:x-operating-system :user-agent; b=uiBOtea8yOi620O5CSLwkR6FtPNa1p0VBhXMR2TXI+4g+YdVPJHnqE+6UuNc1klr/ dpTNYoblVKIB/mKIebBHw== Received: from localhost (174-31-32-127.spkn.qwest.net [174.31.32.127]) by mail.spocom.com with SMTP; Sun, 13 Nov 2016 17:56:18 -0800 Date: Sun, 13 Nov 2016 17:56:32 -0800 From: Gary Johnson To: bug-grep@gnu.org Subject: Early termination bug in grep 2.26 Message-ID: <20161114015632.GC20504@phoenix> Mail-Followup-To: bug-grep@gnu.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Operating-System: Linux 2.6.32-74-generic GNU/Linux User-Agent: Mutt/1.5.20 (2009-06-14) X-detected-operating-system: by eggs.gnu.org: Windows 7 or 8 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) There was some recent discussion on the vim_dev list of a failure to update a Vim package which was found to be due to an update of grep from 2.25 to 2.26. The details of the grep behavior are discussed here: https://www.linuxquestions.org/questions/slackware-14/pkgtools-grep-bug-in-slackware[64]-current-4175593054/ In short, it seems to be due to the "grep: /dev/null output speedup" commit of 2016-05-01, af6af288eac28951b5eee1eaaf373e22b2193b7b. When grep terminates early, it closes the pipe it's reading stdin from, which terminates the program on the other side of that pipe early, before that program has completed its task. Oops. Regards, Gary From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 14 10:32:18 2016 Received: (at submit) by debbugs.gnu.org; 14 Nov 2016 15:32:18 +0000 Received: from localhost ([127.0.0.1]:57239 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6JF4-0004Rq-EG for submit@debbugs.gnu.org; Mon, 14 Nov 2016 10:32:18 -0500 Received: from eggs.gnu.org ([208.118.235.92]:40771) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6JF3-0004Ra-0W for submit@debbugs.gnu.org; Mon, 14 Nov 2016 10:32:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6JEt-0004sV-TK for submit@debbugs.gnu.org; Mon, 14 Nov 2016 10:32:11 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:36865) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c6JEt-0004sC-PZ for submit@debbugs.gnu.org; Mon, 14 Nov 2016 10:32:07 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59958) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c6JEs-00065n-Nx for bug-grep@gnu.org; Mon, 14 Nov 2016 10:32:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6JEn-0004pQ-1l for bug-grep@gnu.org; Mon, 14 Nov 2016 10:32:06 -0500 Received: from mail-it0-x236.google.com ([2607:f8b0:4001:c0b::236]:38247) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c6JEm-0004oZ-Se for bug-grep@gnu.org; Mon, 14 Nov 2016 10:32:00 -0500 Received: by mail-it0-x236.google.com with SMTP id q124so124910137itd.1 for ; Mon, 14 Nov 2016 07:31:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=35k6L2VYNTDWahA1GCGWvCkdMF2+2P/rz5dBhW32ejY=; b=nhXepuCNSicalnHk3B1A8XVI4joKnM4UDNIhVA8x/EoiMaHcHVoCNbOyAt7Rw51rCR RtUtke5QuALOm5fHDMWsXGr2BO0DHQGVecLUuXyNZ4cqZSmwP5q2wOL7vU4arhmFvIbY NkOPcc57/CUBMEt4sa05xyT/z86qq22UA5vaVjeJRY7mUs4SlJVE1JhzJR1ess2JD3Cq y9jr40d7+Q/XrZf2PAO8WGu9yxy6y7EiJI/OxG+ChGmR09XLoBsMA9E/CY1oDbeT48XC l/0pZtyKf6ZE5rjSoEuFT+042ABCAHdll4ZkMMsgb+EiWJKvR7HyufDi2UsMjrwhnr1g Tw2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=35k6L2VYNTDWahA1GCGWvCkdMF2+2P/rz5dBhW32ejY=; b=eKqcrALa86atFZAjuPEYB05iKo3LOYPQ1ZUIFC7veOlwepr2xygJ5tFYFk3DvGGV1f Gtq/Uy9zvrqz4XZ8yzWSsqmlzItHgY9KHxFh5p90GQG2rewAFyC+ZNSPlNc9ZIcRUkLG zyF2XVn1wxnsfLrfWV7d0lKX9pREFV73tyjSrKhIg2k9NXEquX77MVrgWQfOg/3aIX5x 4/iE6F1Ex/rFgg0c8QbGpWuXqBQVXZ/HUsb6hKjN/RaR4hWvMS3oPmcmX13Sor2yFlb0 5piipXxjqpLxnfqmjQCyG9vKGQKG07461xp/j+FQZO3cl19qZtUlev+En6Cz1E13mtqz CVgQ== X-Gm-Message-State: ABUngveHWvWdFZCEElxW8zoQHflgQ/09r9arBhpD1ct5ebO5p/espBlo3xOPlpl4otAsi3Wm0Ju3CB5YR5FufA== X-Received: by 10.36.70.18 with SMTP id j18mr7198470itb.97.1479137517128; Mon, 14 Nov 2016 07:31:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.141.195 with HTTP; Mon, 14 Nov 2016 07:31:36 -0800 (PST) In-Reply-To: <20161114015632.GC20504@phoenix> References: <20161114015632.GC20504@phoenix> From: Jim Meyering Date: Mon, 14 Nov 2016 07:31:36 -0800 X-Google-Sender-Auth: WLlL9wpiXrnhnwNb-kbjBSpWOxw Message-ID: Subject: Re: bug#24941: Early termination bug in grep 2.26 To: bug-grep@gnu.org, Paul Eggert Content-Type: text/plain; charset=UTF-8 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit Cc: 24941@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) On Sun, Nov 13, 2016 at 5:56 PM, Gary Johnson wrote: > There was some recent discussion on the vim_dev list of a failure to > update a Vim package which was found to be due to an update of grep > from 2.25 to 2.26. The details of the grep behavior are discussed > here: > > https://www.linuxquestions.org/questions/slackware-14/pkgtools-grep-bug-in-slackware[64]-current-4175593054/ > > In short, it seems to be due to the "grep: /dev/null output > speedup" commit of 2016-05-01, af6af288eac28951b5eee1eaaf373e22b2193b7b. > When grep terminates early, it closes the pipe it's reading stdin > from, which terminates the program on the other side of that pipe > early, before that program has completed its task. > > Oops. Thank you for the report. Oops, indeed. While I see nothing in the POSIX specification (http://pubs.opengroup.org/onlinepubs/9699919799/utilities/grep.html) that requires it to read all input in such a case, I think the current behavior violates least-surprise. Here is an example to demonstrate. I believe that POSIX does not dictate what this code prints: grep-2.25: $ (seq 10000000; echo $? 1>&2) | grep . > /dev/null 0 grep-2.26: $ (seq 10000000; echo $? 1>&2) | grep . > /dev/null 141 Paul, what do you think about making your heuristic apply only for non-pipes? From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 03:17:22 2016 Received: (at submit) by debbugs.gnu.org; 15 Nov 2016 08:17:22 +0000 Received: from localhost ([127.0.0.1]:57698 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6Yvh-0004Z5-Qy for submit@debbugs.gnu.org; Tue, 15 Nov 2016 03:17:22 -0500 Received: from eggs.gnu.org ([208.118.235.92]:41821) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6Yvg-0004Yt-1w for submit@debbugs.gnu.org; Tue, 15 Nov 2016 03:17:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6YvZ-0006Aq-R2 for submit@debbugs.gnu.org; Tue, 15 Nov 2016 03:17:14 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:51918) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c6YvZ-0006Am-Nu for submit@debbugs.gnu.org; Tue, 15 Nov 2016 03:17:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32775) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c6YvY-00047r-Hz for bug-grep@gnu.org; Tue, 15 Nov 2016 03:17:13 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6YvT-00067n-GL for bug-grep@gnu.org; Tue, 15 Nov 2016 03:17:12 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:45804) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c6YvT-000679-Ak for bug-grep@gnu.org; Tue, 15 Nov 2016 03:17:07 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id ED0A61600ED; Tue, 15 Nov 2016 00:17:04 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id EAxQPdVXvrP9; Tue, 15 Nov 2016 00:17:03 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A90571600EE; Tue, 15 Nov 2016 00:17:03 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id F52UayP7aM7h; Tue, 15 Nov 2016 00:17:03 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 8537D1600ED; Tue, 15 Nov 2016 00:17:03 -0800 (PST) Subject: Re: bug#24941: Early termination bug in grep 2.26 To: Jim Meyering , bug-grep@gnu.org References: <20161114015632.GC20504@phoenix> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> Date: Tue, 15 Nov 2016 00:17:03 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit Cc: Gary Johnson , 24941@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Jim Meyering wrote: > Paul, what do you think about making your heuristic apply only for non-= pipes? I'm a bit dubious, as grep exits early for other reasons, and did so befo= re the=20 patch in question. For example, here: seq 10000000000 | grep -q . This is as of grep 2.19 (2014), due to a bug report where grep performed = badly=20 by not exiting early . Which suggests that we= 'll get=20 bug reports in this area no matter what grep does.... Looking at other implementations, Solaris 11 grep is similar with -q. And= =20 FreeBSD-current grep exits early for this case: seq 10000000000 | grep -f /dev/null Come to think of it, perhaps GNU grep should do a similar optimization fo= r -f=20 /dev/null, if only to keep up with FreeBSD. All the above being said, I am sympathetic to the bug report. Perhaps we = can=20 eliminate the optimization if there are no file arguments and only option= s in=20 the set -EFivx are specified. Something like that anyway. The idea is to = catch=20 common cases in old (and really, broken) scripts, without hurting perform= ance in=20 general. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 14:35:48 2016 Received: (at submit) by debbugs.gnu.org; 15 Nov 2016 19:35:48 +0000 Received: from localhost ([127.0.0.1]:58491 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6jWG-0000x4-AP for submit@debbugs.gnu.org; Tue, 15 Nov 2016 14:35:48 -0500 Received: from eggs.gnu.org ([208.118.235.92]:38360) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6jWE-0000wf-Th for submit@debbugs.gnu.org; Tue, 15 Nov 2016 14:35:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6jW8-0004jZ-J6 for submit@debbugs.gnu.org; Tue, 15 Nov 2016 14:35:41 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:53533) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c6jW8-0004jO-Ep for submit@debbugs.gnu.org; Tue, 15 Nov 2016 14:35:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57539) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c6jW7-0000wD-3w for bug-grep@gnu.org; Tue, 15 Nov 2016 14:35:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c6jW6-0004ib-3e for bug-grep@gnu.org; Tue, 15 Nov 2016 14:35:39 -0500 Received: from mail-it0-x242.google.com ([2607:f8b0:4001:c0b::242]:35573) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c6jW5-0004iE-UB for bug-grep@gnu.org; Tue, 15 Nov 2016 14:35:38 -0500 Received: by mail-it0-x242.google.com with SMTP id b123so2298959itb.2 for ; Tue, 15 Nov 2016 11:35:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=v9uBd31LqDKzQPOLwu8MiEtB4y+oVvQ6lvd30wTT68o=; b=ouZZ/u09fVYubhWpoO7YjRI85bfIsXnlwtwZbzVM1wHRkbzvvx3FtLyVuXQjiHetBp FmpDb8lvPtaZea28X87BQM/bac953LYiOlCbkyTKfighRygFR1DK2H2DmGR+TvrOc8B9 T1ulfP7xCnkfb9xKOUpriWUF8TDT2PlABujuw2az4qoNLVNvkyt77Td5vKNseEUMkfH2 1HIQfxreFAjuAa9/NhezuOpU/Fu7yt+O5JN6v1bzAaZFCA/1rafnIb3cc1NwTBvrxoFz bv3y22maz0JDSRH4g0tKuqbxEiGG1D6knGx6TetR2k3jxpqDw4UlPoj3aog391gQmCgy EDjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=v9uBd31LqDKzQPOLwu8MiEtB4y+oVvQ6lvd30wTT68o=; b=D/MtzSzK7xd64YROJ2fQVIfieE5o6clPZLKjtF4P6zXfdDIt3mDeRUuPwPj2EVBIoA JDbuWPbPMhoouLVB1cZB1Mw1YQ+cdp/Reltug6MsMgDdPRQYA6jQxHG03rujFC4ruRDl axMVA8UPXUQ5c6V0nxqyM6bGilgF0jmdzIyDUoqKgYGLk6aKDGuRdod7H8JKwOx9Byme HW3d/0nQRcDDx6koxPaurIkmN4Cg/nC/6h6RnM8CpIUmN+ywvsjc0egoxDGe6v3Hm0zw 55ys+NELx9TS4k23caYLOIX/oHbOhNHGCjMsyl8ZbEUxEcrSqZv843+0ByI7IgANIs4R 4arA== X-Gm-Message-State: ABUngvefN6EH3ANJ7kByihoG1IayURMGlP4VK3kSSXbY5SnmsRQHM8lvh79Nk4NnOg7svrZCEN5l4tJV7iNJjA== X-Received: by 10.36.214.67 with SMTP id o64mr4843618itg.31.1479238536663; Tue, 15 Nov 2016 11:35:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.141.195 with HTTP; Tue, 15 Nov 2016 11:35:15 -0800 (PST) In-Reply-To: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> References: <20161114015632.GC20504@phoenix> <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> From: Jim Meyering Date: Tue, 15 Nov 2016 11:35:15 -0800 X-Google-Sender-Auth: yAU4i1OrscgQSTqEZnNTjVqPRR4 Message-ID: Subject: Re: bug#24941: Early termination bug in grep 2.26 To: Paul Eggert Content-Type: text/plain; charset=UTF-8 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit Cc: Gary Johnson , bug-grep@gnu.org, 24941@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) On Tue, Nov 15, 2016 at 12:17 AM, Paul Eggert wrote: > Jim Meyering wrote: >> >> Paul, what do you think about making your heuristic apply only for >> non-pipes? > > I'm a bit dubious, as grep exits early for other reasons, and did so before > the patch in question. For example, here: > > seq 10000000000 | grep -q . > > This is as of grep 2.19 (2014), due to a bug report where grep performed > badly by not exiting early . Which suggests that > we'll get bug reports in this area no matter what grep does.... The issue of -q is separate, because anyone who has invoked grep with -q has long been exposed to this behavior already. And behavior could even be inferred based on the POSIX description. I think it is fine for grep -q to terminate early in all cases. My concern is when no "exit-early"-implying option (none of at least -l, -q, -m) is specified, say within some script that has always worked, yet grep-2.26 makes OP's example fail most surprisingly when at a distance (i.e., the invocation of grep was hidden) someone unwittingly redirects standard output to /dev/null. > Looking at other implementations, Solaris 11 grep is similar with -q. And > FreeBSD-current grep exits early for this case: > > seq 10000000000 | grep -f /dev/null > > Come to think of it, perhaps GNU grep should do a similar optimization for > -f /dev/null, if only to keep up with FreeBSD. > > All the above being said, I am sympathetic to the bug report. Perhaps we can > eliminate the optimization if there are no file arguments and only options > in the set -EFivx are specified. Something like that anyway. The idea is to > catch common cases in old (and really, broken) scripts, without hurting > performance in general. I suppose you mean in addition to the S_ISFIFO test? That sounds good. We should retain the optimization when reading from stdin that is a non-pipe. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 17:13:21 2016 Received: (at 24941) by debbugs.gnu.org; 15 Nov 2016 22:13:21 +0000 Received: from localhost ([127.0.0.1]:58581 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6lyi-0004jZ-Sp for submit@debbugs.gnu.org; Tue, 15 Nov 2016 17:13:21 -0500 Received: from mailgw06.kcn.ne.jp ([61.86.7.213]:44921) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6lyg-0004jP-9U for 24941@debbugs.gnu.org; Tue, 15 Nov 2016 17:13:18 -0500 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw06.kcn.ne.jp (Postfix) with ESMTP id 06D94C8012 for <24941@debbugs.gnu.org>; Wed, 16 Nov 2016 07:13:15 +0900 (JST) X-matriXscan-loop-detect: b701d47b1263dd37d7a2e522738375861e32f86f Received: from mail01.kcn.ne.jp ([61.86.6.180]) by mxs01-s with ESMTP; Wed, 16 Nov 2016 07:13:14 +0900 (JST) Received: from [10.120.1.51] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail01.kcn.ne.jp (Postfix) with ESMTPA id CE0E15A828B; Wed, 16 Nov 2016 07:13:13 +0900 (JST) Date: Wed, 16 Nov 2016 07:13:14 +0900 From: Norihiro Tanaka To: Jim Meyering Subject: Re: bug#24941: Early termination bug in grep 2.26 In-Reply-To: References: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> Message-Id: <20161116071314.9E79.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-matriXscan-Sophos-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: -2.8 (--) X-Debbugs-Envelope-To: 24941 Cc: Paul Eggert , 24941@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.8 (--) On Tue, 15 Nov 2016 11:35:15 -0800 Jim Meyering wrote: > I suppose you mean in addition to the S_ISFIFO test? That sounds good. > We should retain the optimization when reading from stdin that is a > non-pipe. This can also happen in stdin. If we redirect stdout to /dev/null, grep-2.26 exits immediately and prompt is returned. - grep-2.25 $ grep . >/dev/null a </dev/null a <) id 1c6mvL-0006BT-Tq for submit@debbugs.gnu.org; Tue, 15 Nov 2016 18:13:56 -0500 Received: from mail-it0-f68.google.com ([209.85.214.68]:36232) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6mvK-0006BG-Dp for 24941@debbugs.gnu.org; Tue, 15 Nov 2016 18:13:55 -0500 Received: by mail-it0-f68.google.com with SMTP id n68so3478174itn.3 for <24941@debbugs.gnu.org>; Tue, 15 Nov 2016 15:13:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=ZPPA+wmVZycde3q6JBpOqXr4xUSYQKERv22MXxoA7wI=; b=VL6Ym6hixdTAbxMV76+L5dwz/jeBcyRTjEFBWdX8fmv32iQdAyHoP7hKk6tiOBGa65 HwNHg67y0YRwX/bfiG7v2yFkew5N/uLsK6RT7nqd9/SL/u9ANmaFM+6nRXpJuVEBbhTA VIm33YWUifxfx2MDhrzd4yiPs0bnTVR0YYzKVqQDFj7WJSIiLBneKN8nQ8rDfRhyq081 uwyQ7LLIn9Df/SJlMrER6CPesNCn8CWSOnJVEYmMeXVXjJ+4K80XdUZCx7/jM52zkw2W 6L2SR7lUrplJkj5vxnoNQgSwjDBqdAmrCKE1TRNsnin0SSqmVF15C7+3oJilyiRe0EJq adDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=ZPPA+wmVZycde3q6JBpOqXr4xUSYQKERv22MXxoA7wI=; b=ai0z6HsmOOxNy15tCaoGpFUsD0QwrPooldfrHXM+5TaH9ubCOXfd6FHxQml9Yg3eaJ OitRjYaPppMPqqVwwbrpPDv9Rla+/Pzc2+j4wBRkypk3MFVDVKag6gb71Z+zsMgNFM1E 4c0SPWJpoYYRsMZxgtjHoJk6mH/iakUhakj+TdC5BMSQXsvOhLAqB+Sa+jRqMDdNp11p JYcWTrPT1K0IkOhoPsHeFUWfpssFHiD9b9q+NOqtNCyzOCtMmBfmgM8uLyWmD2ZBDhJv bxAplaDyD9c1yYbBETePp8llxjVTNRbOiAgooJGHtG3Lz9RyzIf2FVqY1760XmcyyzRd nUqw== X-Gm-Message-State: ABUngvcqkertHGUz6B9hdEC6Dsmmq4Q69LZeX1EH2pEqwWfouRJ0ZzVTrvx2jAmNPWrzKDUSac893oSgxe9Msw== X-Received: by 10.36.70.18 with SMTP id j18mr5811158itb.97.1479251628871; Tue, 15 Nov 2016 15:13:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.141.195 with HTTP; Tue, 15 Nov 2016 15:13:27 -0800 (PST) In-Reply-To: <20161116071314.9E79.27F6AC2D@kcn.ne.jp> References: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> <20161116071314.9E79.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Tue, 15 Nov 2016 15:13:27 -0800 X-Google-Sender-Auth: CDLuchx8_1oHL84XNwP7PJleXxU Message-ID: Subject: Re: bug#24941: Early termination bug in grep 2.26 To: Norihiro Tanaka Content-Type: text/plain; charset=UTF-8 X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 24941 Cc: Paul Eggert , 24941@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) On Tue, Nov 15, 2016 at 2:13 PM, Norihiro Tanaka wrote: > > On Tue, 15 Nov 2016 11:35:15 -0800 > Jim Meyering wrote: > >> I suppose you mean in addition to the S_ISFIFO test? That sounds good. >> We should retain the optimization when reading from stdin that is a >> non-pipe. > > This can also happen in stdin. If we redirect stdout to /dev/null, > grep-2.26 exits immediately and prompt is returned. > > - grep-2.25 > > $ grep . >/dev/null > a < b < c < > - grep-2.26 > > $ src/grep . >/dev/null > a < $ Good point. While I suspect that would be much less likely to cause trouble in practice, I would now rephrase: We should retain the optimization when reading from stdin that is neither a pipe nor a tty. From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 18:33:23 2016 Received: (at 24941) by debbugs.gnu.org; 15 Nov 2016 23:33:23 +0000 Received: from localhost ([127.0.0.1]:58625 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6nEB-0006gG-Fm for submit@debbugs.gnu.org; Tue, 15 Nov 2016 18:33:23 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:53796) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6nE9-0006fy-H6 for 24941@debbugs.gnu.org; Tue, 15 Nov 2016 18:33:22 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B535D160138; Tue, 15 Nov 2016 15:33:14 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Rc1d7f9r9XGz; Tue, 15 Nov 2016 15:33:13 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id B3D9016013A; Tue, 15 Nov 2016 15:33:13 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id v9Y7fVBG1-Rf; Tue, 15 Nov 2016 15:33:13 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 99F80160138; Tue, 15 Nov 2016 15:33:13 -0800 (PST) Subject: Re: bug#24941: Early termination bug in grep 2.26 To: Jim Meyering , Norihiro Tanaka References: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> <20161116071314.9E79.27F6AC2D@kcn.ne.jp> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <98d1b53c-00a8-c377-feee-b1fd16cb9ca3@cs.ucla.edu> Date: Tue, 15 Nov 2016 15:33:13 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.8 (--) X-Debbugs-Envelope-To: 24941 Cc: 24941@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.8 (--) On 11/15/2016 03:13 PM, Jim Meyering wrote: > We should retain the optimization when reading from stdin that is > neither a pipe nor a tty. I am toying with the idea of retaining the optimization only if lseek-to-EOF succeeds, a heuristic that is a bit more restrictive. This arguably would conform better to the POSIX requirement that when grep exits "the file offset in the open file description is properly positioned just past the last byte processed by the utility." See the INPUT FILES section of . From debbugs-submit-bounces@debbugs.gnu.org Tue Nov 15 19:10:20 2016 Received: (at 24941) by debbugs.gnu.org; 16 Nov 2016 00:10:20 +0000 Received: from localhost ([127.0.0.1]:58644 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6nnw-0007bC-HD for submit@debbugs.gnu.org; Tue, 15 Nov 2016 19:10:20 -0500 Received: from mail-it0-f45.google.com ([209.85.214.45]:36867) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c6nnu-0007ay-Vl for 24941@debbugs.gnu.org; Tue, 15 Nov 2016 19:10:19 -0500 Received: by mail-it0-f45.google.com with SMTP id b123so29149458itb.0 for <24941@debbugs.gnu.org>; Tue, 15 Nov 2016 16:10:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=k27yuo4HgxJvJNV93thdjDB6Pcvshq9oWCn0GWZWkLo=; b=weMCVvUwOTg51deKDvS5Wl71v9dXPWwbbw645y1y9HAh7ltuI8uvjSBwxutzWxFZqe HuZpE5lLp4idlVN4U+b24YF2HFP56HdTE34hJAjyINNZHljUNI+6dIpSw/M1+xFlBlWa FTem8Z1YfEa3cLTzdfxLnGT/trsOJYm2gP7PJEMDJCKnVTng3qOdCB0RiyEc6mt8HImP kAvZIQ2RNDN4g9AzmQltcBusUXZP0ngwy8D2jcR/nsl5AUmvHySinPF0wub2wQcy+oQ7 ZY0p+YMJ5tEd5cjso+k8E1AMUHaufV9gy9RP/pcslHpPT0GZbp6fzWVFPk6NFYF5G9iw +xjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=k27yuo4HgxJvJNV93thdjDB6Pcvshq9oWCn0GWZWkLo=; b=cdKw0HrP3/RQTL16FDuIM6gFey3yQOB9F4vyWQ1HUqRy+g1aiZLEhOVp3WpDBNlu95 Bi2DW0NZmGNfFeZV4BLO5fu4wnioNCYoF1OyYDUFfwzKL+WL1tELlBc6D1rw2GjkY4Ga LTPOjupCbyjWTw0QcFyQYgyKzP5W9kojyVEJwRDYB1BIHiTLeqKQF+x3mXoCxybKftzP Cwgi8UwllYwkPHGnrtPNFgj99VO21Ocd1eXfPWMX1B4RDGCN3Go6e+XVM4zX6ATR3jty 6vfOBg8+QNlgqhxA2cHZq26+XN6r/9fvzCPRz9Lffi/KrbE/NrEAkNVIeFrGUaQf1SCE uIzg== X-Gm-Message-State: ABUngvfYwl+7AEMdVLLSrtdoXEq0q5CzAOO9nRKBgm5yCrz0tZPL6JKe8wLDX6II3kwmWt6FSQ5BpVspJU4DlQ== X-Received: by 10.36.214.67 with SMTP id o64mr5850010itg.31.1479255013550; Tue, 15 Nov 2016 16:10:13 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.141.195 with HTTP; Tue, 15 Nov 2016 16:09:52 -0800 (PST) In-Reply-To: <98d1b53c-00a8-c377-feee-b1fd16cb9ca3@cs.ucla.edu> References: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> <20161116071314.9E79.27F6AC2D@kcn.ne.jp> <98d1b53c-00a8-c377-feee-b1fd16cb9ca3@cs.ucla.edu> From: Jim Meyering Date: Tue, 15 Nov 2016 16:09:52 -0800 X-Google-Sender-Auth: GZgkeQS9DQ-HaAirckUCas9WTN8 Message-ID: Subject: Re: bug#24941: Early termination bug in grep 2.26 To: Paul Eggert Content-Type: text/plain; charset=UTF-8 X-Spam-Score: -0.2 (/) X-Debbugs-Envelope-To: 24941 Cc: Norihiro Tanaka , 24941@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.2 (/) On Tue, Nov 15, 2016 at 3:33 PM, Paul Eggert wrote: > On 11/15/2016 03:13 PM, Jim Meyering wrote: >> >> We should retain the optimization when reading from stdin that is >> neither a pipe nor a tty. > > I am toying with the idea of retaining the optimization only if lseek-to-EOF > succeeds, a heuristic that is a bit more restrictive. This arguably would > conform better to the POSIX requirement that when grep exits "the file > offset in the open file description is properly positioned just past the > last byte processed by the utility." See the INPUT FILES section of > . I like it. That would make the offset in the input file predictable in those cases. From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 19 04:48:02 2016 Received: (at 24941-done) by debbugs.gnu.org; 19 Nov 2016 09:48:02 +0000 Received: from localhost ([127.0.0.1]:34433 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c82Fd-0002mf-7r for submit@debbugs.gnu.org; Sat, 19 Nov 2016 04:48:02 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:40164) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c82Fa-0002mJ-4b for 24941-done@debbugs.gnu.org; Sat, 19 Nov 2016 04:47:59 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 7F11316007D; Sat, 19 Nov 2016 01:47:52 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id rGssXaOG9WhO; Sat, 19 Nov 2016 01:47:49 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id A7BDB16007E; Sat, 19 Nov 2016 01:47:49 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id cxaD9dNdTJ4L; Sat, 19 Nov 2016 01:47:49 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3BD6916007D; Sat, 19 Nov 2016 01:47:49 -0800 (PST) Subject: Re: bug#24941: Early termination bug in grep 2.26 To: Jim Meyering References: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> <20161116071314.9E79.27F6AC2D@kcn.ne.jp> <98d1b53c-00a8-c377-feee-b1fd16cb9ca3@cs.ucla.edu> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: Date: Sat, 19 Nov 2016 01:47:48 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------75A9BB998CD3F248ACD5007A" X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 24941-done Cc: Gary Johnson , Norihiro Tanaka , 24941-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) This is a multi-part message in MIME format. --------------75A9BB998CD3F248ACD5007A Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable This turned into more work than I expected, as I kept finding performance= =20 glitches and/or correctness bugs in the neighborhood. I installed the att= ached=20 set of patches. Patch 03 is the crucial one. Patch 10 trivially fixes an = earlier=20 test of mine and I'm too lazy to write a separate email for it. This fixes the problem for me, so I'm taking the liberty of closing this = bug report. --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0001-grep-avoid-unnecessary-isatty-calls.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0001-grep-avoid-unnecessary-isatty-calls.patch" =46rom 8b24c7008aadf62bd6803778ab04fdd065d573d8 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 19 Nov 2016 01:06:01 -0800 Subject: [PATCH 01/10] grep: avoid unnecessary isatty calls This fixes an inefficiency that was mistakenly introduced a while back, when the macro SET_BINARY became defined on all platforms. * src/grep.c (grepdesc, main): Do not unecessarily call isatty on POSIXish platforms. --- src/grep.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/src/grep.c b/src/grep.c index 1163eae..201b1d9 100644 --- a/src/grep.c +++ b/src/grep.c @@ -1834,12 +1834,10 @@ grepdesc (int desc, bool command_line) goto closeout; } =20 -#if defined SET_BINARY /* Set input to binary mode. Pipes are simulated with files on DOS, so this includes the case of "foo | grep bar". */ - if (!isatty (desc)) - SET_BINARY (desc); -#endif + if (O_BINARY && !isatty (desc)) + set_binary_mode (desc, O_BINARY); =20 count =3D grep (desc, &st); if (count_matches) @@ -2801,12 +2799,10 @@ main (int argc, char **argv) if ((argc - optind > 1 && !no_filenames) || with_filenames) out_file =3D 1; =20 -#ifdef SET_BINARY /* Output is set to binary mode because we shouldn't convert NL to CR-LF pairs, especially when grepping binary files. */ - if (!isatty (STDOUT_FILENO)) - SET_BINARY (STDOUT_FILENO); -#endif + if (O_BINARY && !isatty (STDOUT_FILENO)) + set_binary_mode (STDOUT_FILENO, O_BINARY); =20 if (max_count =3D=3D 0) return EXIT_FAILURE; --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0002-grep-improve-diagnostic-on-lseek-failure.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0002-grep-improve-diagnostic-on-lseek-failure.patch" =46rom c87daf6fdcaa116308663047c2e4fb8ff38011c7 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 19 Nov 2016 01:06:01 -0800 Subject: [PATCH 02/10] grep: improve diagnostic on lseek failure * src/grep.c (reset): Mention the file name in the (unlikely) chance of an lseek failure. --- src/grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/grep.c b/src/grep.c index 201b1d9..cafa0a2 100644 --- a/src/grep.c +++ b/src/grep.c @@ -861,7 +861,7 @@ reset (int fd, struct stat const *st) bufoffset =3D lseek (fd, 0, SEEK_CUR); if (bufoffset < 0) { - suppressible_error (_("lseek failed"), errno); + suppressible_error (filename, errno); return false; } } --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0003-grep-scale-back-dev-null-speedup.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0003-grep-scale-back-dev-null-speedup.patch" =46rom 80c97aa06e8f0320ff397a74a018eadc6a21f5fa Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 17 Nov 2016 13:41:39 -0800 Subject: [PATCH 03/10] grep: scale back /dev/null speedup The performance improvement when output is /dev/null (commit af6af288eac28951b5eee1eaaf373e22b2193b7b dated 2016-05-01) breaks scripts that run "PROGRAM | grep PATTERN >/dev/null" where PROGRAM dies when writing into a broken pipe. Suppress the improvement if standard input is not seekable. Problem reported by Gary Johnson (Bug#24941). * NEWS: Document this. * src/grep.c (seek_failed): New static var. (seek_data_failed): Move decl earlier, to be next to seek_failed. (file_must_have_nulls): Skip useless syscalls if seek_failed. Lessen source-code nesting. (reset): Set seek_failed and seek_data_failed. Try lseek even on non-regular files. (grep): New arg INEOF. All callers changed. Do not clear seek_data_failed here, since 'reset' now does this. (finalize_input): New static function. (grepdesc): Use it. (main): Do not exit on first match merely because output is /dev/null. * tests/grep-dev-null-out: Adjust to new behavior. --- NEWS | 6 +++ src/grep.c | 135 +++++++++++++++++++++++++++++-------------= ------ tests/grep-dev-null-out | 3 +- 3 files changed, 91 insertions(+), 53 deletions(-) diff --git a/NEWS b/NEWS index a95c875..a165377 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,12 @@ GNU grep NEWS -*- out= line -*- =20 ** Bug fixes =20 + grep by default now reads all of standard input if it is a pipe, + even if this cannot affect grep's output or exit status. This works + better with nonportable scripts that run "PROGRAM | grep PATTERN + >/dev/null" where PROGRAM dies when writing into a broken pipe. + [bug introduced in grep-2.26] + grep -Pz no longer rejects patterns containing ^ and $, and is more cautious about special patterns like (?-m) and (*FAIL). [bug introduced in grep-2.23] diff --git a/src/grep.c b/src/grep.c index cafa0a2..5ce8f95 100644 --- a/src/grep.c +++ b/src/grep.c @@ -580,6 +580,10 @@ enum { SEEK_DATA =3D SEEK_SET }; enum { SEEK_HOLE =3D SEEK_SET }; #endif =20 +/* True if lseek with SEEK_CUR or SEEK_DATA failed on the current input.= */ +static bool seek_failed; +static bool seek_data_failed; + /* Functions we'll use to search. */ typedef void (*compile_fp_t) (char const *, size_t); typedef size_t (*execute_fp_t) (char *, size_t, size_t *, char const *);= @@ -718,31 +722,26 @@ buf_has_nulls (char *buf, size_t size) static bool file_must_have_nulls (size_t size, int fd, struct stat const *st) { - if (usable_st_size (st)) + /* If the file has holes, it must contain a null byte somewhere. */ + if (SEEK_HOLE !=3D SEEK_SET && !seek_failed + && usable_st_size (st) && size < st->st_size) { - if (st->st_size <=3D size) - return false; - - /* If the file has holes, it must contain a null byte somewhere. = */ - if (SEEK_HOLE !=3D SEEK_SET) + off_t cur =3D size; + if (O_BINARY || fd =3D=3D STDIN_FILENO) { - off_t cur =3D size; - if (O_BINARY || fd =3D=3D STDIN_FILENO) - { - cur =3D lseek (fd, 0, SEEK_CUR); - if (cur < 0) - return false; - } + cur =3D lseek (fd, 0, SEEK_CUR); + if (cur < 0) + return false; + } =20 - /* Look for a hole after the current location. */ - off_t hole_start =3D lseek (fd, cur, SEEK_HOLE); - if (0 <=3D hole_start) - { - if (lseek (fd, cur, SEEK_SET) < 0) - suppressible_error (filename, errno); - if (hole_start < st->st_size) - return true; - } + /* Look for a hole after the current location. */ + off_t hole_start =3D lseek (fd, cur, SEEK_HOLE); + if (0 <=3D hole_start) + { + if (lseek (fd, cur, SEEK_SET) < 0) + suppressible_error (filename, errno); + if (hole_start < st->st_size) + return true; } } =20 @@ -806,13 +805,12 @@ static int bufdesc; /* File descriptor. */ static char *bufbeg; /* Beginning of user-visible stuff. */ static char *buflim; /* Limit of user-visible stuff. */ static size_t pagesize; /* alignment of memory pages */ -static off_t bufoffset; /* Read offset; defined on regular files. */ +static off_t bufoffset; /* Read offset. */ static off_t after_last_match; /* Pointer after last matching line that would have been output if we were outputting characters. */ static bool skip_nuls; /* Skip '\0' in data. */ static bool skip_empty_lines; /* Skip empty lines in data. */ -static bool seek_data_failed; /* lseek with SEEK_DATA failed. */ static uintmax_t totalnl; /* Total newline count before lastnl. */ =20 /* Return VAL aligned to the next multiple of ALIGNMENT. VAL can be @@ -851,20 +849,20 @@ reset (int fd, struct stat const *st) bufbeg =3D buflim =3D ALIGN_TO (buffer + 1, pagesize); bufbeg[-1] =3D eolbyte; bufdesc =3D fd; + bufoffset =3D fd =3D=3D STDIN_FILENO ? lseek (fd, 0, SEEK_CUR) : 0; + seek_failed =3D bufoffset < 0; + + /* Assume SEEK_DATA fails if SEEK_CUR does. */ + seek_data_failed =3D seek_failed; =20 - if (S_ISREG (st->st_mode)) + if (seek_failed) { - if (fd !=3D STDIN_FILENO) - bufoffset =3D 0; - else + if (errno !=3D ESPIPE) { - bufoffset =3D lseek (fd, 0, SEEK_CUR); - if (bufoffset < 0) - { - suppressible_error (filename, errno); - return false; - } + suppressible_error (filename, errno); + return false; } + bufoffset =3D 0; } return true; } @@ -1477,9 +1475,10 @@ grepbuf (char *beg, char const *lim) return outleft0 - outleft; } =20 -/* Search a given (non-directory) file. Return a count of lines printed= =2E */ +/* Search a given (non-directory) file. Return a count of lines printed= =2E + Set *INEOF to true if end-of-file reached. */ static intmax_t -grep (int fd, struct stat const *st) +grep (int fd, struct stat const *st, bool *ineof) { intmax_t nlines, i; size_t residue, save; @@ -1507,7 +1506,6 @@ grep (int fd, struct stat const *st) pending =3D 0; skip_nuls =3D skip_empty_lines && !eol; encoding_error_output =3D false; - seek_data_failed =3D false; =20 nlines =3D 0; residue =3D 0; @@ -1542,7 +1540,10 @@ grep (int fd, struct stat const *st) =20 /* no more data to scan (eof) except for maybe a residue -> break = */ if (beg =3D=3D buflim) - break; + { + *ineof =3D true; + break; + } =20 zap_nuls (beg, buflim, nul_zapper); =20 @@ -1742,11 +1743,46 @@ grepfile (int dirdesc, char const *name, bool fol= low, bool command_line) return grepdesc (desc, command_line); } =20 +/* Finish reading from FD, with status ST and where end-of-file has + been seen if INEOF. Typically this is a no-op, but when reading + from standard input this may adjust the file offset or drain a + pipe. */ + +static void +finalize_input (int fd, struct stat const *st, bool ineof) +{ + if (fd !=3D STDIN_FILENO) + return; + + if (outleft) + { + if (ineof) + return; + if (seek_failed) + { + while (fillbuf (0, st)) + if (bufbeg =3D=3D buflim) + return; + } + else if (0 <=3D lseek (fd, 0, SEEK_END)) + return; + } + else + { + if (seek_failed || bufoffset =3D=3D after_last_match + || 0 <=3D lseek (fd, after_last_match, SEEK_SET)) + return; + } + + suppressible_error (filename, errno); +} + static bool grepdesc (int desc, bool command_line) { intmax_t count; bool status =3D true; + bool ineof =3D false; struct stat st; =20 /* Get the file status, possibly for the second time. This catches @@ -1839,7 +1875,7 @@ grepdesc (int desc, bool command_line) if (O_BINARY && !isatty (desc)) set_binary_mode (desc, O_BINARY); =20 - count =3D grep (desc, &st); + count =3D grep (desc, &st, &ineof); if (count_matches) { if (out_file) @@ -1856,7 +1892,10 @@ grepdesc (int desc, bool command_line) } =20 status =3D !count; - if (list_files =3D=3D (status ? LISTFILES_NONMATCHING : LISTFILES_MATC= HING)) + + if (list_files =3D=3D LISTFILES_NONE) + finalize_input (desc, &st, ineof); + else if (list_files =3D=3D (status ? LISTFILES_NONMATCHING : LISTFILES= _MATCHING)) { print_filename (); putchar_errno ('\n' & filename_mask); @@ -1864,15 +1903,6 @@ grepdesc (int desc, bool command_line) fflush_errno (); } =20 - if (desc =3D=3D STDIN_FILENO) - { - off_t required_offset =3D outleft ? bufoffset : after_last_match; - if (required_offset !=3D bufoffset - && lseek (desc, required_offset, SEEK_SET) < 0 - && S_ISREG (st.st_mode)) - suppressible_error (filename, errno); - } - closeout: if (desc !=3D STDIN_FILENO && close (desc) !=3D 0) suppressible_error (filename, errno); @@ -2699,6 +2729,7 @@ main (int argc, char **argv) if (show_help) usage (EXIT_SUCCESS); =20 + bool dev_null_output =3D false; bool possibly_tty =3D false; struct stat tmp_stat; if (! exit_on_match && fstat (STDOUT_FILENO, &tmp_stat) =3D=3D 0) @@ -2710,7 +2741,7 @@ main (int argc, char **argv) struct stat null_stat; if (stat ("/dev/null", &null_stat) =3D=3D 0 && SAME_INODE (tmp_stat, null_stat)) - exit_on_match =3D true; + dev_null_output =3D true; else possibly_tty =3D true; } @@ -2733,9 +2764,9 @@ main (int argc, char **argv) =20 /* POSIX says -c, -l and -q are mutually exclusive. In this implementation, -q overrides -l and -L, which in turn override -c. = */ - if (exit_on_match) + if (exit_on_match | dev_null_output) list_files =3D LISTFILES_NONE; - if (exit_on_match || list_files !=3D LISTFILES_NONE) + if ((exit_on_match | dev_null_output) || list_files !=3D LISTFILES_NON= E) { count_matches =3D false; done_on_match =3D true; diff --git a/tests/grep-dev-null-out b/tests/grep-dev-null-out index 7f0e1c5..16ddadf 100755 --- a/tests/grep-dev-null-out +++ b/tests/grep-dev-null-out @@ -6,6 +6,7 @@ require_timeout_ =20 ${AWK-awk} 'BEGIN {while (1) print "x"}' /dev/null || fail=3D1 + timeout 1 grep x >/dev/null +test $? -eq 124 || fail=3D1 =20 Exit $fail --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0004-grep-drain-the-input-pipe-faster.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0004-grep-drain-the-input-pipe-faster.patch" =46rom 4fa6f48b573267e758650e114ec158d97916411e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 17 Nov 2016 15:11:35 -0800 Subject: [PATCH 04/10] grep: drain the input pipe faster * src/grep.c (dev_null_output): Now static. (drain_input): New function, using 'splice' if that makes sense. (finalize_input): Use it. (main): Omit now-unnecessary initialization. --- src/grep.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/src/grep.c b/src/grep.c index 5ce8f95..64c72ce 100644 --- a/src/grep.c +++ b/src/grep.c @@ -1031,6 +1031,7 @@ static intmax_t pending; /* Pending lines of output= =2E Always kept 0 if out_quiet is true. = */ static bool done_on_match; /* Stop scanning file on first match. */ static bool exit_on_match; /* Exit on first match. */ +static bool dev_null_output; /* Stdout is known to be /dev/null. */ =20 #include "dosbuf.c" =20 @@ -1743,6 +1744,29 @@ grepfile (int dirdesc, char const *name, bool foll= ow, bool command_line) return grepdesc (desc, command_line); } =20 +/* Read all data from FD, with status ST. Return true if successful, + false (setting errno) otherwise. */ +static bool +drain_input (int fd, struct stat const *st) +{ + ssize_t nbytes; + if (S_ISFIFO (st->st_mode) && dev_null_output) + { +#ifdef SPLICE_F_MOVE + /* Should be faster, since it need not copy data to user space. *= / + while ((nbytes =3D splice (fd, NULL, STDOUT_FILENO, NULL, + INITIAL_BUFSIZE, SPLICE_F_MOVE))) + if (nbytes < 0) + return false; + return true; +#endif + } + while ((nbytes =3D safe_read (fd, buffer, bufalloc))) + if (nbytes =3D=3D SAFE_READ_ERROR) + return false; + return true; +} + /* Finish reading from FD, with status ST and where end-of-file has been seen if INEOF. Typically this is a no-op, but when reading from standard input this may adjust the file offset or drain a @@ -1760,9 +1784,8 @@ finalize_input (int fd, struct stat const *st, bool= ineof) return; if (seek_failed) { - while (fillbuf (0, st)) - if (bufbeg =3D=3D buflim) - return; + if (drain_input (fd, st)) + return; } else if (0 <=3D lseek (fd, 0, SEEK_END)) return; @@ -2729,7 +2752,6 @@ main (int argc, char **argv) if (show_help) usage (EXIT_SUCCESS); =20 - bool dev_null_output =3D false; bool possibly_tty =3D false; struct stat tmp_stat; if (! exit_on_match && fstat (STDOUT_FILENO, &tmp_stat) =3D=3D 0) --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0005-grep-avoid-unnecessary-gettext-call.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0005-grep-avoid-unnecessary-gettext-call.patch" =46rom 2389e561ad0252ab5ea62ab53f19cae0a00ec794 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 17 Nov 2016 17:51:49 -0800 Subject: [PATCH 05/10] grep: avoid unnecessary gettext call Translate "(standard input)" lazily. * src/grep.c (input_filename): New function. (suppressible_error): Remove 1st arg, since it is always input_filename (). All callers changed. (suppressible_error, print_filename, grep, grepdesc): Use it. (grep_command_line_arg): Set filename to NULL if standard input has no label. Often, this avoids all calls to gettext, which can be a win as the first call can be expensive. --- src/grep.c | 51 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 30 insertions(+), 21 deletions(-) diff --git a/src/grep.c b/src/grep.c index 64c72ce..e1b7937 100644 --- a/src/grep.c +++ b/src/grep.c @@ -502,7 +502,7 @@ char eolbyte; static char const *matcher; =20 /* For error messages. */ -/* The input file name, or (if standard input) "-" or a --label argument= =2E */ +/* The input file name, or (if standard input) null or a --label argumen= t. */ static char const *filename; /* Omit leading "./" from file names in diagnostics. */ static bool omit_dot_slash; @@ -590,12 +590,20 @@ typedef size_t (*execute_fp_t) (char *, size_t, siz= e_t *, char const *); static compile_fp_t compile; static execute_fp_t execute; =20 -/* Like error, but suppress the diagnostic if requested. */ +static char const * +input_filename (void) +{ + if (!filename) + filename =3D _("(standard input)"); + return filename; +} + +/* Unless requested, diagnose an error about the input file. */ static void -suppressible_error (char const *mesg, int errnum) +suppressible_error (int errnum) { if (! suppress_errors) - error (0, errnum, "%s", mesg); + error (0, errnum, "%s", input_filename ()); errseen =3D true; } =20 @@ -739,7 +747,7 @@ file_must_have_nulls (size_t size, int fd, struct sta= t const *st) if (0 <=3D hole_start) { if (lseek (fd, cur, SEEK_SET) < 0) - suppressible_error (filename, errno); + suppressible_error (errno); if (hole_start < st->st_size) return true; } @@ -859,7 +867,7 @@ reset (int fd, struct stat const *st) { if (errno !=3D ESPIPE) { - suppressible_error (filename, errno); + suppressible_error (errno); return false; } bufoffset =3D 0; @@ -1056,7 +1064,7 @@ static void print_filename (void) { pr_sgr_start_if (filename_color); - fputs_errno (filename); + fputs_errno (input_filename ()); pr_sgr_end_if (filename_color); } =20 @@ -1514,7 +1522,7 @@ grep (int fd, struct stat const *st, bool *ineof) =20 if (! fillbuf (save, st)) { - suppressible_error (filename, errno); + suppressible_error (errno); return 0; } =20 @@ -1598,7 +1606,7 @@ grep (int fd, struct stat const *st, bool *ineof) nlscan (beg); if (! fillbuf (save, st)) { - suppressible_error (filename, errno); + suppressible_error (errno); goto finish_grep; } } @@ -1617,7 +1625,7 @@ grep (int fd, struct stat const *st, bool *ineof) if (!out_quiet && (encoding_error_output || (0 <=3D nlines_first_null && nlines_first_null <= nlines))) { - printf_errno (_("Binary file %s matches\n"), filename); + printf_errno (_("Binary file %s matches\n"), input_filename ()); if (line_buffered) fflush_errno (); } @@ -1672,7 +1680,7 @@ grepdirent (FTS *fts, FTSENT *ent, bool command_lin= e) case FTS_DNR: case FTS_ERR: case FTS_NS: - suppressible_error (filename, ent->fts_errno); + suppressible_error (ent->fts_errno); return true; =20 case FTS_DEFAULT: @@ -1689,7 +1697,7 @@ grepdirent (FTS *fts, FTSENT *ent, bool command_lin= e) int flag =3D follow ? 0 : AT_SYMLINK_NOFOLLOW; if (fstatat (fts->fts_cwd_fd, ent->fts_accpath, &st1, flag= ) !=3D 0) { - suppressible_error (filename, errno); + suppressible_error (errno); return true; } st =3D &st1; @@ -1738,7 +1746,7 @@ grepfile (int dirdesc, char const *name, bool follo= w, bool command_line) if (desc < 0) { if (follow || ! open_symlink_nofollow_error (errno)) - suppressible_error (filename, errno); + suppressible_error (errno); return true; } return grepdesc (desc, command_line); @@ -1797,7 +1805,7 @@ finalize_input (int fd, struct stat const *st, bool= ineof) return; } =20 - suppressible_error (filename, errno); + suppressible_error (errno); } =20 static bool @@ -1816,7 +1824,7 @@ grepdesc (int desc, bool command_line) directory for a non-directory while 'grep' is running. */ if (fstat (desc, &st) !=3D 0) { - suppressible_error (filename, errno); + suppressible_error (errno); goto closeout; } =20 @@ -1843,7 +1851,7 @@ grepdesc (int desc, bool command_line) /* Close DESC now, to conserve file descriptors if the race condition occurs many times in a deep recursion. */ if (close (desc) !=3D 0) - suppressible_error (filename, errno); + suppressible_error (errno); =20 fts_arg[0] =3D (char *) filename; fts_arg[1] =3D NULL; @@ -1854,9 +1862,9 @@ grepdesc (int desc, bool command_line) while ((ent =3D fts_read (fts))) status &=3D grepdirent (fts, ent, command_line); if (errno) - suppressible_error (filename, errno); + suppressible_error (errno); if (fts_close (fts) !=3D 0) - suppressible_error (filename, errno); + suppressible_error (errno); return status; } if (desc !=3D STDIN_FILENO @@ -1888,7 +1896,8 @@ grepdesc (int desc, bool command_line) && S_ISREG (st.st_mode) && SAME_INODE (st, out_stat)) { if (! suppress_errors) - error (0, 0, _("input file %s is also the output"), quote (filen= ame)); + error (0, 0, _("input file %s is also the output"), + quote (input_filename ())); errseen =3D true; goto closeout; } @@ -1928,7 +1937,7 @@ grepdesc (int desc, bool command_line) =20 closeout: if (desc !=3D STDIN_FILENO && close (desc) !=3D 0) - suppressible_error (filename, errno); + suppressible_error (errno); return status; } =20 @@ -1937,7 +1946,7 @@ grep_command_line_arg (char const *arg) { if (STREQ (arg, "-")) { - filename =3D label ? label : _("(standard input)"); + filename =3D label; return grepdesc (STDIN_FILENO, true); } else --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0006-grep-avoid-O-N-2-buffer-reallocation.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0006-grep-avoid-O-N-2-buffer-reallocation.patch" =46rom 2a45b2fbe2ca6d2d6c161d1593511b92d9e4640e Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 17 Nov 2016 18:25:19 -0800 Subject: [PATCH 06/10] grep: avoid O(N**2) buffer reallocation * src/grep.c (main): Use x2realloc to avoid O(N**2) performance as pattern buffers grow. --- src/grep.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/src/grep.c b/src/grep.c index e1b7937..fe56e9c 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2413,9 +2413,9 @@ fgrep_to_grep_pattern (char **keys_p, size_t *len_p= ) int main (int argc, char **argv) { - char *keys; - size_t keycc, oldcc, keyalloc; - bool with_filenames; + char *keys =3D NULL; + size_t keycc =3D 0, oldcc, keyalloc =3D 0; + bool with_filenames =3D false; size_t cc; int opt, prepended; int prev_optind, last_recursive; @@ -2425,9 +2425,6 @@ main (int argc, char **argv) exit_failure =3D EXIT_TROUBLE; initialize_main (&argc, &argv); =20 - keys =3D NULL; - keycc =3D 0; - with_filenames =3D false; eolbyte =3D '\n'; filename_mask =3D ~0; =20 @@ -2556,8 +2553,12 @@ main (int argc, char **argv) =20 case 'e': cc =3D strlen (optarg); - keys =3D xrealloc (keys, keycc + cc + 1); - strcpy (&keys[keycc], optarg); + if (keyalloc < keycc + cc + 1) + { + keyalloc =3D keycc + cc + 1; + keys =3D x2realloc (keys, &keyalloc); + } + memcpy (&keys[keycc], optarg, cc); keycc +=3D cc; keys[keycc++] =3D '\n'; fl_add (keys, keycc - cc - 1, keycc, ""); @@ -2567,15 +2568,14 @@ main (int argc, char **argv) fp =3D STREQ (optarg, "-") ? stdin : fopen (optarg, O_TEXT ? "rt= " : "r"); if (!fp) die (EXIT_TROUBLE, errno, "%s", optarg); - for (keyalloc =3D 1; keyalloc <=3D keycc + 1; keyalloc *=3D 2) - ; - keys =3D xrealloc (keys, keyalloc); oldcc =3D keycc; - while ((cc =3D fread (keys + keycc, 1, keyalloc - 1 - keycc, fp)= ) !=3D 0) + for (;; keycc +=3D cc) { - keycc +=3D cc; - if (keycc =3D=3D keyalloc - 1) - keys =3D x2nrealloc (keys, &keyalloc, sizeof *keys); + if (keyalloc <=3D keycc + 1) + keys =3D x2realloc (keys, &keyalloc); + cc =3D fread (keys + keycc, 1, keyalloc - (keycc + 1), fp); + if (cc =3D=3D 0) + break; } fread_errno =3D errno; if (ferror (fp)) @@ -2823,7 +2823,7 @@ main (int argc, char **argv) } else if (optind < argc) { - /* A copy must be made in case of an xrealloc() or free() later. = */ + /* Make a copy so that it can be reallocated or freed later. */ keycc =3D strlen (argv[optind]); keys =3D xmemdup (argv[optind++], keycc + 1); fl_add (keys, 0, keycc, ""); --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0007-grep-treat-f-dev-null-like-m0.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0007-grep-treat-f-dev-null-like-m0.patch" =46rom fc6fce9a16bc52aca11cd8f1ad2632943fe94201 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 17 Nov 2016 21:05:42 -0800 Subject: [PATCH 07/10] grep: treat -f /dev/null like -m0 * NEWS: Document this. * src/grep.c (main): With -f /dev/null, don't bother to read the input. This is what FreeBSD grep does. * tests/Makefile.am (TESTS): Add skip-read. * tests/skip-read: New file. --- NEWS | 5 +++++ src/grep.c | 5 ++++- tests/Makefile.am | 1 + tests/skip-read | 15 +++++++++++++++ 4 files changed, 25 insertions(+), 1 deletion(-) create mode 100755 tests/skip-read diff --git a/NEWS b/NEWS index a165377..06a186a 100644 --- a/NEWS +++ b/NEWS @@ -16,6 +16,11 @@ GNU grep NEWS -*- o= utline -*- =20 grep's use of getprogname no longer causes a build failure on HP-UX. =20 +** Improvements + + grep no longer reads the input in a few more cases when it is easy + to see that matching cannot succeed, e.g., 'grep -f /dev/null'. + =20 * Noteworthy changes in release 2.26 (2016-10-02) [stable] =20 diff --git a/src/grep.c b/src/grep.c index fe56e9c..317a0d5 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2866,7 +2866,10 @@ main (int argc, char **argv) if (O_BINARY && !isatty (STDOUT_FILENO)) set_binary_mode (STDOUT_FILENO, O_BINARY); =20 - if (max_count =3D=3D 0) + /* If it is easy to see that matching cannot succeed (e.g., 'grep -f + /dev/null'), fail without reading the input. */ + if (max_count =3D=3D 0 + || (keycc =3D=3D 0 && out_invert && !match_lines && !match_words))= return EXIT_FAILURE; =20 /* Prefer sysconf for page size, as getpagesize typically returns int.= */ diff --git a/tests/Makefile.am b/tests/Makefile.am index f4c82f4..b6f0df3 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -143,6 +143,7 @@ TESTS =3D \ reversed-range-endpoints \ sjis-mb \ skip-device \ + skip-read \ spencer1 \ spencer1-locale \ status \ diff --git a/tests/skip-read b/tests/skip-read new file mode 100755 index 0000000..627d362 --- /dev/null +++ b/tests/skip-read @@ -0,0 +1,15 @@ +#!/bin/sh +# Check that grep skips reading in some cases. + +. "${srcdir=3D.}/init.sh"; path_prepend_ ../src + +fail=3D0 + +for opts in '-m0 y' '-f /dev/null' '-v ""'; do + for matcher in '' -E -F; do + eval returns_ 1 grep $opts $matcher no-such-file > out || fail=3D1 + compare /dev/null out || fail=3D1 + done +done + +Exit $fail --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0008-grep-tune-f-dev-null.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0008-grep-tune-f-dev-null.patch" =46rom e25ea1223d0fb50e759907cdd88e8096b354cbab Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Thu, 17 Nov 2016 21:11:30 -0800 Subject: [PATCH 08/10] grep: tune -f /dev/null * src/grep.c (main): Do the -f /dev/null early-exit checks before more-expensive tests that involve syscalls. --- src/grep.c | 80 +++++++++++++++++++++++++++++++-------------------------= ------ 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/src/grep.c b/src/grep.c index 317a0d5..066df8c 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2761,6 +2761,28 @@ main (int argc, char **argv) if (show_help) usage (EXIT_SUCCESS); =20 + if (keys) + { + if (keycc =3D=3D 0) + { + /* No keys were specified (e.g. -f /dev/null). Match nothing.= */ + out_invert ^=3D true; + match_lines =3D match_words =3D false; + } + else + /* Strip trailing newline. */ + --keycc; + } + else if (optind < argc) + { + /* Make a copy so that it can be reallocated or freed later. */ + keycc =3D strlen (argv[optind]); + keys =3D xmemdup (argv[optind++], keycc + 1); + fl_add (keys, 0, keycc, ""); + } + else + usage (EXIT_TROUBLE); + bool possibly_tty =3D false; struct stat tmp_stat; if (! exit_on_match && fstat (STDOUT_FILENO, &tmp_stat) =3D=3D 0) @@ -2778,21 +2800,6 @@ main (int argc, char **argv) } } =20 - if (color_option =3D=3D 2) - color_option =3D possibly_tty && should_colorize () && isatty (STDOU= T_FILENO); - init_colorize (); - - if (color_option) - { - /* Legacy. */ - char *userval =3D getenv ("GREP_COLOR"); - if (userval !=3D NULL && *userval !=3D '\0') - selected_match_color =3D context_match_color =3D userval; - - /* New GREP_COLORS has priority. */ - parse_grep_colors (); - } - /* POSIX says -c, -l and -q are mutually exclusive. In this implementation, -q overrides -l and -L, which in turn override -c. = */ if (exit_on_match | dev_null_output) @@ -2809,27 +2816,26 @@ main (int argc, char **argv) if (out_before < 0) out_before =3D default_context; =20 - if (keys) - { - if (keycc =3D=3D 0) - { - /* No keys were specified (e.g. -f /dev/null). Match nothing.= */ - out_invert ^=3D true; - match_lines =3D match_words =3D false; - } - else - /* Strip trailing newline. */ - --keycc; - } - else if (optind < argc) + /* If it is easy to see that matching cannot succeed (e.g., 'grep -f + /dev/null'), fail without reading the input. */ + if (max_count =3D=3D 0 + || (keycc =3D=3D 0 && out_invert && !match_lines && !match_words))= + return EXIT_FAILURE; + + if (color_option =3D=3D 2) + color_option =3D possibly_tty && should_colorize () && isatty (STDOU= T_FILENO); + init_colorize (); + + if (color_option) { - /* Make a copy so that it can be reallocated or freed later. */ - keycc =3D strlen (argv[optind]); - keys =3D xmemdup (argv[optind++], keycc + 1); - fl_add (keys, 0, keycc, ""); + /* Legacy. */ + char *userval =3D getenv ("GREP_COLOR"); + if (userval !=3D NULL && *userval !=3D '\0') + selected_match_color =3D context_match_color =3D userval; + + /* New GREP_COLORS has priority. */ + parse_grep_colors (); } - else - usage (EXIT_TROUBLE); =20 initialize_unibyte_mask (); =20 @@ -2866,12 +2872,6 @@ main (int argc, char **argv) if (O_BINARY && !isatty (STDOUT_FILENO)) set_binary_mode (STDOUT_FILENO, O_BINARY); =20 - /* If it is easy to see that matching cannot succeed (e.g., 'grep -f - /dev/null'), fail without reading the input. */ - if (max_count =3D=3D 0 - || (keycc =3D=3D 0 && out_invert && !match_lines && !match_words))= - return EXIT_FAILURE; - /* Prefer sysconf for page size, as getpagesize typically returns int.= */ #ifdef _SC_PAGESIZE long psize =3D sysconf (_SC_PAGESIZE); --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0009-grep-f-dev-null-L-PAT-FILE-outputs-FILE.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0009-grep-f-dev-null-L-PAT-FILE-outputs-FILE.patch" =46rom 5b8900267ff93688fad2a7ab0b25dc57b42ea9e7 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 19 Nov 2016 01:02:08 -0800 Subject: [PATCH 09/10] grep -f /dev/null -L PAT FILE outputs FILE * NEWS: Document this. * src/grep.c (main): Do not exit right away with -L. * tests/skip-read: Test for the fix. --- NEWS | 2 ++ src/grep.c | 5 +++-- tests/skip-read | 13 +++++++++++-- 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/NEWS b/NEWS index 06a186a..29a0e8d 100644 --- a/NEWS +++ b/NEWS @@ -14,6 +14,8 @@ GNU grep NEWS -*- ou= tline -*- more cautious about special patterns like (?-m) and (*FAIL). [bug introduced in grep-2.23] =20 + grep -m0 -L PAT FILE now outputs "FILE". [bug introduced in grep-2.5]= + grep's use of getprogname no longer causes a build failure on HP-UX. =20 ** Improvements diff --git a/src/grep.c b/src/grep.c index 066df8c..a794af4 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2818,8 +2818,9 @@ main (int argc, char **argv) =20 /* If it is easy to see that matching cannot succeed (e.g., 'grep -f /dev/null'), fail without reading the input. */ - if (max_count =3D=3D 0 - || (keycc =3D=3D 0 && out_invert && !match_lines && !match_words))= + if ((max_count =3D=3D 0 + || (keycc =3D=3D 0 && out_invert && !match_lines && !match_words)= ) + && list_files !=3D LISTFILES_NONMATCHING) return EXIT_FAILURE; =20 if (color_option =3D=3D 2) diff --git a/tests/skip-read b/tests/skip-read index 627d362..1eef87e 100755 --- a/tests/skip-read +++ b/tests/skip-read @@ -5,11 +5,20 @@ =20 fail=3D0 =20 +echo /dev/null >exp || framework_failure_ + for opts in '-m0 y' '-f /dev/null' '-v ""'; do for matcher in '' -E -F; do - eval returns_ 1 grep $opts $matcher no-such-file > out || fail=3D1 - compare /dev/null out || fail=3D1 + for file in /dev/null no-such-file; do + eval returns_ 1 grep $opts $matcher no-such-file > out || fail=3D1= + compare /dev/null out || fail=3D1 + eval returns_ 1 grep -l $opts $matcher /dev/null > out || fail=3D1= + compare /dev/null out || fail=3D1 + done + eval returns_ 1 grep -L $opts $matcher /dev/null > out || fail=3D1 + compare exp out || fail=3D1 done done =20 + Exit $fail --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A Content-Type: text/x-diff; name="0010-tests-use-returns_-rather-than.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="0010-tests-use-returns_-rather-than.patch" =46rom f028b70e4eac297ce37b306fbdc18efbf09f2e4a Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sat, 19 Nov 2016 01:23:27 -0800 Subject: [PATCH 10/10] tests: use "returns_" rather than "$?" * tests/grep-dev-null-out: Use "returns_ 124" rather than testing $? =3D 124. --- tests/grep-dev-null-out | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tests/grep-dev-null-out b/tests/grep-dev-null-out index 16ddadf..13a4843 100755 --- a/tests/grep-dev-null-out +++ b/tests/grep-dev-null-out @@ -6,7 +6,6 @@ require_timeout_ =20 ${AWK-awk} 'BEGIN {while (1) print "x"}' /dev/null -test $? -eq 124 || fail=3D1 + returns_ 124 timeout 1 grep x >/dev/null || fail=3D1 =20 Exit $fail --=20 2.7.4 --------------75A9BB998CD3F248ACD5007A-- From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 19 20:40:52 2016 Received: (at 24941-done) by debbugs.gnu.org; 20 Nov 2016 01:40:52 +0000 Received: from localhost ([127.0.0.1]:35442 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8H7k-0004ly-9Y for submit@debbugs.gnu.org; Sat, 19 Nov 2016 20:40:52 -0500 Received: from mail-io0-f195.google.com ([209.85.223.195]:34394) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8H7i-0004lj-3A for 24941-done@debbugs.gnu.org; Sat, 19 Nov 2016 20:40:50 -0500 Received: by mail-io0-f195.google.com with SMTP id n13so2437218ioe.1 for <24941-done@debbugs.gnu.org>; Sat, 19 Nov 2016 17:40:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=hySc2MMoTWc/mCrwIbwie9pw7HhpwKx/Pnzp4iTfamk=; b=eANeFMsWkES5nrmeQGvyJ0fXxBL0FYae1ZyuTETpIy1+XZwdquqtJRPS3Axm46uOYv mvVwykrhVXe6gjvYHE8Bs/W3i6e2VKu124T9U4l2OUkPUMZza2eN7aCHPLQKniRgyUXM ZWz1NiqorFaV7PxxTnhMyqChvPpRDFCDtM0cEgcFUp/8U5Jd2MhHyZ7t+dtbkZYqi/5k U3gg1yi65BV39NEniOHZn3CfI/qvTmKfxLG6/+GqDHPZcSPL/FEuYLgG/Xsvfh1pIv+S wigwgVDBKO/OfyF6kVkk0kNDkv3NyNXAxqUvgo5M1+lM4jHx1NWf3UXQPs+g2sG2l3BD VCpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=hySc2MMoTWc/mCrwIbwie9pw7HhpwKx/Pnzp4iTfamk=; b=cyI0vaF1Dds0Duqc1fYtyqCH0lL2lRQl2QRP8+EoDNEkOlm+BBDz4XblsocIe+QGMF g96dpqhxVul7Ptslf34B2Z/gD73KKqcx1beVMzKe8/O2oMAjPBOkfheMDROMJh/rERhc v/dJ6GdfbdS9wug1CaAzs9FBAYHM3An1odsvnAe2/kbhHHkuQJzpkcuPrCdIFPoW9hEQ miATj/Bn0HkGDN8AJR93cdzoZbRTS4SP0tBO6HkS84gehvPwuakeTESoEFUO0P/4OtxR 9WzGDhjUH8Hqhk2C00BUO9VlmcgXRqpM+5iHoOjvXwHJNkFxUSz3bn6XoaHnKdTF8v7g 8aKg== X-Gm-Message-State: AKaTC01vlJ9bnwWq+WMvieo2E6lGd/fd2uurDqLju1/HIAH9GUsz6TcPXVi26bz8ht6a689pN4tH6G7UargtWA== X-Received: by 10.107.149.144 with SMTP id x138mr6540747iod.23.1479606044449; Sat, 19 Nov 2016 17:40:44 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.141.195 with HTTP; Sat, 19 Nov 2016 17:40:23 -0800 (PST) In-Reply-To: References: <5cbb8baa-4afa-7b6b-17b0-26090e980a1a@cs.ucla.edu> <20161116071314.9E79.27F6AC2D@kcn.ne.jp> <98d1b53c-00a8-c377-feee-b1fd16cb9ca3@cs.ucla.edu> From: Jim Meyering Date: Sat, 19 Nov 2016 17:40:23 -0800 X-Google-Sender-Auth: dDdVL2WGvB7GP12RTCVNLBVwksU Message-ID: Subject: Re: bug#24941: Early termination bug in grep 2.26 To: Paul Eggert Content-Type: text/plain; charset=UTF-8 X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 24941-done Cc: Gary Johnson , Norihiro Tanaka , 24941-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) On Sat, Nov 19, 2016 at 1:47 AM, Paul Eggert wrote: > This turned into more work than I expected, as I kept finding performance > glitches and/or correctness bugs in the neighborhood. I installed the > attached set of patches. Patch 03 is the crucial one. Patch 10 trivially > fixes an earlier test of mine and I'm too lazy to write a separate email for > it. > > This fixes the problem for me, so I'm taking the liberty of closing this bug > report. Impressive work. Thanks a lot! From unknown Sat Sep 20 03:59:01 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 18 Dec 2016 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator