From unknown Fri Jun 20 07:13:02 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#22059 <22059@debbugs.gnu.org> To: bug#22059 <22059@debbugs.gnu.org> Subject: Status: grep -E: unexpected behaviour Reply-To: bug#22059 <22059@debbugs.gnu.org> Date: Fri, 20 Jun 2025 14:13:02 +0000 retitle 22059 grep -E: unexpected behaviour reassign 22059 grep submitter 22059 Charles severity 22059 wishlist thanks From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 30 02:22:36 2015 Received: (at submit) by debbugs.gnu.org; 30 Nov 2015 07:22:36 +0000 Received: from localhost ([127.0.0.1]:60025 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a3Imv-000572-UF for submit@debbugs.gnu.org; Mon, 30 Nov 2015 02:22:36 -0500 Received: from eggs.gnu.org ([208.118.235.92]:55914) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a3GXP-0001d7-Js for submit@debbugs.gnu.org; Sun, 29 Nov 2015 23:58:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a3GXO-0005F1-Ad for submit@debbugs.gnu.org; Sun, 29 Nov 2015 23:58:07 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42777) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3GXO-0005Ex-7Y for submit@debbugs.gnu.org; Sun, 29 Nov 2015 23:58:06 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37677) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3GXN-0004dc-Dy for bug-grep@gnu.org; Sun, 29 Nov 2015 23:58:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a3GXJ-0005EY-Dr for bug-grep@gnu.org; Sun, 29 Nov 2015 23:58:05 -0500 Received: from smtp5.emailarray.com ([65.39.216.39]:37633) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a3GXJ-0005ET-8v for bug-grep@gnu.org; Sun, 29 Nov 2015 23:58:01 -0500 Received: (qmail 82799 invoked by uid 89); 30 Nov 2015 04:57:58 -0000 Received: from unknown (HELO ?192.168.10.17?) (Y2hhcmxlc0BjaGFybGVzbWF0a2luc29uLm9yZ0A1OS45OS4yMzkuODg=) (POLARISLOCAL) by smtp5.emailarray.com with SMTP; 30 Nov 2015 04:57:58 -0000 Message-ID: <565BD753.7020507@charlesmatkinson.org> Date: Mon, 30 Nov 2015 10:27:55 +0530 From: Charles MIME-Version: 1.0 To: bug-grep@gnu.org Subject: grep -E: unexpected behaviour Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-detected-operating-system: by eggs.gnu.org: FreeBSD 9.x X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Mon, 30 Nov 2015 02:22:17 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) As expected: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL' Nov 30 07:16:38 CW8 udisksd[2650]: The string `TSSTcorp CDDVDW SHQeò? ±?¾MUæíE³èBãÄL' is not valid UTF-8. Invalid characters begins at `eò? ±?¾MUæíE³èBãÄL' But add the i to the pattern and the behaviour is unexpected: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* i' /var/log/syslog.1 [no output] Apparently grep silently stops processing when it encounters the invalid UTF-8: # grep -E --only-matching 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | tail -1 udisksd[2650]: The string `TSSTcorp CDDVDW In case the specific unusual characters are relevant, here they are in hex: # grep -E 'udisksd\[[[:digit:]]+\]: The string .* ' /var/log/syslog.1 | head -1 | cut --delimiter=' ' --fields=10-11 | od -x 0000000 4853 8251 f265 88d0 b120 b8d3 4dbe e655 0000020 45ed e8b3 e342 4cc4 0a27 0000032 When the input has invalid characters so grep cannot process it, a message could be expected perhaps configurable by the -s/--no-messages option because the input is (sort of) unreadable. Version: 2.20 from the Debian Jessie package 2.20-4.1 Charles From debbugs-submit-bounces@debbugs.gnu.org Mon Nov 30 12:27:29 2015 Received: (at 22059) by debbugs.gnu.org; 30 Nov 2015 17:27:29 +0000 Received: from localhost ([127.0.0.1]:33094 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a3SEa-0006QK-Pz for submit@debbugs.gnu.org; Mon, 30 Nov 2015 12:27:28 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41764) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1a3SEZ-0006QA-5k for 22059@debbugs.gnu.org; Mon, 30 Nov 2015 12:27:27 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 54CAA1601D0; Mon, 30 Nov 2015 09:27:26 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id mxRZTcT7DNDn; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id AA339160E3D; Mon, 30 Nov 2015 09:27:25 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id CKHkCot-qxiy; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 91B181601D0; Mon, 30 Nov 2015 09:27:25 -0800 (PST) Subject: Re: bug#22059: grep -E: unexpected behaviour To: Charles , 22059@debbugs.gnu.org References: <565BD753.7020507@charlesmatkinson.org> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <565C86FD.70909@cs.ucla.edu> Date: Mon, 30 Nov 2015 09:27:25 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <565BD753.7020507@charlesmatkinson.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 22059 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) On 11/29/2015 08:57 PM, Charles wrote: > Apparently grep silently stops processing when it encounters the invalid UTF-8: The regular expression "." matches a single character, and ".*" matches a string of characters. In your example, there is an encoding error, and encoding errors are not characters so "." and ".*" do not match them. I don't see any bug here. > When the input has invalid characters so grep cannot process it, a message could be expected That's a good suggestion, yes. From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 31 03:55:23 2015 Received: (at control) by debbugs.gnu.org; 31 Dec 2015 08:55:23 +0000 Received: from localhost ([127.0.0.1]:50938 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aEZ11-0005OJ-J5 for submit@debbugs.gnu.org; Thu, 31 Dec 2015 03:55:23 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:41868) by debbugs.gnu.org with esmtp (Exim 4.84) (envelope-from ) id 1aEZ0z-0005O1-Ht for control@debbugs.gnu.org; Thu, 31 Dec 2015 03:55:21 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 2C836160ED6 for ; Thu, 31 Dec 2015 00:55:16 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Dsi6TvFEL1b3 for ; Thu, 31 Dec 2015 00:55:15 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 4FFD0160ED7 for ; Thu, 31 Dec 2015 00:55:15 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id EnJ9ehm4awCT for ; Thu, 31 Dec 2015 00:55:15 -0800 (PST) Received: from [192.168.1.9] (pool-100-32-155-148.lsanca.fios.verizon.net [100.32.155.148]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 3587F160ED6 for ; Thu, 31 Dec 2015 00:55:15 -0800 (PST) To: control@debbugs.gnu.org From: Paul Eggert Subject: grep bug maintenance Organization: UCLA Computer Science Department Message-ID: <5684ED73.6060403@cs.ucla.edu> Date: Thu, 31 Dec 2015 00:55:15 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) severity 22059 wishlist severity 21865 wishlist close 22278 close 22279 close 21755 close 21700 tags 21554 wontfix tags 21527 moreinfo