From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences Resent-From: Santiago Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 29 Jan 2014 09:46:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 16586@debbugs.gnu.org X-Debbugs-Original-To: submit@debbugs.gnu.org Received: via spool by submit@debbugs.gnu.org id=B.139098871531889 (code B ref -1); Wed, 29 Jan 2014 09:46:02 +0000 Received: (at submit) by debbugs.gnu.org; 29 Jan 2014 09:45:15 +0000 Received: from localhost ([127.0.0.1]:39811 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8Rhr-0008IH-2J for submit@debbugs.gnu.org; Wed, 29 Jan 2014 04:45:15 -0500 Received: from mx1.riseup.net ([198.252.153.129]:37253) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8Rhn-0008I7-Cp for submit@debbugs.gnu.org; Wed, 29 Jan 2014 04:45:12 -0500 Received: from fulvetta.riseup.net (fulvetta-pn.riseup.net [10.0.1.75]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id CF2D5510A3 for ; Wed, 29 Jan 2014 01:45:10 -0800 (PST) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr@fulvetta.riseup.net) with ESMTPSA id C6AFE365 Received: by holmon (sSMTP sendmail emulation); Wed, 29 Jan 2014 10:43:46 +0100 Date: Wed, 29 Jan 2014 10:43:46 +0100 From: Santiago Message-ID: <20140129094346.GA1910@holmon> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.8 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Package: grep Version: 2.16 Severity: important Hi there, I forward this bug from debian's BTS. Last changes in -P brought another problem. I've confirmed this behavior on last debian package: ----- Forwarded message from Vincent Lefevre ----- [snip] grep -P loops on some files with invalid UTF-8 sequences, e.g. $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head �e �e �e �e �e �e �e �e �e �e (the infinite loop is interrupted here by a broken pipe due to the "head"). It seems that the fix of https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472 didn't solve all the problems. -- System Information: Debian Release: jessie/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.12-1-amd64 (SMP w/2 CPU cores) Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages grep depends on: ii dpkg 1.17.6 ii install-info 5.2.0.dfsg.1-2 ii libc6 2.17-97 ii libpcre3 1:8.31-2 grep recommends no packages. grep suggests no packages. -- no debconf information ----- End forwarded message ----- From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 03 Feb 2014 21:35:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Santiago Cc: 16586@debbugs.gnu.org Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.139146327931103 (code B ref 16586); Mon, 03 Feb 2014 21:35:02 +0000 Received: (at 16586) by debbugs.gnu.org; 3 Feb 2014 21:34:39 +0000 Received: from localhost ([127.0.0.1]:48376 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WARA6-00085a-9H for submit@debbugs.gnu.org; Mon, 03 Feb 2014 16:34:38 -0500 Received: from mail-pd0-f173.google.com ([209.85.192.173]:36904) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WARA4-00085S-0f for 16586@debbugs.gnu.org; Mon, 03 Feb 2014 16:34:36 -0500 Received: by mail-pd0-f173.google.com with SMTP id y10so7324578pdj.18 for <16586@debbugs.gnu.org>; Mon, 03 Feb 2014 13:34:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=D/T9glPNellobKsYU5VV15OMtuK/xLE+8Jvsf1Rgt2A=; b=PkVlK+R7iZ6IKUsIhuvfwDZP2D6iAbabf9FKV7I6PWhMwM7Y68MfHdN8dFOuqtq85f x2uJ08Msmaa3wOS9snt4Lm8QwUj0++d8X4VdBKIaopqNj+9+PMd1UpjOUHB1bcb1vXlM 7G32Bzr8hmBqX89jHcTsBx5s5sGNtIQCiw/ElbDaKoSTjK2jCrRmtxhDEK9/5nYXm+9q 1IcmlyyZRdRkqcS5xOg+Nlvrqn1o86kf/GFeIw27Ny+AOvwSIhhIv31U35oP8G1UaD7N bsPoHiayZfVnuTXcuZnB2a24dPG2calAQRg7qAGqPBGw9qcr7G2HrtfqGv+ZZOo1mxeS 1pkw== X-Received: by 10.66.138.40 with SMTP id qn8mr5555942pab.154.1391463274912; Mon, 03 Feb 2014 13:34:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.201.231 with HTTP; Mon, 3 Feb 2014 13:34:14 -0800 (PST) In-Reply-To: <20140129094346.GA191Z@holmon> References: <20140129094346.GA191Z@holmon> From: Jim Meyering Date: Mon, 3 Feb 2014 13:34:14 -0800 X-Google-Sender-Auth: EiNUKL5JHheSS_MDwhptddxqLrw Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Wed, Jan 29, 2014 at 1:43 AM, Santiago wrote: > Package: grep > Version: 2.16 > Severity: important > > Hi there, > > I forward this bug from debian's BTS. Last changes in -P brought another > problem. I've confirmed this behavior on last debian package: > > ----- Forwarded message from Vincent Lefevre ----- > > [snip] > > > grep -P loops on some files with invalid UTF-8 sequences, e.g. > > $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > =EF=BF=BDe > > (the infinite loop is interrupted here by a broken pipe due to > the "head"). > > It seems that the fix of > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=3D730472 Thanks for the heads-up. That appears to be a problem with pcre. I've just build grep (git head) against pcre (git head), and adjusted your example slightly and built with gcc's address sanitizer mode. Now, libpcre gets an internal segfault: $ printf "\xe9\n\xab\n" > k; src/grep -P 'e|.?z' k ASAN:SIGSEGV =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D11821=3D=3DERROR: AddressSanitizer: SEGV on unknown address 0x62cfffffffff (pc 0x00\ 00004f0743 sp 0x7fff6b32f4a0 bp 0x7fff6b32f760 T0) #0 0x4f0742 in match /w/co/pcre/pcre_exec.c:5943 #1 0x4f26d5 in pcre_exec /w/co/pcre/pcre_exec.c:6941 #2 0x46f421 in Pexecute /w/co/grep/src/pcresearch.c:178 #3 0x4717a3 in do_execute /w/co/grep/src/main.c:1075 #4 0x4717a3 in grepbuf /w/co/grep/src/main.c:1111 #5 0x472249 in grep /w/co/grep/src/main.c:1222 #6 0x472249 in grepdesc /w/co/grep/src/main.c:1476 #7 0x4073ca in main /w/co/grep/src/main.c:2396 #8 0x7f6f21a53cdc in __libc_start_main (/lib64/libc.so.6+0x1ecdc) #9 0x408a54 (/w/u/w/co/grep/src/grep+0x408a54) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /w/co/pcre/pcre_exec.c:5943 match =3D=3D11821=3D=3DABORTING Sorry, but I don't have time to debug further. Quick glance suggests it is backing up too far: (gdb) b __asan_report_error Breakpoint 1 at 0x448c40: file ../../.././libsanitizer/asan/asan_report.cc, line 711. (gdb) r Starting program: /w/u/w/co/grep/src/grep -P e\|.\?z k warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x00000000004f0743 in match (eptr=3D0x62cfffffffff "", ecode=3D0x60700000df8a "\035zx", mstart=3D0x62d00000b002 "\253\n", '\276' ..., offset_top=3D2, md=3D0x7fffffffce30, eptrb=3D0x0, rdepth=3D0) at pcre_exec.c:5943 5943 BACKCHAR(eptr); (gdb) l 5938 { 5939 if (eptr =3D=3D pp) goto TAIL_RECURSE; 5940 RMATCH(eptr, ecode, offset_top, md, eptrb, RM46); 5941 if (rrc !=3D MATCH_NOMATCH) RRETURN(rrc); 5942 eptr--; 5943 BACKCHAR(eptr); 5944 if (ctype =3D=3D OP_ANYNL && eptr > pp && UCHAR21(eptr) =3D=3D CHAR_NL && 5945 UCHAR21(eptr - 1) =3D=3D CHAR_CR) eptr--; 5946 } 5947 } From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences References: <20140129094346.GA1910@holmon> In-Reply-To: <20140129094346.GA1910@holmon> Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 08 Mar 2014 23:08:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 16586@debbugs.gnu.org Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.139432002521373 (code B ref 16586); Sat, 08 Mar 2014 23:08:02 +0000 Received: (at 16586) by debbugs.gnu.org; 8 Mar 2014 23:07:05 +0000 Received: from localhost ([127.0.0.1]:57089 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WMQKe-0005Yd-R2 for submit@debbugs.gnu.org; Sat, 08 Mar 2014 18:07:05 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]:48581) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WMQKc-0005YH-JI for 16586@debbugs.gnu.org; Sat, 08 Mar 2014 18:07:03 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id BDD3739E8011 for <16586@debbugs.gnu.org>; Sat, 8 Mar 2014 15:07:01 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QCFkMsuwAqqw for <16586@debbugs.gnu.org>; Sat, 8 Mar 2014 15:07:01 -0800 (PST) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 6115839E8008 for <16586@debbugs.gnu.org>; Sat, 8 Mar 2014 15:07:01 -0800 (PST) Message-ID: <531BA294.1020601@cs.ucla.edu> Date: Sat, 08 Mar 2014 15:07:00 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) For what it's worth I can't reproduce this bug on Fedora 20 x86-64, even with valgrind and/or GCC -faddress=sanitize. I'm using Fedora pcre-8.33-4.fc20.x86_64. From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences Resent-From: Santiago Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Tue, 15 Apr 2014 14:11:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 736919@bugs.debian.org, Paul Eggert Cc: 16586@debbugs.gnu.org Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.139757102432441 (code B ref 16586); Tue, 15 Apr 2014 14:11:01 +0000 Received: (at 16586) by debbugs.gnu.org; 15 Apr 2014 14:10:24 +0000 Received: from localhost ([127.0.0.1]:48987 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wa447-0008RA-Kv for submit@debbugs.gnu.org; Tue, 15 Apr 2014 10:10:23 -0400 Received: from mx1.riseup.net ([198.252.153.129]:37423) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wa446-0008R0-1R for 16586@debbugs.gnu.org; Tue, 15 Apr 2014 10:10:22 -0400 Received: from fulvetta.riseup.net (fulvetta-pn.riseup.net [10.0.1.75]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 31ECD4C973; Tue, 15 Apr 2014 07:10:21 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr@fulvetta.riseup.net) with ESMTPSA id 266AE113 Received: by holmon (sSMTP sendmail emulation); Tue, 15 Apr 2014 16:10:29 +0200 Date: Tue, 15 Apr 2014 16:10:29 +0200 From: Santiago Message-ID: <20140415141029.GA1587@holmon> References: <20140129094346.GA1910@holmon> <531BA294.1020601@cs.ucla.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <531BA294.1020601@cs.ucla.edu> User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.98.1 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On Sat, Mar 08, 2014 at 03:07:00PM -0800, Paul Eggert wrote: > For what it's worth I can't reproduce this bug on Fedora 20 x86-64, > even with valgrind and/or GCC -faddress=sanitize. I'm using Fedora > pcre-8.33-4.fc20.x86_64. > Indeed, it was a debian-pcre-specific bug. New pcre package (1:8.31-3) enables JIT regex compilation and solves the issue. I'm updating grep's dependencies to close this bug in debian. Regards, Santiago From unknown Mon Jun 23 04:14:41 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Santiago Subject: bug#16586: closed (Re: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences) Message-ID: References: <534D46D3.7070804@cs.ucla.edu> <20140129094346.GA1910@holmon> X-Gnu-PR-Message: they-closed 16586 X-Gnu-PR-Package: grep Reply-To: 16586@debbugs.gnu.org Date: Tue, 15 Apr 2014 14:50:03 +0000 Content-Type: multipart/mixed; boundary="----------=_1397573403-4538-1" This is a multi-part message in MIME format... ------------=_1397573403-4538-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #16586: grep: infinite loop in grep -P on some files with invalid UTF-8 seq= uences which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 16586@debbugs.gnu.org. --=20 16586: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D16586 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1397573403-4538-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 16586-done) by debbugs.gnu.org; 15 Apr 2014 14:49:16 +0000 Received: from localhost ([127.0.0.1]:49004 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wa4fk-0001A0-4z for submit@debbugs.gnu.org; Tue, 15 Apr 2014 10:49:16 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:47902) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wa4fg-00019g-CF for 16586-done@debbugs.gnu.org; Tue, 15 Apr 2014 10:49:13 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 6185639E8012; Tue, 15 Apr 2014 07:49:06 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Mn090NA4s8NK; Tue, 15 Apr 2014 07:48:57 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id C6B0C39E8015; Tue, 15 Apr 2014 07:48:57 -0700 (PDT) Message-ID: <534D46D3.7070804@cs.ucla.edu> Date: Tue, 15 Apr 2014 07:48:51 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Santiago , 736919@bugs.debian.org Subject: Re: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences References: <20140129094346.GA1910@holmon> <531BA294.1020601@cs.ucla.edu> <20140415141029.GA1587@holmon> In-Reply-To: <20140415141029.GA1587@holmon> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 16586-done Cc: 16586-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Santiago wrote: > it was a debian-pcre-specific bug. Thanks, closing the bug upstream. ------------=_1397573403-4538-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 29 Jan 2014 09:45:15 +0000 Received: from localhost ([127.0.0.1]:39811 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8Rhr-0008IH-2J for submit@debbugs.gnu.org; Wed, 29 Jan 2014 04:45:15 -0500 Received: from mx1.riseup.net ([198.252.153.129]:37253) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8Rhn-0008I7-Cp for submit@debbugs.gnu.org; Wed, 29 Jan 2014 04:45:12 -0500 Received: from fulvetta.riseup.net (fulvetta-pn.riseup.net [10.0.1.75]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id CF2D5510A3 for ; Wed, 29 Jan 2014 01:45:10 -0800 (PST) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr@fulvetta.riseup.net) with ESMTPSA id C6AFE365 Received: by holmon (sSMTP sendmail emulation); Wed, 29 Jan 2014 10:43:46 +0100 Date: Wed, 29 Jan 2014 10:43:46 +0100 From: Santiago To: submit@debbugs.gnu.org Subject: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences Message-ID: <20140129094346.GA1910@holmon> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.8 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Package: grep Version: 2.16 Severity: important Hi there, I forward this bug from debian's BTS. Last changes in -P brought another problem. I've confirmed this behavior on last debian package: ----- Forwarded message from Vincent Lefevre ----- [snip] grep -P loops on some files with invalid UTF-8 sequences, e.g. $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head �e �e �e �e �e �e �e �e �e �e (the infinite loop is interrupted here by a broken pipe due to the "head"). It seems that the fix of https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472 didn't solve all the problems. -- System Information: Debian Release: jessie/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.12-1-amd64 (SMP w/2 CPU cores) Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages grep depends on: ii dpkg 1.17.6 ii install-info 5.2.0.dfsg.1-2 ii libc6 2.17-97 ii libpcre3 1:8.31-2 grep recommends no packages. grep suggests no packages. -- no debconf information ----- End forwarded message ----- ------------=_1397573403-4538-1-- From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 16 Apr 2014 16:26:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 16586@debbugs.gnu.org, Paul Eggert , Santiago , Norihiro Tanaka Cc: 736919@bugs.debian.org Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.139766550429618 (code B ref 16586); Wed, 16 Apr 2014 16:26:01 +0000 Received: (at 16586) by debbugs.gnu.org; 16 Apr 2014 16:25:04 +0000 Received: from localhost ([127.0.0.1]:49830 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WaSdz-0007hc-4g for submit@debbugs.gnu.org; Wed, 16 Apr 2014 12:25:03 -0400 Received: from mail-yh0-f44.google.com ([209.85.213.44]:38389) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WaSdv-0007h2-IV for 16586@debbugs.gnu.org; Wed, 16 Apr 2014 12:25:01 -0400 Received: by mail-yh0-f44.google.com with SMTP id f10so11014290yha.3 for <16586@debbugs.gnu.org>; Wed, 16 Apr 2014 09:24:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=R+pfP6eFos15Cu+XR2WMNzbMtzJq3OzwS1QINFHk8rU=; b=v5kN3JLUmR0G8YYVOQ+cqrxXfOZIvFE888KQlGvrIFVQiL4ONKj+0Ep7qvWcsMhwW8 7eRWEyE6nPHDoLOV2KscNzi+5IJghaWbnGwgtVQCGQhwLigl1aSxILSbr3RrKRyP9//q i+Cy4B1+jsVf0QDHaGCP275UmzVZ/RFE+NYJQ3iwnFdTNWZys9gLe3r2KpyCGESsQk2M IY8KrvoWlYV9axvfNjmxW9OYudlagzMNf4VtZ8xvi9VdT79Ni0Ie+plmYXwxv2LuTjtb D+FcwitSrS5vKZX7+MhZ7wphEm7rA8FjV2JZxeCo0flrQt8E/NOsF0VTWE9ms+1cc39I rvgw== X-Received: by 10.236.30.230 with SMTP id k66mr14025956yha.57.1397665493932; Wed, 16 Apr 2014 09:24:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.149.193 with HTTP; Wed, 16 Apr 2014 09:24:33 -0700 (PDT) In-Reply-To: <534D46D3.7070804@cs.ucla.edu> References: <20140129094346.GA1910@holmon> <531BA294.1020601@cs.ucla.edu> <20140415141029.GA1587@holmon> <534D46D3.7070804@cs.ucla.edu> From: Jim Meyering Date: Wed, 16 Apr 2014 09:24:33 -0700 X-Google-Sender-Auth: rnSsPNGkgKSUT5xnrbLLaDo71gM Message-ID: Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Tue, Apr 15, 2014 at 7:48 AM, Paul Eggert wrote: > Santiago wrote: >> it was a debian-pcre-specific bug. > > Thanks, closing the bug upstream. This bug is still present in upstream libpcre version 8.35. I wrote a patch for it, posted at http://debbugs.gnu.org/17245#26 and Norihiro forwarded it on to the libpcre bug tracker here: http://bugs.exim.org/show_bug.cgi?id=1468 From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 16 13:44:00 2014 Received: (at control) by debbugs.gnu.org; 16 Apr 2014 17:44:00 +0000 Received: from localhost ([127.0.0.1]:49871 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WaTsN-0001WI-Fq for submit@debbugs.gnu.org; Wed, 16 Apr 2014 13:43:59 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:38862) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WaTsJ-0001W0-Hu for control@debbugs.gnu.org; Wed, 16 Apr 2014 13:43:56 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 790DFA6000F for ; Wed, 16 Apr 2014 10:43:49 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n22g5v3BJxAp for ; Wed, 16 Apr 2014 10:43:45 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 144A1A60011 for ; Wed, 16 Apr 2014 10:43:45 -0700 (PDT) Message-ID: <534EC150.3090607@cs.ucla.edu> Date: Wed, 16 Apr 2014 10:43:44 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: forwarding 16586 upstream Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) forwarded 16586 Philip Hazel thanks From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Wed, 16 Apr 2014 17:51:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Jim Meyering , 16586@debbugs.gnu.org, Santiago , Norihiro Tanaka Cc: 736919@bugs.debian.org Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.139767063410968 (code B ref 16586); Wed, 16 Apr 2014 17:51:02 +0000 Received: (at 16586) by debbugs.gnu.org; 16 Apr 2014 17:50:34 +0000 Received: from localhost ([127.0.0.1]:49881 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WaTyj-0002qp-CT for submit@debbugs.gnu.org; Wed, 16 Apr 2014 13:50:33 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:39246) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WaTyh-0002qa-2C for 16586@debbugs.gnu.org; Wed, 16 Apr 2014 13:50:31 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 14198A6000E; Wed, 16 Apr 2014 10:50:25 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dwRFeAEEZmON; Wed, 16 Apr 2014 10:50:21 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id D6EF039E8011; Wed, 16 Apr 2014 10:50:20 -0700 (PDT) Message-ID: <534EC2DC.90605@cs.ucla.edu> Date: Wed, 16 Apr 2014 10:50:20 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 References: <20140129094346.GA1910@holmon> <531BA294.1020601@cs.ucla.edu> <20140415141029.GA1587@holmon> <534D46D3.7070804@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Jim Meyering wrote: > This bug is still present in upstream libpcre version 8.35. Ah, sorry, I thought it was Debian-specific. I've reopened grep bug 16586 , and have forwarded it to Philip Hazel, who currently has the PCRE bug assigned, according to . From unknown Mon Jun 23 04:14:41 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.503 (Entity 5.503) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Santiago Subject: bug#16586: closed (Re: bug#17245: GREP BUG: grep -P and binary files) Message-ID: References: <53555D5E.3080806@cs.ucla.edu> <20140129094346.GA1910@holmon> X-Gnu-PR-Message: they-closed 16586 X-Gnu-PR-Package: grep Reply-To: 16586@debbugs.gnu.org Date: Mon, 21 Apr 2014 18:04:03 +0000 Content-Type: multipart/mixed; boundary="----------=_1398103443-20495-1" This is a multi-part message in MIME format... ------------=_1398103443-20495-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #16586: grep: infinite loop in grep -P on some files with invalid UTF-8 seq= uences which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 16586@debbugs.gnu.org. --=20 16586: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D16586 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1398103443-20495-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 16586-done) by debbugs.gnu.org; 21 Apr 2014 18:03:27 +0000 Received: from localhost ([127.0.0.1]:54336 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WcIYw-0005JP-HC for submit@debbugs.gnu.org; Mon, 21 Apr 2014 14:03:27 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:41940) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WcIYt-0005JB-LV; Mon, 21 Apr 2014 14:03:24 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 634C039E8012; Mon, 21 Apr 2014 11:03:22 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BX-nCqCKsPVT; Mon, 21 Apr 2014 11:03:18 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 518C939E8008; Mon, 21 Apr 2014 11:03:18 -0700 (PDT) Message-ID: <53555D5E.3080806@cs.ucla.edu> Date: Mon, 21 Apr 2014 11:03:10 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Norihiro Tanaka , 17245-done@debbugs.gnu.org, 16586-done@debbugs.gnu.org Subject: Re: bug#17245: GREP BUG: grep -P and binary files References: <20140416084844.F895.27F6AC2D@kcn.ne.jp> <20140416211353.F7FD.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140416211353.F7FD.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------090306020509090105090802" X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 16586-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) This is a multi-part message in MIME format. --------------090306020509090105090802 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 04/16/2014 05:13 AM, Norihiro Tanaka wrote: > http://bugs.exim.org/show_bug.cgi?id=1468 Thanks. The response there makes it clear that if grep passes arbitrary binary data to PCRE, and if grep uses PCRE_NO_UTF8_CHECK, undefined behavior will result (maybe infinite loop, core dump, etc.). We can't have undefined behavior in grep. A simple fix is to avoid using PCRE_NO_UTF8_CHECK so I installed the attached patch to do that. Perhaps we can think of a better way at some point. In the meantime I'm taking the liberty of closing Bug#17245 and Bug#16586. --------------090306020509090105090802 Content-Type: text/x-patch; name="0001-grep-P-now-rejects-invalid-input-sequences-in-UTF-8-.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-grep-P-now-rejects-invalid-input-sequences-in-UTF-8-.pa"; filename*1="tch" >From b9a691aa9b7aaa43e07841f11095d779b210448d Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Mon, 21 Apr 2014 10:51:16 -0700 Subject: [PATCH] grep: -P now rejects invalid input sequences in UTF-8 locales See and . * NEWS: Document this. * src/pcresearch.c (Pexecute): Do not use PCRE_NO_UTF8_CHECK, as this leads to undefined behavior when the input is not UTF-8. * tests/pcre-infloop, tests/pcre-invalid-utf8-input: Exit status is now 2, not 1, when grep -P is given invalid UTF-8 data in a UTF-8 locale. --- NEWS | 4 ++++ src/pcresearch.c | 17 ++++------------- tests/pcre-infloop | 2 +- tests/pcre-invalid-utf8-input | 5 ++--- 4 files changed, 11 insertions(+), 17 deletions(-) diff --git a/NEWS b/NEWS index fbb782b..2d3e12a 100644 --- a/NEWS +++ b/NEWS @@ -14,6 +14,10 @@ GNU grep NEWS -*- outline -*- grep -f no longer mishandles patterns containing NUL bytes. [bug introduced in grep-2.11] + grep -P now reports an error and exits when given invalid UTF-8 data. + Previously it was unreliable, and sometimes crashed or looped. + [bug introduced in grep-2.16] + grep -P now works with -w and -x and backreferences. Before, echo aa|grep -Pw '(.)\1' would fail to match, yet echo aa|grep -Pw '(.)\2' would match. diff --git a/src/pcresearch.c b/src/pcresearch.c index a5e953f..9f63f37 100644 --- a/src/pcresearch.c +++ b/src/pcresearch.c @@ -52,19 +52,14 @@ Pcompile (char const *pattern, size_t size) int e; char const *ep; char *re = xnmalloc (4, size + 7); - int flags = PCRE_MULTILINE | (match_icase ? PCRE_CASELESS : 0); + int flags = (PCRE_MULTILINE + | (match_icase ? PCRE_CASELESS : 0) + | (using_utf8 () ? PCRE_UTF8 : 0)); char const *patlim = pattern + size; char *n = re; char const *p; char const *pnul; - if (using_utf8 ()) - { - /* Enable PCRE's UTF-8 matching. Note also the use of - PCRE_NO_UTF8_CHECK when calling pcre_extra, below. */ - flags |= PCRE_UTF8; - } - /* FIXME: Remove these restrictions. */ if (memchr (pattern, '\n', size)) error (EXIT_TROUBLE, 0, _("the -P option only supports a single pattern")); @@ -154,10 +149,6 @@ Pexecute (char const *buf, size_t size, size_t *match_size, e == PCRE_ERROR_NOMATCH && line_next < buf + size; start_ofs -= line_next - line_buf) { - /* Disable the check that would make an invalid byte - seqence *in the input* trigger a failure. */ - int options = PCRE_NO_UTF8_CHECK; - line_buf = line_next; line_end = memchr (line_buf, eolbyte, (buf + size) - line_buf); if (line_end == NULL) @@ -172,7 +163,7 @@ Pexecute (char const *buf, size_t size, size_t *match_size, error (EXIT_TROUBLE, 0, _("exceeded PCRE's line length limit")); e = pcre_exec (cre, extra, line_buf, line_end - line_buf, - start_ofs < 0 ? 0 : start_ofs, options, + start_ofs < 0 ? 0 : start_ofs, 0, sub, sizeof sub / sizeof *sub); } diff --git a/tests/pcre-infloop b/tests/pcre-infloop index 57b67ae..febf356 100755 --- a/tests/pcre-infloop +++ b/tests/pcre-infloop @@ -28,6 +28,6 @@ printf 'a\201b\r' > in || framework_failure_ fail=0 LC_ALL=en_US.utf8 timeout 3 grep -P 'a.?..b' in -test $? = 1 || fail_ "libpcre's match function appears to infloop" +test $? = 2 || fail_ "libpcre's match function appears to infloop" Exit $fail diff --git a/tests/pcre-invalid-utf8-input b/tests/pcre-invalid-utf8-input index ccf3caf..913e8ee 100755 --- a/tests/pcre-invalid-utf8-input +++ b/tests/pcre-invalid-utf8-input @@ -15,8 +15,7 @@ fail=0 printf 'j\202\nj\n' > in || framework_failure_ -LC_ALL=en_US.UTF-8 grep -P j in > out 2>&1 || fail=1 -compare in out || fail=1 -compare /dev/null err || fail=1 +LC_ALL=en_US.UTF-8 grep -P j in +test $? -eq 2 || fail=1 Exit $fail -- 1.9.0 --------------090306020509090105090802-- ------------=_1398103443-20495-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 29 Jan 2014 09:45:15 +0000 Received: from localhost ([127.0.0.1]:39811 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8Rhr-0008IH-2J for submit@debbugs.gnu.org; Wed, 29 Jan 2014 04:45:15 -0500 Received: from mx1.riseup.net ([198.252.153.129]:37253) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1W8Rhn-0008I7-Cp for submit@debbugs.gnu.org; Wed, 29 Jan 2014 04:45:12 -0500 Received: from fulvetta.riseup.net (fulvetta-pn.riseup.net [10.0.1.75]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Gandi Standard SSL CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id CF2D5510A3 for ; Wed, 29 Jan 2014 01:45:10 -0800 (PST) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: santiagorr@fulvetta.riseup.net) with ESMTPSA id C6AFE365 Received: by holmon (sSMTP sendmail emulation); Wed, 29 Jan 2014 10:43:46 +0100 Date: Wed, 29 Jan 2014 10:43:46 +0100 From: Santiago To: submit@debbugs.gnu.org Subject: grep: infinite loop in grep -P on some files with invalid UTF-8 sequences Message-ID: <20140129094346.GA1910@holmon> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97.8 at mx1 X-Virus-Status: Clean X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) Package: grep Version: 2.16 Severity: important Hi there, I forward this bug from debian's BTS. Last changes in -P brought another problem. I've confirmed this behavior on last debian package: ----- Forwarded message from Vincent Lefevre ----- [snip] grep -P loops on some files with invalid UTF-8 sequences, e.g. $ /usr/bin/printf "\xe9\x65\n\xab\n" | grep -P '.e|.?z' | head �e �e �e �e �e �e �e �e �e �e (the infinite loop is interrupted here by a broken pipe due to the "head"). It seems that the fix of https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730472 didn't solve all the problems. -- System Information: Debian Release: jessie/sid APT prefers unstable APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.12-1-amd64 (SMP w/2 CPU cores) Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages grep depends on: ii dpkg 1.17.6 ii install-info 5.2.0.dfsg.1-2 ii libc6 2.17-97 ii libpcre3 1:8.31-2 grep recommends no packages. grep suggests no packages. -- no debconf information ----- End forwarded message ----- ------------=_1398103443-20495-1-- From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: bug#17245: GREP BUG: grep -P and binary files Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 24 Apr 2014 02:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 16586@debbugs.gnu.org, Paul Eggert , Santiago Cc: 17245-done@debbugs.gnu.org, Norihiro Tanaka , 16586-done@debbugs.gnu.org Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.13983066759447 (code B ref 16586); Thu, 24 Apr 2014 02:32:01 +0000 Received: (at 16586) by debbugs.gnu.org; 24 Apr 2014 02:31:15 +0000 Received: from localhost ([127.0.0.1]:56729 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd9RR-0002SD-Pf for submit@debbugs.gnu.org; Wed, 23 Apr 2014 22:31:14 -0400 Received: from mail-yk0-f171.google.com ([209.85.160.171]:46600) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd9RL-0002Rh-Fh; Wed, 23 Apr 2014 22:31:09 -0400 Received: by mail-yk0-f171.google.com with SMTP id q9so1564535ykb.16 for ; Wed, 23 Apr 2014 19:31:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=UxxjrM1X9NKSHJc1P3BGI39oNB8skqNdBbag9ISRBXo=; b=pVjEJ+rJUe6aJq33IEN904RJRopIZ33gGTgWI+APUhDCQxtpNW8eVE4hvTj51o+Qvn gnwl/DXMNGRnXdFyteKf6Jeyjz3QIaXvnZSEjGm6lc7JW7PjbPBX1eFUBPzJwP6IjWd6 yXL+kly55QzAKEYBCXdNdZf/Mbj+oPvWxdf/vnXqCPnexvPuStKYlD5iS8j/khkzFt0X j4NZ63Afm58JkRlho9KGniXObHB8EVOYV62zZ49U4/XGiafm1YrF0ApM+ARuStgftRKq 0ysa3eIHRQFtRm1IDsWdkEI9BIZdUOTnTuLaPxGXEogyOdleUIVFEfKOVNrSskkwmWCg hKYw== X-Received: by 10.236.177.100 with SMTP id c64mr75496363yhm.30.1398306666716; Wed, 23 Apr 2014 19:31:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.149.193 with HTTP; Wed, 23 Apr 2014 19:30:46 -0700 (PDT) In-Reply-To: <53555D5E.3080806@cs.ucla.edu> References: <20140416084844.F895.27F6AC2D@kcn.ne.jp> <20140416211353.F7FD.27F6AC2D@kcn.ne.jp> <53555D5E.3080806@cs.ucla.edu> From: Jim Meyering Date: Wed, 23 Apr 2014 19:30:46 -0700 X-Google-Sender-Auth: ChLOJUgVBol-tF4swqeDxey1lx4 Message-ID: Content-Type: multipart/mixed; boundary=20cf303f656a2985db04f7c0a44a X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --20cf303f656a2985db04f7c0a44a Content-Type: text/plain; charset=ISO-8859-1 On Mon, Apr 21, 2014 at 11:03 AM, Paul Eggert wrote: > On 04/16/2014 05:13 AM, Norihiro Tanaka wrote: >> >> http://bugs.exim.org/show_bug.cgi?id=1468 > > > Thanks. The response there makes it clear that if grep passes arbitrary > binary data to PCRE, and if grep uses PCRE_NO_UTF8_CHECK, undefined behavior > will result (maybe infinite loop, core dump, etc.). We can't have undefined > behavior in grep. A simple fix is to avoid using PCRE_NO_UTF8_CHECK so I > installed the attached patch to do that. Perhaps we can think of a better > way at some point. In the meantime I'm taking the liberty of closing > Bug#17245 and Bug#16586. Thanks for the patch, but I'm not sure I like the consequences: that anyone using grep -P to search data that is even a tiny bit inconsistent with their UTF-8 locale will now get an exit status of 2 rather than the matches they used to get. I would prefer to test for working PCRE support and disable -P if it is deemed inadequate, but that may have to wait for the release of a new version of libpcre. In any case, I found that this additional change is required, at least on OS/X, to avoid a test failure: --20cf303f656a2985db04f7c0a44a Content-Type: text/plain; charset=US-ASCII; name="k.txt" Content-Disposition: attachment; filename="k.txt" Content-Transfer-Encoding: base64 X-Attachment-Id: f_hudfg1em1 RnJvbSBiODBhOTU2OTE0MThjZTE5YjQyYjU0YzcwNjYzM2VmOGJlMGJkOWVlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBKaW0gTWV5ZXJpbmcgPG1leWVyaW5nQGZiLmNvbT4KRGF0ZTog V2VkLCAyMyBBcHIgMjAxNCAxOToyMToxMSAtMDcwMApTdWJqZWN0OiBbUEFUQ0hdIHRlc3RzOiB1 c2UgY29uc2lzdGVudCBzcGVsbGluZyBmb3IgbG9jYWxlIG5hbWUsIGVuX1VTLlVURi04CgoqIHRl c3RzL3BjcmUtaW5mbG9vcDogU3BlbGwgbG9jYWxlIG5hbWUsIGVuX1VTLlVURi04LCBjb25zaXN0 ZW50bHksCmNvbnZlcnRpbmcgdGhpcyBvbmUgdXNlIGZyb20gImVuX1VTLnV0ZjgiLCB3aGljaCB3 b3VsZCBwcm92b2tlIGEKdGVzdCBmYWlsdXJlIG9uIE9TL1guCi0tLQogdGVzdHMvcGNyZS1pbmZs b29wIHwgMiArLQogMSBmaWxlIGNoYW5nZWQsIDEgaW5zZXJ0aW9uKCspLCAxIGRlbGV0aW9uKC0p CgpkaWZmIC0tZ2l0IGEvdGVzdHMvcGNyZS1pbmZsb29wIGIvdGVzdHMvcGNyZS1pbmZsb29wCmlu ZGV4IGZlYmYzNTYuLjFiMzNlNzIgMTAwNzU1Ci0tLSBhL3Rlc3RzL3BjcmUtaW5mbG9vcAorKysg Yi90ZXN0cy9wY3JlLWluZmxvb3AKQEAgLTI3LDcgKzI3LDcgQEAgcHJpbnRmICdhXDIwMWJccicg PiBpbiB8fCBmcmFtZXdvcmtfZmFpbHVyZV8KCiBmYWlsPTAKCi1MQ19BTEw9ZW5fVVMudXRmOCB0 aW1lb3V0IDMgZ3JlcCAtUCAnYS4/Li5iJyBpbgorTENfQUxMPWVuX1VTLlVURi04IHRpbWVvdXQg MyBncmVwIC1QICdhLj8uLmInIGluCiB0ZXN0ICQ/ID0gMiB8fCBmYWlsXyAibGlicGNyZSdzIG1h dGNoIGZ1bmN0aW9uIGFwcGVhcnMgdG8gaW5mbG9vcCIKCiBFeGl0ICRmYWlsCi0tIAoxLjkuMi40 NTkuZzY4NzczYWMKCg== --20cf303f656a2985db04f7c0a44a-- From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: bug#17245: GREP BUG: grep -P and binary files Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 24 Apr 2014 05:40:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Jim Meyering , 16586@debbugs.gnu.org, Santiago Cc: 17245@debbugs.gnu.org, Norihiro Tanaka Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.139831795832347 (code B ref 16586); Thu, 24 Apr 2014 05:40:02 +0000 Received: (at 16586) by debbugs.gnu.org; 24 Apr 2014 05:39:18 +0000 Received: from localhost ([127.0.0.1]:56780 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdCNS-0008Pf-Al for submit@debbugs.gnu.org; Thu, 24 Apr 2014 01:39:18 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:55463) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdCNQ-0008PS-8D; Thu, 24 Apr 2014 01:39:17 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 3E580A60001; Wed, 23 Apr 2014 22:39:15 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AyxW2z3NiVhK; Wed, 23 Apr 2014 22:39:10 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 72ABA39E8011; Wed, 23 Apr 2014 22:39:10 -0700 (PDT) Message-ID: <5358A37E.7000407@cs.ucla.edu> Date: Wed, 23 Apr 2014 22:39:10 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 References: <20140416084844.F895.27F6AC2D@kcn.ne.jp> <20140416211353.F7FD.27F6AC2D@kcn.ne.jp> <53555D5E.3080806@cs.ucla.edu> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -3.0 (---) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Jim Meyering wrote: > anyone using grep -P to search data that is even a tiny bit > inconsistent with their UTF-8 locale will now get an exit status of > 2 rather than the matches they used to get. Yes, I don't like that either, but says libpcre intends to have undefined behavior here. If so, it wouldn't help to wait until the next libprce release, which may well have a serious bug of this form in a different area, a bug that's not easy to test for. Perhaps somebody should modify grep -P to discard input lines containing non-UTF-8 data instead of presenting them to libprce. That way, it would be safe for grep -P to use PCRE_NO_UTF8_CHECK. Although grep -P should report an error and exit with status 2 if it discards input due to encoding errors, it can also report matches in lines that do not contain encoding errors, so that users can see both the error messages and the matches. From unknown Mon Jun 23 04:14:41 2025 X-Loop: help-debbugs@gnu.org Subject: bug#16586: bug#17245: GREP BUG: grep -P and binary files Resent-From: Jim Meyering Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 24 Apr 2014 15:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 16586 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Paul Eggert Cc: 17245 <17245@debbugs.gnu.org>, Santiago , 16586@debbugs.gnu.org, Norihiro Tanaka Received: via spool by 16586-submit@debbugs.gnu.org id=B16586.139835337112278 (code B ref 16586); Thu, 24 Apr 2014 15:30:02 +0000 Received: (at 16586) by debbugs.gnu.org; 24 Apr 2014 15:29:31 +0000 Received: from localhost ([127.0.0.1]:57439 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdLad-0003Br-8x for submit@debbugs.gnu.org; Thu, 24 Apr 2014 11:29:31 -0400 Received: from mail-yk0-f172.google.com ([209.85.160.172]:42761) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdLaa-0003Ba-0P; Thu, 24 Apr 2014 11:29:28 -0400 Received: by mail-yk0-f172.google.com with SMTP id q9so1433671ykb.3 for ; Thu, 24 Apr 2014 08:29:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=AkSi9hh4YbDUxJPygwmwQU8iAWPto3CmkabQ9Mr9+GM=; b=Cx5KyIJdMrj8tyPGCuz4/VuMb/xhBOq3DEseygtzp2oeh6Pk/lanFj91FWgeqiXGFE 8BpfHpMGb5LCv4YBvTShE/LYC+PDQYnjfomqUJLsOLFsAIAR3F84hrO8YXFlHwJiXerd YhFcPwADyg2cc14z1RZxO2Ponist6witS02aRNXjJbBUra0GwoWtQJaNcDn6bITYQ24A BRC6LFyXwWbZy0BiWEa1kebACfXl+mM4dJMCBXIo5wJtXyvB4QOrlSWrYgqByw4z7vNy RwxUhAlQDkmmE451ZpsRSAMNw5mqbQJJ+txQrh9xzydZ966e9mOtPi4bFkcCMEM2v0U8 zRnQ== X-Received: by 10.236.125.12 with SMTP id y12mr3511089yhh.42.1398353367363; Thu, 24 Apr 2014 08:29:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.170.149.193 with HTTP; Thu, 24 Apr 2014 08:29:07 -0700 (PDT) In-Reply-To: <5358A37E.7000407@cs.ucla.edu> References: <20140416084844.F895.27F6AC2D@kcn.ne.jp> <20140416211353.F7FD.27F6AC2D@kcn.ne.jp> <53555D5E.3080806@cs.ucla.edu> <5358A37E.7000407@cs.ucla.edu> From: Jim Meyering Date: Thu, 24 Apr 2014 08:29:07 -0700 X-Google-Sender-Auth: PUdNlZsz10IAJjRLyDEQwcBl-Ww Message-ID: Content-Type: text/plain; charset=ISO-8859-1 X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) On Wed, Apr 23, 2014 at 10:39 PM, Paul Eggert wrote: > Jim Meyering wrote: >> >> anyone using grep -P to search data that is even a tiny bit >> inconsistent with their UTF-8 locale will now get an exit status of >> 2 rather than the matches they used to get. > > > Yes, I don't like that either, but says libpcre Oh! I had not read that. That is disappointing. > intends to have undefined behavior here. If so, it wouldn't help to wait > until the next libprce release, which may well have a serious bug of this > form in a different area, a bug that's not easy to test for. Indeed. > Perhaps somebody should modify grep -P to discard input lines containing > non-UTF-8 data instead of presenting them to libprce. That way, it would be > safe for grep -P to use PCRE_NO_UTF8_CHECK. Although grep -P should report > an error and exit with status 2 if it discards input due to encoding errors, > it can also report matches in lines that do not contain encoding errors, so > that users can see both the error messages and the matches. That sounds reasonable, but I don't like the requirement that one make two passes over each subject text.