From unknown Sun Aug 10 09:14:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#60621: grep -P does not set PCRE2_UCP Resent-From: Karl Pettersson Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 07 Jan 2023 07:38:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 60621 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 60621@debbugs.gnu.org X-Debbugs-Original-To: bug-grep Received: via spool by submit@debbugs.gnu.org id=B.167307705725272 (code B ref -1); Sat, 07 Jan 2023 07:38:03 +0000 Received: (at submit) by debbugs.gnu.org; 7 Jan 2023 07:37:37 +0000 Received: from localhost ([127.0.0.1]:56368 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pE3lo-0006ZV-Rb for submit@debbugs.gnu.org; Sat, 07 Jan 2023 02:37:37 -0500 Received: from lists.gnu.org ([209.51.188.17]:54704) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pDtXM-0002qX-Kh for submit@debbugs.gnu.org; Fri, 06 Jan 2023 15:42:00 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pDtXM-00039z-Cq for bug-grep@gnu.org; Fri, 06 Jan 2023 15:42:00 -0500 Received: from smtp.outgoing.loopia.se ([93.188.3.37]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pDtXK-0001ni-8Y for bug-grep@gnu.org; Fri, 06 Jan 2023 15:42:00 -0500 Received: from s807.loopia.se (localhost [127.0.0.1]) by s807.loopia.se (Postfix) with ESMTP id 3966A2F5F6FF for ; Fri, 6 Jan 2023 21:41:54 +0100 (CET) Received: from s979.loopia.se (unknown [172.22.191.6]) by s807.loopia.se (Postfix) with ESMTP id 2B0D72E2826B for ; Fri, 6 Jan 2023 21:41:54 +0100 (CET) Received: from s476.loopia.se (unknown [172.22.191.6]) by s979.loopia.se (Postfix) with ESMTP id 28DC710BC40B for ; Fri, 6 Jan 2023 21:41:54 +0100 (CET) X-Virus-Scanned: amavisd-new at amavis.loopia.se X-Spam-Flag: NO X-Spam-Score: -1 X-Spam-Level: X-Spam-Status: No, score=-1 tagged_above=-999 required=6.2 tests=[ALL_TRUSTED=-1] autolearn=disabled Received: from s981.loopia.se ([172.22.191.6]) by s476.loopia.se (s476.loopia.se [172.22.190.16]) (amavisd-new, port 10024) with LMTP id aLO3rm-ANgfA for ; Fri, 6 Jan 2023 21:41:53 +0100 (CET) X-Loopia-Auth: user X-Loopia-User: karl.pettersson@klpn.se X-Loopia-Originating-IP: 31.209.52.155 Received: from localhost (31-209-52-155.cust.bredband2.com [31.209.52.155]) (Authenticated sender: karl.pettersson@klpn.se) by s981.loopia.se (Postfix) with ESMTPSA id C1DA922B1765 for ; Fri, 6 Jan 2023 21:41:53 +0100 (CET) Date: Fri, 6 Jan 2023 21:41:53 +0100 From: Karl Pettersson Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Received-SPF: none client-ip=93.188.3.37; envelope-from=karl.pettersson@klpn.se; helo=smtp.outgoing.loopia.se X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Mailman-Approved-At: Sat, 07 Jan 2023 02:37:32 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Using grep -P for boundary matches yields incorrect results with non-ASCII letters: $ echo 'Öst' | grep -P '\bs' Öst The output should be nothing in this case, and the culprit seems to be this line in pcresearch.c: flags |= PCRE2_UTF; If the PCRE2_UCP flag is added according to this, the program behaves correctly: flags |= PCRE2_UTF|PCRE2_UCP; The pcre2grep test program in the pcre2 has the same problem, and I filed an issue there too: https://github.com/PCRE2Project/pcre2/issues/185 A Twitter discussion with more examples: https://twitter.com/gro_tsen/status/1610972356972875777 Kind regards -- Karl Pettersson Uppsala, Sverige/Sweden https://static-dust.klpn.se/ From unknown Sun Aug 10 09:14:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#60621: Duplicate 0f #60618 References: In-Reply-To: Resent-From: Karl Pettersson Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sat, 07 Jan 2023 09:15:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60621 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: bug#60621 <60621@debbugs.gnu.org> Received: via spool by 60621-submit@debbugs.gnu.org id=B60621.16730828733534 (code B ref 60621); Sat, 07 Jan 2023 09:15:02 +0000 Received: (at 60621) by debbugs.gnu.org; 7 Jan 2023 09:14:33 +0000 Received: from localhost ([127.0.0.1]:56433 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pE5Hc-0000uw-Ss for submit@debbugs.gnu.org; Sat, 07 Jan 2023 04:14:33 -0500 Received: from smtp.outgoing.loopia.se ([93.188.3.37]:24581) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pE5Hb-0000ug-1g for 60621@debbugs.gnu.org; Sat, 07 Jan 2023 04:14:31 -0500 Received: from s807.loopia.se (localhost [127.0.0.1]) by s807.loopia.se (Postfix) with ESMTP id 62EAC2F7B637 for <60621@debbugs.gnu.org>; Sat, 7 Jan 2023 10:14:24 +0100 (CET) Received: from s934.loopia.se (unknown [172.22.191.6]) by s807.loopia.se (Postfix) with ESMTP id 546632E27737 for <60621@debbugs.gnu.org>; Sat, 7 Jan 2023 10:14:24 +0100 (CET) Received: from s472.loopia.se (unknown [172.22.191.6]) by s934.loopia.se (Postfix) with ESMTP id 52AC47CEA53 for <60621@debbugs.gnu.org>; Sat, 7 Jan 2023 10:14:24 +0100 (CET) X-Virus-Scanned: amavisd-new at amavis.loopia.se X-Spam-Flag: NO X-Spam-Score: -1 X-Spam-Level: X-Spam-Status: No, score=-1 tagged_above=-999 required=6.2 tests=[ALL_TRUSTED=-1] autolearn=disabled Received: from s934.loopia.se ([172.22.191.5]) by s472.loopia.se (s472.loopia.se [172.22.190.12]) (amavisd-new, port 10024) with LMTP id 9idX8RN1nLTZ for <60621@debbugs.gnu.org>; Sat, 7 Jan 2023 10:14:23 +0100 (CET) X-Loopia-Auth: user X-Loopia-User: karl.pettersson@klpn.se X-Loopia-Originating-IP: 31.209.52.155 Received: from localhost (31-209-52-155.cust.bredband2.com [31.209.52.155]) (Authenticated sender: karl.pettersson@klpn.se) by s934.loopia.se (Postfix) with ESMTPSA id AF0497CEA64 for <60621@debbugs.gnu.org>; Sat, 7 Jan 2023 10:14:23 +0100 (CET) Date: Sat, 7 Jan 2023 10:14:22 +0100 From: Karl Pettersson Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi I first filed the original issue for pcre2grep after a Twitter discussion, and then also sent it to the bug-grep list, but Carlo Arenas had already noticed it (but it had not been registered from what I could see), so it is a duplicate of #60618. -- Karl Pettersson Uppsala, Sverige/Sweden https://static-dust.klpn.se/ From debbugs-submit-bounces@debbugs.gnu.org Sat Jan 07 17:55:38 2023 Received: (at control) by debbugs.gnu.org; 7 Jan 2023 22:55:38 +0000 Received: from localhost ([127.0.0.1]:59154 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pEI6E-0004c1-Am for submit@debbugs.gnu.org; Sat, 07 Jan 2023 17:55:38 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:44080) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pEI6C-0004bn-Mo for control@debbugs.gnu.org; Sat, 07 Jan 2023 17:55:37 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 05E12160040 for ; Sat, 7 Jan 2023 14:55:29 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id wR1lfVLkjnh7 for ; Sat, 7 Jan 2023 14:55:28 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 49F6D160041 for ; Sat, 7 Jan 2023 14:55:28 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra.cs.ucla.edu 49F6D160041 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=78364E5A-2AF3-11ED-87FA-8298ECA2D365; t=1673132128; bh=Zhu4d6F1ZdxZRdxr2KZ24bQwrtVKtsmZTGwZPeTYXEw=; h=Message-ID:Date:MIME-Version:To:From:Subject:Content-Type: Content-Transfer-Encoding; b=U54IVUwUGBI6s5oCX0gaDzhKdlOXD/ccSo6o0/LjqWKJ3n/BeamTweHuSNkYMGSo8 Aqen3s6OJ8LEgldnbeSdwEyfO8XAd927NYASWh8K7B+ohiPSqRVVcdQNl3qyWqUgcd wyYT0vDPWRLvsmzpd9nR1gtPaYvZvASQTz/x4K3A= X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id VnQGQfbzbbME for ; Sat, 7 Jan 2023 14:55:28 -0800 (PST) Received: from [192.168.1.9] (cpe-172-91-119-151.socal.res.rr.com [172.91.119.151]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1C6EC160040 for ; Sat, 7 Jan 2023 14:55:28 -0800 (PST) Message-ID: Date: Sat, 7 Jan 2023 14:54:53 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Content-Language: en-US To: control@debbugs.gnu.org From: Paul Eggert Subject: merge 60618 60621 Organization: UCLA Computer Science Department Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) merge 60618 60621