From unknown Wed Aug 20 02:46:45 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#29909 <29909@debbugs.gnu.org> To: bug#29909 <29909@debbugs.gnu.org> Subject: Status: non-greedy matching (RE2) Reply-To: bug#29909 <29909@debbugs.gnu.org> Date: Wed, 20 Aug 2025 09:46:45 +0000 retitle 29909 non-greedy matching (RE2) reassign 29909 sed submitter 29909 Shawn Landden severity 29909 wishlist thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 30 12:17:21 2017 Received: (at submit) by debbugs.gnu.org; 30 Dec 2017 17:17:21 +0000 Received: from localhost ([127.0.0.1]:58075 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eVKl6-0001WD-N1 for submit@debbugs.gnu.org; Sat, 30 Dec 2017 12:17:21 -0500 Received: from eggs.gnu.org ([208.118.235.92]:49107) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eVGlO-0002n9-2M for submit@debbugs.gnu.org; Sat, 30 Dec 2017 08:01:22 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eVGlI-0004po-07 for submit@debbugs.gnu.org; Sat, 30 Dec 2017 08:01:16 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_20,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:42043) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eVGlH-0004p8-TV for submit@debbugs.gnu.org; Sat, 30 Dec 2017 08:01:15 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40065) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eVGlG-0001XH-LY for bug-sed@gnu.org; Sat, 30 Dec 2017 08:01:15 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eVGlF-0004md-Rg for bug-sed@gnu.org; Sat, 30 Dec 2017 08:01:14 -0500 Received: from mail-ua0-x232.google.com ([2607:f8b0:400c:c08::232]:39503) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eVGlF-0004lo-N6 for bug-sed@gnu.org; Sat, 30 Dec 2017 08:01:13 -0500 Received: by mail-ua0-x232.google.com with SMTP id e19so8313407uam.6 for ; Sat, 30 Dec 2017 05:01:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=rETfat6d3SZXa4yepLHMuIQAVb9aKnnudF025p5YMPs=; b=SzMRMmM4zJJV1jBXqL3XEcLT+ldunWM6JQdmbFk/wQESqyPd2hd+K2y3FP62f8X4sV ohrczfuqHO2KezDY2w6l0eNldXN/HT2AdSZudej1EP36Zbs3eahnuo6EyVqmd94sz59G al+BPT62CbXn04ad7IWDV9nqFOfm1o2IKHIBDu6IgBzP3NHCPCd0aMkIoEhWqSQ7OwdR JV9ks1yZPyB88hsJgh7NY6u1yxOponU7cv5hOi7hRqNGGk3d23Z91wvtj+OI39XyxmyO CfCDZHvPx5GDbLZ+K8yvz3agUPCmQHoQ9G4AYNP+hH3xZT9rGcUM+ASdq6yp4b+cF1O4 69sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=rETfat6d3SZXa4yepLHMuIQAVb9aKnnudF025p5YMPs=; b=cV6mCKBMECQBkdm9nXDkPT8d0+d4Wo1zcO2qNGwiYAzMadKvFxrwJ41DBveCsaQLF6 S1foaFZMp6f0lu/4SP8YPmHLlt4tEWCRcjsu6FPpt8/jYHig9psohtiA0ws3J7GcrLiG NG84E2ExeyT7ox+252pWQ5Z6H6AwjLF+r1Qr7t56P0cGzcjladgtOc8w6jq6vKy5y1/U M8Gw8vyTcwOsszPf7L3kkoowLu1soSLK2ZIHImhAVrIyzca8y7AU0q6s+87aLw260bV3 V5GnPYjGLup2rX+XM/MF+TC2J3pIJgf+V9+DoX1hGDeIUYhDKha3Rb/DhXPb5FMIokKB /qxg== X-Gm-Message-State: AKGB3mIHx7adonsB4mAs9eCJT+EuXlYaeyPI8pOBn6x39p7V7sy1qR76 fEOtPRHeLTRM3+RTmylL+n8utiugrcjuMWlkX5pNUw== X-Google-Smtp-Source: ACJfBotaglqwmuy5BU+5wfz92g9e919ZLp6WbactoxbUWJnwVNwwYkOOn9r0Uf3Wd6HrvBvDiCBxAT0XnbBeS0zLDWE= X-Received: by 10.176.89.193 with SMTP id k1mr19204231uad.10.1514638872597; Sat, 30 Dec 2017 05:01:12 -0800 (PST) MIME-Version: 1.0 Received: by 10.159.40.227 with HTTP; Sat, 30 Dec 2017 05:01:12 -0800 (PST) From: Shawn Landden Date: Sat, 30 Dec 2017 05:01:12 -0800 Message-ID: Subject: non-greedy matching (RE2) To: bug-sed@gnu.org Content-Type: text/plain; charset="UTF-8" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 30 Dec 2017 12:17:19 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) It is well known that sed lacks non-greedy regular expression matches. This means that sed can only match a subset of regular languages[1]. Would a proper patch to add re2 support[2], so that sed implements ALL regular languages correctly, in O(n) time, be considered? Thanks, Shawn Landden [1] https://en.wikipedia.org/wiki/Regular_language#Location_in_the_Chomsky_hierarchy And because that link isn't very good, 28c3: The Science of Insecurity https://www.youtube.com/watch?v=3kEfedtQVOY [2] https://github.com/google/re2 From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 30 18:55:22 2017 Received: (at control) by debbugs.gnu.org; 30 Dec 2017 23:55:22 +0000 Received: from localhost ([127.0.0.1]:58229 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eVQyH-0004Cx-Pw for submit@debbugs.gnu.org; Sat, 30 Dec 2017 18:55:22 -0500 Received: from mail-it0-f53.google.com ([209.85.214.53]:44846) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eVQyH-0004Cg-1Y; Sat, 30 Dec 2017 18:55:21 -0500 Received: by mail-it0-f53.google.com with SMTP id b5so34144062itc.3; Sat, 30 Dec 2017 15:55:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=Y+Z6x9n3DL8mcZKwuNn7omeQPywmyXU3Tx1BZVipb8Y=; b=WK5FHjLMhU+hRyg2RQcG364IByKd+agJk+BgVK+X9myGztMQRiUgKbqnrwtcFIFd1S tutGfOvhOuq4vX/2mIPLSWynZSLaEj8tBWSdpqI4pFvN39eMJiYORR8m5dcKAJtfXIGK 0wiM6ScAxMgni4i1I6n778W6X3zWcer8j2tfzumIpWfObxQAtxTj4IMV8RdtXhPs62/0 8Ehk8G4yeYi16iNuiLqocvHNtETR4QK2/3jzSY/SxtGIzL9N09js29ErET+P/RmnqUWZ 7eIJnVAGhGrE5vMd+ack2T5/P301GrtHKJAN2PLaKjDPJdscAdPVM7ypUjZU9f745YPP jgfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Y+Z6x9n3DL8mcZKwuNn7omeQPywmyXU3Tx1BZVipb8Y=; b=DxEDmZjNoP5kVONwwUXq7pU6QEPuxMtDRjNk9Y7ETxmMA+PazUU6uYPlzOMTAjFIMs YiBD9fJ/r9w8whelRMfZh4h0B5ErLKzLhxSgJQ4UrSjXWgCRXMd4mC+7sD9E7NLehlhW vElCAzyspVyklQTHLUVazxxtY3PvIPJcmUCSfRR8u4ZIKUOJ4UfmX7QkQk/ihELUsr3c 3n9q0+s9aA+DpKN1GRH8+/QQQsvJ6XaBSG/SGY1Wcwrz9DWPIPgZdaC1x87OcsElbxfk 5Vd8oMniAsJRK1DI+1hHkAn2VzHF1wvHDBcnoT9/sqYLxiqiEuwMDXCl1QF7bRLC/yUW 455A== X-Gm-Message-State: AKGB3mJiEPGyjuX1jEf/CMQWYmObWBgTFTQZS50MxvWWEkLTN+NmB6Mq P4FQ+c8FJDSNAfvJz3T4mCgcLryW X-Google-Smtp-Source: ACJfBouEIC3hJoMVnPkK21PnyCd46N3wqsk4k8WniUlGySwo1GD0iU7pdnSm+FC7xlLtaGcjvvszxA== X-Received: by 10.36.83.133 with SMTP id n127mr53770214itb.60.1514678115068; Sat, 30 Dec 2017 15:55:15 -0800 (PST) Received: from [192.168.88.239] (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id j185sm15196809itd.2.2017.12.30.15.55.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 30 Dec 2017 15:55:13 -0800 (PST) Subject: Re: bug#29909: non-greedy matching (RE2) To: Shawn Landden , 29909-done@debbugs.gnu.org References: From: Assaf Gordon Message-ID: Date: Sat, 30 Dec 2017 16:55:11 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) severity 29909 wishlist stop Hello Shawn, On 2017-12-30 06:01 AM, Shawn Landden wrote: > It is well known that sed lacks non-greedy regular expression matches. > This means that sed can only match a subset of regular languages[1]. > Would a proper patch to add re2 support[2], so that sed implements ALL > regular languages correctly, in O(n) time, be considered? > > [2] https://github.com/google/re2 First, A working patch is worth 1000 emails :) if you already have something working, that will go a long way towards considering this feature. However, From a cursory look, I would say using RE2 in GNU sed is not likely. RE2 is a C++ library, and while there is a C wrapper for it, it will make compiling GNU sed much more complicated than it is today. It could be added as an optional dependency, but GNU sed is included in many "minimal" installation, and those will likely opt not to add additional libraries to their minimal setup - so by default most users won't benefit from RE2 at all. There was an attempt to add PCRE support for GNU sed (which has been shelved for now). PCRE is much more commonly available than RE2, and if any effort is done in this direction, I would think focusing on reviving the PCRE patch would be more effective. As such, I'm marking this ticket as a "wishlist" item and closing it, but discussion can continue by replying to this thread. regards, - assaf From unknown Wed Aug 20 02:46:45 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 28 Jan 2018 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator