From debbugs-submit-bounces@debbugs.gnu.org Thu Oct 28 12:48:05 2021 Received: (at submit) by debbugs.gnu.org; 28 Oct 2021 16:48:05 +0000 Received: from localhost ([127.0.0.1]:53040 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mg8ZQ-0006G6-7n for submit@debbugs.gnu.org; Thu, 28 Oct 2021 12:48:05 -0400 Received: from lists.gnu.org ([209.51.188.17]:42872) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mg7Hx-000494-Oy for submit@debbugs.gnu.org; Thu, 28 Oct 2021 11:26:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:43546) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mg7Hx-0000sm-J6 for bug-sed@gnu.org; Thu, 28 Oct 2021 11:25:57 -0400 Received: from mail-yb1-xb30.google.com ([2607:f8b0:4864:20::b30]:37603) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mg7Hv-0001u9-4A for bug-sed@gnu.org; Thu, 28 Oct 2021 11:25:57 -0400 Received: by mail-yb1-xb30.google.com with SMTP id d204so16256165ybb.4 for ; Thu, 28 Oct 2021 08:25:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=immunant-com.20210112.gappssmtp.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=jgZMva4xa87bfjlcCvtwamhPw+J0GSvV5Vdrx1EPZZk=; b=Oki3vpuRKY80fUQu7s91Tgs2HNZZIcAAAmUZSriya7CQIptRfP/Cni1vQRul0SvX6G ybwX0PgBHFxbWMYJhM3uj6Lxs5/GARDpItfQ3kBXChSqGB1goWX5oXvhUnOcWIbBiZLU kXO4c0hl/ZD/wLdj0mw7bA8eJ08+VHV/fHRr/Vwgzjr9mCCSaLjGMndYIsnNn6keIEYp GIaNnEuT+g5SeQzNYsXXtwFXMTgUiWIMIvBQF2/QOswho+kMxFeHcoEHseHbSmAI8ze0 xTXJ/EUP9il2KgOK0/kLkUt2/e7oLl2x9C+bZo60n7O5+3hn+a0kShvNVCLoUdQDSnpU RwMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=jgZMva4xa87bfjlcCvtwamhPw+J0GSvV5Vdrx1EPZZk=; b=GegvPrjUutNDbJes4v4N58x9bi3n48IXx62XcBPGnJfSNb7y8IixovCcdoaKqb6ut4 0NKQXUyqPnvGnVbSEhW2Oesb5UQjmhibB5xsKqrvkvo9W2p45AsHp2cuH4FSFiLqhLmJ vOeIg1vY+ZuRCStNr5NpX6hqBNXwe8aPy8FbY6J9sWEZiar4ig3anqGDah84oYBAbEb6 obGoq559ejQbCaQ8/nOJF/f/FNneZRyy0ncrPkJxC49pT90x0cB3Jr07xVG7//SPLPzU 1W6Ym6OZGKnTXY2cW4fHcmC1MF+AAcyOQddCxI+yyPOIs3kx2DeODQ+60KTMkVPETavf c2Zg== X-Gm-Message-State: AOAM533SxAy7QC31D6EROfbTG3stN014vOz7+f4AG5ZZU8+dKYwx3SK4 ugA9Fs0mI9fmykPZ7GQQuLNe8QQLPsR4SPdwiiEzJl7zLquFfSah X-Google-Smtp-Source: ABdhPJzprleYMDnTd4BaMblYW4NRthUdMyMNKE6zvkywynoZ8sroAnLOW/KhwMDhAiyC5GrLnAzDqU4si6QpflAMZi0= X-Received: by 2002:a05:6902:70b:: with SMTP id k11mr5852595ybt.314.1635434752958; Thu, 28 Oct 2021 08:25:52 -0700 (PDT) MIME-Version: 1.0 From: Frances Wingerter Date: Thu, 28 Oct 2021 15:25:42 +0000 Message-ID: Subject: sed bug: ASCII NUL not handled in simple pattern To: bug-sed@gnu.org Content-Type: text/plain; charset="UTF-8" Received-SPF: none client-ip=2607:f8b0:4864:20::b30; envelope-from=fw@immunant.com; helo=mail-yb1-xb30.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 28 Oct 2021 12:48:03 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) I'm using sed 4.8 (`sed (GNU sed) 4.8` per `sed --version`) on x86_64 Arch Linux. Compare the output of these two sed invocations: ``` $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d' a b c ``` and ``` $ echo -e 'a\nb\n\v\nc\n' | sed -e '/\v/,$d' a b ``` The latter is the expected behavior, but when input and pattern use `\0`, sed seems to miss the matches and never triggers. Hopefully this should be an easy fix. Thanks, Frances From debbugs-submit-bounces@debbugs.gnu.org Thu Oct 28 13:32:21 2021 Received: (at submit) by debbugs.gnu.org; 28 Oct 2021 17:32:21 +0000 Received: from localhost ([127.0.0.1]:53084 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mg9GH-0007Le-4X for submit@debbugs.gnu.org; Thu, 28 Oct 2021 13:32:21 -0400 Received: from lists.gnu.org ([209.51.188.17]:35784) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mg9GC-0007LT-MC for submit@debbugs.gnu.org; Thu, 28 Oct 2021 13:32:20 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46432) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mg9GB-0002Cw-76 for bug-sed@gnu.org; Thu, 28 Oct 2021 13:32:16 -0400 Received: from mout.gmx.net ([212.227.15.19]:36531) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mg9G8-0003Pk-Qi for bug-sed@gnu.org; Thu, 28 Oct 2021 13:32:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1635442328; bh=lqMeaRO4mOlstWNbk7VhPaMQvVvBfV7ltMb4NENRFUA=; h=X-UI-Sender-Class:Date:From:To:Subject:In-Reply-To:References; b=BoJEJjnuraDLpLtsW+Ctx7BOLdV683MIIgUXG1pJ+5YUFpfb8aup5Y6eZn/4CFNsV CaksG8k5jfow+MucvAyH/YB8azz8HsVls8+CuzDBBnMvbjKmD2NAnl/02J3QtuGdrI 8Seg2yrTcqSLATJt4qL/g1xS96YhZyMELI3dOpH8= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from swedishchef ([95.238.61.212]) by mail.gmx.net (mrgmx005 [212.227.17.184]) with ESMTPSA (Nemesis) id 1M9FjR-1mcYDn27jb-006PRp for ; Thu, 28 Oct 2021 19:32:08 +0200 Date: Thu, 28 Oct 2021 19:32:02 +0200 From: Davide Brini To: bug-sed@gnu.org Subject: Re: bug#51462: sed bug: ASCII NUL not handled in simple pattern Message-ID: <20211028193202.7ff97b30@swedishchef> In-Reply-To: References: X-Mailer: Claws Mail 4.0.0 (GTK+ 3.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:IW1y91byxwuQNkB5+UIVi5IzWwUGOVM86PGJefLmNypmdXnW3Tu cm2dfqLYIQAYzsqd5ZMOUP+tt6ISoLAgX0CE5B60ZajVNGNTrsDf1rNH9s3i6d8o80YrHx4 rtE8S2ThhPIS++VCGLs52lpvdGEfDssWIzQ5kBh0pxevASeBIUmIcOr3ZuZgN1r0jMI793L wNghW9nsU6k5J9L/pw+zQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:M/tb25KbHEQ=:zQzeI5vwsH1I+Cp17zm15A Bxhx/mxyXOx0J4KQBx1b6wnX9cIkaT0MK6H/OS6fTqRdBuIgIWO1Bg7BXgbYgyCfNriHMuq2W vuBr7N+2MiMo4CKDP2uvU1fzD/VFGuGH2Tlm2aIYhIpfydbTdx6FfIFHA14+3ZWMTIJiGiF/t ADdHSG7DoNc97OUd4EXqJAld24tygIOoM21bHULDbc1yw0E4gKzSfd8Ht+5bwXVUyhfalhE5P cEE/SjGLjiklJBfi/L0QKtE9CQRTueSk5x6abK39tCpfaxn37KQZYvPkfVnA6d7AMNPti+X4X eTJu5MF6xvkHL3AL9CXCJeN2iAWz9jRgtgIDYwPy91OB70fm2OYJGdpW1T7XjbgcB1RW6BcjS As82+4n3HnIDvP/u4x93JHJorvIiG78XN7ntz8+8ketzUB3NVd0G7WjeyQBEdIW5auVir74pp /LYFbaamGWiillIP1ebsuyDSr009mFjFmpJcWDlj9uQ8HZrOdWT1r64EnWoqWOBHiX96g/rpb zcnHE9CpOudWu9C4EVWoa7r9+oJmVy5axQSb/UYOWRYKKEig7tCh5BiFh/WzKyHfI1E0eQLmT n9NqZdRhygdeAyztfoingKcxOySN0dNDQz1/p19dxhtV+c7nU07On12ylcB4JlE5o7RFf4NbD q3fWgcRzzTKsiwCGoLwhP/HAgwNh14yxn9wm9/DN9cOjHVe9IYHGX5hTT+luLq0pGy9B0afJC XqvbRuM8OuxgxMt1Lyzxv88ob+zVXKHEg8xFHrRJFCLzozjy6EvQ8UkbvMGmLP4IYgj5XmpeS 3/THzmylult7tyGO8L8LXcOQ8xpvaQnCvQ//Np8OWWHVnAddQY/z2eeP6QcWp3F3BloKlhDMi jYy0sBojXGX0AqS6wjlO7zzsB5IPkmPcgyibX8hRsbAJtiQPWcafYjsy9ofU+wMYo76dwpWxi e6JL9DRCtqMF+Yd8/NgXBDWmGFcxpCeiL0+2C0QrhPTAuyQSLMiJW79isgVV//T4Pqw8qY3sN BNfqykkjUxWTxFWTVXxjhcu6WtLZ8ZhgiKEgEf4SIWMem9MatRx3wN0OeB89U8+N8MFEOlisW Zg4AUZfUcKJKGg= Received-SPF: pass client-ip=212.227.15.19; envelope-from=dave_br@gmx.com; helo=mout.gmx.net X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter wrote: > I'm using sed 4.8 (`sed (GNU sed) 4.8` per `sed --version`) on x86_64 > Arch Linux. > > Compare the output of these two sed invocations: > ``` > $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d' > a > b > > c > This works $ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d' (\o000, \x00 also work). All documented here: https://www.gnu.org/software/sed/manual/sed.html#Escapes Whether sed maintainers want to also allow the \0 syntax, up to them of course. =2D- D. From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 30 03:11:46 2021 Received: (at 51462) by debbugs.gnu.org; 30 Oct 2021 07:11:47 +0000 Received: from localhost ([127.0.0.1]:56575 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mgiWo-0005Gx-Jz for submit@debbugs.gnu.org; Sat, 30 Oct 2021 03:11:46 -0400 Received: from mail-pl1-f182.google.com ([209.85.214.182]:40876) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mgiWl-0005Gi-Ns for 51462@debbugs.gnu.org; Sat, 30 Oct 2021 03:11:45 -0400 Received: by mail-pl1-f182.google.com with SMTP id v20so8270624plo.7 for <51462@debbugs.gnu.org>; Sat, 30 Oct 2021 00:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=5tm+meyQiEPHt8Xc4mUmGcmF1B7ELbsWjgo3N5fHkOU=; b=Dw19kLA5g2PaTaQ96f6jrOEQuA8HuzwEzlLkhkwpXJo7qBfcihJQJuU3Ey7BgyQpTq Gup8vsqYob0LWEIH26InXtZ2rS50iCWt6Tzk/twzcgVcYHOgYfgwg9mlxpv/rQj58vrh UAHmoizv6EWYfedTPtpDFfKuYoz9QQj4IcZpjlzUHq0XBY99uotEPaEDEEKscpCRAKQt YmabhtC1TRqUw370edgaAN9+W5z7yFViyYMMd8ipqBW6axVM/REr51slwVFhMtWk6DW7 n/gq0vnXTQ1+esIhHgKZVRAcShsD9Ks8GyfC8VIJe9DWSEPs0Nl4cDOK6B+bvKIOd67C nz7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=5tm+meyQiEPHt8Xc4mUmGcmF1B7ELbsWjgo3N5fHkOU=; b=hx8a8e4W9Q7HjAlGbYi16S3bt4an+7RuHU+OJy9AcKmv8GslDr9BEx/jpup/IlvbT1 hjUcORhB3/Xxo8WN8bj2AZQkBOVf7Aq+pROfC8dNPgpiiY//BASjP9loDo1EMKmjNM2C 5e7tqgNW7J1howJT3rAvv8ZaQpZqtKaymvyDbOTrIyRWwWl5PcRcy03WTz4vcOdgw5TS pJOg8+zx5a6RCmy/7F4U9U19k+wVEBgf1M+mqrJmfPkyfRTKnDC5N1GSi0oymZnT+18d Ok9bl7pX978HMlPLSjFpMqu7Bmdt9sh0KGY8id9vaFLVpEbfh4foxLENQVwQqpVu/Sfs Czaw== X-Gm-Message-State: AOAM530uUF7Qesje9PUh/qrXnqXCJ8MFOFxW4/iGL6Zu9olNCipSQ9KK VxUYVmRGIiDOZo2Px/dnFR8= X-Google-Smtp-Source: ABdhPJwe8pp7ii8F8e8AJNx2F8D6kCTxlwfEg/2ZitEZOaHGeaXovUh76wkFxE6k0p78rQkA3g7n0A== X-Received: by 2002:a17:903:2286:b0:141:72c1:41d2 with SMTP id b6-20020a170903228600b0014172c141d2mr13735526plh.79.1635577897856; Sat, 30 Oct 2021 00:11:37 -0700 (PDT) Received: from tomato.moose.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id w12sm13360753pjq.2.2021.10.30.00.11.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 30 Oct 2021 00:11:37 -0700 (PDT) Subject: Re: bug#51462: sed bug: ASCII NUL not handled in simple pattern To: Davide Brini , 51462@debbugs.gnu.org, Frances Wingerter , Eric Blake References: <20211028193202.7ff97b30@swedishchef> From: Assaf Gordon Message-ID: <967e3a27-0487-51c1-7eaf-8b9361373f90@gmail.com> Date: Sat, 30 Oct 2021 01:11:35 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20211028193202.7ff97b30@swedishchef> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -0.1 (/) X-Debbugs-Envelope-To: 51462 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.1 (-) (Adding Eric Blake for POSIX opinion) Hello, On 2021-10-28 11:32 a.m., Davide Brini wrote: > On Thu, 28 Oct 2021 15:25:42 +0000, Frances Wingerter > wrote: >> >> Compare the output of these two sed invocations: >> ``` >> $ echo -e 'a\nb\n\0\nc\n' | sed -e '/\0/,$d' >> > $ echo -ne 'a\nb\n\0\nc\n' | sed -e '/\d000/,$d' > > (\o000, \x00 also work). All documented here: > https://www.gnu.org/software/sed/manual/sed.html#Escapes > > Whether sed maintainers want to also allow the \0 syntax, up to them of > course. Thanks Davide for the reply. In GNU sed, "\0" in the replacement part acts identically to "&" - referencing the whole matched portion. This is the implemented behavior (though undocumented?) since GNU sed version 3, released in December 1995 - so not likely to be changed. For comparison, in BSDs "\0" acts as literal zero (ASCII 48). Interestingly, POSIX defines a "BACKREF" as: [...] The character string consisting of a character followed by a single-digit numeral, '1' to '9'. ( from: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_05 ) And so one could argue that this is a GNU extension that should be disabled when used with "sed --posix". I think we should keep "\0" undocumented to prevent proliferation of this non-standard behavior. regards, - assaf