From debbugs-submit-bounces@debbugs.gnu.org Thu Mar 26 11:30:27 2020 Received: (at submit) by debbugs.gnu.org; 26 Mar 2020 15:30:27 +0000 Received: from localhost ([127.0.0.1]:59514 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHUSh-00077Y-Do for submit@debbugs.gnu.org; Thu, 26 Mar 2020 11:30:27 -0400 Received: from lists.gnu.org ([209.51.188.17]:59141) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jHL5w-0005Mi-Q8 for submit@debbugs.gnu.org; Thu, 26 Mar 2020 01:30:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:44733) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jHL5v-0001Vw-GA for bug-sed@gnu.org; Thu, 26 Mar 2020 01:30:20 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE,URIBL_BLOCKED autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jHL5u-0005zy-Av for bug-sed@gnu.org; Thu, 26 Mar 2020 01:30:19 -0400 Received: from mail-ua1-x931.google.com ([2607:f8b0:4864:20::931]:40991) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jHL5u-0005zm-44 for bug-sed@gnu.org; Thu, 26 Mar 2020 01:30:18 -0400 Received: by mail-ua1-x931.google.com with SMTP id f9so1692075uaq.8 for ; Wed, 25 Mar 2020 22:30:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=avil3CGUU528McFE4wKuaDcsoo4z5v363sV+/FhDOQY=; b=nTl7e2mboLP/lr8h50alPCoNgKzQcDEDzQaVx/cXl+cq0eoFTvJ5A040guPvCWED0/ rYWlXQE2svm0DzmyE8FufSf+jBuMdMXpgejwSdyHJI55AUD2dkJEm50WoAiFcVsLblaX XrL8/C311DrZK/ad4VNwhr2tcqWEUCqjGrjrwTJGsjYxNBO9tbvdsbmq8O6ufuISaiU5 TqXTg/+pqAYOGRGw05droPOeg79UsHfHg7PlS0J6YIzS/a54XH9JSqhzPEGFGzH9HNf5 fZnj/sONHpg+q023JM/4b1Ln5sIc/eCYKg3g8WpVINNrMCS7ZQJma+ecSuwNiMzEYBDg el9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=avil3CGUU528McFE4wKuaDcsoo4z5v363sV+/FhDOQY=; b=fHFtXs5BdEyk4CXDhM/9WF+WQiI32ymFrpvXw/0ZA3OtTcjDXbusp2EBgFJImVg0Rh XP2dxs96AGYxXMU5/L7WBGIp5XMA8bQFDbWzEOprrNy6HzevIB70f6F2ea5BBvm1RWj6 iWY5xiPNTUtMN+25s94Sb8M2V1vQSrgGddXLY5poObgTC5aBcAjqLSRLZThRaGpAKprs N813/f4yt/s8lYPvtNxtnTmE3N7Ow9kWExIQazJs713expeeeiIudzngRSzZfLORkVGi Li4a4CZ5nwWXCF1cZH7atEHp2/ik9OSlc0qRDOfkVFUAs0pR8opwS+giznt2q2IKiq66 XEBA== X-Gm-Message-State: ANhLgQ2A9DTw7w0LoUA0pun3P0KUiHIwT9m6hwT2eeM7vbBCJiRERFQY JatVhBn0E2x66L+1oGsqnkpSmOiJf8LbXRWbi7DSTJgyfUg= X-Google-Smtp-Source: ADFU+vstHztF2hxNow9im6vzs2k+BRp4Vw2I0BHCTiZJDefQmYygOIRZYCfrcK8qjn4s2TPotrTJgO4+yuzBWxVxhz8= X-Received: by 2002:ab0:6516:: with SMTP id w22mr5019801uam.101.1585200616829; Wed, 25 Mar 2020 22:30:16 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a05:6102:200b:0:0:0:0 with HTTP; Wed, 25 Mar 2020 22:30:16 -0700 (PDT) From: =?UTF-8?B?T8SfdXo=?= Date: Thu, 26 Mar 2020 07:30:16 +0200 Message-ID: Subject: n as delimiter alias To: bug-sed@gnu.org Content-Type: multipart/alternative; boundary="000000000000a9f87105a1bb47ba" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::931 X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 26 Mar 2020 11:30:26 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --000000000000a9f87105a1bb47ba Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable $ sed --version sed (GNU sed) 4.7 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later < https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Jay Fenlason, Tom Lord, Ken Pizzini, Paolo Bonzini, Jim Meyering, and Assaf Gordon. GNU sed home page: . General help using GNU software: . E-mail bug reports to: . While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not match 'n' when 'n' is the delimiter. See: $ echo t | sed 'st\ttt' | xxd 00000000: 0a . $ $ echo n | sed 'sn\nnn' | xxd 00000000: 6e0a Is this a bug or is there a sound logic behind this? --=20 O=C4=9Fuz --000000000000a9f87105a1bb47ba Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
$ sed --version
sed (GNU sed) 4.7
Copyright (C) 20= 18 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version= 3 or later <https://gnu.o= rg/licenses/gpl.html>.
This is free software: you are free= to change and redistribute it.
There is NO WARRANTY, to the exte= nt permitted by law.

Written by Jay Fenlason, Tom = Lord, Ken Pizzini,
Paolo Bonzini, Jim Meyering, and Assaf Gordon.=
GNU sed home page: <https://www.gnu.org/software/sed/>.
General help usin= g GNU software: <https://www.gn= u.org/gethelp/>.
E-mail bug reports to: <bug-sed@gnu.org>.

Whil= e '\t' matches a literal 't' when 't' is the delimi= ter, '\n' does not match 'n' when 'n' is the delimi= ter. See:

$ echo t | sed 'st\ttt' | xxd
00000000: 0a=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0.
$
$ echo n | sed 'sn\nnn' | xxd
00000000: 6e0a

Is this a bug or is there a = sound logic behind this?


--
O=C4=9Fuz
--000000000000a9f87105a1bb47ba-- From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 31 00:42:21 2020 Received: (at 40242) by debbugs.gnu.org; 31 Mar 2020 04:42:21 +0000 Received: from localhost ([127.0.0.1]:35377 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJ8jF-0001gM-Aq for submit@debbugs.gnu.org; Tue, 31 Mar 2020 00:42:21 -0400 Received: from mail-pg1-f180.google.com ([209.85.215.180]:39090) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJ8jC-0001g4-V7 for 40242@debbugs.gnu.org; Tue, 31 Mar 2020 00:42:19 -0400 Received: by mail-pg1-f180.google.com with SMTP id g32so3739058pgb.6 for <40242@debbugs.gnu.org>; Mon, 30 Mar 2020 21:42:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=ArcHdwiuHGEGioHf/D56pTnTbOfGPNX8gpdwtVg9lck=; b=cjI+GSOBoQeDYNQfxV7viUq7I/FkyYlnI0w59fFMqCLT/Mq9ndOR7YJdliW6nNJzHU Xxsl2lxbNEF5EtPXePW05+ihf8jKjH44+MoMfcRyJ5H9CjXPBwcv9rW/F2cqWX0JPChb 0J2CyLTJwViO+jhNHESENB1T2nqwP/wdlWWXKQzYakQNbomXITPgq/m1j7X53ylrlTnE Jx0EDOc+hivYcUsXxIcJSj8pFJFMjSDD979o4igY4x7UAG7uPkVwipHS9CBObQr+qQGQ z35kbrtYCzh54I7CqwEqh8NyFpXxgpbxSw1fZgNiJGKTRL8g5lXyQDP+v6rs75yLyLrF H33w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ArcHdwiuHGEGioHf/D56pTnTbOfGPNX8gpdwtVg9lck=; b=bclfm3KjrXRZXC0tlPj9iixSjIwJGsTwp95B/S2IgUOWOmt9ELvZ58e0oam805Zu8F F7F/pExEoOywKcthHxCL/rqWsJkgI7lILKIbLBzYzSmcVTzZutDS029vgIEdEFFXd+7v HgtelAs+bVGLn87gQqoKw5nxpiVegt2+S6/VJFUX63cuY2dQprNLbHrj2JNYFxnVb8Jy P5rT1i7nFlgzwe658tWBY4ZavmaDtcG2RphP6+0YxyQpkcQ/5Zj/1uUS5ZvuxuBwMycY LTGh3bQwt5UahPsOsvwfSuTKORkBnz8GlCAMD7ecJ25FQQPSFk1umP/BNy7Rfxdd9zcZ hFHw== X-Gm-Message-State: ANhLgQ2ZSY2XMM3e81Ngrm0mQHtrp/ks14xgYPDZK1kUqMUBSMfV9q+c qyZTkIbK+JDU+9Hp67JBFWdvy0Ll X-Google-Smtp-Source: ADFU+vu5sfgcntoPQHCH9f+QTRhet5VDGCjzuKc8B/koJE6qYl1puv+jloKM0gCAl1uTq037fO8UqQ== X-Received: by 2002:a63:24c6:: with SMTP id k189mr15860629pgk.436.1585629732245; Mon, 30 Mar 2020 21:42:12 -0700 (PDT) Received: from tomato.moose.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id nh14sm863567pjb.17.2020.03.30.21.42.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Mar 2020 21:42:11 -0700 (PDT) Subject: Re: bug#40242: n as delimiter alias To: =?UTF-8?B?T8SfdXo=?= , 40242@debbugs.gnu.org References: From: Assaf Gordon Message-ID: Date: Mon, 30 Mar 2020 22:42:09 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40242 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) tags 40242 confirmed stop Hello, On 2020-03-25 11:30 p.m., Oğuz wrote: > While '\t' matches a literal 't' when 't' is the delimiter, '\n' does not > match 'n' when 'n' is the delimiter. See: > > $ echo t | sed 'st\ttt' | xxd > 00000000: 0a . > $ > $ echo n | sed 'sn\nnn' | xxd > 00000000: 6e0a > > Is this a bug or is there a sound logic behind this? Thank you for finding this interesting edge-case. I think it is a (very old) bug. I'm not sure about its origin, perhaps Jim or Paolo can comment. First, let's start with what's expected (slightly modifying your examples): The canonical usage, here "\t" becomes a TAB, and "t" is not replaced: $ printf t | sed 's/\t//' | od -a -An t Then, using a different character "q" instead of "/", works the same: $ printf t | sed 'sq\tqq' | od -a -An t The sed manual says (in section "3.3 The s command"): " The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character. " This is the reason "\t" represents a regular "t" (not TAB) *if* the substitute command's delimiter is "t" as well: $ printf t | sed 'st\ttt' | od -a -An [no output, as expected] And similarly for other characters: printf x | sed 'sx\xxx' | od -a -An printf a | sed 'sa\aaa' | od -a -An printf z | sed 'sz\zzz' | od -a -An [no output, as expected] --- Second, The "\n" case behaves differently, regardless of which separator is used. It is always treated as "\n" (new line), never literal "n", even if the separator is "n": These are correct, as expected: $ printf n | sed 's/\n//' | od -a -An n $ printf n | sed 's/\n//' | od -a -An n $ printf n | sed 'sx\nxx' | od -a -An n Here, we'd expect "\n" to be treated as a literal "n" character, not "\n", but it is not (as you've found): $ printf n | sed 'sn\nnn' | od -a -An n ---- In the code, the "match_slash" function [1] is used to find the delimiters of the "s" command (typically "slashes"). Special handling happens if a slash is found [2], And in lines 557-8 there's this conditional: else if (ch == 'n' && regex) ch = '\n'; Which forces any "\n" to be a new-line, regardless if the delimiter itself was an "n". [1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531 [2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552 In older sed versions, these two lines where protected by "#ifndef REG_PERL" [3] so perhaps it had something to do with regex variants. But the origin of this line predates the git history. Jim/Paolo - any ideas what this relates to? https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c?id=41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551 --- Interestingly, removing these two lines does not cause any test failures, so this might be easy to fix without causing any regressions. For now I'm leaving this item open until we decide how to deal with it. regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 31 00:47:12 2020 Received: (at control) by debbugs.gnu.org; 31 Mar 2020 04:47:12 +0000 Received: from localhost ([127.0.0.1]:35381 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJ8nv-0001pL-VJ for submit@debbugs.gnu.org; Tue, 31 Mar 2020 00:47:12 -0400 Received: from mail-pj1-f41.google.com ([209.85.216.41]:36435) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJ8nu-0001p3-3i; Tue, 31 Mar 2020 00:47:10 -0400 Received: by mail-pj1-f41.google.com with SMTP id nu11so561999pjb.1; Mon, 30 Mar 2020 21:47:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=ojJ9QCLFwHOSpAcOAyLAEFVjmhrnpR+EFgnq2epfkF4=; b=XnMVrj1N0JdlZYHuEGtOJnH0P/MyWkXWVbiikEEI7voiS0OVIWvGp7Q4gqbnChoCas XAbRFsAXj1dhYhAJK3ayZT1LYcr1+koomyV8ZUEBTglism6n5rQTI7A5Qo8VkaGuUCou SLgIwXsd8DhyZfjZZhuExot5YFFHbPt0JWotqHQ2A5TsXXJfJtml8T6l0HEcMt5jsPQ4 WJsppR11IwO1oP6/oOgxMHKy6zzCOuo4RiqIdM3VcrOiRJgLDNcgi7lJNzPYTRYHQpjK Dr49TAIl66gNOzujinM7RIi2uhRtUeGV62Xjxr6IV++868/Jv0/I4feUq0oZ4S3iZYPS laGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ojJ9QCLFwHOSpAcOAyLAEFVjmhrnpR+EFgnq2epfkF4=; b=AGIcnQIbel2QZemOfgCN2gV17mPmKXuLLzsGJeyomtQ/UJ8s899fCTv+xMSXyirBPN 1N1ojwbv2dGwBOF3nuFUNmsRi3smfoxKdHpC1ydGC41rmrkM5U5LyC2QnMplfbMMjrdH 0qlI76LM22cD9E9fdC54A7ZFLZ5TNrGUN2pwTO3GbQBDtkgJqSqjURcjN9yAf77ec67h BMrDrUOtE9tTSbw5u46fkszykHDHzzEr+3dbNJseDcqRW5iC/xsmUteC1BvKQTeFYwcW kkLuoclQvRrRjLBnALSaEsuQM4rLUqNR0gee6OMbIZN8tCOgdb06OHlKbEbeBUyI51ds 4+jQ== X-Gm-Message-State: AGi0PuYjAZtNab59g3RQhCXlg0mSgsMGV/Brx7axdpMgrzqq9K+dxvD3 Yu8tkCt50bGc6v3LGE/WkNPqLYF2 X-Google-Smtp-Source: APiQypK4DBGx/9t588UpwRpMTMICv5So2EJwMUKCLTK8Bc/c+V6JdCAa7zp8eePcP5jEWZQqcLQ3pA== X-Received: by 2002:a17:90a:cb18:: with SMTP id z24mr1678868pjt.67.1585630023642; Mon, 30 Mar 2020 21:47:03 -0700 (PDT) Received: from tomato.moose.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id t186sm10659038pgd.43.2020.03.30.21.47.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Mar 2020 21:47:02 -0700 (PDT) Subject: Re: bug#40239: Bug in how \cregexpc is handled To: Enrico Maria De Angelis , 40239@debbugs.gnu.org References: From: Assaf Gordon Message-ID: Date: Mon, 30 Mar 2020 22:47:01 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) merge 40239 40242 stop Hello, On 2020-03-26 8:18 a.m., Enrico Maria De Angelis wrote: [...] > The is means that using n in \nregexpn prevevents the use of the literal n > in the regexp. > > The issue has come to light in this StackOverflow > > question. Thank you for the report. The original poster (Oguz Ismail) sent a similar issue, please see the reply there: http://debbugs.gnu.org/40242 regards, - assaf From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 31 00:47:44 2020 Received: (at control) by debbugs.gnu.org; 31 Mar 2020 04:47:44 +0000 Received: from localhost ([127.0.0.1]:35386 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJ8oS-0001qW-JT for submit@debbugs.gnu.org; Tue, 31 Mar 2020 00:47:44 -0400 Received: from mail-pl1-f179.google.com ([209.85.214.179]:44632) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJ8oP-0001qH-Ry for control@debbugs.gnu.org; Tue, 31 Mar 2020 00:47:42 -0400 Received: by mail-pl1-f179.google.com with SMTP id h11so7644050plr.11 for ; Mon, 30 Mar 2020 21:47:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=to:from:message-id:date:user-agent:mime-version:content-language :content-transfer-encoding; bh=w1B8uD2bucppmC+14k3BZQ9q6Bq8v3XYrQdyEm1uBi0=; b=jYB23K3oM2rixRBsKMwIAPl2oCtP0v0oN2j/Jxa535CWj4avINBJEDG5T/LrcOsQFV EMWaxHyMv25UxysUptpjJbwxvf/9CFJAzdElEfNJQhluRiJIh9jHJCCXVtyzFJrVg4O0 yh1tPHCaxkOk995Atros701Gj/L2Sdpp6fpc/nhIW79mc16AUj0Q6Ng0s1z23AdILlcQ qMhc9Mf8LFYf2DJt/gs3KFgflavl+6FiZpktVmG2GHCjCd+oKt+gsJZNkQNPPvMOCuIy CBw+GqL5YzgqwbqD3Xb1zqkNjpukHkWh/z1snnWVyCrJXF0kifyCazOeEcozU0bc4/y0 H0Qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:message-id:date:user-agent:mime-version :content-language:content-transfer-encoding; bh=w1B8uD2bucppmC+14k3BZQ9q6Bq8v3XYrQdyEm1uBi0=; b=sIGnHMcTpW2vCw1JGm+Co4FAhPMK8U7cD1T3boi8VIcv1dip3DTbIvoV0DRm4jcKbn WXBAY7aVrimo2yjgENMqumsNrM2RJhaCsz/1Wxqbu2ycOhbwzkqRdvcAQ/gbQCJMwR0R ovTnJ5VqHnA9rsiU3ytDLcApct8BIZNN63WfCxDKyMGYhVOyOAz4CY9K7+RZpnfUESY/ aa9nX3x/vYZEcaKiQPj8be/N+KXYyEvdIMBYh7TBATq3ErfufN+E0fcKVnNc8aYQFzhr z0eTsGVihvfNwLJuK7w7CU0I2MQr9op6UccOaHnaXd7frjCHKu4AZMvCYseQSjrYzRlN CDGQ== X-Gm-Message-State: AGi0Pub4tqoZgu7TkjRQWb6yH62W3kOS/+0/Mw1vZVrFn8fgAKYPGNaV ccPni/Kbx9BSKTPMdGDnRBT0WFN5 X-Google-Smtp-Source: APiQypKJ/8++o8JEupLZ/7Rf9ZnEDZMRN8+r7goHQ5Lu3B5v/lVls3irBJkAZDKQ4Gcc88FYSg0N5Q== X-Received: by 2002:a17:90a:7182:: with SMTP id i2mr1718750pjk.74.1585630055561; Mon, 30 Mar 2020 21:47:35 -0700 (PDT) Received: from tomato.moose.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id c20sm6834526pfr.96.2020.03.30.21.47.34 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Mar 2020 21:47:34 -0700 (PDT) To: control@debbugs.gnu.org From: Assaf Gordon Message-ID: <35b24d39-77b1-f352-cf07-03127079ae76@gmail.com> Date: Mon, 30 Mar 2020 22:47:33 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: 2.0 (++) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: tag 40242 confirmed Content analysis details: (2.0 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (assafgordon[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at https://www.dnswl.org/, no trust [209.85.214.179 listed in list.dnswl.org] 0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [209.85.214.179 listed in wl.mailspike.net] 1.8 MISSING_SUBJECT Missing Subject: header 0.2 NO_SUBJECT Extra score for no subject 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 1.0 (+) tag 40242 confirmed From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 31 03:36:44 2020 Received: (at 40242) by debbugs.gnu.org; 31 Mar 2020 07:36:44 +0000 Received: from localhost ([127.0.0.1]:35464 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJBRz-0000Ye-Qt for submit@debbugs.gnu.org; Tue, 31 Mar 2020 03:36:44 -0400 Received: from mail-ua1-f44.google.com ([209.85.222.44]:45849) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJAsa-0007sU-L1 for 40242@debbugs.gnu.org; Tue, 31 Mar 2020 03:00:09 -0400 Received: by mail-ua1-f44.google.com with SMTP id 9so7263311uav.12 for <40242@debbugs.gnu.org>; Tue, 31 Mar 2020 00:00:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ZGxa5Y0OpEsYF4euZSwgF0evCYVWD8iKO6o6PDSKgx0=; b=fnVxyu50VM1PXc8XrpliBdsaHgUnf+dhyTcAUmjDIKnwQ52brBMXpxn69uuMegYWNF 7LxgAlxPLnCMI44iB2Q6w1sMYEGfnpTFn5MASQrdTaBoVwOQcqeHGCvhZSN+Zx6B9v7n YW8h6VAvHbxi/ABT7P8cMLg3ibR9X/eairUCGNixNa3EiF+gFpPDCKYtXcWG0Ir9Wh7C cm/x6Wxbc+rkQx6hXEKG+TzJTnsx8SLqE00hy1wReDFPYBLeo+LqSN6nWr5zOlCHkG9h duX9He7zmmufos6D1w/FFYpSDJhwWtckzuMRLzwLbRseCPlIXt0ezQmiLLJG0W61g6C5 UA6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ZGxa5Y0OpEsYF4euZSwgF0evCYVWD8iKO6o6PDSKgx0=; b=YyXjwm2T0/UbVIQZBnpyKIT0f9FYEEaM49Og6HhPYJBigH5xQq/VzoIyXyezphvPt6 uXrLQ8LCNI0DXGEdfAPN6vNWT+qPDswQX0VVBlZLhTxyVXq5eTkfnMV3+ugL4NxVj2/1 m8FehQsQg1a5gf0s9BwN3faihfTFjlBtBF6UD/R+aQQEcaaqp6XPnCR0GSKvDG1EwVBv SI8uOfMgb/j1vV8p+aGCBEhN0ZbbrOXUUIXs4PddZtdVQ1qtgHu9jWhQpFfl2QeDWYAQ KMfBCHHz1PwdzbDSreaXVL90zeAfoiq7a/gXlBKopkzdp/Wfqgc5YEb9gEywB/D4KJEn 08Aw== X-Gm-Message-State: AGi0PuYzZCpnRr7NvBjenoCErMKofh5k6voBO3/sTPpBfClPjTM2+MEJ WB8ITE/ArLtrZGG3rP4bzyyQjsKke/fyEpfQGFQ= X-Google-Smtp-Source: APiQypLFqftKCdzMOEVlHB/+aeIhj7VkL4x8woixhiJMa6bId3nlgycL1Wb043i0SdpVf18TjzzrSce/h00I/mIll4U= X-Received: by 2002:ab0:2b0b:: with SMTP id e11mr10464754uar.136.1585638002798; Tue, 31 Mar 2020 00:00:02 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a05:6102:200b:0:0:0:0 with HTTP; Tue, 31 Mar 2020 00:00:02 -0700 (PDT) In-Reply-To: References: From: =?UTF-8?B?T8SfdXo=?= Date: Tue, 31 Mar 2020 10:00:02 +0300 Message-ID: Subject: Re: bug#40242: n as delimiter alias To: Assaf Gordon Content-Type: multipart/alternative; boundary="000000000000e6347b05a2211dea" X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40242 X-Mailman-Approved-At: Tue, 31 Mar 2020 03:36:43 -0400 Cc: "40242@debbugs.gnu.org" <40242@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --000000000000e6347b05a2211dea Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the reply. This might not be a bug though; I sent a similar mail (https://www.mail-archive.com/austin-group-l@opengroup.org/msg05881.html) to Austin Group mailing list asking what's the expected behavior in this case, and I was told ( https://www.mail-archive.com/austin-group-l@opengroup.org/msg05891.html) both behaviors -yielding n or empty line- are correct and standard should *probably* be amended to explicitly state that this is unspecified. And apparently ( https://www.mail-archive.com/austin-group-l@opengroup.org/msg05893.html) some other UNIXes adopted the same practice as GNU sed (or vice versa, I don't know which one is older). Regards 31 Mart 2020 Sal=C4=B1 tarihinde Assaf Gordon yazd= =C4=B1: > tags 40242 confirmed > stop > > Hello, > > On 2020-03-25 11:30 p.m., O=C4=9Fuz wrote: > >> While '\t' matches a literal 't' when 't' is the delimiter, '\n' does no= t >> match 'n' when 'n' is the delimiter. See: >> >> $ echo t | sed 'st\ttt' | xxd >> 00000000: 0a . >> $ >> $ echo n | sed 'sn\nnn' | xxd >> 00000000: 6e0a >> >> Is this a bug or is there a sound logic behind this? >> > > Thank you for finding this interesting edge-case. > > I think it is a (very old) bug. I'm not sure about its origin, > perhaps Jim or Paolo can comment. > > First, > let's start with what's expected (slightly modifying your examples): > > The canonical usage, here "\t" becomes a TAB, and "t" is not replaced: > > $ printf t | sed 's/\t//' | od -a -An > t > > Then, using a different character "q" instead of "/", works the same: > > $ printf t | sed 'sq\tqq' | od -a -An > t > > The sed manual says (in section "3.3 The s command"): > " > The / characters may be uniformly replaced by any other single > character within any given s command. > > The / character (or whatever other character is used in its > stead) can appear in the regexp or replacement only if it is > preceded by a \ character. > " > > This is the reason "\t" represents a regular "t" (not TAB) > *if* the substitute command's delimiter is "t" as well: > > $ printf t | sed 'st\ttt' | od -a -An > [no output, as expected] > > And similarly for other characters: > > printf x | sed 'sx\xxx' | od -a -An > printf a | sed 'sa\aaa' | od -a -An > printf z | sed 'sz\zzz' | od -a -An > [no output, as expected] > > --- > > Second, > The "\n" case behaves differently, regardless of which > separator is used. It is always treated as "\n" (new line), > never literal "n", even if the separator is "n": > > These are correct, as expected: > $ printf n | sed 's/\n//' | od -a -An > n > $ printf n | sed 's/\n//' | od -a -An > n > $ printf n | sed 'sx\nxx' | od -a -An > n > > Here, we'd expect "\n" to be treated as a literal "n" character, > not "\n", but it is not (as you've found): > > $ printf n | sed 'sn\nnn' | od -a -An > n > > ---- > > In the code, the "match_slash" function [1] is used to find > the delimiters of the "s" command (typically "slashes"). > Special handling happens if a slash is found [2], > And in lines 557-8 there's this conditional: > > else if (ch =3D=3D 'n' && regex) > ch =3D '\n'; > > Which forces any "\n" to be a new-line, regardless if the > delimiter itself was an "n". > > [1] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n531 > [2] https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c#n552 > > In older sed versions, these two lines where protected by > "#ifndef REG_PERL" [3] so perhaps it had something to do with regex > variants. But the origin of this line predates the git history. > Jim/Paolo - any ideas what this relates to? > > https://git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c > ?id=3D41a169a9a14b5bdc736313eb411f02bcbe1c046d#n551 > > --- > > Interestingly, removing these two lines does not cause > any test failures, so this might be easy to fix without causing > any regressions. > > > For now I'm leaving this item open until we decide how to deal with it. > > regards, > - assaf > > > > > --=20 O=C4=9Fuz --000000000000e6347b05a2211dea Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the reply. This might not be a bug though; I sent a similar mail= (https://www.mail-archive.com/austin-group-l@opengroup.org/msg058= 81.html) to Austin Group mailing list asking what's the expected be= havior in this case, and I was told (https://www.mail-archive.com/= austin-group-l@opengroup.org/msg05891.html) both behaviors -yielding n = or empty line- are correct and standard should *probably* be amended to exp= licitly state that this is unspecified. And apparently (https://ww= w.mail-archive.com/austin-group-l@opengroup.org/msg05893.html) some oth= er UNIXes adopted the same practice as GNU sed (or vice versa, I don't = know which one is older).

Regards

31 Mart 2020 Sa= l=C4=B1 tarihinde Assaf Gordon <assafgordon@gmail.com> yazd=C4=B1:
While '\t' matches a literal 't' when 't' is the de= limiter, '\n' does not
match 'n' when 'n' is the delimiter. See:

$ echo t | sed 'st\ttt' | xxd
00000000: 0a=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0.
$
$ echo n | sed 'sn\nnn' | xxd
00000000: 6e0a

Is this a bug or is there a sound logic behind this?

Thank you for finding this interesting edge-case.

I think it is a (very old) bug. I'm not sure about its origin,
perhaps Jim or Paolo can comment.

First,
let's start with what's expected (slightly modifying your examples)= :

The canonical usage, here "\t" becomes a TAB, and "t" i= s not replaced:

=C2=A0 =C2=A0$ printf t | sed 's/\t//' | od -a -An
=C2=A0 =C2=A0 =C2=A0 t

Then, using a different character "q" instead of "/", w= orks the same:

=C2=A0 =C2=A0$ printf t | sed 'sq\tqq' | od -a -An
=C2=A0 =C2=A0 =C2=A0 t

The sed manual says (in section "3.3 The s command"):
=C2=A0 =C2=A0 =C2=A0 "
=C2=A0 =C2=A0 =C2=A0 The / characters may be uniformly replaced by any othe= r single
=C2=A0 =C2=A0 =C2=A0 character within any given s command.

=C2=A0 =C2=A0 =C2=A0 The / character (or whatever other character is used i= n its
=C2=A0 =C2=A0 =C2=A0 stead) can appear in the regexp or replacement only if= it is
=C2=A0 =C2=A0 =C2=A0 preceded by a \ character.
=C2=A0 =C2=A0 =C2=A0 "

This is the reason "\t" represents a regular "t" (not T= AB)
*if* the substitute command's delimiter is "t" as well:

=C2=A0 =C2=A0 =C2=A0 $ printf t | sed 'st\ttt' | od -a -An
=C2=A0 =C2=A0 =C2=A0 [no output, as expected]

And similarly for other characters:

=C2=A0 =C2=A0 =C2=A0 printf x | sed 'sx\xxx' | od -a -An
=C2=A0 =C2=A0 =C2=A0 printf a | sed 'sa\aaa' | od -a -An
=C2=A0 =C2=A0 =C2=A0 printf z | sed 'sz\zzz' | od -a -An
=C2=A0 =C2=A0 =C2=A0 [no output, as expected]

---

Second,
The "\n" case behaves differently, regardless of which
separator is used. It is always treated as "\n" (new line),
never literal "n", even if the separator is "n":

These are correct, as expected:
=C2=A0 =C2=A0 $ printf n | sed 's/\n//' | od -a -An
=C2=A0 =C2=A0 =C2=A0 =C2=A0n
=C2=A0 =C2=A0 $ printf n | sed 's/\n//' | od -a -An
=C2=A0 =C2=A0 =C2=A0 =C2=A0n
=C2=A0 =C2=A0 $ printf n | sed 'sx\nxx' | od -a -An
=C2=A0 =C2=A0 =C2=A0 =C2=A0n

Here, we'd expect "\n" to be treated as a literal "n&quo= t; character,
not "\n", but it is not (as you've found):

=C2=A0 =C2=A0 $ printf n | sed 'sn\nnn' | od -a -An
=C2=A0 =C2=A0 =C2=A0 =C2=A0n

----

In the code, the "match_slash" function [1] is used to find
the delimiters of the "s" command (typically "slashes")= .
Special handling happens if a slash is found [2],
And in lines 557-8 there's this conditional:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else if (ch =3D=3D 'n&= #39; && regex)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ch =3D '\n'= ;

Which forces any "\n" to be a new-line, regardless if the
delimiter itself was an "n".

[1] https://git.savannah.gnu.org/cgit/sed.git/tre= e/sed/compile.c#n531
[2] https://git.savannah.gnu.org/cgit/sed.git/tre= e/sed/compile.c#n552

In older sed versions, these two lines where protected by
"#ifndef REG_PERL" [3] so perhaps it had something to do with reg= ex variants. But the origin of this line predates the git history.
Jim/Paolo - any ideas what this relates to?

https:/= /git.savannah.gnu.org/cgit/sed.git/tree/sed/compile.c?id=3D41a169= a9a14b5bdc736313eb411f02bcbe1c046d#n551

---

Interestingly, removing these two lines does not cause
any test failures, so this might be easy to fix without causing
any regressions.


For now I'm leaving this item open until we decide how to deal with it.=

regards,
=C2=A0- assaf






--
O=C4=9Fuz

--000000000000e6347b05a2211dea-- From debbugs-submit-bounces@debbugs.gnu.org Tue Mar 31 09:26:14 2020 Received: (at 40242) by debbugs.gnu.org; 31 Mar 2020 13:26:14 +0000 Received: from localhost ([127.0.0.1]:35757 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJGuE-0003Q3-9s for submit@debbugs.gnu.org; Tue, 31 Mar 2020 09:26:14 -0400 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:40598 helo=us-smtp-delivery-1.mimecast.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jJGuD-0003Pm-8R for 40242@debbugs.gnu.org; Tue, 31 Mar 2020 09:26:13 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585661167; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R991Df435mbXB17YJD8qM6Bynw1uHTY6CGemcOh5JIA=; b=C14cOE1PTeMu5+KEyA80AR+XA/hnCMjFEqDP0P3LVMNQTFuKJp7yFel9q88P3rKGzmIPKj lXWB4VZ3m7Wu+EDzva1D4I/VVFinKf93ABWlEKQksePISfNhJpS52Cl8lZmT6QF5MKvisN BnfNUDjMWPBh6Vvsrdc3m8elTsq+tT0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-41-_quPDjL2MuO9D0Cn22rEEg-1; Tue, 31 Mar 2020 09:26:03 -0400 X-MC-Unique: _quPDjL2MuO9D0Cn22rEEg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A29238010F3; Tue, 31 Mar 2020 13:26:02 +0000 (UTC) Received: from [10.3.113.246] (ovpn-113-246.phx2.redhat.com [10.3.113.246]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 575CB5C1C5; Tue, 31 Mar 2020 13:26:02 +0000 (UTC) Subject: Re: bug#40242: n as delimiter alias To: =?UTF-8?B?T8SfdXo=?= , Assaf Gordon References: From: Eric Blake Organization: Red Hat, Inc. Message-ID: <7b29c654-5a69-f32b-5898-dc1bb7810c67@redhat.com> Date: Tue, 31 Mar 2020 08:26:01 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 40242 Cc: "40242@debbugs.gnu.org" <40242@debbugs.gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 3/31/20 2:00 AM, O=C4=9Fuz wrote: > Thanks for the reply. This might not be a bug though; I sent a similar ma= il > (https://www.mail-archive.com/austin-group-l@opengroup.org/msg05881.html) > to Austin Group mailing list asking what's the expected behavior in this > case, and I was told ( > https://www.mail-archive.com/austin-group-l@opengroup.org/msg05891.html) > both behaviors -yielding n or empty line- are correct and standard should > *probably* be amended to explicitly state that this is unspecified. And > apparently ( > https://www.mail-archive.com/austin-group-l@opengroup.org/msg05893.html) > some other UNIXes adopted the same practice as GNU sed (or vice versa, I > don't know which one is older). The POSIX folks will probably declare that use of a \X sequence (for=20 arbitrary X; 'n', 't', '1', and probably others all fit this category)=20 inside a regex delimited by X is unspecified behavior. But that still=20 doesn't stop us from fixing GNU set to at least be consistent - we=20 should either blindly declare that \X represents the special meaning of=20 X when such a meaning is present regardless of X also being the regex=20 delimiter (our current \n behavior - no way to represent the delimiter=20 as a literal match), or that use of X as a delimiter renders the special=20 meaning of \X useless for that regex (our \t behavior - no way to=20 represent the special behavior as part of the match). My personal=20 preference is making things consistent to our \t behavior. >> In the code, the "match_slash" function [1] is used to find >> the delimiters of the "s" command (typically "slashes"). >> Special handling happens if a slash is found [2], >> And in lines 557-8 there's this conditional: >> >> else if (ch =3D=3D 'n' && regex) >> ch =3D '\n'; >> >> Which forces any "\n" to be a new-line, regardless if the >> delimiter itself was an "n". >> >> Interestingly, removing these two lines does not cause >> any test failures, so this might be easy to fix without causing >> any regressions. >> >> >> For now I'm leaving this item open until we decide how to deal with it. I'm thus in favor of removing that special-case of 'n'. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 24 02:25:27 2022 Received: (at 40242-done) by debbugs.gnu.org; 24 Oct 2022 06:25:27 +0000 Received: from localhost ([127.0.0.1]:46842 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1omqtq-0003b2-S3 for submit@debbugs.gnu.org; Mon, 24 Oct 2022 02:25:27 -0400 Received: from mail-lf1-f50.google.com ([209.85.167.50]:34367) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1omqtn-0003aj-JO for 40242-done@debbugs.gnu.org; Mon, 24 Oct 2022 02:25:25 -0400 Received: by mail-lf1-f50.google.com with SMTP id a29so15175312lfo.1 for <40242-done@debbugs.gnu.org>; Sun, 23 Oct 2022 23:25:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TpDuf14dyBq8GtAUWug7qWIx4biUl76bQaex+8UYc3Q=; b=aOgmCf2Onqwq3B3b+/DcRWTw5B9mlZJ83+U70txE3qVYGruYrW92yPKuWvCRlQCKjr xTsXsY4vnccTxodS1wyg2AogqTh1/LhcRrZxmNltU9DxKgJXdN4iMGK4MVBaHJjQfHsX 8eU1Z90LeyGUyJz1wagMfwLth52iZbYfs+QEXVC1YSaYKSuBjDn3/octGmAFcKYMrCfj WhnnnjLd/xaDN9bPFq3D8haimU2F0VU83zGVg9vfIVeOquj0Z0s20nkqoGlJauDSkRw7 f2z0EYci56p57RpuknFo+pwmZDTuFfcDrkFIUZEwjGlJ8wvKGAIJzzfUJU6WEmAMOv7V AuvQ== X-Gm-Message-State: ACrzQf2/Prtg8OlgRfx4YbuWFRwRKPJ/erhEQQ8Gj1/R9Yk4XJkPVCXI pUxThvBatxK3vQi1W7ATESNFxJxqk7ORzsrpLm8= X-Google-Smtp-Source: AMsMyM7HnD98YNZbyVQdFrgWRwQCEDXEEr6JmV37FPRsIqOng7D7sgmjiZ7QUUYrDI9zBMtbYWJi+KxwN5ACSNWCo7M= X-Received: by 2002:a05:6512:4002:b0:4a2:6243:8384 with SMTP id br2-20020a056512400200b004a262438384mr10813063lfb.29.1666592717545; Sun, 23 Oct 2022 23:25:17 -0700 (PDT) MIME-Version: 1.0 References: <7b29c654-5a69-f32b-5898-dc1bb7810c67@redhat.com> In-Reply-To: <7b29c654-5a69-f32b-5898-dc1bb7810c67@redhat.com> From: Jim Meyering Date: Sun, 23 Oct 2022 23:25:05 -0700 Message-ID: Subject: Re: bug#40242: n as delimiter alias To: Eric Blake Content-Type: multipart/mixed; boundary="000000000000eab9e105ebc1da9d" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 40242-done Cc: 40242-done@debbugs.gnu.org, Assaf Gordon , =?UTF-8?B?T8SfdXo=?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) --000000000000eab9e105ebc1da9d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Mar 31, 2020 at 6:36 AM Eric Blake wrote: > On 3/31/20 2:00 AM, O=C4=9Fuz wrote: > > Thanks for the reply. This might not be a bug though; I sent a similar = mail > > (https://www.mail-archive.com/austin-group-l@opengroup.org/msg05881.htm= l) > > to Austin Group mailing list asking what's the expected behavior in thi= s > > case, and I was told ( > > https://www.mail-archive.com/austin-group-l@opengroup.org/msg05891.html= ) > > both behaviors -yielding n or empty line- are correct and standard shou= ld > > *probably* be amended to explicitly state that this is unspecified. And > > apparently ( > > https://www.mail-archive.com/austin-group-l@opengroup.org/msg05893.html= ) > > some other UNIXes adopted the same practice as GNU sed (or vice versa, = I > > don't know which one is older). > > The POSIX folks will probably declare that use of a \X sequence (for > arbitrary X; 'n', 't', '1', and probably others all fit this category) > inside a regex delimited by X is unspecified behavior. But that still > doesn't stop us from fixing GNU set to at least be consistent - we > should either blindly declare that \X represents the special meaning of > X when such a meaning is present regardless of X also being the regex > delimiter (our current \n behavior - no way to represent the delimiter > as a literal match), or that use of X as a delimiter renders the special > meaning of \X useless for that regex (our \t behavior - no way to > represent the special behavior as part of the match). My personal > preference is making things consistent to our \t behavior. > > >> In the code, the "match_slash" function [1] is used to find > >> the delimiters of the "s" command (typically "slashes"). > >> Special handling happens if a slash is found [2], > >> And in lines 557-8 there's this conditional: > >> > >> else if (ch =3D=3D 'n' && regex) > >> ch =3D '\n'; > >> > >> Which forces any "\n" to be a new-line, regardless if the > >> delimiter itself was an "n". > >> > > >> Interestingly, removing these two lines does not cause > >> any test failures, so this might be easy to fix without causing > >> any regressions. > >> > >> > >> For now I'm leaving this item open until we decide how to deal with it= . > > I'm thus in favor of removing that special-case of 'n'. Thank you all. Sorry it's taken so long. I expect to push the following tomorrow. --000000000000eab9e105ebc1da9d Content-Type: application/octet-stream; name="sed-tweak.diff" Content-Disposition: attachment; filename="sed-tweak.diff" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_l9me4ztv0 Y29tbWl0IDU5YzZmMTdhNzE5YjU0MjVmNGRlZmIzZDkyMTkyMTBjZmM2NzY3MDkKQXV0aG9yOiBK aW0gTWV5ZXJpbmcgPGppbUBtZXllcmluZy5uZXQ+CkRhdGU6ICAgU3VuIE9jdCAyMyAwOTo1ODo0 MCAyMDIyIC0wNzAwCgogICAgdGVzdDogYWRkIGEgdGVzdCBjYXNlIGFuZCBtZW50aW9uIHRoZSBj aGFuZ2UgaW4gTkVXUwogICAgCiAgICAqIHRlc3RzdWl0ZS9taXNjLnBsOiBBZGQgYSB0ZXN0IHRv IGV4ZXJjaXNlIHRoZSBwcmVjZWRpbmcgY2hhbmdlLgogICAgKiBORVdTIChDaGFuZ2VzIGluIGJl aGF2aW9yKTogTWVudGlvbiBpdC4KCmRpZmYgLS1naXQgYS9ORVdTIGIvTkVXUwppbmRleCBiZGFi YzY4Li44NDI4ZjMxIDEwMDY0NAotLS0gYS9ORVdTCisrKyBiL05FV1MKQEAgLTI5LDYgKzI5LDE0 IEBAIEdOVSBzZWQgTkVXUyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC0qLSBv dXRsaW5lIC0qLQogICBUaGUgJ3InIGNvbW1hbmQgbm93IGFjY2VwdHMgYWRkcmVzcyAwLCBhbGxv d2luZyBpbnNlcnRpbmcgYSBmaWxlIGJlZm9yZQogICB0aGUgZmlyc3QgbGluZS4KCisqKiBDaGFu Z2VzIGluIGJlaGF2aW9yCisKKyAgIFNlZCBub3cgcHJpbnRzIHRoZSBsZXNzLXN1cnByaXNpbmcg dmFyaWFudCBpbiBhIGNvcm5lciBjYXNlIG9mCisgICBQT1NJWC11bnNwZWNpZmllZCBiZWhhdmlv ci4gIEJlZm9yZSwgdGhpcyB3b3VsZCBwcmludCAibiIuCisgICBOb3csIGl0IHByaW50cyAiWCI6 CisKKyAgICBwcmludGYgbiB8IHNlZCAnc25cbm5Ybic7IGVjaG8KKwoKICogTm90ZXdvcnRoeSBj aGFuZ2VzIGluIHJlbGVhc2UgNC44ICgyMDIwLTAxLTE0KSBbc3RhYmxlXQoKZGlmZiAtLWdpdCBh L3Rlc3RzdWl0ZS9taXNjLnBsIGIvdGVzdHN1aXRlL21pc2MucGwKaW5kZXggNjRhYzU3Yi4uZTYw YjgxMiAxMDA2NDQKLS0tIGEvdGVzdHN1aXRlL21pc2MucGwKKysrIGIvdGVzdHN1aXRlL21pc2Mu cGwKQEAgLTExOTcsNiArMTE5Nyw4IEBAIHMsLipbXlwvXSwsCiAgICAgIFsnYnVnMzA3OTRfMScs ICJzL3ovXFxcXHg1Y0EvIiwgIHtJTj0+J3onfSwge09VVCA9PiAiXFxBIn1dLAogICAgICBbJ2J1 ZzMwNzk0XzInLCAicy96L1xcXFx4NWMvIiwgICB7SU49Pid6J30sIHtPVVQgPT4gIlxcIn1dLAog ICAgICBbJ2J1ZzMwNzk0XzMnLCAicy96L1xcXFx4NWMxLyIsICB7SU49Pid6J30sIHtPVVQgPT4g IlxcMSJ9XSwKKworICAgICBbJ2J1ZzQwMjQyJywgcSgnc25cbm5YbicpLCAge0lOPT4nbid9LCB7 T1VUID0+ICdYJ31dLAogICAgICk7CgogbXkgJHNhdmVfdGVtcHMgPSAkRU5We1NBVkVfVEVNUFN9 OwoKY29tbWl0IGY1YWJlYmE0OGU0YzkzZmM4ZGY0YzgxZTdhZTJhNjFmNDY5ZWQ1ZTEKQXV0aG9y OiBPxJ91eiA8b2d1emlzbWFpbHV5c2FsQGdtYWlsLmNvbT4KRGF0ZTogICBTdW4gT2N0IDIzIDA5 OjUxOjM3IDIwMjIgLTA3MDAKCiAgICBzZWQ6IGhhbmRsZSB0aGUgdW5zcGVjaWZpZWQgIm4gYXMg ZGVsaW1pdGVyIGFsaWFzIiBjYXNlIG1vcmUgc2Vuc2libHkKICAgIAogICAgUHJpbnQgdGhlIGxl c3Mtc3VycHJpc2luZyB2YXJpYW50IGluIGEgY29ybmVyIGNhc2Ugb2YgUE9TSVgtdW5zcGVjaWZp ZWQKICAgIGJlaGF2aW9yLiAgQmVmb3JlLCB0aGlzIHdvdWxkIHByaW50ICJuIi4gIE5vdywgaXQg cHJpbnRzICJYIjoKICAgICAgcHJpbnRmIG4gfCBzZWQgJ3NuXG5uWG4nOyBlY2hvCiAgICAqIHNl ZC9jb21waWxlLmMgKG1hdGNoX3NsYXNoKTogUmVtb3ZlIHNwZWNpYWwgaGFuZGxpbmcgb2YgJ24n LgogICAgUmVwb3J0ZWQgaW4gaHR0cHM6Ly9idWdzLmdudS5vcmcvNDAyNDIKCmRpZmYgLS1naXQg YS9zZWQvY29tcGlsZS5jIGIvc2VkL2NvbXBpbGUuYwppbmRleCAxOTQyZjNiLi5mOTZmYmNhIDEw MDY0NAotLS0gYS9zZWQvY29tcGlsZS5jCisrKyBiL3NlZC9jb21waWxlLmMKQEAgLTU1Nyw4ICs1 NTcsNiBAQCBtYXRjaF9zbGFzaCAoaW50IHNsYXNoLCBpbnQgcmVnZXgpCiAgICAgICAgICAgICAg IGNoID0gaW5jaGFyICgpOwogICAgICAgICAgICAgICBpZiAoY2ggPT0gRU9GKQogICAgICAgICAg ICAgICAgIGJyZWFrOwotICAgICAgICAgICAgICBlbHNlIGlmIChjaCA9PSAnbicgJiYgcmVnZXgp Ci0gICAgICAgICAgICAgICAgY2ggPSAnXG4nOwogICAgICAgICAgICAgICBlbHNlIGlmIChjaCAh PSAnXG4nICYmIChjaCAhPSBzbGFzaCB8fCAoIXJlZ2V4ICYmIGNoID09ICcmJykpKQogICAgICAg ICAgICAgICAgIGFkZDFfYnVmZmVyIChiLCAnXFwnKTsKICAgICAgICAgICAgIH0K --000000000000eab9e105ebc1da9d-- From unknown Thu Sep 11 04:20:45 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 21 Nov 2022 12:24:09 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator