From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 25 12:06:44 2022
Received: (at submit) by debbugs.gnu.org; 25 Apr 2022 16:06:44 +0000
Received: from localhost ([127.0.0.1]:35923 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1nj1EW-0007UH-VF
	for submit@debbugs.gnu.org; Mon, 25 Apr 2022 12:06:44 -0400
Received: from lists.gnu.org ([209.51.188.17]:49988)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eblake@redhat.com>) id 1nj1EU-0007U8-HY
 for submit@debbugs.gnu.org; Mon, 25 Apr 2022 12:06:40 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:37320)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eblake@redhat.com>) id 1nj1ES-0006qa-Oy
 for bug-sed@gnu.org; Mon, 25 Apr 2022 12:06:38 -0400
Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:34837)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <eblake@redhat.com>) id 1nj1EP-0005ZG-8z
 for bug-sed@gnu.org; Mon, 25 Apr 2022 12:06:35 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1650902791;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 in-reply-to:in-reply-to:references:references;
 bh=9XC6wpFB0uk7DT0HWtMkAkrBRIxrkmjlkaC1RyNMltc=;
 b=JkfKE9FhwbuFk1aBmB+1kkKuCHPsa8FDamfHJT1mVXTRVIPfI1ofOsa+YeJ1AR/k0OxRhP
 bne3R/wvBdZkETL3yCVOIJoCtuumPatIwjsxUbDmWkYV7Ayc8+r3Wed6G0kKwr4qouDaUy
 Ip837/jNrocCDbwWdt6kJsIlQREL0FE=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-240-QgBcmoPFPOahA2e2TQ5btw-1; Mon, 25 Apr 2022 12:06:26 -0400
X-MC-Unique: QgBcmoPFPOahA2e2TQ5btw-1
Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com
 [10.11.54.1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3594C811E83;
 Mon, 25 Apr 2022 16:06:26 +0000 (UTC)
Received: from redhat.com (unknown [10.2.16.160])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id B79BC40CF910;
 Mon, 25 Apr 2022 16:06:25 +0000 (UTC)
Date: Mon, 25 Apr 2022 11:06:23 -0500
From: Eric Blake <eblake@redhat.com>
To: Christoph Anton Mitterer <calestyo@scientia.org>
Subject: Re: [Issue 8 drafts 0001556]: clarify meaning of \n used in a
 bracket expression in a sed context address or s-command
Message-ID: <20220425160623.ame3mg3waibjhzpn@redhat.com>
References: <4acab9a9f9622d1235d17b84d4640e68@austingroupbugs.net>
 <03968d54a7a17e6734c90492f8027488faa01a3a.camel@scientia.org>
MIME-Version: 1.0
In-Reply-To: <03968d54a7a17e6734c90492f8027488faa01a3a.camel@scientia.org>
User-Agent: NeoMutt/20220415-26-c08bba
X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1
Authentication-Results: relay.mimecast.com;
 auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=eblake@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Received-SPF: pass client-ip=170.10.133.124; envelope-from=eblake@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -28
X-Spam_score: -2.9
X-Spam_bar: --
X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
Cc: Geoff Clare <gwc@opengroup.org>, bug-sed@gnu.org,
 austin-group-l@opengroup.org
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

Adding bug-sed@gnu.org into this conversation.

On Mon, Apr 25, 2022 at 02:50:22AM +0200, Christoph Anton Mitterer via austin-group-l at The Open Group wrote:
> Hey.
> 
> Geoff, I haven't had time yet to look at your updated proposal of
> #1550, not sure whether I manage to do it this night or in the next
> days.
> But I'll definitely reply, so please be a bit more patient. :-)
> 
> 
> However, on thing came to my minds again, which I think needs further
> discussion...
> 
> 
> 
> The current "solution" to a number of previous problems is:
> 
> Inside a bracket expression there cannot be any escape sequences.
> Therefore, there cannot be any \n (in the sense of <newline>) nor any
> \c (in the sense of "un-delimitering" the delimiter character c).
> 
> 
> While this is per se perfectly valid (and solves numerous issues), it
> has one problem:
> 
> (at least) GNU sed breaks it already!
> 
> 
> 
> As you noted yourself in
> https://www.austingroupbugs.net/view.php?id=1556#c5621
> 
> it requires POSIXLY_CORRECT=1 to work as it should.
> 
> $ printf 'a\\b\n' | sed 's/a[\n]b/X/'
> a\b
> $ printf 'a\nb\n' | sed 's/a[\n]b/X/'
> a
> b
> $ printf 'a\nb\n' | sed -z 's/a[\n]b/X/'
> X
> $ printf 'anb\n' | sed 's/a[\n]b/X/'
> anb
> $ export POSIXLY_CORRECT=1
> $ printf 'a\\b\n' | sed 's/a[\n]b/X/'
> X
> $ printf 'a\nb\n' | sed 's/a[\n]b/X/'
> a
> b
> $ printf 'a\nb\n' | sed -z 's/a[\n]b/X/'
> a
> b
> $ printf 'anb\n' | sed 's/a[\n]b/X/'
> X
> $ 
> 
> 
> NOT so for GNU's extension of '\s':
> '\s'
>      Matches whitespace characters (spaces and tabs).  Newlines
>      embedded in the pattern/hold spaces will also match...
> (and I assume neither for any similar such extensions):
> 
> $ printf 'asb\n' | sed 's/a[\s]b/X/'
> X
> $ printf 'a\\b\n' | sed 's/a[\s]b/X/'
> X
> $ printf 'a b\n' | sed 's/a[\s]b/X/'
> a b
> $ export POSIXLY_CORRECT=1
> $ printf 'asb\n' | sed 's/a[\s]b/X/'
> X
> calestyo@heisenberg:~$ printf 'a\\b\n' | sed 's/a[\s]b/X/'
> X
> calestyo@heisenberg:~$ printf 'a b\n' | sed 's/a[\s]b/X/'
> a b
> $
> 
> 
> It also works as expected for escaped delimiter characters:
> $ printf 'aDb\n' | sed 'sDa[\D]bDXD'
> X
> $ printf 'a\\b\n' | sed 'sDa[\D]bDXD'
> X
> 
> even when the delimiter char has also special meaning when escaped (as
> with '\s'):
> $ printf 'asb\n' | sed 'ssa[\s]bsXs'
> X
> $ printf 'a\\b\n' | sed 'ssa[\s]bsXs'
> X
> $ printf 'a b\n' | sed 'ssa[\s]bsXs'
> a b
> 
> 
> (all the above with GNU sed 4.8).
> 
> 
> So the only problematic case seems to be '\n'.
> 
> 
> 
> I don't want to step on anyone's toes... but GNU sed is probably one of
> the (if not the) major implementation of sed, isn't it?
> 
> 
> And regardless of POSIXLY_CORRECT, the standard describes now a
> behaviour (namely that the bracket expression [\n] is the literal
> characters '\' or 'n' and *not* <newline>)... which is not shared by a
> major implementation, at least not with its default settings.
> 
> Anyone who reads the standard would assume that [\n] is not a
> <newline>. 
> And of course we could just say "well your implementation is not
> compliant" or "look at it's documentation, where it says about
> POSIXLY_CORRECT" ... but that doesn't seem so good to me.
> 
> Usually, implementations extend POSIX rather gracefully, but this is a
> more serious deviation.
> 
> 
> I mean should we just leave it at that?
> 
> Or should we add some hint, e.g. indicating that portable applications
> should not use '\n' but rather 'n\' ... or perhaps even generally place
> '\' last in the bracket expression?
> 
> 
> The best would of course be to get GNU change it's behaviour, though I
> have no idea how likely that is ;-)
> 
> I had tried to reach out to GNU and BusyBox sed maintainers before, and
> while I got replies from BusyBox' I couldn't get in touch with GNU's.
> 
> Is there anyone who's in contact with these people?

The GNU sed developers can be reached at bug-sed@gnu.org (per the
output of 'sed --help', and as done in this email).

So if I'm restating your complaint correctly, you are worried that GNU
sed's non-POSIX behavior (what you get by default when POSIXLY_CORRECT
is not set) treats the four-byte sequence '[\n]' in an s-command regex
as a bracket expression for the single character of a literal newline
(that is, interpreting \n as an escape sequence even though it is
inside a bracket expression), instead of as a bracket expression for
either of a literal backslash or literal n; but concur that its
behavior when being POSIX-compliant matches the POSIX rules.

POSIX can't control what GNU sed does when in non-POSIX mode.  But it
can document a recommendation to spell the bracket expression intended
to match either a backslash or an n in the order [n\] to avoid any
potential confusion with [\n] being interpreted as an escape sequence.

Or am I missing something else that you are proposing that either the
Austin Group should do in its documentation efforts, and/or which GNU
sed should do to comply with the recent Austin Group recommendations?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


From debbugs-submit-bounces@debbugs.gnu.org Mon Apr 25 19:45:21 2022
Received: (at submit) by debbugs.gnu.org; 25 Apr 2022 23:45:21 +0000
Received: from localhost ([127.0.0.1]:36331 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1nj8OO-00040R-VJ
	for submit@debbugs.gnu.org; Mon, 25 Apr 2022 19:45:21 -0400
Received: from lists.gnu.org ([209.51.188.17]:37274)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <calestyo@scientia.org>) id 1nj8ON-00040J-0a
 for submit@debbugs.gnu.org; Mon, 25 Apr 2022 19:45:19 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:44912)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <calestyo@scientia.org>)
 id 1nj8OM-0007fX-MY
 for bug-sed@gnu.org; Mon, 25 Apr 2022 19:45:18 -0400
Received: from cyan.elm.relay.mailchannels.net ([23.83.212.47]:18461)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <calestyo@scientia.org>)
 id 1nj8OK-00007A-GI
 for bug-sed@gnu.org; Mon, 25 Apr 2022 19:45:18 -0400
X-Sender-Id: instrampxe0y3a|x-authuser|calestyo@scientia.org
Received: from relay.mailchannels.net (localhost [127.0.0.1])
 by relay.mailchannels.net (Postfix) with ESMTP id 4A09E5A1477;
 Mon, 25 Apr 2022 23:44:30 +0000 (UTC)
Received: from cpanel-007-fra.hostingww.com (unknown [127.0.0.6])
 (Authenticated sender: instrampxe0y3a)
 by relay.mailchannels.net (Postfix) with ESMTPA id F18575A14EB;
 Mon, 25 Apr 2022 23:44:28 +0000 (UTC)
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1650930269; a=rsa-sha256;
 cv=none;
 b=0FXK5dGqcpCzIuZ1XDVWQp5R/LccltRMq/YpOFVpJ/Th7uqxeIN5jKYHukcJvb+tSbqT4M
 8Zpl4kmL/24Wa841VbJ/7Ux0qStnt2Y02nHH0EecGUnr3j0sKilYEuVpN4d4n5vSMERtTG
 AM1Bnr+J2dr56TII0g6g5dWWFFeyrRB9pEqA0Av24FAidlKvA1mE1QL49KBgBlJx2ltbJu
 08Z0Cb5dRkLc1P5PAuHVmbpjmP0oBWhxRLb5nUErn7FODGWJXRfa2F1qjeEhrWpQJf7rDH
 YgkHjYHWR605oC3zOgCMK1gyd9MgfOe2CS3gmW7ghWJKmHCemluwnxbB1j/ytA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=mailchannels.net; s=arc-2022; t=1650930269;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=UeohsSLhpaRpNuongNuNgbFzB6uzoOHha0172zY+ZdQ=;
 b=kaXJs1MRSH4lzk/29xtLzEPwrr4lqTKhshINs5mh9AJCCdeVvhsxsDDD+XCW1ndUrVV7yQ
 5LEqPb4C7hxWSQgEbN5mTT9OkFcCwEgo72MHMXHfM4+XJ3VOHHtqroVzjR0ZbTHa76gMXm
 uR+N++h5eGmUjsKfF9c977dUGASXDoboObNV+0ddNLLA83ufGbT4vCKKPKnLIMDP74l8pa
 TgCmMJUDeReujbQjvJonLIVyDcZLctETGtroVM13ZCsp35+CrNMQzOkTirAuONmGQAgA5K
 FRA/nol/PI5jdYrCIcwkTN5184oX9VBtCqLBAcoBmpqMMujnSjB6wuaY8P7NZQ==
ARC-Authentication-Results: i=1; rspamd-6dfbdcb948-27j8q;
 auth=pass smtp.auth=instrampxe0y3a smtp.mailfrom=calestyo@scientia.org
X-Sender-Id: instrampxe0y3a|x-authuser|calestyo@scientia.org
Received: from cpanel-007-fra.hostingww.com (cpanel-007-fra.hostingww.com
 [3.69.87.180])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384)
 by 100.115.45.43 (trex/6.7.1); Mon, 25 Apr 2022 23:44:30 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: instrampxe0y3a|x-authuser|calestyo@scientia.org
X-MailChannels-Auth-Id: instrampxe0y3a
X-Broad-Bubble: 2a51c947232d49b4_1650930269954_3019630636
X-MC-Loop-Signature: 1650930269954:3857779380
X-MC-Ingress-Time: 1650930269954
Received: from ppp-46-244-247-121.dynamic.mnet-online.de
 ([46.244.247.121]:57328 helo=heisenberg.fritz.box)
 by cpanel-007-fra.hostingww.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95)
 (envelope-from <calestyo@scientia.org>) id 1nj8NT-0003z1-Qe;
 Mon, 25 Apr 2022 23:44:27 +0000
Message-ID: <4f7ed03332eaecca1c251cadaf03f6df094ecbb9.camel@scientia.org>
Subject: Re: [Issue 8 drafts 0001556]: clarify meaning of \n used in a
 bracket expression in a sed context address or s-command
From: Christoph Anton Mitterer <calestyo@scientia.org>
To: Eric Blake <eblake@redhat.com>
Date: Tue, 26 Apr 2022 01:44:21 +0200
In-Reply-To: <20220425160623.ame3mg3waibjhzpn@redhat.com>
References: <4acab9a9f9622d1235d17b84d4640e68@austingroupbugs.net>
 <03968d54a7a17e6734c90492f8027488faa01a3a.camel@scientia.org>
 <20220425160623.ame3mg3waibjhzpn@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.44.1-2 
MIME-Version: 1.0
X-OutGoing-Spam-Status: No, score=-1.0
X-AuthUser: calestyo@scientia.org
Received-SPF: pass client-ip=23.83.212.47; envelope-from=calestyo@scientia.org;
 helo=cyan.elm.relay.mailchannels.net
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.3 (-)
X-Debbugs-Envelope-To: submit
Cc: Geoff Clare <gwc@opengroup.org>, bug-sed@gnu.org,
 austin-group-l@opengroup.org
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

Hey Eric.

On Mon, 2022-04-25 at 11:06 -0500, Eric Blake wrote:
> The GNU sed developers can be reached at bug-sed@gnu.org=C2=A0(per the
> output of 'sed --help', and as done in this email).

Ah, I think I had written to sed-devel in January.


> So if I'm restating your complaint correctly

"complaint" is a bit harsh ;-) ... it's not my intention to step on
anyone's toes, just to hoping to help with portability.


> you are worried that GNU
> sed's non-POSIX behavior (what you get by default when
> POSIXLY_CORRECT
> is not set)

Speaking of POSIXLY_CORRECT ... I'm not sure how much that really helps
in practise.

First, the reality probably is that most users won't read the info page
from top to bottom and even if they do, it's not for sure that they
really understand the implications of e.g. '[\n]' and that they'd need
to use POSIXLY_CORRECT.
Sure you can argue now that this is then the fault of the user, but I
don't think that this helps in practise.

Second, (sed) scripts may flow in both directions, i.e. from an
implementation that is (per default) POSIXly correct to GNU sed (which
per default is not) - and vice versa.
So when such script comes from a non-GNU-sed and uses '[\n]' in the
strict POSIX sense, it would likely be just used as is with GNU sed,
that it has different semantics is possibly not immaculately visible,
as there's no error or so, and thus people probably won't realise that
they'd need to set POSIXLY_CORRECT non empty for such "foreign"
scripts.
An the same would likely happen in the other direction. The average GNU
sed user may perhaps never notice that '[\n]' being newline is a GNU
speciality unless he knows the standard well. If that is then used on a
non-GNU-sed, sematics change again.


> treats the four-byte sequence '[\n]' in an s-command regex
> as a bracket expression for the single character of a literal newline
> (that is, interpreting \n as an escape sequence even though it is
> inside a bracket expression), instead of as a bracket expression for
> either of a literal backslash or literal n; but concur that its
> behavior when being POSIX-compliant matches the POSIX rules.

I guess it's at least quite unfortunate that it does so.
Especially because GNU seems to really do this only with sed, e.g. grep
(with POSIXLY_CORRECT UNset) seems to interpret '[\n]' POSIXly
correct...

$ printf 'a\nb' | grep -z '^a[\n]b$' ; echo

$ printf 'a\\b' | grep -z '^a[\n]b$' ; echo
a\b
$ printf 'anb' | grep -z '^a[\n]b$' ; echo
anb

... which I'd blindly guess is also not necessarily clear to the
average GNU grep/sed user.


> POSIX can't control what GNU sed does when in non-POSIX mode.

Sure... and even if it would do so in POSIX mode, there's no POSIX
police ;-)

Nevertheless... in practise most people will just assume that the
default mode is mostly POSIX compliant, except perhaps for "graceful"
extensions.

All these GNU extensions (like '\+' and friends for BREs... or '\s' and
friends for BREs and EREs) still work nicely with POSIX, cause POSIX
says that these produce undefined results, so if someone really wanted
to be portable, he didn't use it.

But this is different for the sed + '[\n]' case. Some who restricted
himself to just POSIX would still get into troubles.

And sure, strictly speaking you're of course right, and only with
POSIXLY_CORRECT non-empty, GNU sed is guaranteed to behave so - but
again, I'd blindly guess that in practise that goes quite easily
unnoticed.


> But it
> can document a recommendation to spell the bracket expression
> intended
> to match either a backslash or an n in the order [n\] to avoid any
> potential confusion with [\n] being interpreted as an escape
> sequence.

The problem remains of course for any scripts which are written&tested
with sed implementations that behave the other way and which are then
used with GNU sed.

The best (for portability) would probably if GNU sed could change the
behaviour, but I see of course that unfortunately this is likely not
easily possible either.

I just searched the sed info page... and that seems to basically say:
> '[LIST]'
> '[^LIST]'
>     Matches any single character in LIST: for example, '[aeiou]'
>     matches all vowels.  A list may include sequences like
>     'CHAR1-CHAR2', which matches any character between (inclusive)
>     CHAR1 and CHAR2.  *Note Character Classes and Bracket
>     Expressions::.
...
a bit further down
...
> '\n'
>      Matches the newline character.

IMO, that's however "outside" of the part for bracket expressions,
because everything else that is described on the same level (like '\+'
or '\DIGIT') is clearly *not* intended to work inside GNU sed bracket
expression, right?

However later in "5.5 Character Classes and Bracket Expressions":
> Also, when not in 'POSIXLY_CORRECT' mode, special escapes like '\n'
> and '\t' are recognized within LIST.  *Note Escapes::.

So I guess at this point it's game over and GNU sed could never really
change behaviour without breaking gazillion things.


btw: I'd hope that these \<char> escape sequences produce at least all
the literal <char>, when <char> is also the delimiter.


> Or am I missing something else that you are proposing that either the
> Austin Group should do in its documentation efforts, and/or which GNU
> sed should do to comply with the recent Austin Group recommendations?

Well I guess given that GNU sed explicitly documented this behaviour
for the non-'POSIXLY_CORRECT'-mode) means that there cannot anything be
done than documenting it as good as possible (on both sides).

Perhaps better to use '\\' for any literally meant <backslash>, than to
just put it at the end of the list, cause some implementations could
also think about giving special meaning to '\]'.


Really unfortunate though, especially that it's then not even
consistent across GNU (i.e. also in GNU sed).


Thanks,
Chris.