From unknown Tue Jun 17 01:48:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52252: Escape sequences treated as visible characters, long characters split Resent-From: Fabian =?UTF-8?Q?R=C3=B6ling?= Original-Sender: "Debbugs-submit" Resent-CC: bug-coreutils@gnu.org Resent-Date: Thu, 02 Dec 2021 23:59:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 52252 X-GNU-PR-Package: coreutils X-GNU-PR-Keywords: To: 52252@debbugs.gnu.org X-Debbugs-Original-To: bug-coreutils@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.16384895347587 (code B ref -1); Thu, 02 Dec 2021 23:59:02 +0000 Received: (at submit) by debbugs.gnu.org; 2 Dec 2021 23:58:54 +0000 Received: from localhost ([127.0.0.1]:49659 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1msvyY-0001yJ-0G for submit@debbugs.gnu.org; Thu, 02 Dec 2021 18:58:54 -0500 Received: from lists.gnu.org ([209.51.188.17]:37970) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1msvuL-0001rn-8I for submit@debbugs.gnu.org; Thu, 02 Dec 2021 18:54:35 -0500 Received: from eggs.gnu.org ([209.51.188.92]:56296) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1msvuK-00012A-WE for bug-coreutils@gnu.org; Thu, 02 Dec 2021 18:54:33 -0500 Received: from [2607:f8b0:4864:20::92e] (port=46842 helo=mail-ua1-x92e.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1msvu5-0007br-Uv for bug-coreutils@gnu.org; Thu, 02 Dec 2021 18:54:32 -0500 Received: by mail-ua1-x92e.google.com with SMTP id az37so2141909uab.13 for ; Thu, 02 Dec 2021 15:54:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=ey1l5vJMtWxvcHdAb32WmlYwaRpwrm3/wTiUsMzYYbY=; b=qzJdx/otQZp+/aUjVRunrK8JHyunc+cFU/6J++APv79csAC24LrezTCDfqBgjTMCBN UgLM9+uRtSXepfZxy1bOIKFNwCl5W+valfm2uvc6zHWtn7sqQVEpveXCOTfFJmYhU3Oq SsXOJeWISuUfQW0Og4BOKT7xbDoHHBi8H7CIMf0YFOM2SUsMy8JIiiKmVJ+gEwe1sfYR tr22fIzp24o1tCfCzVjVVUoL4hziVUluS4o/idbP6+u5FgunofQdHh1uy995hnRuWt8A pfiLn8rk3kf9IPK2qgKATsmz2xEO2l2xqZKNh6HUSlg3oSVIgd+N50QrZGWz55RjS3kY P4lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ey1l5vJMtWxvcHdAb32WmlYwaRpwrm3/wTiUsMzYYbY=; b=ZPOIDdufl74lGZ7w22Kbgc+qwcSUgfwXCg4zN+OUVfVA6sTWQBIIcQsqxBOVRvYoe4 NvooBBxYf/6/h4kTdgHwUAdevADoSSNVDQCVYmBzZQ3uUnbAQehHmB14QTctiRVcRN7j 3FLJtlmsQsf5kdKBlSjYp5gp1OnsQiNlr/YKqMHIrubyIAQfoyZqc0qbJjnAVJsHd5TN BygucZWUh2yIuUybr8vzUWtGznc3IMDWUiobGz45NYmU5LlNBUaDb6ZRGpi3Y+w98t1t 3bjqTDWHuWUexj063Q/SKWzJYeqNogXZQecip9gE68RdE4txKfHGWccIjfAGuUT5TASp B5nQ== X-Gm-Message-State: AOAM533Xfvf51fP4pUXTuFAT8r1JVilhhI99d9lGLE4WT/YUvubeEEnp +Nh9R/cMM4j4basKCZ9NF9PDnAYB3nCZzZic7LHC04pMKfjINw== X-Google-Smtp-Source: ABdhPJwx3Z0Dfk79ycsB52Non97/kS9BWhVFMGhbMc46Xx2Q2Nyur56JbFnUynRiy1URpiPpXQk54bIfElXCu8yM0Tg= X-Received: by 2002:a05:6130:30a:: with SMTP id ay10mr19959311uab.135.1638489256368; Thu, 02 Dec 2021 15:54:16 -0800 (PST) MIME-Version: 1.0 From: Fabian =?UTF-8?Q?R=C3=B6ling?= Date: Fri, 3 Dec 2021 00:53:40 +0100 Message-ID: Content-Type: multipart/alternative; boundary="000000000000181fb805d2328285" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::92e (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::92e; envelope-from=fabianroeling@googlemail.com; helo=mail-ua1-x92e.google.com X-Spam_score_int: 14 X-Spam_score: 1.4 X-Spam_bar: + X-Spam_report: (1.4 / 5.0 requ) BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, PDS_HP_HELO_NORDNS=0.001, RDNS_NONE=0.793, SPF_PASS=-0.001, T_SPF_HELO_TEMPERROR=0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -0.0 (/) X-Mailman-Approved-At: Thu, 02 Dec 2021 18:58:52 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --000000000000181fb805d2328285 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable "fold" includes the character '\e' and each of the following characters that are used to format text as if they were all visible. Example: echo "a\e[3mb\e[0mc" | fold -w1 Result: a *b* c It still formats correctly, it just breaks too early. Special characters can even be broken up. Example: echo "=E5=90=9B=E3=81=AE=E5=90=8D=E3=81=AF" | fold -w1 Result: =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD =EF=BF=BD In case this appears invisible or otherwise different: It's 12 lines with one "tofu"/U+FFFD/replacement character each. I would maybe expect issues like this when the "-b" option is given, since that counts bytes, but it happens even without it. While trying to work around this, I noticed a similar issue with "wc": echo -e "\e[3ma\e[0m" | wc -cm 10 10 I have not investigated this further so far. --000000000000181fb805d2328285 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
"fold" includes the character '\e' and each of the following characters that are = used to format text as if they were all visible.
Ex= ample:
echo "a\e[3mb\e[0mc" | fold -w1
Result:
a



b




c
It still = formats correctly, it just breaks too early.
Specia= l characters can even be broken up.
Example:
<= div class=3D"gmail_default" style=3D"font-size:small">echo "=E5=90=9B=E3=81=AE=E5=90=8D=E3=81=AF" | fo= ld -w1
Result:
=EF=BF= =BD
=EF=BF=BD
=EF=BF=BD
=EF=BF=BD
=EF=BF=BD
=EF=BF=BD
=EF= =BF=BD
=EF=BF=BD
=EF=BF=BD
=EF=BF=BD
=EF=BF=BD
=EF=BF=BD
In case this appears invisible or otherwise dif= ferent: It's 12 lines with one "tofu"/U+FFFD/replacement char= acter each.
I would maybe expect issues like this w= hen the "-b" option = is given, since that counts bytes, but it happens even without it.

While trying to work around th= is, I noticed a similar issue with "wc":
echo -e "\e[3ma\e[0m"= | wc -cm
=C2=A0 =C2=A0 =C2=A010 =C2=A0 =C2=A0 =C2=A010
I have not investigated this further so far.
--000000000000181fb805d2328285--