From unknown Mon Jun 23 22:03:37 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#49925 <49925@debbugs.gnu.org> To: bug#49925 <49925@debbugs.gnu.org> Subject: Status: cat -E interprets sentinel newline at the end of buffer as an actual newline after a \r Reply-To: bug#49925 <49925@debbugs.gnu.org> Date: Tue, 24 Jun 2025 05:03:37 +0000 retitle 49925 cat -E interprets sentinel newline at the end of buffer as an= actual newline after a \r reassign 49925 coreutils submitter 49925 Michael Debertol severity 49925 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 07 09:07:44 2021 Received: (at submit) by debbugs.gnu.org; 7 Aug 2021 13:07:44 +0000 Received: from localhost ([127.0.0.1]:52161 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mCM3E-0004fC-0D for submit@debbugs.gnu.org; Sat, 07 Aug 2021 09:07:44 -0400 Received: from lists.gnu.org ([209.51.188.17]:49364) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mCM39-0004f3-OC for submit@debbugs.gnu.org; Sat, 07 Aug 2021 09:07:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55586) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mCM39-0003Sa-BI for bug-coreutils@gnu.org; Sat, 07 Aug 2021 09:07:39 -0400 Received: from mail-wm1-x331.google.com ([2a00:1450:4864:20::331]:50715) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mCM37-0008Dr-7Z for bug-coreutils@gnu.org; Sat, 07 Aug 2021 09:07:39 -0400 Received: by mail-wm1-x331.google.com with SMTP id o11so1208718wms.0 for ; Sat, 07 Aug 2021 06:07:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:date:mime-version:user-agent:to:from:subject :content-transfer-encoding; bh=xwmj564gWgYS6YI7CYd8D6BKFHchE6azwOsYm1wd5e4=; b=OdvKv33wxDMqoLtCP1a/YA+4R0HNYyOaiTP/578XfsW0azfUDsT4kSLq+31lgTds8C eDx31jiDqMxArdbZMbtQo5XXLxOw1a9GmJI1+UeXi14yH/uPcbRQAa3d7Opqcuz0pd23 BixNoghev62XvJG4DBCdtiWatlNQbYyiuctizCOsDlRdSrRPYLKHcNwh55FJV2QwDtQ0 jBFoMSy8t1vX/CfyqlJ+D0Qf9YM/sEgY7MwCSanzggVyZtBbDg2pSgS75H4WmwxUY7EF Pki18BW4MaIBeLeGDcvN/m+zNx8yAXTBEJc9oBkrNkJAEb9gNu5oEbGtYLMJA1E4hUVe NA4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:date:mime-version:user-agent:to:from :subject:content-transfer-encoding; bh=xwmj564gWgYS6YI7CYd8D6BKFHchE6azwOsYm1wd5e4=; b=QVWWuwnXjkfXoovyJCrb95WIThGOev3HHFs3CD2nSIpUzXRF+lewNu1fU3wnSI47T9 r+5Kpn+kkSy4pnvtmTgkGfGW/JMiK4/WZdBeMX6/nQAvHiNp4CSrNc5AR9u6UgsKkbAf ajJKT3Km5EhP2LXt44hCCi1D7wvxGR+y2Edj/v7Dd0hyoGDetGWwVHAT3ayMOzHnoDOK 7jxnVW+1e1BSn7uZdyIFJXuBTfp/kpn4lVlsVSA+DfIehTqfSg1tLnUpPRJe6Mfh1s1K LDaSzkNl5Y9P119asbmhrIQ+8e+jdy2Re5CcU/4niX8wTJV4mNgPz/R/GJSZUNF2+zin +wag== X-Gm-Message-State: AOAM530ry30EB0EapRU8JYeDpyCnH7hylvyX1CWGUesyjMLRUmMgAlvY QEqMNLbFE+EQMtK1uilocR7LwwE9CxQ= X-Google-Smtp-Source: ABdhPJwU2R6kFKxrYacderTkaIXoDyCfY/6HgE5S8Vh93PRv/FdRLUO/fCXfFpFaqYVNalzMANCr0g== X-Received: by 2002:a7b:c4d6:: with SMTP id g22mr3446673wmk.172.1628341654149; Sat, 07 Aug 2021 06:07:34 -0700 (PDT) Received: from [192.168.143.178] ([151.36.126.55]) by smtp.gmail.com with UTF8SMTPSA id w5sm15003694wro.45.2021.08.07.06.07.33 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 07 Aug 2021 06:07:33 -0700 (PDT) Message-ID: Date: Sat, 7 Aug 2021 15:07:32 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:92.0) Gecko/20100101 Thunderbird/92.0a1 To: bug-coreutils@gnu.org From: Michael Debertol Subject: cat -E interprets sentinel newline at the end of buffer as an actual newline after a \r Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Received-SPF: pass client-ip=2a00:1450:4864:20::331; envelope-from=michael.debertol@gmail.com; helo=mail-wm1-x331.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Hi, after https://lists.gnu.org/archive/html/coreutils/2021-02/msg00003.html (unreleased), the behavior of cat -E was changed so that it prints "^M$" for "\r\n" line endings. Whenever it sees a \r "cat -E" checks if the byte after is a \n, however that \n might be the sentinel value that is inserted at the end of a buffer. This is a problem in two cases: - When a \r is at the end of the input. `printf "\r" | cat -E` will print "^M", even though there is no "\n" after the "\r". FWIW, tests/misc/cat-E.sh expects a "^M" for a trailing "\r", but I think that's wrong. - When the file is too big to fit into one buffer. If you try to "cat -E" a big file (mutliple megabytes) that consists of only "\r", cat will print a few "^M" whenever it hits the end of a buffer in the middle of the file and at the end. Michael From debbugs-submit-bounces@debbugs.gnu.org Sat Aug 07 14:29:17 2021 Received: (at 49925-done) by debbugs.gnu.org; 7 Aug 2021 18:29:17 +0000 Received: from localhost ([127.0.0.1]:53136 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mCR4P-0004fT-3J for submit@debbugs.gnu.org; Sat, 07 Aug 2021 14:29:17 -0400 Received: from mail-wr1-f54.google.com ([209.85.221.54]:33356) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mCR4M-0004fG-Lj for 49925-done@debbugs.gnu.org; Sat, 07 Aug 2021 14:29:15 -0400 Received: by mail-wr1-f54.google.com with SMTP id k4so2454299wrc.0 for <49925-done@debbugs.gnu.org>; Sat, 07 Aug 2021 11:29:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=JljAiwDWokY3lv5TSv78TuQQ1yHVjsB7lqwhTb1JS3M=; b=IXWwfyiY6GuYumgCTLAixtb9UzlT4MlweSgRDB9rdtQOKZZjrH4WxXLuTIbEADuXTs +J2Mft4KLvfZyWnRFMvNjBSQRHzMAhcaf7oQQcltPDaPPIqbPaIgZxutvAgfTHyDdaVb M9X9hcG8EZlqrjC3soJsK4Dyl86ZUeJUdrHxlonQteDsPeaInOBxg/CYY6NWOhbpUzD1 uidm6aYgKdYf4DYHV+zMhOfoFNTeCQyGebbfUUJrbzNv4/OumyxI3pPT666socr7hwpV kjnc5E4d5PGupgioptKOD1H39mDRA7xAIcvt4IcsvK2Q6I6HgbWMruTaJ7Q7YxZ2+xpa uR0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language; bh=JljAiwDWokY3lv5TSv78TuQQ1yHVjsB7lqwhTb1JS3M=; b=F1zq2ROgqYtZgjASLDKJGweGb2F9NI4DUEnvtnyDxYWh1DBwhtBLbV+83tBUEfs24M bfa0Nd/EIrmzE1FVI3RPTKd/XTRY5zxc3QiDk+lp8OTRq/cRqsclSz5YEp9sOlTtKwQP 4lw9jnI/iHo1eZoybC78PA/glDDBDnTGLBbC6ugV2h8z7utEWHzx0l8doXFM3iwB4Id2 os27SKaZ2k2kIx4SItvWemlhm4f+IQXIedUXWoCgUjWiKVnaXuajob9F4bbQgHhkQXoC m+LLsSUFVObXHQPQhC8X5F8vPmA+KCSKU7nU+R9OCH4Lhcwti8eWYYiW/HjWRBgjET/n g99w== X-Gm-Message-State: AOAM5308dGGWFZv5JWheJ4xDm5Qw4A0XgM9dhsshi6gfnoyDW8tnC8lj sV6Nz+ag1pjH9q7rP5Usj9jhsqHAvh/avg== X-Google-Smtp-Source: ABdhPJxt4YtfPU0NC/MN5H0awhhwjLH4Ldqx/NVd8UsGTabDlUsVFeqSXTr/iffOzu0s+EybZc0mCg== X-Received: by 2002:a5d:638b:: with SMTP id p11mr1797887wru.257.1628360948574; Sat, 07 Aug 2021 11:29:08 -0700 (PDT) Received: from localhost.localdomain (86-42-15-3-dynamic.agg2.lod.rsl-rtd.eircom.net. [86.42.15.3]) by smtp.googlemail.com with UTF8SMTPSA id r8sm2829998wrj.11.2021.08.07.11.29.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 07 Aug 2021 11:29:07 -0700 (PDT) Subject: Re: bug#49925: cat -E interprets sentinel newline at the end of buffer as an actual newline after a \r To: Michael Debertol , 49925-done@debbugs.gnu.org References: From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: <14378fe4-0b51-fa6c-b060-2ad5bc5d719f@draigBrady.com> Date: Sat, 7 Aug 2021 19:29:06 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------2DEDD20E03EF3A00381D5060" Content-Language: en-US X-Spam-Score: 0.4 (/) X-Debbugs-Envelope-To: 49925-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.6 (/) This is a multi-part message in MIME format. --------------2DEDD20E03EF3A00381D5060 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 07/08/2021 14:07, Michael Debertol wrote: > Hi, > > after https://lists.gnu.org/archive/html/coreutils/2021-02/msg00003.html > (unreleased), the behavior of cat -E was changed so that it prints "^M$" > for "\r\n" line endings. > > Whenever it sees a \r "cat -E" checks if the byte after is a \n, however > that \n might be the sentinel value that is inserted at the end of a buffer. > > This is a problem in two cases: > > - When a \r is at the end of the input. `printf "\r" | cat -E` will > print "^M", even though there is no "\n" after the "\r". FWIW, > tests/misc/cat-E.sh expects a "^M" for a trailing "\r", but I think > that's wrong. This was intentional (as per the test) as I was thinking we can provide more info here in the edge case that \r is the last char of a file. However it's incorrect as you suggest, as cat can't treat files independently. > - When the file is too big to fit into one buffer. If you try to "cat > -E" a big file (mutliple megabytes) that consists of only "\r", cat will > print a few "^M" whenever it hits the end of a buffer in the middle of > the file and at the end. That indeed is a bug. So we need to track handling of \r across buffer and file boundaries. The attached does that, and I'll apply later. marking this as done, thanks! Pádraig --------------2DEDD20E03EF3A00381D5060 Content-Type: text/x-patch; charset=UTF-8; name="cat-E-trailing-CR.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cat-E-trailing-CR.patch" >From 2952343144654eaac158ef65292d885a61759e75 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Sat, 7 Aug 2021 18:47:57 +0100 Subject: [PATCH] cat: with -E fix handling of \r\n spanning buffers We must delay handling when \r is the last character of the buffer being processed, as the next character may or may not be \n. * src/cat.c (pending_cr): A new global to record whether the last character processed (in -E mode) is '\r'. (cat): Honor pending_cr when processing the start of the buffer. (main): Honor pending_cr if no more files to process. * tests/misc/cat-E.sh: Add test cases. Fixes https://bugs.gnu.org/49925 --- src/cat.c | 39 +++++++++++++++++++++++++++++++++------ tests/misc/cat-E.sh | 16 ++++++++++++++-- 2 files changed, 47 insertions(+), 8 deletions(-) diff --git a/src/cat.c b/src/cat.c index 28f6e8e4e..17bc4fab9 100644 --- a/src/cat.c +++ b/src/cat.c @@ -78,6 +78,9 @@ static char *line_num_end = line_buf + LINE_COUNTER_BUF_LEN - 3; /* Preserves the 'cat' function's local 'newlines' between invocations. */ static int newlines2 = 0; +/* Whether there is a pending CR to process. */ +static bool pending_cr = false; + void usage (int status) { @@ -397,9 +400,16 @@ cat ( } /* Output a currency symbol if requested (-e). */ - if (show_ends) - *bpout++ = '$'; + { + if (pending_cr) + { + *bpout++ = '^'; + *bpout++ = 'M'; + pending_cr = false; + } + *bpout++ = '$'; + } /* Output the newline. */ @@ -409,6 +419,14 @@ cat ( } while (ch == '\n'); + /* Here CH cannot contain a newline character. */ + + if (pending_cr) + { + *bpout++ = '\r'; + pending_cr = false; + } + /* Are we at the beginning of a line, and line numbers are requested? */ if (newlines >= 0 && number) @@ -417,8 +435,6 @@ cat ( bpout = stpcpy (bpout, line_num_print); } - /* Here CH cannot contain a newline character. */ - /* The loops below continue until a newline character is found, which means that the buffer is empty or that a proper newline has been found. */ @@ -489,8 +505,13 @@ cat ( { if (ch == '\r' && *bpin == '\n' && show_ends) { - *bpout++ = '^'; - *bpout++ = 'M'; + if (bpin == eob) + pending_cr = true; + else + { + *bpout++ = '^'; + *bpout++ = 'M'; + } } else *bpout++ = ch; @@ -768,6 +789,12 @@ main (int argc, char **argv) } while (++argind < argc); + if (pending_cr) + { + if (full_write (STDOUT_FILENO, "\r", 1) != 1) + die (EXIT_FAILURE, errno, _("write error")); + } + if (have_read_stdin && close (STDIN_FILENO) < 0) die (EXIT_FAILURE, errno, _("closing standard input")); diff --git a/tests/misc/cat-E.sh b/tests/misc/cat-E.sh index 401b6d591..1131eb3a5 100755 --- a/tests/misc/cat-E.sh +++ b/tests/misc/cat-E.sh @@ -21,10 +21,22 @@ print_ver_ cat # Up to and including 8.32 the $ would have displayed at the start of the line # overwriting the first character printf 'a\rb\r\nc\n\r\nd\r' > 'in' || framework_failure_ -printf 'a\rb^M$\nc$\n^M$\nd^M' > 'exp' || framework_failure_ - +printf 'a\rb^M$\nc$\n^M$\nd\r' > 'exp' || framework_failure_ cat -E 'in' > out || fail=1 +compare exp out || fail=1 + +# Ensure \r\n spanning files (or buffers) is handled +printf '1\r' > in2 || framework_failure_ +printf '\n2\r\n' > in2b || framework_failure_ +printf '1^M$\n2^M$\n' > 'exp' || framework_failure_ +cat -E 'in2' 'in2b' > out || fail=1 +compare exp out || fail=1 +# Ensure \r at end of buffer is handled +printf '1\r' > in2 || framework_failure_ +printf '2\r\n' > in2b || framework_failure_ +printf '1\r2^M$\n' > 'exp' || framework_failure_ +cat -E 'in2' 'in2b' > out || fail=1 compare exp out || fail=1 Exit $fail -- 2.26.2 --------------2DEDD20E03EF3A00381D5060-- From unknown Mon Jun 23 22:03:37 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 05 Sep 2021 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator