From unknown Wed Jun 18 23:11:37 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#47702 <47702@debbugs.gnu.org> To: bug#47702 <47702@debbugs.gnu.org> Subject: Status: wc man page: first you are talking about bytes, then you are talking about characters Reply-To: bug#47702 <47702@debbugs.gnu.org> Date: Thu, 19 Jun 2025 06:11:37 +0000 retitle 47702 wc man page: first you are talking about bytes, then you are = talking about characters reassign 47702 coreutils submitter 47702 =E7=A9=8D=E4=B8=B9=E5=B0=BC Dan Jacobson severity 47702 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 11 01:42:48 2021 Received: (at submit) by debbugs.gnu.org; 11 Apr 2021 05:42:48 +0000 Received: from localhost ([127.0.0.1]:53828 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lVSrv-0007W4-RM for submit@debbugs.gnu.org; Sun, 11 Apr 2021 01:42:48 -0400 Received: from lists.gnu.org ([209.51.188.17]:54604) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lVSrs-0007Vb-QF for submit@debbugs.gnu.org; Sun, 11 Apr 2021 01:42:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52976) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lVSrs-0007eU-Ee for bug-coreutils@gnu.org; Sun, 11 Apr 2021 01:42:44 -0400 Received: from beige.elm.relay.mailchannels.net ([23.83.212.16]:41207) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lVSrq-0001Tf-4z for bug-coreutils@gnu.org; Sun, 11 Apr 2021 01:42:43 -0400 X-Sender-Id: dreamhost|x-authsender|jidanni@jidanni.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 1BB0F1E2431 for ; Sun, 11 Apr 2021 05:42:39 +0000 (UTC) Received: from pdx1-sub0-mail-a74.g.dreamhost.com (100-96-27-157.trex.outbound.svc.cluster.local [100.96.27.157]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id BC1A91E23A9 for ; Sun, 11 Apr 2021 05:42:38 +0000 (UTC) X-Sender-Id: dreamhost|x-authsender|jidanni@jidanni.org Received: from pdx1-sub0-mail-a74.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.96.27.157 (trex/6.1.1); Sun, 11 Apr 2021 05:42:39 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|jidanni@jidanni.org X-MailChannels-Auth-Id: dreamhost X-Snatch-Belong: 6b1211f07c3027ff_1618119758983_376145773 X-MC-Loop-Signature: 1618119758983:4279912745 X-MC-Ingress-Time: 1618119758983 Received: from pdx1-sub0-mail-a74.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a74.g.dreamhost.com (Postfix) with ESMTP id 76A8684C66 for ; Sat, 10 Apr 2021 22:42:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=jidanni.org; h=from:to :subject:date:message-id:mime-version:content-type; s= jidanni.org; bh=BJ2bMfVuWMJl9MIwqqDV6p5qpWo=; b=ONCct0hhsb0jrsfh mOUuQ1/dlP5icZ3mgtYoA2uL853Muwtc/pn1bVdp4xVcod6wpn3BnjXxpw6Tcz8/ aHQjpCVjBXdXsiyO+3kEtCo60Va6yNgQKDTF9n/StqGkrjDv0g85a+hSwkXM8Xkx O5xIdxXkB1y+m2eEPpKYhI943B4= Received: from jidanni.org (114-41-20-171.dynamic-ip.hinet.net [114.41.20.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: jidanni@jidanni.org) by pdx1-sub0-mail-a74.g.dreamhost.com (Postfix) with ESMTPSA id 417627E780 for ; Sat, 10 Apr 2021 22:42:38 -0700 (PDT) X-DH-BACKEND: pdx1-sub0-mail-a74 From: =?utf-8?B?56mN5Li55bC8?= Dan Jacobson To: bug-coreutils@gnu.org Subject: wc man page: first you are talking about bytes, then you are talking about characters Date: Sun, 11 Apr 2021 09:42:57 +0800 Message-ID: <87fszx8ula.5.fsf@jidanni.org> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=23.83.212.16; envelope-from=jidanni@jidanni.org; helo=beige.elm.relay.mailchannels.net X-Spam_score_int: 8 X-Spam_score: 0.8 X-Spam_bar: / X-Spam_report: (0.8 / 5.0 requ) BAYES_00=-1.9, DATE_IN_PAST_03_06=1.592, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_BL_SPAMCOP_NET=1.347, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.9 (/) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.1 (/) Man wc says Print newline, word, and byte counts for each FILE, and a total line if more than one FILE is specified. A word is a non-zero-length sequence of characters delimited by white space. first you are talking about bytes, then you are talking about characters. So for the latter, please say characters (not bytes) or characters (same as bytes) or just bytes Yes, even if explained in the INFO file. Thanks. From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 11 11:50:45 2021 Received: (at 47702-done) by debbugs.gnu.org; 11 Apr 2021 15:50:45 +0000 Received: from localhost ([127.0.0.1]:55759 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lVcMH-0008I2-6c for submit@debbugs.gnu.org; Sun, 11 Apr 2021 11:50:45 -0400 Received: from mail-wr1-f51.google.com ([209.85.221.51]:41975) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lVcMF-0008Ho-N8 for 47702-done@debbugs.gnu.org; Sun, 11 Apr 2021 11:50:44 -0400 Received: by mail-wr1-f51.google.com with SMTP id a6so10377717wrw.8 for <47702-done@debbugs.gnu.org>; Sun, 11 Apr 2021 08:50:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=GP1e6YxaVD/gcArUyxStNX8l6H+TmhjS9+GhbfUjjxM=; b=uByxp7rNHBaZ+aBANgiqCoKtFCOKyLeuiDIFT7FcRQTPrwIWS9cb0lcXTBapjwHT3p yzlwTFp1NWEC0+b5Pde4CwHNP51anTHi7FwfdnunoL6t5wJDyM8Ov2bRgj6mds8Gepk/ zc0ELLLzfI1YT2scO6NN0ZAZW3h4KTW1rB04DpF3HzwzzJztwtSPEptT1ns4GmeF66YN c/Hw9JNf5poKu4GYCqC+RQM4FckkUxZ/nYDyanlhVZxNqLGV4ZuynENhRY0+UCFa/UZT o8i74KhZpSocgXcBmFTbFbtsxrR1sebFTxBOblCQox4jglVaKxCBClB+V38tdbaJnLIv 14VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language; bh=GP1e6YxaVD/gcArUyxStNX8l6H+TmhjS9+GhbfUjjxM=; b=d7n2Lrztu0eR87fSnCs7tn7jOKonmG8pERtd4QE0LLAIdt3sHXgM6yykpWlB9djIpA YmGnmtXXasYymZZbgsnnPoaM2h59dOXx88A+qo2CK6TohOHfhyGIMkkz+7SEM4mKxqFU w97KzwrnMWumeMNMXFBxSj445dcSsrU4jgn1B43JLtiS58luEK2Ld1iGmKQ5XBdNJ0QU VhClIbJ+jYM1h8zefMI0w51Vu3l7ym0ZXrkzATFgnSS+kwSD1QZ2nmJ0oqy+o8U/EPEK EbUx7Hx6aqHKY5YfRMKMN7NSiAwCYeFNtGY5mM1z6p0CAnAa+U8EvJx6VAbGXQjFnwO3 fG/w== X-Gm-Message-State: AOAM530rGBkxYXZtho9hC0mIEczfAChJrlSigFpG5MkeAOx4IJjvNuk1 WOuPYsi3V8RGxI5fqAKQd8iXN54o3hA= X-Google-Smtp-Source: ABdhPJyrogVNfoj/zYJvp2IOLQJazaXY5eJrL5YjTmXz6NFq0h2tUcThk0IVaxepqebS0XkLHSn2ZQ== X-Received: by 2002:adf:f944:: with SMTP id q4mr23734231wrr.281.1618156237486; Sun, 11 Apr 2021 08:50:37 -0700 (PDT) Received: from localhost.localdomain (86-42-14-227-dynamic.agg2.lod.rsl-rtd.eircom.net. [86.42.14.227]) by smtp.googlemail.com with UTF8SMTPSA id c2sm11601487wmr.22.2021.04.11.08.50.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 11 Apr 2021 08:50:36 -0700 (PDT) Subject: Re: bug#47702: wc man page: first you are talking about bytes, then you are talking about characters To: =?UTF-8?B?56mN5Li55bC8IERhbiBKYWNvYnNvbg==?= , 47702-done@debbugs.gnu.org References: <87fszx8ula.5.fsf@jidanni.org> From: =?UTF-8?Q?P=c3=a1draig_Brady?= Message-ID: Date: Sun, 11 Apr 2021 16:50:35 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0 MIME-Version: 1.0 In-Reply-To: <87fszx8ula.5.fsf@jidanni.org> Content-Type: multipart/mixed; boundary="------------D6047575BFEC9DE941F6C56A" Content-Language: en-US X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 47702-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) This is a multi-part message in MIME format. --------------D6047575BFEC9DE941F6C56A Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote: > Man wc says > > Print newline, word, and byte counts for each FILE, and a total line if > more than one FILE is specified. A word is a non-zero-length sequence > of characters delimited by white space. > > first you are talking about bytes, then you are talking about > characters. > > So for the latter, please say > characters (not bytes) > or > characters (same as bytes) > or just > bytes > Yes, even if explained in the INFO file. You're right that this is under-specified, in both the man page and the info file. The above is really characters (not bytes). In fact as a GNU extension it's printable characters. POSIX does not specify this, but one can confirm like: $ printf '\xc3 \xc3' | LC_ALL=C wc --word --character --byte 0 3 3 $ printf '\xc3 \xc3' | LC_ALL=C.utf8 wc --word --character --byte 0 1 3 The info file was really quite under-specified in this regard. I'll apply the attached to clarify things. Marking this as done. thanks! Pádraig --------------D6047575BFEC9DE941F6C56A Content-Type: text/x-patch; charset=UTF-8; name="wc-clarify-counts.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="wc-clarify-counts.patch" >From c985544e68d1a1c9d231d2f2db03126f9af51ad6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Sun, 11 Apr 2021 16:24:07 +0100 Subject: [PATCH] doc: clarify what's counted by wc * src/wc.c (usage): State that only printable characters are considered when counting words. This also disambiguates wether we're talking about bytes or characters in this context. * doc/coreutils.texi (wc invocation): Likewise. Also clarify that --characters counts valid locale aware characters, and that --lines does not count a trailing "line" unless it ends with a newline character. Fixes https://bugs.gnu.org/47702 --- doc/coreutils.texi | 17 +++++++++++------ src/wc.c | 2 +- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/doc/coreutils.texi b/doc/coreutils.texi index e53c0de6e..cd10b0d4d 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -3754,9 +3754,10 @@ contents of files. @cindex word count @cindex line count -@command{wc} counts the number of bytes, characters, whitespace-separated -words, and newlines in each given @var{file}, or standard input if none -are given or for a @var{file} of @samp{-}. Synopsis: +@command{wc} counts the number of bytes, characters, words, and newlines +in each given @var{file}, or standard input if none are given +or for a @var{file} of @samp{-}. A word is a nonzero length +sequence of printable characters delimited by white space. Synopsis: @example wc [@var{option}]@dots{} [@var{file}]@dots{} @@ -3807,19 +3808,23 @@ Print only the byte counts. @itemx --chars @opindex -m @opindex --chars -Print only the character counts. +Print only the character counts, as per the current locale. +Invalid characters are not counted. @item -w @itemx --words @opindex -w @opindex --words -Print only the word counts. +Print only the word counts. A word is a nonzero length +sequence of printable characters separated by white space. @item -l @itemx --lines @opindex -l @opindex --lines -Print only the newline counts. +Print only the newline character counts. +Note a file without a trailing newline character, +will not have that last portion included in the line count. @item -L @itemx --max-line-length diff --git a/src/wc.c b/src/wc.c index 5216db189..263ba30e8 100644 --- a/src/wc.c +++ b/src/wc.c @@ -123,7 +123,7 @@ Usage: %s [OPTION]... [FILE]...\n\ fputs (_("\ Print newline, word, and byte counts for each FILE, and a total line if\n\ more than one FILE is specified. A word is a non-zero-length sequence of\n\ -characters delimited by white space.\n\ +printable characters delimited by white space.\n\ "), stdout); emit_stdin_note (); -- 2.26.2 --------------D6047575BFEC9DE941F6C56A-- From unknown Wed Jun 18 23:11:37 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 10 May 2021 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator