GNU bug report logs - #47702
wc man page: first you are talking about bytes, then you are talking about characters

Previous Next

Package: coreutils;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Sun, 11 Apr 2021 05:43:03 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
Subject: bug#47702: closed (Re: bug#47702: wc man page: first you are
 talking about bytes, then you are talking about characters)
Date: Sun, 11 Apr 2021 15:51:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#47702: wc man page: first you are talking about bytes, then you are talking about characters

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 47702 <at> debbugs.gnu.org.

-- 
47702: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=47702
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Pádraig Brady <P <at> draigBrady.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>,
 47702-done <at> debbugs.gnu.org
Subject: Re: bug#47702: wc man page: first you are talking about bytes, then
 you are talking about characters
Date: Sun, 11 Apr 2021 16:50:35 +0100
[Message part 3 (text/plain, inline)]
On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote:
> Man wc says
> 
>         Print newline, word, and byte counts for each FILE, and a total line if
>         more than one FILE is specified.  A word is a non-zero-length  sequence
>         of characters delimited by white space.
> 
> first you are talking about bytes, then you are talking about
> characters.
> 
> So for the latter, please say
> characters (not bytes)
> or
> characters (same as bytes)
> or just
> bytes
> Yes, even if explained in the INFO file.

You're right that this is under-specified,
in both the man page and the info file.
The above is really characters (not bytes).
In fact as a GNU extension it's printable characters.
POSIX does not specify this, but one can confirm like:


$ printf '\xc3 \xc3' | LC_ALL=C wc --word --character --byte
      0       3       3
$ printf '\xc3 \xc3' | LC_ALL=C.utf8 wc --word --character --byte
      0       1       3

The info file was really quite under-specified in this regard.
I'll apply the attached to clarify things.
Marking this as done.

thanks!
Pádraig
[wc-clarify-counts.patch (text/x-patch, attachment)]
[Message part 5 (message/rfc822, inline)]
From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-coreutils <at> gnu.org
Subject: wc man page: first you are talking about bytes, then you are
 talking about characters
Date: Sun, 11 Apr 2021 09:42:57 +0800
Man wc says

       Print newline, word, and byte counts for each FILE, and a total line if
       more than one FILE is specified.  A word is a non-zero-length  sequence
       of characters delimited by white space.

first you are talking about bytes, then you are talking about
characters.

So for the latter, please say
characters (not bytes)
or
characters (same as bytes)
or just
bytes
Yes, even if explained in the INFO file.
Thanks.



This bug report was last modified 4 years and 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.