GNU bug report logs - #47702
wc man page: first you are talking about bytes, then you are talking about characters

Previous Next

Package: coreutils;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Sun, 11 Apr 2021 05:43:03 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #10 received at 47702-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>,
 47702-done <at> debbugs.gnu.org
Subject: Re: bug#47702: wc man page: first you are talking about bytes, then
 you are talking about characters
Date: Sun, 11 Apr 2021 16:50:35 +0100
[Message part 1 (text/plain, inline)]
On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote:
> Man wc says
> 
>         Print newline, word, and byte counts for each FILE, and a total line if
>         more than one FILE is specified.  A word is a non-zero-length  sequence
>         of characters delimited by white space.
> 
> first you are talking about bytes, then you are talking about
> characters.
> 
> So for the latter, please say
> characters (not bytes)
> or
> characters (same as bytes)
> or just
> bytes
> Yes, even if explained in the INFO file.

You're right that this is under-specified,
in both the man page and the info file.
The above is really characters (not bytes).
In fact as a GNU extension it's printable characters.
POSIX does not specify this, but one can confirm like:


$ printf '\xc3 \xc3' | LC_ALL=C wc --word --character --byte
      0       3       3
$ printf '\xc3 \xc3' | LC_ALL=C.utf8 wc --word --character --byte
      0       1       3

The info file was really quite under-specified in this regard.
I'll apply the attached to clarify things.
Marking this as done.

thanks!
Pádraig
[wc-clarify-counts.patch (text/x-patch, attachment)]

This bug report was last modified 4 years and 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.