GNU bug report logs - #20954
wc - linux

Previous Next

Package: coreutils;

Reported by: tele <swojskichlopak <at> wp.pl>

Date: Thu, 2 Jul 2015 00:46:03 UTC

Severity: normal

Tags: notabug

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: 20954 <at> debbugs.gnu.org, tele <swojskichlopak <at> wp.pl>
Subject: bug#20954: wc - linux
Date: Thu, 2 Jul 2015 14:23:30 +0100
2015-07-01 19:41:00 -0600, Bob Proulx:
[...]
> > $ a="" ; echo $s | wc -l
> > 1
[...]
> No.  Should be 1.  You have forgotten about the newline at the end of
> the command.  The echo will terminate with a newline.
[...]

Leaving a variable unquoted will also cause the shell to apply
the split+glob operator on it. echo will also do some
transformations on the string (backslash and option processing).

To count the number of bytes in a variable, you can use:

printf %s "$var" | wc -c

Use "${#var}" or 

printf %s "$var" | wc -m

for the number of characters.

GNU wc will not count the bytes that are not part of a valid
character, while GNU bash's ${#var} will count them as one
character:

In a UTF-8 locale:

$ var=$'\x80X\x80\u00e9'
$ printf %s "$var" | hd
00000000  80 58 80 c3 a9                                    |.X...|
00000005
$ echo "${#var}"
4
$ printf %s "$var" | wc -c
5
$ printf %s "$var" | wc -m
2

Above $var contains the 0x80 byte that doesn't form a valid
character, "X" (0x58), then another 0x80, then é (0xc3 0xa9).

wc -c counts the 5 bytes, wc -m counts X and é, while bash
${#var} counts those plus the 0x80s.

-- 
Stephane




This bug report was last modified 9 years and 327 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.