GNU bug report logs -
#20954
wc - linux
Previous Next
Reported by: tele <swojskichlopak <at> wp.pl>
Date: Thu, 2 Jul 2015 00:46:03 UTC
Severity: normal
Tags: notabug
Done: Bob Proulx <bob <at> proulx.com>
Bug is archived. No further changes may be made.
Full log
Message #15 received at 20954 <at> debbugs.gnu.org (full text, mbox):
2015-07-01 19:41:00 -0600, Bob Proulx:
[...]
> > $ a="" ; echo $s | wc -l
> > 1
[...]
> No. Should be 1. You have forgotten about the newline at the end of
> the command. The echo will terminate with a newline.
[...]
Leaving a variable unquoted will also cause the shell to apply
the split+glob operator on it. echo will also do some
transformations on the string (backslash and option processing).
To count the number of bytes in a variable, you can use:
printf %s "$var" | wc -c
Use "${#var}" or
printf %s "$var" | wc -m
for the number of characters.
GNU wc will not count the bytes that are not part of a valid
character, while GNU bash's ${#var} will count them as one
character:
In a UTF-8 locale:
$ var=$'\x80X\x80\u00e9'
$ printf %s "$var" | hd
00000000 80 58 80 c3 a9 |.X...|
00000005
$ echo "${#var}"
4
$ printf %s "$var" | wc -c
5
$ printf %s "$var" | wc -m
2
Above $var contains the 0x80 byte that doesn't form a valid
character, "X" (0x58), then another 0x80, then é (0xc3 0xa9).
wc -c counts the 5 bytes, wc -m counts X and é, while bash
${#var} counts those plus the 0x80s.
--
Stephane
This bug report was last modified 9 years and 327 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.