GNU bug report logs - #20954
wc - linux

Previous Next

Package: coreutils;

Reported by: tele <swojskichlopak <at> wp.pl>

Date: Thu, 2 Jul 2015 00:46:03 UTC

Severity: normal

Tags: notabug

Done: Bob Proulx <bob <at> proulx.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20954 in the body.
You can then email your comments to 20954 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#20954; Package coreutils. (Thu, 02 Jul 2015 00:46:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to tele <swojskichlopak <at> wp.pl>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 02 Jul 2015 00:46:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: tele <swojskichlopak <at> wp.pl>
To: bug-coreutils <at> gnu.org
Subject: wc - linux
Date: Thu, 2 Jul 2015 00:34:54 +0200
Hi!

From terminal:


$ a="" ; echo $s | wc -l
1

Should be 0 , yes ?




Information forwarded to bug-coreutils <at> gnu.org:
bug#20954; Package coreutils. (Thu, 02 Jul 2015 01:42:02 GMT) Full text and rfc822 format available.

Message #8 received at 20954 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: tele <swojskichlopak <at> wp.pl>
Cc: 20954 <at> debbugs.gnu.org
Subject: Re: bug#20954: wc - linux
Date: Wed, 1 Jul 2015 19:41:00 -0600
tag 20954 + notabug
close 20954
thanks

tele wrote:
> Hi!

Hi! :-)

> From terminal:
> 
> $ a="" ; echo $s | wc -l
> 1

Do you mean $a instead of $s?  Either way is the same though assuming
$s is empty too.

> Should be 0 , yes ?

No.  Should be 1.  You have forgotten about the newline at the end of
the command.  The echo will terminate with a newline.  You can see
this with od.

  echo | od -tx1 -c
  0000000  0a
           \n

Since this appears to be a usage error I have closed the bug.  Please
feel free to follow up with more information.  We will read it.  And
we appreciate additional communication!  I am simply closing it to
keep the accounting straight.  :-)

Bob




Added tag(s) notabug. Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Thu, 02 Jul 2015 01:42:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 20954 <at> debbugs.gnu.org and tele <swojskichlopak <at> wp.pl> Request was from Bob Proulx <bob <at> proulx.com> to control <at> debbugs.gnu.org. (Thu, 02 Jul 2015 01:42:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#20954; Package coreutils. (Thu, 02 Jul 2015 13:24:02 GMT) Full text and rfc822 format available.

Message #15 received at 20954 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane.chazelas <at> gmail.com>
To: Bob Proulx <bob <at> proulx.com>
Cc: 20954 <at> debbugs.gnu.org, tele <swojskichlopak <at> wp.pl>
Subject: Re: bug#20954: wc - linux
Date: Thu, 2 Jul 2015 14:23:30 +0100
2015-07-01 19:41:00 -0600, Bob Proulx:
[...]
> > $ a="" ; echo $s | wc -l
> > 1
[...]
> No.  Should be 1.  You have forgotten about the newline at the end of
> the command.  The echo will terminate with a newline.
[...]

Leaving a variable unquoted will also cause the shell to apply
the split+glob operator on it. echo will also do some
transformations on the string (backslash and option processing).

To count the number of bytes in a variable, you can use:

printf %s "$var" | wc -c

Use "${#var}" or 

printf %s "$var" | wc -m

for the number of characters.

GNU wc will not count the bytes that are not part of a valid
character, while GNU bash's ${#var} will count them as one
character:

In a UTF-8 locale:

$ var=$'\x80X\x80\u00e9'
$ printf %s "$var" | hd
00000000  80 58 80 c3 a9                                    |.X...|
00000005
$ echo "${#var}"
4
$ printf %s "$var" | wc -c
5
$ printf %s "$var" | wc -m
2

Above $var contains the 0x80 byte that doesn't form a valid
character, "X" (0x58), then another 0x80, then é (0xc3 0xa9).

wc -c counts the 5 bytes, wc -m counts X and é, while bash
${#var} counts those plus the 0x80s.

-- 
Stephane




Information forwarded to bug-coreutils <at> gnu.org:
bug#20954; Package coreutils. (Thu, 02 Jul 2015 15:30:19 GMT) Full text and rfc822 format available.

Message #18 received at submit <at> debbugs.gnu.org (full text, mbox):

From: tele <swojskichlopak <at> wp.pl>
To: bug-coreutils <at> gnu.org
Subject: Re: wc - linux
Date: Thu, 2 Jul 2015 12:19:59 +0200
[Message part 1 (text/plain, inline)]

tag 20954 + notabug
close 20954
thanks

tele wrote:

> Hi!

Hi!

>  From terminal:
>
> $ a="" ; echo $s | wc -l
> 1

Do you mean $a instead of $s?  Either way is the same though assuming
$s is empty too.

- Yes, my mistake :-)

> Should be 0 , yes ?

No.  Should be 1.  You have forgotten about the newline at the end of
the command.  The echo will terminate with a newline.  You can see
this with od.

  echo | od -tx1 -c
  0000000  0a
           \n

Since this appears to be a usage error I have closed the bug.  Please
feel free to follow up with more information.  We will read it.  And
we appreciate additional communication!  I am simply closing it to
keep the accounting straight.

Bob

# "echo" gives in new line, "echo -n" subtracts 1 line, but "wc -l" can count only from new line,
# so if something exist inside first line "wc -l" can not count. :-(
# example:
#
#	$ a="j" ; echo  "$a"  |  wc -l
#	1
#
#	$ a="" ; echo  "$a"  |  wc -l
#	1
#
#	$ a="" ; echo -n "$a"  |  wc -l
#	0
#
#	$ a="j" ; echo -n "$a"  |  wc -l
#	0

So,

$ a="" ; echo  "$a"  |  sed '/^\s*$/d' | wc -l
0

$ a="3" ; echo  "$a"  |  sed '/^\s*$/d' | wc -l
1

Can be added option to "wc" to fix this problem without use sed in future ?
Thanks for helping :-)
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#20954; Package coreutils. (Thu, 02 Jul 2015 23:34:02 GMT) Full text and rfc822 format available.

Message #21 received at 20954 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: tele <swojskichlopak <at> wp.pl>
Cc: 20954 <at> debbugs.gnu.org
Subject: Re: bug#20954: wc - linux
Date: Thu, 2 Jul 2015 17:33:23 -0600
tele wrote:
> "echo" gives in new line,

Yes.

> "echo -n" subtracts 1 line,

echo -n is non-portable and shouldn't be used.

echo -n suppresses emitting a trailing newline.

Note that in both of these cases you are using the shell's internal
builtin echo and not the coreutils echo.  They behave the same.

> but "wc -l" can count only from new line, so if something exist
> inside first line "wc -l" can not count. :-(

"wc -l" counts newlines.  That is the task that it was constructed to
do.  That is exactly what it does.  No more and no less.

What is a text line?  A text line by definition ends with a newline.
This has been standardized to prevent different implementations from
implementing it differently and creating portability problems.
Therefore all standards compliant implementations must implement it in
the same way to prevent portability problems.

> example:
>
>	$ a="j" ; echo  "$a"  |  wc -l
>	1

I have been wondering.  Why are you using a variable here?  Using the
variable as you are doing is no different than not using the variable.

  echo "j" | od -tx1 -c
  0000000  6a  0a
            j  \n

There is one newline.  That counts as one text line.

>	$ a="" ; echo  "$a"  |  wc -l
>	1

  echo "" | od -tx1 -c
  0000000  0a
           \n

There is one newline.  That counts as one text line.

>	$ a="" ; echo -n "$a"  |  wc -l
>	0

  echo -n "" | od -tx1 -c
  0000000

Nothing was emitted.  No newlines.  Counts as zero lines.  But nothing
was emitted.  Zero characters.

  od -tx1 -c < /dev/null
  0000000

>	$ a="j" ; echo -n "$a"  |  wc -l
>	0

  echo -n "j" | od -tx1 -c
  0000000  6a
            j

That emits one character, the 'j' character.  It emits no newlines.
Without any newlines at all that is not and cannot be a "text" line.
Without a newline that can only be interpreted as binary data.  In any
case there were no newlines to count and "wc -l" counted and reported
zero newlines.

Instead of echo -n it would be better and portable to use printf
instead.

  printf "j" | od -tx1 -c
  0000000  6a
            j

Same action in a portable way using printf.  Avoid using echo with
options.

> So,
> 
> $ a="" ; echo  "$a"  |  sed '/^\s*$/d' | wc -l
> 0

  echo "" | sed '/^\s*$/d' | od -tx1 -c
  0000000

As we previosuly see the echo action will emit one newline character.
This is piped to the sed program which will delete that line.
Deleting the line is what the sed 'd' action does.  Therefore sed does
not emit the newline.  The text line is deleted.

> $ a="3" ; echo  "$a"  |  sed '/^\s*$/d' | wc -l
> 1

  echo "3" | sed '/^\s*$/d' | od -tx1 -c
  0000000  33  0a
            3  \n

Here the echo emitted two character a '3' and a newline.  The sed
prgram did not match and therefore did not delete the line.  Since it
did not delete the line it passed the one text line to wc and "wc -l"
counted the one newline and reported one text line.

> Can be added option to "wc" to fix this problem without use sed in future ?
> Thanks for helping :-)

There is no problem to be fixed.  And therefore this isn't something
that can be "fixed" in wc.

Bob




Information forwarded to bug-coreutils <at> gnu.org:
bug#20954; Package coreutils. (Fri, 03 Jul 2015 14:19:02 GMT) Full text and rfc822 format available.

Message #24 received at submit <at> debbugs.gnu.org (full text, mbox):

From: tele <swojskichlopak <at> wp.pl>
To: bug-coreutils <at> gnu.org
Subject: Re: wc - linux
Date: Fri, 3 Jul 2015 10:49:10 +0200
[Message part 1 (text/plain, inline)]
tag 20954 + notabug
close 20954
thanks


Maybe we did not understand.
I don't want change old definitions but create new option for wc or echo,
because this above examples not make logic sense,
( and it I want fix, however with sed is also fixed )
 however now Iunderstand that they work correctly in accordance with 
accepted principles.

#  What is a text line? A text line by definition ends with a newline. 
This has been standardized to prevent
#  different implementations from implementing it differently and 
creating portability problems. Therefore
#  all standards compliant implementations must implement it in the same 
way to prevent portability
#  problems.

" wc -l " in most examples working correct,
because it " echo "  give's " \n " and "wc -l" count correct.
I mentioned about "wc", because for me build option "wc -a" for "echo"  
or "echo -m"
this is not important.
Maybe exist hope for example create option "m" to echo  , " echo -m "
which not will from new line, but first line if variable is empty
and from new line if is not empty  ?

example:

echo -m "" | wc -l
0

echo -m "e" | wc -l
1
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#20954; Package coreutils. (Mon, 06 Jul 2015 02:41:02 GMT) Full text and rfc822 format available.

Message #27 received at 20954 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: tele <swojskichlopak <at> wp.pl>
Cc: 20954 <at> debbugs.gnu.org
Subject: Re: bug#20954: wc - linux
Date: Sun, 5 Jul 2015 20:40:26 -0600
tele wrote:
> Maybe we did not understand.
> I don't want change old definitions but create new option for wc or echo,
> because this above examples not make logic sense,

What would such an option do?

> ( and it I want fix, however with sed is also fixed )

Your original message asked if "echo | wc -l" should count 0 lines
instead of 1 line.  But the echo is going to produce one line and
therefore it should be counted.

In a later message you wrote using sed to delete blank lines so that
only non-blank lines remained to be counted.

> $ a="" ; echo  "$a"  |  sed '/^\s*$/d' | wc -l
> 0
> 
> $ a="3" ; echo  "$a"  |  sed '/^\s*$/d' | wc -l
> 1
> 
> Can be added option to "wc" to fix this problem without use sed in future ?

This tells me that you as yet did not understand things yet. :-(

I tried to explain this in more detail in my response to that message.
The sed command you pulled from stackoverflow.com deletes blank
lines.  That is a good way to avoid counting blank lines.

If I guess at what you are suggesting then it does not make sense to
add an option to wc to count only non-blank lines.  If you don't want
to count blank lines then delete them first.  There are an infinite
number of possible things to count.  There cannot be an infinite
number of options implemented.  And using sed to delete blank lines is
the Right Way To Do Things.

> however now Iunderstand that they work correctly in accordance with
> accepted principles.

Yes.

> > What is a text line? A text line by definition ends with a
> > newline. This has been standardized to prevent different
> > implementations from implementing it differently and creating
> > portability problems. Therefore all standards compliant
> > implementations must implement it in the same way to prevent
> > portability problems.
> 
> " wc -l " in most examples working correct,

"most"?  No.  "wc -l" is working correctly in all examples. :-)

> because it " echo "  give's " \n " and "wc -l" count correct.

Yes.

> I mentioned about "wc", because for me build option "wc -a" for "echo"  or
> "echo -m"
> this is not important.
> Maybe exist hope for example create option "m" to echo  , " echo -m "
> which not will from new line, but first line if variable is empty
> and from new line if is not empty  ?
> 
> example:
> 
> echo -m "" | wc -l
> 0
> 
> echo -m "e" | wc -l
> 1

The shell is a programming language.  If not infinite number then a
very large number of possibilities may be implemented by programming
them in the shell.  All such possibilities should not be coded into
specific options.  Instead if you have a specific need it should be
programmed.  Simply write the code that says explicitly what you want
to do.  There are millions of lines of code written for various tasks.
All of those millions of lines should not be turned into specific
options.  If you want to delete blank lines then simply delete blank
lines.

This entire discussion feels like an XY problem.  Here is a collection
of explanations of the XY problem.

  http://www.perlmonks.org/?node_id=542341

The help-bash <at> gnu.org mailing list is the right place to follow up but
if you wrote there and said what you were trying to do and asking how
to do it in the shell people would try to help you there.

Bob




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 03 Aug 2015 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 327 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.