GNU bug report logs - #37093
wc runs 100% cpu when in pipeline or tee >(wc)

Previous Next

Package: coreutils;

Reported by: Edward Huff <ejhuff <at> gmail.com>

Date: Tue, 20 Aug 2019 04:45:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #15 received at control <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Edward Huff <ejhuff <at> gmail.com>, 37093 <at> debbugs.gnu.org
Subject: Re: bug#37093: wc runs 100% cpu when in pipeline or tee >(wc)
Date: Tue, 20 Aug 2019 00:41:15 -0600
tag 37093 notabug
close 37093
stop

Hello,

On 2019-08-19 10:44 p.m., Edward Huff wrote:
> In the demo below, dd uses 0.665s to write 1GiB of zeros.
> sha256sum uses 4.285s to calculate the sha256 of 1GiB of zeros.
> wc uses 32.160s to count 1GiB of zeros.

[...]

> baseline results:
> $ dd if=/dev/zero count=$((1024*1024)) bs=1024 | tee >(sha256sum>&2) | wc
> 1048576+0 records in
> 1048576+0 records out
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 32.5007 s, 33.0 MB/s
> 49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
>        0       0 1073741824
> $

First,
Try to avoid UTF8 locales (i.e., force a C/POSIX locale with LC_ALL=C)
which makes 'wc' much faster.

On my computer:

With UTF8 locale:

  $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \
        | tee >(sha256sum>&2) | time --portability wc
  1048576+0 records in
  1048576+0 records out
  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 46.5928 s, 23.0 MB/s
  49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
        0       0 1073741824
  real 46.59
  user 46.37
  sys 0.19

With C locale:

  $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \
       | tee >(sha256sum>&2) | LC_ALL=C time --portability wc
  1048576+0 records in
  1048576+0 records out
  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 8.60285 s, 125 MB/s
  49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
        0       0 1073741824
  real 8.60
  user 5.22
  sys 0.26


Second,
The "word counting" feature in 'wc' is the main cpu-hog.
If you avoid that (i.e. counting only lines, or only characters),
'wc' is even faster (and it automatically ignores UTF8 issues):

  $ dd if=/dev/zero count=$((1024*1024)) bs=1024 \
       | tee >(sha256sum>&2) \
       | \time --portability wc -c
  1048576+0 records in
  1048576+0 records out
  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.59429 s, 141 MB/s
  49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -
  1073741824

  real 7.59
  user 0.10
  sys 0.71

Notice that the "real time" wasn't changed much (from 8.6s to 7.59s), 
but the actual work performed by 'wc' (measured in "user time") is down
drastically.


Third,
If you are comfortable with compiling Coreutils from source,
you can build it using optimized hashing function from OpenSSL, like so:

     ./configure --with-openssl
     make

Then, "sha256sum" will be faster (about 2x fast on my computer).

If you don't want to re-compile it, consider using "openssl" directly
to calculate the checksum, like so:

  dd if=/dev/zero count=1K bs=1M | tee >(openssl sha256>&2) | wc -c


Fourth,
To save few more microseconds, consider using dd with larger block size 
(bs=) and fewer blocks (count=), e.g.:

   $ time dd if=/dev/zero of=/dev/null count=1M bs=1K
   1048576+0 records in
   1048576+0 records out
   1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.865853 s, 1.2 GB/s

   real	0m0.868s
   user	0m0.288s
   sys	0m0.579s

   $ time dd if=/dev/zero of=/dev/null count=1K bs=1M
   1024+0 records in
   1024+0 records out
   1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.0998688 s, 10.8 GB/s

   real	0m0.102s
   user	0m0.000s
   sys	0m0.102s

This won't reduce the total time by much, but will result in
fewer sys-calls, and less CPU kernel time (at least by a tiny bit).
The effect is more noticeable when reading or writing to a physical disk.

----

Lastly,
If you use GNU time instead of the shell's built-in 'time' function,
you can specify custom output format,
and easily show the timing of each program in the pipeline.
Example:

    $ FMT="\n=== CMD: %C ===\nreal %e\tuser %U\tsys %S\n"
    $ \time -f "$FMT" dd if=/dev/zero count=1M bs=1K \
         | \time -f "$FMT" tee >(\time -f "$FMT" sha256sum>&2) \
         | \time -f "$FMT" wc -c
    1048576+0 records in
    1048576+0 records out
    1073741824 bytes (1.1 GB, 1.0 GiB) copied, 7.77339 s, 138 MB/s

    === CMD: dd if=/dev/zero count=1048576 bs=1024 ===
    real 7.77	user 0.36	sys 1.65


    === CMD: tee /dev/fd/63 ===
    real 7.77	user 0.10	sys 1.30

    49bc20df15e412a64472421e13fe86ff1c5165e18b2afccf160d4dc19fe68a14  -

    === CMD: sha256sum ===
    real 7.77	user 7.47	sys 0.27

    1073741824

    === CMD: wc -c ===
    real 7.77	user 0.05	sys 0.76


As such, I'm closing this as "not a bug",
but discussion can continue by replying to this thread.

regards,
 - assaf





This bug report was last modified 5 years and 334 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.