GNU bug report logs - #20029
'yes' surprisingly slow

Previous Next

Package: coreutils;

Reported by: Ole Tange <tange <at> gnu.org>

Date: Sat, 7 Mar 2015 11:50:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #26 received at 20029 <at> debbugs.gnu.org (full text, mbox):

From: Giuseppe Scrivano <gscrivano <at> gnu.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 20029 <at> debbugs.gnu.org, Ole Tange <tange <at> gnu.org>
Subject: Re: bug#20029: 'yes' surprisingly slow
Date: Tue, 10 Mar 2015 01:31:36 +0100
Pádraig Brady <P <at> draigBrady.com> writes:

> The attached should make things more efficient here.
>
> thanks,
> Pádraig.
>
>
> From 7959bbf19307705e98f08cfa32a9dcf67672590c Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?P=C3=A1draig=20Brady?= <P <at> draigBrady.com>
> Date: Mon, 9 Mar 2015 19:27:32 +0000
> Subject: [PATCH] yes: output data more efficiently
>
> yes(1) may be used to generate repeating patterns of text
> for test inputs etc., so adjust to be more efficient.
>
> Profiling the case where yes(1) is outputting small items
> through stdio (which was the default case), shows the overhead
> continuously processing small items in main() and in stdio:
>
>     $ yes >/dev/null & perf top -p $!
>     31.02%  yes           [.] main
>     27.36%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>     14.51%  libc-2.20.so  [.] fputs_unlocked
>     13.50%  libc-2.20.so  [.] strlen
>     10.66%  libc-2.20.so  [.] __GI___mempcpy
>      1.98%  yes           [.] fputs_unlocked <at> plta
>
> Sending more data per stdio call improves the situation,
> but still, there is significant stdio overhead due to memory copies,
> and the repeated string length checking:
>
>     $ yes "`echo {1..1000}`" >/dev/null & perf top -p $!
>     42.26%  libc-2.20.so  [.] __GI___mempcpy
>     17.38%  libc-2.20.so  [.] strlen
>      5.21%  [kernel]      [k] __srcu_read_lock
>      4.58%  [kernel]      [k] __srcu_read_unlock
>      4.27%  libc-2.20.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
>      2.50%  libc-2.20.so  [.] __GI___libc_write
>      2.45%  [kernel]      [k] system_call
>      2.40%  [kernel]      [k] system_call_after_swapgs
>      2.27%  [kernel]      [k] vfs_write
>      2.09%  libc-2.20.so  [.] _IO_do_write@@GLIBC_2.2.5
>      2.01%  [kernel]      [k] fsnotify
>      1.95%  libc-2.20.so  [.] _IO_file_write@@GLIBC_2.2.5
>      1.44%  yes           [.] main
>
> We can avoid all stdio overhead by building up the buffer
> _once_ and outputting that, and the profile below shows
> the bottleneck moved to the kernel:
>
>     $ src/yes >/dev/null & perf top -p $!
>     15.42%  [kernel]      [k] __srcu_read_lock
>     12.98%  [kernel]      [k] __srcu_read_unlock
>      9.41%  libc-2.20.so  [.] __GI___libc_write
>      9.11%  [kernel]      [k] vfs_write
>      8.35%  [kernel]      [k] fsnotify
>      8.02%  [kernel]      [k] system_call
>      5.84%  [kernel]      [k] system_call_after_swapgs
>      4.54%  [kernel]      [k] __fget_light
>      3.98%  [kernel]      [k] sys_write
>      3.65%  [kernel]      [k] selinux_file_permission
>      3.44%  [kernel]      [k] rw_verify_area
>      2.94%  [kernel]      [k] __fsnotify_parent
>      2.76%  [kernel]      [k] security_file_permission
>      2.39%  yes           [.] main
>      2.17%  [kernel]      [k] __fdget_pos
>      2.13%  [kernel]      [k] sysret_check
>      0.81%  [kernel]      [k] write_null
>      0.36%  yes           [.] write <at> plt
>
> Note this change also ensures that yes will only write complete lines
> for lines softer than BUFSIZ.
>
> * src/yes.c (main): Build up a BUFSIZ buffer of lines,
> and output that, rather than having stdio process each item.
> * tests/misc/yes.sh: Add a new test for various buffer sizes.
> * tests/local.mk: Reference the new test.
> Fixes http://bugs.gnu.org/20029
> ---
>  src/yes.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
>  tests/local.mk    |  1 +
>  tests/misc/yes.sh | 28 ++++++++++++++++++++++++++++
>  3 files changed, 70 insertions(+), 2 deletions(-)
>  create mode 100755 tests/misc/yes.sh
>
> diff --git a/src/yes.c b/src/yes.c
> index b35b13f..91dea11 100644
> --- a/src/yes.c
> +++ b/src/yes.c
> @@ -58,6 +58,10 @@ Repeatedly output a line with all specified STRING(s), or 'y'.\n\
>  int
>  main (int argc, char **argv)
>  {
> +  char buf[BUFSIZ];
> +  char *pbuf = buf;
> +  int i;
> +
>    initialize_main (&argc, &argv);
>    set_program_name (argv[0]);
>    setlocale (LC_ALL, "");
> @@ -77,9 +81,44 @@ main (int argc, char **argv)
>        argv[argc++] = bad_cast ("y");
>      }
>  
> -  while (true)
> +  /* Buffer data locally once, rather than having the
> +     large overhead of stdio buffering each item.   */
> +  for (i = optind; i < argc; i++)
> +    {
> +      size_t len = strlen (argv[i]);
> +      if (BUFSIZ < len || BUFSIZ - len <= pbuf - buf)
> +        break;
> +      memcpy (pbuf, argv[i], len);
> +      pbuf += len;
> +      *pbuf++ = i == argc - 1 ? '\n' : ' ';
> +    }
> +  if (i < argc)
> +    pbuf = NULL;

since the buffer is partly filled, wouldn't be better to reuse it?

Something like this (barely tested):

diff --git a/src/yes.c b/src/yes.c
index 91dea11..ac690ce 100644
--- a/src/yes.c
+++ b/src/yes.c
@@ -92,9 +92,7 @@ main (int argc, char **argv)
       pbuf += len;
       *pbuf++ = i == argc - 1 ? '\n' : ' ';
     }
-  if (i < argc)
-    pbuf = NULL;
-  else
+  if (i == argc)
     {
       size_t line_len = pbuf - buf;
       size_t lines = BUFSIZ / line_len;
@@ -106,7 +104,7 @@ main (int argc, char **argv)
     }
 
   /* The normal case is to continuously output the local buffer.  */
-  while (pbuf)
+  while (i == argc)
     {
       if (write (STDOUT_FILENO, buf, pbuf - buf) == -1)
         {
@@ -117,11 +115,17 @@ main (int argc, char **argv)
 
   /* If the data doesn't fit in BUFSIZ then it's large
      and not too onerous to output using stdio in any case.  */
-  while (! pbuf)
+  while (i != argc)
     {
-      for (i = optind; i < argc; i++)
-        if (fputs (argv[i], stdout) == EOF
-            || putchar (i == argc - 1 ? '\n' : ' ') == EOF)
+      int j;
+      if ((pbuf - buf) && fwrite (buf, pbuf - buf, 1, stdout) == 0)
+        {
+          error (0, errno, _("standard output"));
+          return EXIT_FAILURE;
+        }
+      for (j = i; j < argc; j++)
+        if (fputs (argv[j], stdout) == EOF
+            || putchar (j == argc - 1 ? '\n' : ' ') == EOF)
           {
             error (0, errno, _("standard output"));
             return EXIT_FAILURE;


The difference seems to be (without the patch):

$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686  -

real	0m7.994s
user	0m11.093s
sys	0m2.953s

versus (with the patch):

$ time src/yes `echo {1..2000}` | head -c 2000M | md5sum
55c293324aa6ecce14f0bf30da5a4686  -

real	0m3.534s
user	0m4.164s
sys	0m1.803s

Regards,
Giuseppe





This bug report was last modified 10 years and 80 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.