GNU bug report logs - #67490
[PATCH v2] tail: fix tailing sysfs files on large page kernels

Previous Next

Package: coreutils;

Reported by: dann frazier <dann.frazier <at> canonical.com>

Date: Mon, 27 Nov 2023 17:31:02 UTC

Severity: normal

Tags: patch

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Pádraig Brady <P <at> draigBrady.com>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#67490: closed ([PATCH v2] tail: fix tailing sysfs files on
 large page kernels)
Date: Fri, 01 Dec 2023 23:24:02 +0000
[Message part 1 (text/plain, inline)]
Your message dated Fri, 1 Dec 2023 23:23:19 +0000
with message-id <52ff0042-1eea-acac-e355-7e7dd62f5956 <at> draigBrady.com>
and subject line Re: bug#67490: [PATCH v2] tail: fix tailing sysfs files on large page kernels
has caused the debbugs.gnu.org bug report #67490,
regarding [PATCH v2] tail: fix tailing sysfs files on large page kernels
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
67490: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=67490
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: dann frazier <dann.frazier <at> canonical.com>
To: bug-coreutils <at> gnu.org
Subject: [PATCH] tail: fix following /proc and /sys files when using a 64K
 page size
Date: Mon, 27 Nov 2023 09:24:33 -0700
* src/tail.c (file_lines): Use fstat() to determine a file's block
size and dynamically allocate a buffer of that size for traversing
backwards.
---
 src/tail.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/src/tail.c b/src/tail.c
index c45f3b65a..437a38204 100644
--- a/src/tail.c
+++ b/src/tail.c
@@ -518,18 +518,30 @@ static bool
 file_lines (char const *pretty_filename, int fd, uintmax_t n_lines,
             off_t start_pos, off_t end_pos, uintmax_t *read_pos)
 {
-  char buffer[BUFSIZ];
+  char *buffer;
   size_t bytes_read;
+  blksize_t bufsize;
   off_t pos = end_pos;
+  bool ok = true;
+  struct stat stats;
 
   if (n_lines == 0)
     return true;
 
+  if (fstat (fd, &stats) != 0)
+    {
+      error (0, errno, _("cannot fstat %s"), quoteaf (pretty_filename));
+      return false;
+    }
+
+  bufsize = ST_BLKSIZE (stats);
+  buffer = xmalloc (bufsize);
+
   /* Set 'bytes_read' to the size of the last, probably partial, buffer;
      0 < 'bytes_read' <= 'BUFSIZ'.  */
-  bytes_read = (pos - start_pos) % BUFSIZ;
+  bytes_read = (pos - start_pos) % bufsize;
   if (bytes_read == 0)
-    bytes_read = BUFSIZ;
+    bytes_read = bufsize;
   /* Make 'pos' a multiple of 'BUFSIZ' (0 if the file is short), so that all
      reads will be on block boundaries, which might increase efficiency.  */
   pos -= bytes_read;
@@ -538,7 +550,8 @@ file_lines (char const *pretty_filename, int fd, uintmax_t n_lines,
   if (bytes_read == SAFE_READ_ERROR)
     {
       error (0, errno, _("error reading %s"), quoteaf (pretty_filename));
-      return false;
+      ok = false;
+      goto free_buffer;
     }
   *read_pos = pos + bytes_read;
 
@@ -565,7 +578,7 @@ file_lines (char const *pretty_filename, int fd, uintmax_t n_lines,
               xwrite_stdout (nl + 1, bytes_read - (n + 1));
               *read_pos += dump_remainder (false, pretty_filename, fd,
                                            end_pos - (pos + bytes_read));
-              return true;
+              goto free_buffer;
             }
         }
 
@@ -577,23 +590,26 @@ file_lines (char const *pretty_filename, int fd, uintmax_t n_lines,
           xlseek (fd, start_pos, SEEK_SET, pretty_filename);
           *read_pos = start_pos + dump_remainder (false, pretty_filename, fd,
                                                   end_pos);
-          return true;
+          goto free_buffer;
         }
-      pos -= BUFSIZ;
+      pos -= bufsize;
       xlseek (fd, pos, SEEK_SET, pretty_filename);
 
-      bytes_read = safe_read (fd, buffer, BUFSIZ);
+      bytes_read = safe_read (fd, buffer, bufsize);
       if (bytes_read == SAFE_READ_ERROR)
         {
           error (0, errno, _("error reading %s"), quoteaf (pretty_filename));
-          return false;
+          ok = false;
+          goto free_buffer;
         }
 
       *read_pos = pos + bytes_read;
     }
   while (bytes_read > 0);
 
-  return true;
+free_buffer:
+  free (buffer);
+  return ok;
 }
 
 /* Print the last N_LINES lines from the end of the standard input,
-- 
2.42.0



[Message part 3 (message/rfc822, inline)]
From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>,
 dann frazier <dann.frazier <at> canonical.com>, 67490-done <at> debbugs.gnu.org
Subject: Re: bug#67490: [PATCH v2] tail: fix tailing sysfs files on large page
 kernels
Date: Fri, 1 Dec 2023 23:23:19 +0000
On 01/12/2023 01:54, Paul Eggert wrote:
> On 11/30/23 12:11, Pádraig Brady wrote:
>> Though that will generally give 128K, which is good when processing all
>> of a file,
>> but perhaps overkill when processing just the last part of a file.
> 
> The 128 KiB number was computed as being better for apps like 'sed' that
> typically read all or most of the file. 'tail' sometimes behaves that
> way (e.g., 'tail -c +10') and so 'tail' should use 128 KiB in those
> cases. The simplest way to do that is for 'tail' to use 128 KiB all the
> time - that would cost little for uses like plain 'tail' and it could be
> a significant win for uses like 'tail -c +10'.

Yes I agree we should use io_blksize() in other routines in tail
where we may dump lots of a file. However in this (most common) case
the routine is dealing with the end of a regular file,
so it's probably best to somewhat minimize the amount of data read,
and more directly check the page_size which is issue at hand.
I've pushed the fix at https://github.com/coreutils/coreutils/commit/73d119f4f
where the adjustment (which also corresponds to what we do in wc) is:

  if (sb->st_size % page_size == 0)
    bufsize = MAX (BUFSIZ, page_size);

I'll follow up with another patch to address the performance aspect,
which uses io_blksize() where appropriate.

> (As an aside, the 128 KiB number was computed in 2014. These days 256
> KiB might be better if someone could take the time to measure....)

I periodically check with the documented script,
but I should update the comment when I do that.
I'll update the date in the comment now at least.
I quickly tested a few systems here,
which suggests 256KiB _may_ be a more appropriate default now.
More testing required.

Marking this as done.

thanks,
Pádraig


This bug report was last modified 1 year and 175 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.