GNU bug report logs - #18621
[BUG] wc -c incorrectly counts bytes in /sys

Previous Next

Package: coreutils;

Reported by: George Shuklin <george.shuklin <at> gmail.com>

Date: Fri, 3 Oct 2014 15:13:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18621 <at> debbugs.gnu.org, George Shuklin <george.shuklin <at> gmail.com>, Jim Meyering <jim <at> meyering.net>
Subject: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Fri, 03 Oct 2014 20:17:15 +0100
On 10/03/2014 07:47 PM, Paul Eggert wrote:
> On 10/03/2014 11:26 AM, Jim Meyering wrote:
>> That looks like a fine fix.
> 
> Unfortunately that fix would make 'wc -c' waaaaay slower for a file that consists entirely of a big hole.

True, which you could avoid by deferring to read() for empty files:

diff --git a/src/wc.c b/src/wc.c
index 1ff007d..f8176cc 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
         fstatus->failed = fstat (fd, &fstatus->st);

       if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
+          && fstatus->st.st_blocks && fstatus->st.st_size
           && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
           && (end_pos = lseek (fd, 0, SEEK_END)) != -1)
         {

> How about if we change usable_st_size to return false for these proc files, with a heuristic as tight as we can make it, and to have coreutils check usable_st_size in more places.  Something like this, perhaps:
> 
>   /* Return a boolean indicating whether SB->st_size is correct. */
>   static inline bool
>   usable_st_size (struct stat const *sb)
>   {
>     if (S_ISREG (sb->st_mode))
>       {
>         /* proc files like /sys/kernel/vmcoreinfo are weird: their
>            st_size values do not reflect what's actually in them.
>            The following heuristic attempts to catch proc files without
>            catching many regular files that just happen to have the same
>            signature.  */
>         return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0
>                   && sb->st_size == ST_BLKSIZE (*sb));
>       }
>     return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb));
>   }
>
> and then review every place where coreutils currently uses st_size and prepend a check for usable_st_size if needed.

That would be usefult, and that check matches this case, however many are not
distinguishable. Consider: `stat /proc/$$/io` which has st_size and st_blocks = 0
Note my adjusted patch above handles both cases.

thanks,
Pádraig




This bug report was last modified 10 years and 289 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.