On 05/02/2023 18:27, Stephane Chazelas wrote: > "wc -c" without filename arguments is meant to read stdin til > EOF and report the number of bytes it has read. > > When stdin is on a regular file, GNU wc has that optimisation > whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR) > to find out its current position within the file, fstat(0) and > reports st_size - pos (assuming st_size > pos). > > However, it does not move the position to the end of the file. > That means for instance that: > > $ echo test > file > $ { wc -c; wc -c; } < file > 5 > 5 > > Instead of 5, then 0: > > $ { wc -c; cat; } < file > 5 > test > > So the optimisation is incomplete. > > It also reports the size of the file even if it could not possibly read it > because it's not open in read mode: > > { wc -c; } 0>> file > 5 > > IMO, it should only do the optimisation if > - fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR > - current checks for /proc /sys-like filesystems > - pos > st_size > - lseek(0,st_size,SEEK_POS) is successful. > > (that leaves a race window above where it could move the cursor > backward, but I would think that can be ignored as if something > else reads at the same time, there's not much we can expect > anyway). Yes I agree. Adjusting would also avoid the following inconsistencies: $ { wc -c; wc -c; } < file 5 5 $ { wc -l; wc -l; } < file 1 0 $ truncate -s $(getconf PAGESIZE) file $ { wc -c; wc -c; } < file 4096 0 Hopefully the attached addresses this. Note it doesn't add the constraint on the input being readable, which I'll think a bit more about. cheers, Pádraig