Head command behaves differently with seekable and un-seekable input-data sources.
Pipes are un-seekable. Head's behavior with input provided using pipes will be different from its behavior when input is provided from a regular file (which are seekable).
Therefore, the defect that was raised for regular files can not be evaluated by examples with pipes.
Here's why head behaves differently depending on whether the file allows lseek():
-------------------------------------------------------------------------------------------------
1. Head command first reads data into a buffer using read() system call and then operates upon that buffer.
2. Size of the buffer used by head in read() is 8192 bytes on my machine.
3. It is not an error if read() gets lesser number of bytes than requested; this may happen for example because fewer bytes are actually available when read accessed the pipe or because read() was interrupted by a signal.
4. Therefore, only the upper bound of the data read in the buffer is fixed, not the lower bound.
5. Since read() tries to read as much data as it can (upto buffer size), therefore, in most cases it reads more data into the buffer than actually needed by head command's algorithm.
6. When head discovers that it does not need all the data in the buffer, then head tries to return the extra data back to the file descriptor by using lseek().
7. However, data can not be returned back for un-seekable files. Therefore, head has to discard extra data in for un-seekable files. This creates situations that look as if head has eaten some part of the data.
Head's problem with unseekable files - Commands waiting to execute after head will never get the extra data that was read by head.
Bigger Problem – How much data will be lost is not fixed because how much data read() actually reads is not fixed (See point 4 above). It is also possible that no data is lost!!
I tried the following example and it worked as expected:
$ seq 10 >p
$ ( head -n 2 ; echo xxx ; cat )<p
1
2
xxx
3
4
5
6
7
8
9
10
$
Anoop Sharma wrote:
> Head command does not position file pointer correctly for negative line
> count. Here is a demonstration of the problem.
The problem doesn't seem to be limited to negative
line counts. I replaced the 10 ABC lines by a number
sequence to demonstrate this issue clearer.
$ seq 10 | ( head -n -2 ; echo xxx ; cat )
1
2
3
4
5
6
7
8
xxx
$ seq 10 | ( head -n 2 ; echo xxx ; cat )
1
2
xxx
So head eats all of the input. The info page is silent
about this.
Have a nice day,
Berny