GNU bug report logs -
#21460
Race condition in tests/tail-2/assert.sh
Previous Next
Reported by: ludo <at> gnu.org (Ludovic Courtès)
Date: Fri, 11 Sep 2015 16:24:02 UTC
Severity: normal
Merged with 21459
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 21460 in the body.
You can then email your comments to 21460 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 16:24:03 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
ludo <at> gnu.org (Ludovic Courtès)
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 11 Sep 2015 16:24:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
We have observed intermittent failures of tests/tail-2/assert.sh
(Coreutils 8.24, libc 2.22), especially showing up on relatively slow
machines (armhf and mips64el.)
The failure is with ‘tail --follow=name’, which, in inotify mode, would
fail to report that file ‘foo’ has been deleted.
The strace of a correct execution (x86_64) is like this:
--8<---------------cut here---------------start------------->8---
lstat("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
lstat("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
inotify_init() = 5
write(1, "==> a <==\nx\n\n==> foo <==\n", 25) = 25
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 2
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 3
open("a", O_RDONLY|O_NONBLOCK) = 6
lstat("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstatfs(6, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=16738210, f_bfree=1219533, f_bavail=367617, f_files=4259840, f_ffree=2406492, f_fsid={1622537548, 1497272261}, f_namelen=255, f_frsize=4096}) = 0
close(6) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
open("foo", O_RDONLY|O_NONBLOCK) = 6
lstat("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstatfs(6, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=16738210, f_bfree=1219533, f_bavail=367617, f_files=4259840, f_ffree=2406492, f_fsid={1622537548, 1497272261}, f_namelen=255, f_frsize=4096}) = 0
close(6) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) = 16
open("foo", O_RDONLY|O_NONBLOCK) = -1 ENOENT (No such file or directory)
lstat("foo", 0x7ffee174c3b0) = -1 ENOENT (No such file or directory)
write(2, "tail: ", 6) = 6
write(2, "foo", 3) = 3
write(2, ": No such file or directory", 27) = 27
write(2, "\n", 1) = 1
close(4) = 0
read(5, "\3\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0", 20) = 16
inotify_rm_watch(5, 3) = -1 EINVAL (Invalid argument)
--8<---------------cut here---------------end--------------->8---
For a *failing* execution (armhf), we get:
--8<---------------cut here---------------start------------->8---
lstat64("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
inotify_init() = 5
write(1, "==> a <==\nx\n\n==> foo <==\n", 25) = 25
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "a", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 2
inotify_add_watch(5, ".", IN_ATTRIB|IN_MOVED_TO|IN_CREATE) = 1
inotify_add_watch(5, "foo", IN_MODIFY|IN_ATTRIB|IN_DELETE_SELF|IN_MOVE_SELF) = 3
open("a", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6
lstat64("a", {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337, f_bfree=36178942, f_bavail=34274471, f_files=9502720, f_ffree=9405759, f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0
close(6) = 0
fstat64(3, {st_mode=S_IFREG|0644, st_size=2, ...}) = 0
open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6
lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337, f_bfree=36178942, f_bavail=34274471, f_files=9502720, f_ffree=9405759, f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0
close(6) = 0
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
read(5, "\3\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 20) = 16
open("foo", O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 6
lstat64("foo", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fstatfs64(6, 88, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=37378337, f_bfree=36178938, f_bavail=34274467, f_files=9502720, f_ffree=9405759, f_fsid={1592050960, 1812457140}, f_namelen=255, f_frsize=4096, f_flags=4128}) = 0
close(6) = 0
read(5,
--8<---------------cut here---------------end--------------->8---
In both cases, reading from the inotify file descriptor (number 5)
returns a notification for watch #3 (corresponding to ‘foo’), with mask
IN_ATTRIB (value 4).
However, the open("foo") call that immediately follows does *not* return
ENOENT in the failing case: The file is still there.
The kernel’s ‘vfs_unlink’ goes like this:
fsnotify_link_count(target); /* IN_ATTRIB */
d_delete(dentry); /* fsnotify_nameremove → IN_DELETE for “.” */
So, ‘tail’ first receives the IN_ATTRIB notification corresponding to
the link count decrease on ‘foo’; at that point, ‘foo’ is still
available. And then, ‘tail’ should receive the IN_DELETE_SELF
notification, at which point ‘foo’ would have been actually been
unlinked. But in practice we don’t seem to be receiving IN_DELETE_SELF,
even in the succeeding case.
I think the problem happens when ‘tail’ opens ‘foo’ right in between of
the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t
report anything.
Does that make sense?
Thanks,
Ludo’.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 17:19:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
Ludovic Courtès wrote:
> I think the problem happens when ‘tail’ opens ‘foo’ right in between of
> the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t
> report anything.
>
> Does that make sense?
Yes, though if the link count is indeed zero, I'm surprised that 'tail' can open
the file -- that sounds like a bug in the kernel.
If there is such a kernel bug and 'tail' can open a file with a link count of
zero, that would explain why 'tail' does not immediately receive an
IN_DELETE_SELF notification: after all, the file is open (by 'tail' itself) so
it should not be deleted even if it has a link count of zero. If so, it appears
that there's another kernel bug later: when 'tail' closes the file's last file
descriptor, the file should be deleted and an IN_DELETE_SELF notification should
be sent to 'tail'.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 17:19:02 GMT)
Full text and
rfc822 format available.
Forcibly Merged 21459 21460.
Request was from
Pádraig Brady <P <at> draigBrady.com>
to
control <at> debbugs.gnu.org
.
(Fri, 11 Sep 2015 18:20:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 20:56:01 GMT)
Full text and
rfc822 format available.
Message #16 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert <eggert <at> cs.ucla.edu> skribis:
> Ludovic Courtès wrote:
>> I think the problem happens when ‘tail’ opens ‘foo’ right in between of
>> the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t
>> report anything.
>>
>> Does that make sense?
>
> Yes, though if the link count is indeed zero, I'm surprised that
> 'tail' can open the file -- that sounds like a bug in the kernel.
Attached is a reproducer; just run it in a loop for a couple of seconds:
--8<---------------cut here---------------start------------->8---
$ while ./a.out ; do : ; done
funny, errno = Success, nlink = 0
Aborted (core dumped)
--8<---------------cut here---------------end--------------->8---
I’m not sure if that’s a kernel bug. Strictly speaking, inotify works
as expected: we get a notification for nlink--, which doesn’t mean the
file has vanished.
The conclusion for ‘tail’ would be to wait for the IN_DELETE_SELF event
before considering the file to be gone. WDYT?
(That ‘inotify_rm_watch’ returns EINVAL *is* a bug IMO, but not
worrisome.)
Thanks,
Ludo’.
[inotify.c (text/plain, inline)]
#define _GNU_SOURCE 1
#include <stdlib.h>
#include <unistd.h>
#include <sys/inotify.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <assert.h>
int
main ()
{
int file = creat ("foo", S_IRUSR | S_IWUSR);
assert_perror (errno);
close (file);
int notifications = inotify_init ();
assert_perror (errno);
int watch = inotify_add_watch (notifications, "foo",
IN_MODIFY | IN_ATTRIB
| IN_DELETE_SELF | IN_MOVE_SELF);
assert_perror (errno);
if (fork () == 0)
{
unlink ("foo");
assert_perror (errno);
exit (EXIT_SUCCESS);
}
struct inotify_event event;
ssize_t count = read (notifications, &event, sizeof event);
assert (count == sizeof event);
assert (event.mask == IN_ATTRIB);
struct stat st;
stat ("foo", &st);
if (errno != ENOENT)
{
printf ("funny, errno = %m, nlink = %li\n",
st.st_nlink);
abort ();
}
count = read (notifications, &event, sizeof event);
assert (count == sizeof event);
assert (event.mask == IN_DELETE_SELF);
/* Bug #2: this returns EINVAL for no good reason. */
/* inotify_rm_watch (notifications, watch); */
/* assert_perror (errno); */
return EXIT_SUCCESS;
}
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 20:56:02 GMT)
Full text and
rfc822 format available.
Merged 21459 21460.
Request was from
ludo <at> gnu.org (Ludovic Courtès)
to
control <at> debbugs.gnu.org
.
(Fri, 11 Sep 2015 21:11:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 22:51:02 GMT)
Full text and
rfc822 format available.
Message #24 received at submit <at> debbugs.gnu.org (full text, mbox):
On 11/09/15 21:55, Ludovic Courtès wrote:
> Paul Eggert <eggert <at> cs.ucla.edu> skribis:
>
>> Ludovic Courtès wrote:
>>> I think the problem happens when ‘tail’ opens ‘foo’ right in between of
>>> the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t
>>> report anything.
>>>
>>> Does that make sense?
>>
>> Yes, though if the link count is indeed zero, I'm surprised that
>> 'tail' can open the file -- that sounds like a bug in the kernel.
>
> Attached is a reproducer; just run it in a loop for a couple of seconds:
>
> --8<---------------cut here---------------start------------->8---
> $ while ./a.out ; do : ; done
> funny, errno = Success, nlink = 0
> Aborted (core dumped)
> --8<---------------cut here---------------end--------------->8---
>
> I’m not sure if that’s a kernel bug. Strictly speaking, inotify works
> as expected: we get a notification for nlink--, which doesn’t mean the
> file has vanished.
Interesting. It does seem that the IN_ATTRIB is sent before the st_nlink--
takes effect? That could be a bug. Or it could be a dcache coherency
issue where the name still references the st_nlink==0 inode.
Note recheck() just open() and close() the file in this case,
but since it doesn't close() the original fd, then there will be
no IN_DELETE_SELF event.
If the above kernel behavior can be explained and is acceptable,
I suppose we could augment recheck() with something like:
diff --git a/src/tail.c b/src/tail.c
index f916d74..e9d5337 100644
--- a/src/tail.c
+++ b/src/tail.c
@@ -1046,6 +1046,18 @@ recheck (struct File_spec *f, bool blocking)
close_fd (f->fd, pretty_name (f));
}
+ else if (new_stats.st_nlink == 0) /* XXX: what about multi-linked files. */
+ {
+ /* It was seen on Linux that a file could be opened
+ even though unlinked as the directory entry (cache)
+ is updated after the IN_ATTRIB is sent for the nlink--. */
+
+ error (0, f->errnum, _("%s has become inaccessible"),
+ quote (pretty_name (f)));
+
+ close_fd (fd, pretty_name (f));
+ close_fd (f->fd, pretty_name (f));
+ f->fd = -1;
else
{
/* No changes detected, so close new fd. */
> The conclusion for ‘tail’ would be to wait for the IN_DELETE_SELF event
> before considering the file to be gone. WDYT?
As mentioned above, tail references the file until it can't open it,
so the IN_DELETE_SELF is only generated upon the close_fd(f->fd) above.
thanks,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 22:51:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 23:49:01 GMT)
Full text and
rfc822 format available.
Message #30 received at submit <at> debbugs.gnu.org (full text, mbox):
On 11/09/15 23:49, Pádraig Brady wrote:
> On 11/09/15 21:55, Ludovic Courtès wrote:
>> Paul Eggert <eggert <at> cs.ucla.edu> skribis:
>>
>>> Ludovic Courtès wrote:
>>>> I think the problem happens when ‘tail’ opens ‘foo’ right in between of
>>>> the two notifications: ‘foo’ is still there, and so ‘tail’ doesn’t
>>>> report anything.
>>>>
>>>> Does that make sense?
>>>
>>> Yes, though if the link count is indeed zero, I'm surprised that
>>> 'tail' can open the file -- that sounds like a bug in the kernel.
>>
>> Attached is a reproducer; just run it in a loop for a couple of seconds:
>>
>> --8<---------------cut here---------------start------------->8---
>> $ while ./a.out ; do : ; done
>> funny, errno = Success, nlink = 0
>> Aborted (core dumped)
>> --8<---------------cut here---------------end--------------->8---
>>
>> I’m not sure if that’s a kernel bug. Strictly speaking, inotify works
>> as expected: we get a notification for nlink--, which doesn’t mean the
>> file has vanished.
>
> Interesting. It does seem that the IN_ATTRIB is sent before the st_nlink--
> takes effect? That could be a bug. Or it could be a dcache coherency
> issue where the name still references the st_nlink==0 inode.
>
> Note recheck() just open() and close() the file in this case,
> but since it doesn't close() the original fd, then there will be
> no IN_DELETE_SELF event.
>
> If the above kernel behavior can be explained and is acceptable,
> I suppose we could augment recheck() with something like:
>
> diff --git a/src/tail.c b/src/tail.c
> index f916d74..e9d5337 100644
> --- a/src/tail.c
> +++ b/src/tail.c
> @@ -1046,6 +1046,18 @@ recheck (struct File_spec *f, bool blocking)
> close_fd (f->fd, pretty_name (f));
>
> }
> + else if (new_stats.st_nlink == 0) /* XXX: what about multi-linked files. */
> + {
> + /* It was seen on Linux that a file could be opened
> + even though unlinked as the directory entry (cache)
> + is updated after the IN_ATTRIB is sent for the nlink--. */
> +
> + error (0, f->errnum, _("%s has become inaccessible"),
> + quote (pretty_name (f)));
> +
> + close_fd (fd, pretty_name (f));
> + close_fd (f->fd, pretty_name (f));
> + f->fd = -1;
> + }
> else
> {
>
>> The conclusion for ‘tail’ would be to wait for the IN_DELETE_SELF event
>> before considering the file to be gone. WDYT?
>
> As mentioned above, tail references the file until it can't open it,
> so the IN_DELETE_SELF is only generated upon the close_fd(f->fd) above.
Google reminded me of this!
https://lists.gnu.org/archive/html/coreutils/2015-07/msg00015.html
I.E. this is the same issue that Assaf noticed,
and that I though was restricted to older kernels.
That has an alternate fix attached.
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 11 Sep 2015 23:49:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Sat, 12 Sep 2015 01:10:03 GMT)
Full text and
rfc822 format available.
Message #36 received at submit <at> debbugs.gnu.org (full text, mbox):
Pádraig Brady wrote:
> + else if (new_stats.st_nlink == 0) /* XXX: what about multi-linked files. */
That comment was my thought exactly. It appears to be a kernel bug that really
needs to get fixed in the kernel; there just doesn't seem to be a reliable
workaround in user space.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Sat, 12 Sep 2015 01:10:04 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Sat, 12 Sep 2015 02:23:02 GMT)
Full text and
rfc822 format available.
Message #42 received at submit <at> debbugs.gnu.org (full text, mbox):
On 12/09/15 02:09, Paul Eggert wrote:
> Pádraig Brady wrote:
>> + else if (new_stats.st_nlink == 0) /* XXX: what about multi-linked files. */
>
> That comment was my thought exactly. It appears to be a kernel bug that really
> needs to get fixed in the kernel; there just doesn't seem to be a reliable
> workaround in user space.
I double checked with kernel guys, and Al Viro
essentially said the inode and directory operations are independent.
https://lkml.org/lkml/2015/9/11/790
So we probably need to handle the IN_DELETE event
on the directory to cater for this race, as done in:
https://lists.gnu.org/archive/html/coreutils/2015-07/msg00015.html
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Sat, 12 Sep 2015 02:23:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#21460
; Package
coreutils
.
(Fri, 02 Oct 2015 15:51:01 GMT)
Full text and
rfc822 format available.
Message #48 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 12/09/15 03:22, Pádraig Brady wrote:
> On 12/09/15 02:09, Paul Eggert wrote:
>> Pádraig Brady wrote:
>>> + else if (new_stats.st_nlink == 0) /* XXX: what about multi-linked files. */
>>
>> That comment was my thought exactly. It appears to be a kernel bug that really
>> needs to get fixed in the kernel; there just doesn't seem to be a reliable
>> workaround in user space.
>
> I double checked with kernel guys, and Al Viro
> essentially said the inode and directory operations are independent.
> https://lkml.org/lkml/2015/9/11/790
>
> So we probably need to handle the IN_DELETE event
> on the directory to cater for this race, as done in:
> https://lists.gnu.org/archive/html/coreutils/2015-07/msg00015.html
I'll apply the attached later.
thanks,
Pádraig
[tail-unlink-notification-race.patch (text/x-patch, attachment)]
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Fri, 02 Oct 2015 15:51:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
ludo <at> gnu.org (Ludovic Courtès)
:
bug acknowledged by developer.
(Fri, 02 Oct 2015 15:51:03 GMT)
Full text and
rfc822 format available.
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Fri, 02 Oct 2015 15:51:03 GMT)
Full text and
rfc822 format available.
Notification sent
to
ludo <at> gnu.org (Ludovic Courtès)
:
bug acknowledged by developer.
(Fri, 02 Oct 2015 15:51:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 31 Oct 2015 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 231 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.