GNU bug report logs - #9813
rm -rf calls rmdir() prior to close(), which can fail

Previous Next

Package: coreutils;

Reported by: Eric Blake <eblake <at> redhat.com>

Date: Thu, 20 Oct 2011 17:41:01 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Eric Blake <eblake <at> redhat.com>
Subject: bug#9813: closed (Re: rm -rf calls rmdir() prior to close(),
 which can fail)
Date: Sun, 23 Oct 2011 20:56:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#9813: rm -rf calls rmdir() prior to close(), which can fail

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 9813 <at> debbugs.gnu.org.

-- 
9813: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=9813
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Eric Blake <eblake <at> redhat.com>
Cc: cygwin <at> cygwin.com, bug-gnulib <bug-gnulib <at> gnu.org>,
	9813-done <at> debbugs.gnu.org
Subject: Re: rm -rf calls rmdir() prior to close(), which can fail
Date: Sun, 23 Oct 2011 22:53:34 +0200
Eric Blake wrote:
> POSIX is clear that attempts to rmdir() a directory that still has
> open descriptors may fail.  Of course, on Linux, this (rather
> limiting) restriction is not present, so we don't notice it; but on
> Cygwin, there are certain file systems where this is a real problem,
> such as in this thread:
> http://cygwin.com/ml/cygwin/2011-10/msg00365.html
>
> Looking at an strace on Linux reveals the problem (abbreviated to show
> highlights here):
>
> $ mkdir -p a/b
> $ strace rm -f a
> ...
> openat(AT_FDCWD, "a", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
> ...
> fcntl(3, F_DUPFD, 3)                    = 4
> ...
> close(3)                                = 0
> ...
> openat(4, "b", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
> ...
> fcntl(3, F_DUPFD, 3)                    = 5
> ...
> close(3)                                = 0
> close(5)                                = 0
> unlinkat(4, "b", AT_REMOVEDIR)          = 0
> unlinkat(AT_FDCWD, "a", AT_REMOVEDIR)   = 0
> close(4)                                = 0
>
> Notice that for subdirectories, we opened the directory, then used dup
> to have a handle for use in further *at calls, then do
> fdopendir/readdir/closedir on the DIR*, then close the duplicate fd,
> all before calling unlinkat (aka rmdir) on that subdirectory.  But for
> the top-level directory, the dup'd fd (4) is still open when we
> attempt the unlinkat.

Thanks for the analysis, Eric.
That was due to a rather subtle but easy/safe-to-fix bug.

While the rm from coreutils-8.14 worked as your strace above shows, the
fixed one does this: (note how the close(4) now precedes the removal of "a")

  $ mkdir -p a/b
  $ strace -e openat,close,unlinkat ./rm -rf a
  close(3)                                = 0
  close(3)                                = 0
  openat(AT_FDCWD, "a", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
  close(3)                                = 0
  openat(4, "b", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
  close(3)                                = 0
  close(5)                                = 0
  unlinkat(4, "b", AT_REMOVEDIR)          = 0
  close(4)                                = 0
  unlinkat(AT_FDCWD, "a", AT_REMOVEDIR)   = 0
  close(0)                                = 0
  close(1)                                = 0
  close(2)                                = 0

Here is the patch that I expect to push tomorrow:

From a11c49cd72a91c05a272e36ff5d3cd92675cbfb5 Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering <at> redhat.com>
Date: Sun, 23 Oct 2011 22:42:25 +0200
Subject: [PATCH] fts: close parent dir FD before returning from
 post-traversal fts_read

The problem: the fts-using "rm -rf A/B/" would attempt to unlink A,
while a file descriptor open on A remained.  This is suboptimal
(holding a file descriptor open longer than needed) on Linux, but
otherwise not a problem.  However, on Cygwin with certain file system
types, (see http://cygwin.com/ml/cygwin/2011-10/msg00365.html), that
represents a real problem: it causes the removal of A to fail with
e.g., "rm: cannot remove `A': Device or resource busy"

fts visits each directory twice and keeps a cache (fts_fd_ring) of
directory file descriptors.  After completing the final, FTS_DP,
visit of a directory, RESTORE_INITIAL_CWD intended to clear the FD
cache, but then proceeded to add a new FD to it via the subsequent
FCHDIR (which calls cwd_advance_fd and i_ring_push).  Before, the
final file descriptor would be closed only via fts_close's call to
fd_ring_clear.  Now, it is usually closed earlier, via the final
FTS_DP-returning fts_read call.
* lib/fts.c (restore_initial_cwd): New function, converted from
the macro.  Call fd_ring_clear *after* FCHDIR, not before it.
Update callers.
Reported by Franz Sirl via the above URL, with analysis by Eric Blake
in http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/28739
---
 ChangeLog |   25 +++++++++++++++++++++++++
 lib/fts.c |   23 +++++++++++++++--------
 2 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 93ee45e..3a2d2cc 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,28 @@
+2011-10-23  Jim Meyering  <meyering <at> redhat.com>
+
+	fts: close parent dir FD before returning from post-traversal fts_read
+	The problem: the fts-using "rm -rf A/B/" would attempt to unlink A,
+	while a file descriptor open on A remained.  This is suboptimal
+	(holding a file descriptor open longer than needed) on Linux, but
+	otherwise not a problem.  However, on Cygwin with certain file system
+	types, (see http://cygwin.com/ml/cygwin/2011-10/msg00365.html), that
+	represents a real problem: it causes the removal of A to fail with
+	e.g., "rm: cannot remove `A': Device or resource busy"
+
+	fts visits each directory twice and keeps a cache (fts_fd_ring) of
+	directory file descriptors.  After completing the final, FTS_DP,
+	visit of a directory, RESTORE_INITIAL_CWD intended to clear the FD
+	cache, but then proceeded to add a new FD to it via the subsequent
+	FCHDIR (which calls cwd_advance_fd and i_ring_push).  Before, the
+	final file descriptor would be closed only via fts_close's call to
+	fd_ring_clear.  Now, it is usually closed earlier, via the final
+	FTS_DP-returning fts_read call.
+	* lib/fts.c (restore_initial_cwd): New function, converted from
+	the macro.  Call fd_ring_clear *after* FCHDIR, not before it.
+	Update callers.
+	Reported by Franz Sirl via the above URL, with analysis by Eric Blake
+	in http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/28739
+
 2011-10-23  Gary V. Vaughan  <gary <at> gnu.org>
 	    Bruno Haible  <bruno <at> clisp.org>
 	    Jim Meyering  <jim <at> meyering.net>
diff --git a/lib/fts.c b/lib/fts.c
index e3829f3..f61a91e 100644
--- a/lib/fts.c
+++ b/lib/fts.c
@@ -229,11 +229,6 @@ static int      fts_safe_changedir (FTS *, FTSENT *, int, const char *)
 #define ISSET(opt)      (sp->fts_options & (opt))
 #define SET(opt)        (sp->fts_options |= (opt))

-/* FIXME: make this a function */
-#define RESTORE_INITIAL_CWD(sp)                 \
-  (fd_ring_clear (&((sp)->fts_fd_ring)),        \
-   FCHDIR ((sp), (ISSET (FTS_CWDFD) ? AT_FDCWD : (sp)->fts_rfd)))
-
 /* FIXME: FTS_NOCHDIR is now misnamed.
    Call it FTS_USE_FULL_RELATIVE_FILE_NAMES instead. */
 #define FCHDIR(sp, fd)                                  \
@@ -349,6 +344,18 @@ cwd_advance_fd (FTS *sp, int fd, bool chdir_down_one)
   sp->fts_cwd_fd = fd;
 }

+/* Restore the initial, pre-traversal, "working directory".
+   In FTS_CWDFD mode, we merely call cwd_advance_fd, otherwise,
+   we may actually change the working directory.
+   Return 0 upon success. Upon failure, set errno and return nonzero.  */
+static int
+restore_initial_cwd (FTS *sp)
+{
+  int fail = FCHDIR (sp, ISSET (FTS_CWDFD) ? AT_FDCWD : sp->fts_rfd);
+  fd_ring_clear (&(sp->fts_fd_ring));
+  return fail;
+}
+
 /* Open the directory DIR if possible, and return a file
    descriptor.  Return -1 and set errno on failure.  It doesn't matter
    whether the file descriptor has read or write access.  */
@@ -948,7 +955,7 @@ next:   tmp = p;
                  * root.
                  */
                 if (p->fts_level == FTS_ROOTLEVEL) {
-                        if (RESTORE_INITIAL_CWD(sp)) {
+                        if (restore_initial_cwd(sp)) {
                                 SET(FTS_STOP);
                                 return (NULL);
                         }
@@ -1055,7 +1062,7 @@ cd_dot_dot:
          * one level, via "..".
          */
         if (p->fts_level == FTS_ROOTLEVEL) {
-                if (RESTORE_INITIAL_CWD(sp)) {
+                if (restore_initial_cwd(sp)) {
                         p->fts_errno = errno;
                         SET(FTS_STOP);
                 }
@@ -1579,7 +1586,7 @@ mem1:                           saved_errno = errno;
          */
         if (!continue_readdir && descend && (type == BCHILD || !nitems) &&
             (cur->fts_level == FTS_ROOTLEVEL
-             ? RESTORE_INITIAL_CWD(sp)
+             ? restore_initial_cwd(sp)
              : fts_safe_changedir(sp, cur->fts_parent, -1, ".."))) {
                 cur->fts_info = FTS_ERR;
                 SET(FTS_STOP);
--
1.7.7.419.g87009

[Message part 3 (message/rfc822, inline)]
From: Eric Blake <eblake <at> redhat.com>
To: bug-coreutils <bug-coreutils <at> gnu.org>, bug-gnulib <bug-gnulib <at> gnu.org>
Subject: rm -rf calls rmdir() prior to close(), which can fail
Date: Thu, 20 Oct 2011 11:38:36 -0600
POSIX is clear that attempts to rmdir() a directory that still has open 
descriptors may fail.  Of course, on Linux, this (rather limiting) 
restriction is not present, so we don't notice it; but on Cygwin, there 
are certain file systems where this is a real problem, such as in this 
thread:
http://cygwin.com/ml/cygwin/2011-10/msg00365.html

Looking at an strace on Linux reveals the problem (abbreviated to show 
highlights here):

$ mkdir -p a/b
$ strace rm -f a
...
openat(AT_FDCWD, "a", 
O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
...
fcntl(3, F_DUPFD, 3)                    = 4
...
close(3)                                = 0
...
openat(4, "b", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3
...
fcntl(3, F_DUPFD, 3)                    = 5
...
close(3)                                = 0
close(5)                                = 0
unlinkat(4, "b", AT_REMOVEDIR)          = 0
unlinkat(AT_FDCWD, "a", AT_REMOVEDIR)   = 0
close(4)                                = 0

Notice that for subdirectories, we opened the directory, then used dup 
to have a handle for use in further *at calls, then do 
fdopendir/readdir/closedir on the DIR*, then close the duplicate fd, all 
before calling unlinkat (aka rmdir) on that subdirectory.  But for the 
top-level directory, the dup'd fd (4) is still open when we attempt the 
unlinkat.

I'm still trying to investigate whether the fix needs to be in gnulib or 
just coreutils, but something needs to be done to swap the order so that 
the last handle to the directory is closed prior to the rmdir attempt.

-- 
Eric Blake   eblake <at> redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org



This bug report was last modified 13 years and 298 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.