Package: coreutils;
Reported by: ovasik <at> redhat.com
Date: Thu, 18 Sep 2014 10:54:02 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Pádraig Brady <P <at> draigBrady.com> To: Boris Ranto <branto <at> redhat.com> Cc: 18499 <at> debbugs.gnu.org, ovasik <at> redhat.com, Miklos Szeredi <miklos <at> szeredi.hu> Subject: bug#18499: Possible mv race for hardlinks (rhbz #1141368 ) Date: Fri, 14 Nov 2014 23:36:47 +0000
On 13/11/14 16:06, Boris Ranto wrote: > On Thu, 2014-11-13 at 15:41 +0000, Pádraig Brady wrote: >> On 13/11/14 13:58, Boris Ranto wrote: >>> On Wed, 2014-09-24 at 17:18 +0100, Pádraig Brady wrote: >>>> On 09/18/2014 11:52 AM, Ondrej Vasik wrote: >>>>> Hi, >>>>> as reported in https://bugzilla.redhat.com/show_bug.cgi?id=1141368 , >>>>> there is a possible race condition in mv in the case of hardlinks. >>>>> >>>>> Bug is reproducible even on ext4 filesystem (I guess it is filesystem >>>>> independent), even with the latest coreutils. There was already attempt >>>>> to fix the non-atomicity of mv for hardlinks >>>>> ( http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/12985 ), from >>>>> the current reproducer it looks like the fix is probably incomplete. >>>> >>>> The reason mv does the problematic unlink() is because it needs to simulate >>>> the rename, as rename() has this IMHO incorrect, though defined and documented behavior: >>>> >>>> "If oldpath and newpath are existing hard links referring to the same >>>> file, then rename() does nothing, and returns a success status." >>>> >>>> For arbitration of the rename between separate processes, >>>> the kernel needs to be involved. I noticed the very recent >>>> renameat2() system call added to Linux which supports flags. >>>> Miklos, do you think this is something that could be >>>> handled by renameat2() either implicitly or explicitly with a flag? >>>> >>>> thanks, >>>> Pádraig. >>>> >>>> >>>> >>> >>> Hi all, >>> >>> I've looked into this and I believe that the whole idea that 'mv a b' >>> where a and b are the same file (but different hard links) would unlink >>> one of the files is flawed -- I think we should return an error message >>> (something like we do for mv a a) and exit in this case. I'll add a >>> little more of my reasoning, here: >>> >>> As Pádraig stated in his comment (and as is stated in rename man page): >>> >>> "If oldpath and newpath are existing hard links referring to the same >>> file, then rename() does nothing, and returns a success status." >>> >>> This is probably due to this documented behaviour from rename man page: >>> >>> "rename() renames a file, moving it between directories if >>> required. Any other hard links to the file (as created using >>> link(2)) are unaffected." >>> >>> I.e. rename does not touch other hard links directly from its definition >>> (hence, it can't do anything if it gets the same hard linked file >>> twice). This actually means that (without a locking mechanism) there is >>> no way for us to do mv a b for hard linked files atomically -- the two >>> separate manual unlinks would race no matter what we do. >>> >>> The renameat2() function does not help here either (at least from what I >>> could find about it). It only allows us to do atomic switch of the files >>> which does not really help in this case. >> >> A flag could be added for this case though right? >> I.E. to handle the case of racing renames, >> one file would always be left in place. >> In other words to support this functionality we would need >> the kernel support from renameat(). >> > > Yes, technically it could. Maybe, it would be worth a try to propose it > for kernel. > >>> Hence, I think that the best thing for us to do here is to print the >>> "mv: 'a' and 'b' are the same hard linked file' and exit with an error >>> code if we detect this behaviour. At least if we want to preserve the >>> atomicity of the operation. >> >> We've three options that I see: >> >> 1. Simulate but be susceptible to data loss races (what we do now) >> 2. Ignore rename() issue and leave both files present (what FreeBSD does) >> 3. Fail with a "files are identical" error (what Solaris does) >> I see this was previously discussed in http://bugs.gnu.org/10686 and it was mentioned there that the 2008 POSIX standard explicitly states that any of the three approaches above is appropriate. > Personally, I like the Solaris way the best (if we can't get the system > call that would assure the atomicity in this case) -- i.e. let the user > know that we can't do that. It is also in tune with what the link man > page says for hard links: > > "This new name may be used exactly as the old one for any > operation;" > > As we error out on 'mv a a', we should probably also threat 'mv a b' > where a and b are the same file that way. > > As opposed to FreeBSD solution, the user will know that he needs to run > 'rm a' to accomplish what he actually wanted to do in the first place. > >> Now 1 is also an "issue" when the source and dest files are on separate file systems. >> I.E. mv will need to unlink() in that case. So from a high level, mv is foregoing the >> atomicity of the rename in this edge case to provide consistent behavior across >> all cases. The question boils down to whether atomicity in this edge case is required. >> Could you expand on a use case that requires this atomicity? >> > There actually is a real world use case. The original red hat bugzilla > (1141368) was filed because people were hitting this issue in a > distributed environment (via GlusterFS) -- i.e. two people were renaming > files at the same time from different locations. Following those bugs indicates it was an internal test case that was failing. I.E. it's an unlikely edge case. I'm in a bit of a quandary with this one. Seeing as this (consistent high level) behavior has been in use for the last 11 years: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=fc6073d6 without a "real" customer complaint, I'm tending to leave as is, but push for better behavior from the kernel. I.E. add a RENAME_REPLACE flag to renameat2(), which will do the rename atomically for hardlinks, so we could do something like: #if !HAVE_RENAME_REPLACE if (same_file (a, b)) unlink (src); /* as we do now. */ #endif rename (a, b); thanks, Pádraig.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.