[PATCH] (2.5.4) death of ->i_zombie

author Alexander Viro <viro@math.psu.edu>

Mon, 11 Feb 2002 12:43:33 +0000 (04:43 -0800)

committer Linus Torvalds <torvalds@home.transmeta.com>

Mon, 11 Feb 2002 12:43:33 +0000 (04:43 -0800)
author Alexander Viro <viro@math.psu.edu>
Mon, 11 Feb 2002 12:43:33 +0000 (04:43 -0800)
committer Linus Torvalds <torvalds@home.transmeta.com>
Mon, 11 Feb 2002 12:43:33 +0000 (04:43 -0800)
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking

index 3a5eec4d1be8f96051bd8470b92ea6f68af9151a..e37d003d6c3f024a24f86d56afb8228295fa36b8 100644 (file)
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -48,28 +48,30 @@ prototypes:
  
  locking rules:
         all may block
-               BKL     i_sem(inode)    i_zombie(inode)
-lookup:                yes     yes             no
-create:                yes     yes             yes
-link:          yes     yes             yes
-mknod:         yes     yes             yes
-mkdir:         yes     yes             yes
-unlink:                yes     yes             yes
-rmdir:         yes     yes             yes             (see below)
-rename:                yes     yes (both)      yes (both)      (see below)
-readlink:      no      no              no
-follow_link:   no      no              no
-truncate:      yes     yes             no              (see below)
-setattr:       yes     if ATTR_SIZE    no
-permssion:     yes     no              no
-getattr:                                               (see below)
-revalidate:    no                                      (see below)
-       Additionally, ->rmdir() has i_zombie on victim and so does ->rename()
-in case when target exists and is a directory.
-       ->rename() on directories has (per-superblock) ->s_vfs_rename_sem.
+               BKL     i_sem(inode)
+lookup:                yes     yes
+create:                yes     yes
+link:          yes     yes
+mknod:         yes     yes
+mkdir:         yes     yes
+unlink:                yes     yes (both)
+rmdir:         yes     yes (both)      (see below)
+rename:                yes     yes (all)       (see below)
+readlink:      no      no
+follow_link:   no      no
+truncate:      yes     yes             (see below)
+setattr:       yes     if ATTR_SIZE
+permssion:     yes     no
+getattr:                               (see below)
+revalidate:    no                      (see below)
+setxattr:      DOCUMENT_ME
+getxattr:      DOCUMENT_ME
+removexattr:   DOCUMENT_ME
+       Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_sem on
+victim.
+       cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
         ->revalidate(), it may be called both with and without the i_sem
-on dentry->d_inode. VFS never calls it with i_zombie on dentry->d_inode,
-but watch for other methods directly calling this one...
+on dentry->d_inode.
         ->truncate() is never called directly - it's a callback, not a
  method. It's called by vmtruncate() - library function normally used by
  ->setattr(). Locking information above applies to that call (i.e. is
@@ -77,6 +79,9 @@ inherited from ->setattr() - vmtruncate() is used when ATTR_SIZE had been
  passed).
         ->getattr() is currently unused.
  
+See Documentation/filesystems/directory-locking for more detailed discussion
+of the locking scheme for directory operations.
+
  --------------------------- super_operations ---------------------------
  prototypes:
         void (*read_inode) (struct inode *);
diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking

new file mode 100644 (file)

index 0000000..207a1c5
--- /dev/null
+++ b/Documentation/filesystems/directory-locking
@@ -0,0 +1,99 @@
+       Locking scheme used for directory operations is based on two
+kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem).
+
+       For our purposes all operations fall in 5 classes:
+
+1) read access.  Locking rules: caller locks directory we are accessing.
+
+2) object creation.  Locking rules: same as above.
+
+3) object removal.  Locking rules: caller locks parent, finds victim,
+locks victim and calls the method.
+
+4) rename() that is _not_ cross-directory.  Locking rules: caller locks
+the parent, finds source and target, if target already exists - locks it
+and then calls the method.
+
+5) cross-directory rename.  The trickiest in the whole bunch.  Locking
+rules:
+       * lock the filesystem
+       * lock parents in "ancestors first" order.
+       * find source and target.
+       * if old parent is equal to or is a descendent of target
+               fail with -ENOTEMPTY
+       * if new parent is equal to or is a descendent of source
+               fail with -ELOOP
+       * if target exists - lock it.
+       * call the method.
+
+
+The rules above obviously guarantee that all directories that are going to be
+read, modified or removed by method will be locked by caller.
+
+
+If no directory is its own ancestor, the scheme above is deadlock-free.
+Proof:
+
+       First of all, at any moment we have a partial ordering of the
+objects - A < B iff A is an ancestor of B.
+
+       That ordering can change.  However, the following is true:
+
+(1) if operation different from cross-directory rename holds lock on A and
+    attempts to acquire lock on B, A will remain the parent of B until we
+    acquire the lock on B.  (Proof: only cross-directory rename can change
+    the parent of object and it would have to lock the parent).
+
+(2) if cross-directory rename holds the lock on filesystem, order will not
+    change until rename acquires all locks.  (Proof: other cross-directory
+    renames will be blocked on filesystem lock and we don't start changing
+    the order until we had acquired all locks).
+
+       Now consider the minimal deadlock.  Each process is blocked on
+attempt to acquire some lock and already holds at least one lock.  Let's
+consider the set of contended locks.  First of all, filesystem lock is
+not contended, since any process blocked on it is not holding any locks.
+Thus all processes are blocked on ->i_sem.
+
+       Any contended object is either held by cross-directory rename or
+has a child that is also contended.  Indeed, suppose that it is held by
+operation other than cross-directory rename.  Then the lock this operation
+is blocked on belongs to child of that object due to (1).
+
+       It means that one of the operations is cross-directory rename.
+Otherwise the set of contended objects would be infinite - each of them
+would have a contended child and we had assumed that no object is its
+own descendent.  Moreover, there is exactly one cross-directory rename
+(see above).
+
+       Consider the object blocking the cross-directory rename.  One of
+its descendents is locked by cross-directory rename (otherwise we would again
+have an infinite set of of contended objects).  But that means that means
+that cross-directory rename is taking locks out of order.  Due to (2) the
+order hadn't changed since we had acquired filesystem lock.  But locking
+rules for cross-directory rename guarantee that we do not try to acquire
+lock on descendent before the lock on ancestor.  Contradiction.  I.e.
+deadlock is impossible.  Q.E.D.
+
+
+       These operations are guaranteed to avoid loop creation.  Indeed,
+the only operation that could introduce loops is cross-directory rename.
+Since the only new (parent, child) pair added by rename() is (new parent,
+source), such loop would have to contain these objects and the rest of it
+would have to exist before rename().  I.e. at the moment of loop creation
+rename() responsible for that would be holding filesystem lock and new parent
+would have to be equal to or a descendent of source.  But that means that
+new parent had been equal to or a descendent of source since the moment when
+we had acquired filesystem lock and rename() would fail with -ELOOP in that
+case.
+
+       While this locking scheme works for arbitrary DAGs, it relies on
+ability to check that directory is a descendent of another object.  Current
+implementation assumes that directory graph is a tree.  This assumption is
+also preserved by all operations (cross-directory rename on a tree that would
+not introduce a cycle will leave it a tree and link() fails for directories).
+
+       Notice that "directory" in the above == "anything that might have
+children", so if we are going to introduce hybrid objects we will need
+either to make sure that link(2) doesn't work for them or to make changes
+in is_subdir() that would make it work even in presense of such beasts.
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c

index 7f371ea860663987876a928c02760be67fd0fcbb..888610506d61e9ec114cfe25329473e8e34ba799 100644 (file)
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -472,11 +472,9 @@ static ssize_t bm_entry_write(struct file *file, const char *buffer,
                         break;
                 case 3: root = dget(file->f_vfsmnt->mnt_sb->s_root);
                         down(&root->d_inode->i_sem);
-                       down(&root->d_inode->i_zombie);
  
                         kill_node(e);
  
-                       up(&root->d_inode->i_zombie);
                         up(&root->d_inode->i_sem);
                         dput(root);
                         break;
@@ -516,8 +514,6 @@ static ssize_t bm_register_write(struct file *file, const char *buffer,
         if (IS_ERR(dentry))
                 goto out;
  
-       down(&root->d_inode->i_zombie);
-
         err = -EEXIST;
         if (dentry->d_inode)
                 goto out2;
@@ -556,7 +552,6 @@ static ssize_t bm_register_write(struct file *file, const char *buffer,
         mntput(mnt);
         err = 0;
  out2:
-       up(&root->d_inode->i_zombie);
         dput(dentry);
  out:
         up(&root->d_inode->i_sem);
@@ -605,12 +600,10 @@ static ssize_t bm_status_write(struct file * file, const char * buffer,
                 case 2: enabled = 1; break;
                 case 3: root = dget(file->f_vfsmnt->mnt_sb->s_root);
                         down(&root->d_inode->i_sem);
-                       down(&root->d_inode->i_zombie);
  
                         while (!list_empty(&entries))
                                 kill_node(list_entry(entries.next, Node, list));
  
-                       up(&root->d_inode->i_zombie);
                         up(&root->d_inode->i_sem);
                         dput(root);
                 default: return res;
diff --git a/fs/inode.c b/fs/inode.c

index c23323fbae70af34fef8788bcebff6a965d7fd6a..4b7a35f9cd4ffdd0458c3b46be5e7538fb738556 100644 (file)
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -143,7 +143,6 @@ void inode_init_once(struct inode *inode)
         INIT_LIST_HEAD(&inode->i_dirty_data_buffers);
         INIT_LIST_HEAD(&inode->i_devices);
         sema_init(&inode->i_sem, 1);
-       sema_init(&inode->i_zombie, 1);
         spin_lock_init(&inode->i_data.i_shared_lock);
  }
  
diff --git a/fs/namei.c b/fs/namei.c

index a629546c6e6ddebc219793fbd7f24f37f6913d6a..13fec2512bedc125160db5b39ec6f0346ea1b01e 100644 (file)
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -93,6 +93,11 @@
   * hopefully we will be able to get rid of that wart in 2.5. So far only
   * XEmacs seems to be relying on it...
   */
+/*
+ * [Sep 2001 AV] Single-semaphore locking scheme (kudos to David Holland)
+ * implemented.  Let's see if raised priority of ->s_vfs_rename_sem gives
+ * any extra contention...
+ */
  
  /* In order to reduce some races, while at the same time doing additional
   * checking and hopefully speeding things up, we copy filenames to the
@@ -931,28 +936,67 @@ static inline int lookup_flags(unsigned int f)
         return retval;
  }
  
-int vfs_create(struct inode *dir, struct dentry *dentry, int mode)
+/*
+ * p1 and p2 should be directories on the same fs.
+ */
+struct dentry *lock_rename(struct dentry *p1, struct dentry *p2)
  {
-       int error;
+       struct dentry *p;
  
-       mode &= S_IALLUGO;
-       mode |= S_IFREG;
+       if (p1 == p2) {
+               down(&p1->d_inode->i_sem);
+               return NULL;
+       }
+
+       down(&p1->d_inode->i_sb->s_vfs_rename_sem);
+
+       for (p = p1; p->d_parent != p; p = p->d_parent) {
+               if (p->d_parent == p2) {
+                       down(&p2->d_inode->i_sem);
+                       down(&p1->d_inode->i_sem);
+                       return p;
+               }
+       }
+
+       for (p = p2; p->d_parent != p; p = p->d_parent) {
+               if (p->d_parent == p1) {
+                       down(&p1->d_inode->i_sem);
+                       down(&p2->d_inode->i_sem);
+                       return p;
+               }
+       }
+
+       down(&p1->d_inode->i_sem);
+       down(&p2->d_inode->i_sem);
+       return NULL;
+}
+
+void unlock_rename(struct dentry *p1, struct dentry *p2)
+{
+       up(&p1->d_inode->i_sem);
+       if (p1 != p2) {
+               up(&p2->d_inode->i_sem);
+               up(&p1->d_inode->i_sb->s_vfs_rename_sem);
+       }
+}
+
+int vfs_create(struct inode *dir, struct dentry *dentry, int mode)
+{
+       int error = may_create(dir, dentry);
  
-       down(&dir->i_zombie);
-       error = may_create(dir, dentry);
         if (error)
-               goto exit_lock;
+               return error;
  
-       error = -EACCES;        /* shouldn't it be ENOSYS? */
         if (!dir->i_op || !dir->i_op->create)
-               goto exit_lock;
+               return -EACCES; /* shouldn't it be ENOSYS? */
  
         DQUOT_INIT(dir);
+
+       mode &= S_IALLUGO;
+       mode |= S_IFREG;
         lock_kernel();
         error = dir->i_op->create(dir, dentry, mode);
         unlock_kernel();
-exit_lock:
-       up(&dir->i_zombie);
         if (!error)
                 inode_dir_notify(dir, DN_CREATE);
         return error;
@@ -1212,26 +1256,21 @@ fail:
  
  int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
  {
-       int error = -EPERM;
+       int error = may_create(dir, dentry);
  
-       down(&dir->i_zombie);
-       if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
-               goto exit_lock;
-
-       error = may_create(dir, dentry);
         if (error)
-               goto exit_lock;
+               return error;
+
+       if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
+               return -EPERM;
  
-       error = -EPERM;
         if (!dir->i_op || !dir->i_op->mknod)
-               goto exit_lock;
+               return -EPERM;
  
         DQUOT_INIT(dir);
         lock_kernel();
         error = dir->i_op->mknod(dir, dentry, mode, dev);
         unlock_kernel();
-exit_lock:
-       up(&dir->i_zombie);
         if (!error)
                 inode_dir_notify(dir, DN_CREATE);
         return error;
@@ -1284,25 +1323,19 @@ out:
  
  int vfs_mkdir(struct inode *dir, struct dentry *dentry, int mode)
  {
-       int error;
+       int error = may_create(dir, dentry);
  
-       down(&dir->i_zombie);
-       error = may_create(dir, dentry);
         if (error)
-               goto exit_lock;
+               return error;
  
-       error = -EPERM;
         if (!dir->i_op || !dir->i_op->mkdir)
-               goto exit_lock;
+               return -EPERM;
  
         DQUOT_INIT(dir);
         mode &= (S_IRWXUGO|S_ISVTX);
         lock_kernel();
         error = dir->i_op->mkdir(dir, dentry, mode);
         unlock_kernel();
-
-exit_lock:
-       up(&dir->i_zombie);
         if (!error)
                 inode_dir_notify(dir, DN_CREATE);
         return error;
@@ -1369,9 +1402,8 @@ static void d_unhash(struct dentry *dentry)
  
  int vfs_rmdir(struct inode *dir, struct dentry *dentry)
  {
-       int error;
+       int error = may_delete(dir, dentry, 1);
  
-       error = may_delete(dir, dentry, 1);
         if (error)
                 return error;
  
@@ -1380,7 +1412,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
  
         DQUOT_INIT(dir);
  
-       double_down(&dir->i_zombie, &dentry->d_inode->i_zombie);
+       down(&dentry->d_inode->i_sem);
         d_unhash(dentry);
         if (IS_DEADDIR(dir))
                 error = -ENOENT;
@@ -1393,7 +1425,7 @@ int vfs_rmdir(struct inode *dir, struct dentry *dentry)
                 if (!error)
                         dentry->d_inode->i_flags |= S_DEAD;
         }
-       double_up(&dir->i_zombie, &dentry->d_inode->i_zombie);
+       up(&dentry->d_inode->i_sem);
         if (!error) {
                 inode_dir_notify(dir, DN_DELETE);
                 d_delete(dentry);
@@ -1447,28 +1479,33 @@ exit:
  
  int vfs_unlink(struct inode *dir, struct dentry *dentry)
  {
-       int error;
+       int error = may_delete(dir, dentry, 0);
  
-       down(&dir->i_zombie);
-       error = may_delete(dir, dentry, 0);
-       if (!error) {
-               error = -EPERM;
-               if (dir->i_op && dir->i_op->unlink) {
-                       DQUOT_INIT(dir);
-                       if (d_mountpoint(dentry))
-                               error = -EBUSY;
-                       else {
-                               lock_kernel();
-                               error = dir->i_op->unlink(dir, dentry);
-                               unlock_kernel();
-                               if (!error)
-                                       d_delete(dentry);
-                       }
-               }
+       if (error)
+               return error;
+
+       if (!dir->i_op || !dir->i_op->unlink)
+               return -EPERM;
+
+       DQUOT_INIT(dir);
+
+       dget(dentry);
+       down(&dentry->d_inode->i_sem);
+       if (d_mountpoint(dentry))
+               error = -EBUSY;
+       else {
+               lock_kernel();
+               error = dir->i_op->unlink(dir, dentry);
+               unlock_kernel();
+               if (!error)
+                       d_delete(dentry);
         }
-       up(&dir->i_zombie);
+       up(&dentry->d_inode->i_sem);
+       dput(dentry);
+
         if (!error)
                 inode_dir_notify(dir, DN_DELETE);
+
         return error;
  }
  
@@ -1517,24 +1554,18 @@ slashes:
  
  int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
  {
-       int error;
+       int error = may_create(dir, dentry);
  
-       down(&dir->i_zombie);
-       error = may_create(dir, dentry);
         if (error)
-               goto exit_lock;
+               return error;
  
-       error = -EPERM;
         if (!dir->i_op || !dir->i_op->symlink)
-               goto exit_lock;
+               return -EPERM;
  
         DQUOT_INIT(dir);
         lock_kernel();
         error = dir->i_op->symlink(dir, dentry, oldname);
         unlock_kernel();
-
-exit_lock:
-       up(&dir->i_zombie);
         if (!error)
                 inode_dir_notify(dir, DN_CREATE);
         return error;
@@ -1576,39 +1607,31 @@ out:
  
  int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry)
  {
-       struct inode *inode;
+       struct inode *inode = old_dentry->d_inode;
         int error;
  
-       down(&dir->i_zombie);
-       error = -ENOENT;
-       inode = old_dentry->d_inode;
         if (!inode)
-               goto exit_lock;
+               return -ENOENT;
  
         error = may_create(dir, new_dentry);
         if (error)
-               goto exit_lock;
+               return error;
  
-       error = -EXDEV;
         if (dir->i_sb != inode->i_sb)
-               goto exit_lock;
+               return -EXDEV;
  
         /*
          * A link to an append-only or immutable file cannot be created.
          */
-       error = -EPERM;
         if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
-               goto exit_lock;
+               return -EPERM;
         if (!dir->i_op || !dir->i_op->link)
-               goto exit_lock;
+               return -EPERM;
  
         DQUOT_INIT(dir);
         lock_kernel();
         error = dir->i_op->link(old_dentry, dir, new_dentry);
         unlock_kernel();
-
-exit_lock:
-       up(&dir->i_zombie);
         if (!error)
                 inode_dir_notify(dir, DN_CREATE);
         return error;
@@ -1680,17 +1703,23 @@ exit:
   *        story.
   *     c) we have to lock _three_ objects - parents and victim (if it exists).
   *        And that - after we got ->i_sem on parents (until then we don't know
- *        whether the target exists at all, let alone whether it is a directory
- *        or not). Solution: ->i_zombie. Taken only after ->i_sem. Always taken
- *        on link creation/removal of any kind. And taken (without ->i_sem) on
- *        directory that will be removed (both in rmdir() and here).
+ *        whether the target exists).  Solution: try to be smart with locking
+ *        order for inodes.  We rely on the fact that tree topology may change
+ *        only under ->s_vfs_rename_sem _and_ that parent of the object we
+ *        move will be locked.  Thus we can rank directories by the tree
+ *        (ancestors first) and rank all non-directories after them.
+ *        That works since everybody except rename does "lock parent, lookup,
+ *        lock child" and rename is under ->s_vfs_rename_sem.
+ *        HOWEVER, it relies on the assumption that any object with ->lookup()
+ *        has no more than 1 dentry.  If "hybrid" objects will ever appear,
+ *        we'd better make sure that there's no link(2) for them.
   *     d) some filesystems don't support opened-but-unlinked directories,
   *        either because of layout or because they are not ready to deal with
   *        all cases correctly. The latter will be fixed (taking this sort of
   *        stuff into VFS), but the former is not going away. Solution: the same
   *        trick as in rmdir().
   *     e) conversion from fhandle to dentry may come in the wrong moment - when
- *        we are removing the target. Solution: we will have to grab ->i_zombie
+ *        we are removing the target. Solution: we will have to grab ->i_sem
   *        in the fhandle_to_dentry code. [FIXME - current nfsfh.c relies on
   *        ->i_sem on parents, which works but leads to some truely excessive
   *        locking].
@@ -1698,131 +1727,96 @@ exit:
  int vfs_rename_dir(struct inode *old_dir, struct dentry *old_dentry,
                struct inode *new_dir, struct dentry *new_dentry)
  {
-       int error;
+       int error = 0;
         struct inode *target;
  
-       if (old_dentry->d_inode == new_dentry->d_inode)
-               return 0;
-
-       error = may_delete(old_dir, old_dentry, 1);
-       if (error)
-               return error;
-
-       if (new_dir->i_sb != old_dir->i_sb)
-               return -EXDEV;
-
-       if (!new_dentry->d_inode)
-               error = may_create(new_dir, new_dentry);
-       else
-               error = may_delete(new_dir, new_dentry, 1);
-       if (error)
-               return error;
-
-       if (!old_dir->i_op || !old_dir->i_op->rename)
-               return -EPERM;
-
         /*
          * If we are going to change the parent - check write permissions,
          * we'll need to flip '..'.
          */
-       if (new_dir != old_dir) {
+       if (new_dir != old_dir)
                 error = permission(old_dentry->d_inode, MAY_WRITE);
-       }
+
         if (error)
                 return error;
  
-       DQUOT_INIT(old_dir);
-       DQUOT_INIT(new_dir);
-       down(&old_dir->i_sb->s_vfs_rename_sem);
-       error = -EINVAL;
-       if (is_subdir(new_dentry, old_dentry))
-               goto out_unlock;
-       /* Don't eat your daddy, dear... */
-       /* This also avoids locking issues */
-       if (old_dentry->d_parent == new_dentry)
-               goto out_unlock;
         target = new_dentry->d_inode;
-       if (target) { /* Hastur! Hastur! Hastur! */
-               triple_down(&old_dir->i_zombie,
-                           &new_dir->i_zombie,
-                           &target->i_zombie);
+       if (target) {
+               down(&target->i_sem);
                 d_unhash(new_dentry);
-       } else
-               double_down(&old_dir->i_zombie,
-                           &new_dir->i_zombie);
-       if (IS_DEADDIR(old_dir)||IS_DEADDIR(new_dir))
-               error = -ENOENT;
-       else if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
+       }
+       if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
                 error = -EBUSY;
         else 
                 error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
         if (target) {
                 if (!error)
                         target->i_flags |= S_DEAD;
-               triple_up(&old_dir->i_zombie,
-                         &new_dir->i_zombie,
-                         &target->i_zombie);
+               up(&target->i_sem);
                 if (d_unhashed(new_dentry))
                         d_rehash(new_dentry);
                 dput(new_dentry);
-       } else
-               double_up(&old_dir->i_zombie,
-                         &new_dir->i_zombie);
-               
+       }
         if (!error)
                 d_move(old_dentry,new_dentry);
-out_unlock:
-       up(&old_dir->i_sb->s_vfs_rename_sem);
         return error;
  }
  
  int vfs_rename_other(struct inode *old_dir, struct dentry *old_dentry,
                struct inode *new_dir, struct dentry *new_dentry)
  {
+       struct inode *target;
         int error;
  
-       if (old_dentry->d_inode == new_dentry->d_inode)
-               return 0;
+       dget(new_dentry);
+       target = new_dentry->d_inode;
+       if (target)
+               down(&target->i_sem);
+       if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
+               error = -EBUSY;
+       else
+               error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
+       if (!error) {
+               /* The following d_move() should become unconditional */
+               if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) {
+                       d_move(old_dentry, new_dentry);
+               }
+       }
+       if (target)
+               up(&target->i_sem);
+       dput(new_dentry);
+       return error;
+}
  
-       error = may_delete(old_dir, old_dentry, 0);
+int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+              struct inode *new_dir, struct dentry *new_dentry)
+{
+       int error;
+       int is_dir = S_ISDIR(old_dentry->d_inode->i_mode);
+
+       if (old_dentry->d_inode == new_dentry->d_inode)
+               return 0;
+ 
+       error = may_delete(old_dir, old_dentry, is_dir);
         if (error)
                 return error;
  
-       if (new_dir->i_sb != old_dir->i_sb)
-               return -EXDEV;
-
         if (!new_dentry->d_inode)
                 error = may_create(new_dir, new_dentry);
         else
-               error = may_delete(new_dir, new_dentry, 0);
+               error = may_delete(new_dir, new_dentry, is_dir);
         if (error)
                 return error;
  
         if (!old_dir->i_op || !old_dir->i_op->rename)
                 return -EPERM;
  
+       if (IS_DEADDIR(old_dir)||IS_DEADDIR(new_dir))
+               return -ENOENT;
         DQUOT_INIT(old_dir);
         DQUOT_INIT(new_dir);
-       double_down(&old_dir->i_zombie, &new_dir->i_zombie);
-       if (d_mountpoint(old_dentry)||d_mountpoint(new_dentry))
-               error = -EBUSY;
-       else
-               error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
-       double_up(&old_dir->i_zombie, &new_dir->i_zombie);
-       if (error)
-               return error;
-       /* The following d_move() should become unconditional */
-       if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) {
-               d_move(old_dentry, new_dentry);
-       }
-       return 0;
-}
  
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
-              struct inode *new_dir, struct dentry *new_dentry)
-{
-       int error;
-       if (S_ISDIR(old_dentry->d_inode->i_mode))
+       if (is_dir)
                 error = vfs_rename_dir(old_dir,old_dentry,new_dir,new_dentry);
         else
                 error = vfs_rename_other(old_dir,old_dentry,new_dir,new_dentry);
@@ -1842,6 +1836,7 @@ static inline int do_rename(const char * oldname, const char * newname)
         int error = 0;
         struct dentry * old_dir, * new_dir;
         struct dentry * old_dentry, *new_dentry;
+       struct dentry * trap;
         struct nameidata oldnd, newnd;
  
         if (path_init(oldname, LOOKUP_PARENT, &oldnd))
@@ -1868,7 +1863,7 @@ static inline int do_rename(const char * oldname, const char * newname)
         if (newnd.last_type != LAST_NORM)
                 goto exit2;
  
-       double_lock(new_dir, old_dir);
+       trap = lock_rename(new_dir, old_dir);
  
         old_dentry = lookup_hash(&oldnd.last, old_dir);
         error = PTR_ERR(old_dentry);
@@ -1886,21 +1881,30 @@ static inline int do_rename(const char * oldname, const char * newname)
                 if (newnd.last.name[newnd.last.len])
                         goto exit4;
         }
+       /* source should not be ancestor of target */
+       error = -EINVAL;
+       if (old_dentry == trap)
+               goto exit4;
         new_dentry = lookup_hash(&newnd.last, new_dir);
         error = PTR_ERR(new_dentry);
         if (IS_ERR(new_dentry))
                 goto exit4;
+       /* target should not be an ancestor of source */
+       error = -ENOTEMPTY;
+       if (new_dentry == trap)
+               goto exit5;
  
         lock_kernel();
         error = vfs_rename(old_dir->d_inode, old_dentry,
                                    new_dir->d_inode, new_dentry);
         unlock_kernel();
  
+exit5:
         dput(new_dentry);
  exit4:
         dput(old_dentry);
  exit3:
-       double_up(&new_dir->d_inode->i_sem, &old_dir->d_inode->i_sem);
+       unlock_rename(new_dir, old_dir);
  exit2:
         path_release(&newnd);
  exit1:
diff --git a/fs/namespace.c b/fs/namespace.c

index 1419b179c74ce364ec512fe70848ccbe5817667b..982a0cfd1c70bd9dec945ec7bfc26a78879455d8 100644 (file)
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -466,7 +466,7 @@ static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
                 return -ENOTDIR;
  
         err = -ENOENT;
-       down(&nd->dentry->d_inode->i_zombie);
+       down(&nd->dentry->d_inode->i_sem);
         if (IS_DEADDIR(nd->dentry->d_inode))
                 goto out_unlock;
  
@@ -481,7 +481,7 @@ static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
         }
         spin_unlock(&dcache_lock);
  out_unlock:
-       up(&nd->dentry->d_inode->i_zombie);
+       up(&nd->dentry->d_inode->i_sem);
         return err;
  }
  
@@ -577,7 +577,7 @@ static int do_move_mount(struct nameidata *nd, char *old_name)
                 goto out;
  
         err = -ENOENT;
-       down(&nd->dentry->d_inode->i_zombie);
+       down(&nd->dentry->d_inode->i_sem);
         if (IS_DEADDIR(nd->dentry->d_inode))
                 goto out1;
  
@@ -607,7 +607,7 @@ static int do_move_mount(struct nameidata *nd, char *old_name)
  out2:
         spin_unlock(&dcache_lock);
  out1:
-       up(&nd->dentry->d_inode->i_zombie);
+       up(&nd->dentry->d_inode->i_sem);
  out:
         up_write(&current->namespace->sem);
         if (!err)
@@ -949,7 +949,7 @@ asmlinkage long sys_pivot_root(const char *new_root, const char *put_old)
         user_nd.dentry = dget(current->fs->root);
         read_unlock(&current->fs->lock);
         down_write(&current->namespace->sem);
-       down(&old_nd.dentry->d_inode->i_zombie);
+       down(&old_nd.dentry->d_inode->i_sem);
         error = -EINVAL;
         if (!check_mnt(user_nd.mnt))
                 goto out2;
@@ -992,7 +992,7 @@ asmlinkage long sys_pivot_root(const char *new_root, const char *put_old)
         path_release(&root_parent);
         path_release(&parent_nd);
  out2:
-       up(&old_nd.dentry->d_inode->i_zombie);
+       up(&old_nd.dentry->d_inode->i_sem);
         up_write(&current->namespace->sem);
         path_release(&user_nd);
         path_release(&old_nd);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c

index 55a29adf3e0ec3008656ce7be5876053470d0eae..135395052535fbf10676ac5038213d3b59fda170 100644 (file)
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1226,7 +1226,7 @@ int
  nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
                             struct svc_fh *tfhp, char *tname, int tlen)
  {
-       struct dentry   *fdentry, *tdentry, *odentry, *ndentry;
+       struct dentry   *fdentry, *tdentry, *odentry, *ndentry, *trap;
         struct inode    *fdir, *tdir;
         int             err;
  
@@ -1253,7 +1253,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
  
         /* cannot use fh_lock as we need deadlock protective ordering
          * so do it by hand */
-       double_down(&tdir->i_sem, &fdir->i_sem);
+       trap = lock_rename(tdentry, fdentry);
         ffhp->fh_locked = tfhp->fh_locked = 1;
         fill_pre_wcc(ffhp);
         fill_pre_wcc(tfhp);
@@ -1266,12 +1266,17 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
         err = -ENOENT;
         if (!odentry->d_inode)
                 goto out_dput_old;
+       err = -EINVAL;
+       if (odentry == trap)
+               goto out_dput_old;
  
         ndentry = lookup_one_len(tname, tdentry, tlen);
         err = PTR_ERR(ndentry);
         if (IS_ERR(ndentry))
                 goto out_dput_old;
-
+       err = -ENOTEMPTY;
+       if (ndentry == trap)
+               goto out_dput_new;
  
  #ifdef MSNFS
         if ((ffhp->fh_export->ex_flags & NFSEXP_MSNFS) &&
@@ -1287,6 +1292,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
         }
         dput(ndentry);
  
+ out_dput_new:
+       dput(ndentry);
   out_dput_old:
         dput(odentry);
   out_nfserr:
@@ -1299,9 +1306,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
          */
         fill_post_wcc(ffhp);
         fill_post_wcc(tfhp);
-       double_up(&tdir->i_sem, &fdir->i_sem);
+       unlock_rename(tdentry, fdentry);
         ffhp->fh_locked = tfhp->fh_locked = 0;
-       
+
  out:
         return err;
  }
diff --git a/fs/readdir.c b/fs/readdir.c

index 083165f37d855f99657f45dc806a6e49fa55e30e..8a09857a4c04d0e8c59aa741825a0c7ce63b949c 100644 (file)
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -21,14 +21,12 @@ int vfs_readdir(struct file *file, filldir_t filler, void *buf)
         if (!file->f_op || !file->f_op->readdir)
                 goto out;
         down(&inode->i_sem);
-       down(&inode->i_zombie);
         res = -ENOENT;
         if (!IS_DEADDIR(inode)) {
                 lock_kernel();
                 res = file->f_op->readdir(file, buf, filler);
                 unlock_kernel();
         }
-       up(&inode->i_zombie);
         up(&inode->i_sem);
  out:
         return res;
diff --git a/include/linux/fs.h b/include/linux/fs.h

index cc874fa9748b3dc44c4e366d2b62978f24119a79..ff3057143aae7ec28442806537d83f522005abe2 100644 (file)
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -425,7 +425,6 @@ struct inode {
         unsigned long           i_blocks;
         unsigned long           i_version;
         struct semaphore        i_sem;
-       struct semaphore        i_zombie;
         struct inode_operations *i_op;
         struct file_operations  *i_fop; /* former ->i_op->default_file_ops */
         struct super_block      *i_sb;
@@ -759,6 +758,9 @@ extern int vfs_rmdir(struct inode *, struct dentry *);
  extern int vfs_unlink(struct inode *, struct dentry *);
  extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *);
  
+extern struct dentry *lock_rename(struct dentry *, struct dentry *);
+extern void unlock_rename(struct dentry *, struct dentry *);
+
  /*
   * File types
   */
@@ -1505,131 +1507,6 @@ extern int generic_osync_inode(struct inode *, int);
  extern int inode_change_ok(struct inode *, struct iattr *);
  extern int inode_setattr(struct inode *, struct iattr *);
  
-/*
- * Common dentry functions for inclusion in the VFS
- * or in other stackable file systems.  Some of these
- * functions were in linux/fs/ C (VFS) files.
- *
- */
-
-/*
- * Locking the parent is needed to:
- *  - serialize directory operations
- *  - make sure the parent doesn't change from
- *    under us in the middle of an operation.
- *
- * NOTE! Right now we'd rather use a "struct inode"
- * for this, but as I expect things to move toward
- * using dentries instead for most things it is
- * probably better to start with the conceptually
- * better interface of relying on a path of dentries.
- */
-static inline struct dentry *lock_parent(struct dentry *dentry)
-{
-       struct dentry *dir = dget(dentry->d_parent);
-
-       down(&dir->d_inode->i_sem);
-       return dir;
-}
-
-static inline struct dentry *get_parent(struct dentry *dentry)
-{
-       return dget(dentry->d_parent);
-}
-
-static inline void unlock_dir(struct dentry *dir)
-{
-       up(&dir->d_inode->i_sem);
-       dput(dir);
-}
-
-/*
- * Whee.. Deadlock country. Happily there are only two VFS
- * operations that does this..
- */
-static inline void double_down(struct semaphore *s1, struct semaphore *s2)
-{
-       if (s1 != s2) {
-               if ((unsigned long) s1 < (unsigned long) s2) {
-                       struct semaphore *tmp = s2;
-                       s2 = s1; s1 = tmp;
-               }
-               down(s1);
-       }
-       down(s2);
-}
-
-/*
- * Ewwwwwwww... _triple_ lock. We are guaranteed that the 3rd argument is
- * not equal to 1st and not equal to 2nd - the first case (target is parent of
- * source) would be already caught, the second is plain impossible (target is
- * its own parent and that case would be caught even earlier). Very messy.
- * I _think_ that it works, but no warranties - please, look it through.
- * Pox on bloody lusers who mandated overwriting rename() for directories...
- */
-
-static inline void triple_down(struct semaphore *s1,
-                              struct semaphore *s2,
-                              struct semaphore *s3)
-{
-       if (s1 != s2) {
-               if ((unsigned long) s1 < (unsigned long) s2) {
-                       if ((unsigned long) s1 < (unsigned long) s3) {
-                               struct semaphore *tmp = s3;
-                               s3 = s1; s1 = tmp;
-                       }
-                       if ((unsigned long) s1 < (unsigned long) s2) {
-                               struct semaphore *tmp = s2;
-                               s2 = s1; s1 = tmp;
-                       }
-               } else {
-                       if ((unsigned long) s1 < (unsigned long) s3) {
-                               struct semaphore *tmp = s3;
-                               s3 = s1; s1 = tmp;
-                       }
-                       if ((unsigned long) s2 < (unsigned long) s3) {
-                               struct semaphore *tmp = s3;
-                               s3 = s2; s2 = tmp;
-                       }
-               }
-               down(s1);
-       } else if ((unsigned long) s2 < (unsigned long) s3) {
-               struct semaphore *tmp = s3;
-               s3 = s2; s2 = tmp;
-       }
-       down(s2);
-       down(s3);
-}
-
-static inline void double_up(struct semaphore *s1, struct semaphore *s2)
-{
-       up(s1);
-       if (s1 != s2)
-               up(s2);
-}
-
-static inline void triple_up(struct semaphore *s1,
-                            struct semaphore *s2,
-                            struct semaphore *s3)
-{
-       up(s1);
-       if (s1 != s2)
-               up(s2);
-       up(s3);
-}
-
-static inline void double_lock(struct dentry *d1, struct dentry *d2)
-{
-       double_down(&d1->d_inode->i_sem, &d2->d_inode->i_sem);
-}
-
-static inline void double_unlock(struct dentry *d1, struct dentry *d2)
-{
-       double_up(&d1->d_inode->i_sem,&d2->d_inode->i_sem);
-       dput(d1);
-       dput(d2);
-}
-
  #endif /* __KERNEL__ */
  
  #endif /* _LINUX_FS_H */
diff --git a/kernel/ksyms.c b/kernel/ksyms.c

index 3f0b16ccf16b1a383ee091493355ea8e2e1ae135..cf21ba61d8cba67b023362d63677ee08d3590a2f 100644 (file)
--- a/kernel/ksyms.c
+++ b/kernel/ksyms.c
@@ -253,6 +253,8 @@ EXPORT_SYMBOL(vfs_statfs);
  EXPORT_SYMBOL(vfs_fstat);
  EXPORT_SYMBOL(vfs_stat);
  EXPORT_SYMBOL(vfs_lstat);
+EXPORT_SYMBOL(lock_rename);
+EXPORT_SYMBOL(unlock_rename);
  EXPORT_SYMBOL(generic_read_dir);
  EXPORT_SYMBOL(generic_file_llseek);
  EXPORT_SYMBOL(remote_llseek);
author	Alexander Viro <viro@math.psu.edu>
	Mon, 11 Feb 2002 12:43:33 +0000 (04:43 -0800)
committer	Linus Torvalds <torvalds@home.transmeta.com>
	Mon, 11 Feb 2002 12:43:33 +0000 (04:43 -0800)
Documentation/filesystems/Locking		patch \| blob \| history
Documentation/filesystems/directory-locking	[new file with mode: 0644]	patch \| blob
fs/binfmt_misc.c		patch \| blob \| history
fs/inode.c		patch \| blob \| history
fs/namei.c		patch \| blob \| history
fs/namespace.c		patch \| blob \| history
fs/nfsd/vfs.c		patch \| blob \| history
fs/readdir.c		patch \| blob \| history
include/linux/fs.h		patch \| blob \| history
kernel/ksyms.c		patch \| blob \| history