mpage_writepages() does a lock_page() on pages to be written back, even
when it is being used for page reclaim writeback.
This is normally OK, because the page is unlocked quickly - pages are
unlocked during writeback and nobody should be performing __GFP_FS
allocations inside lock_page().
But it has introduced a ranking problem in ext3:
generic_file_write
->lock_page
->ext3_prepare_write
->journal_start (waits for a commit)
versus
ext3_create()
->journal_start()
->ext3_new_inode(GFP_KERNEL)
->page reclaim
->mpage_writepages
->lock_page (locks up, transaction is held open)
Maybe sometime, I'll have to turn mpage_writepages' lock_page into a
trylock if the caller is PF_MEMALLOC. But for now, let's make ext3's
inside-transaction allocations use GFP_NOFS. There is only one of them.
return;
}
-static kmem_cache_t * ext3_inode_cachep;
+static kmem_cache_t *ext3_inode_cachep;
+/*
+ * Called inside transaction, so use GFP_NOFS
+ */
static struct inode *ext3_alloc_inode(struct super_block *sb)
{
struct ext3_inode_info *ei;
- ei = (struct ext3_inode_info *)kmem_cache_alloc(ext3_inode_cachep, SLAB_KERNEL);
+
+ ei = kmem_cache_alloc(ext3_inode_cachep, SLAB_NOFS);
if (!ei)
return NULL;
return &ei->vfs_inode;