]> git.hungrycats.org Git - linux/commitdiff
zygo: btrfs: 'btrfs replace' hangs at end of replacing a device (v5.10.82)
authorZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Tue, 30 Nov 2021 16:37:05 +0000 (11:37 -0500)
committerZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Thu, 2 Dec 2021 01:43:27 +0000 (20:43 -0500)
From: Nikolay Borisov <nborisov@suse.com>
Date: Tue, 30 Nov 2021 15:55:12 +0200

I have a working hypothesis what might be going wrong, however without a
crash dump to investigate I can't really confirm it. Basically I think
btrfs_rm_dev_replace_blocked is not seeing the decrement aka the store
to running bios count since it's using cond_wake_up_nomb. If I'm right
then the following should fix it:

@@ -1122,7 +1123,8 @@ void btrfs_bio_counter_inc_noblocked(struct
btrfs_fs_info *fs_info)
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
        percpu_counter_sub(&fs_info->dev_replace.bio_counter, amount);
-       cond_wake_up_nomb(&fs_info->dev_replace.replace_wait);
+       /* paired with the wait_event barrier in replace_blocked */
+       cond_wake_up(&fs_info->dev_replace.replace_wait);
 }

Can you apply it and see if it can reproduce, I don't know what's the
incident rate of this bug so you have to decide at what point it should
be fixed. In any case this patch can't have any negative functional
impact, it just makes the ordering slightly stronger to ensure the write
happens before possibly waking up someone on the queue.

(cherry picked from commit 004d176cd42177999c24c25aaa09a7aa8b5ace02)

fs/btrfs/dev-replace.c

index d029be40ea6f0ac3e38748c6b4438c400137f90b..2d803e0f3f7cdbccf2ec5c49569ff97b6f5198a2 100644 (file)
@@ -1321,7 +1321,8 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
        percpu_counter_sub(&fs_info->dev_replace.bio_counter, amount);
-       cond_wake_up_nomb(&fs_info->dev_replace.replace_wait);
+       /* paired with the wait_event barrier in replace_blocked */
+       cond_wake_up(&fs_info->dev_replace.replace_wait);
 }
 
 void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)