]> git.hungrycats.org Git - linux/commitdiff
zygo: btrfs: 'btrfs replace' hangs at end of replacing a device (v5.10.82)
authorZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Tue, 30 Nov 2021 16:37:05 +0000 (11:37 -0500)
committerZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Sun, 5 Dec 2021 07:57:28 +0000 (02:57 -0500)
From: Nikolay Borisov <nborisov@suse.com>
Date: Tue, 30 Nov 2021 15:55:12 +0200

I have a working hypothesis what might be going wrong, however without a
crash dump to investigate I can't really confirm it. Basically I think
btrfs_rm_dev_replace_blocked is not seeing the decrement aka the store
to running bios count since it's using cond_wake_up_nomb. If I'm right
then the following should fix it:

@@ -1122,7 +1123,8 @@ void btrfs_bio_counter_inc_noblocked(struct
btrfs_fs_info *fs_info)
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
        percpu_counter_sub(&fs_info->dev_replace.bio_counter, amount);
-       cond_wake_up_nomb(&fs_info->dev_replace.replace_wait);
+       /* paired with the wait_event barrier in replace_blocked */
+       cond_wake_up(&fs_info->dev_replace.replace_wait);
 }

Can you apply it and see if it can reproduce, I don't know what's the
incident rate of this bug so you have to decide at what point it should
be fixed. In any case this patch can't have any negative functional
impact, it just makes the ordering slightly stronger to ensure the write
happens before possibly waking up someone on the queue.

(cherry picked from commit 004d176cd42177999c24c25aaa09a7aa8b5ace02)

fs/btrfs/dev-replace.c

index 02a68b04e43f9e0ec0c2469e1706a2e79d7cd5e6..755286af38031e3ba572a35e8ec27139a7a5136a 100644 (file)
@@ -1137,7 +1137,8 @@ void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
        percpu_counter_sub(&fs_info->dev_replace.bio_counter, amount);
-       cond_wake_up_nomb(&fs_info->dev_replace.replace_wait);
+       /* paired with the wait_event barrier in replace_blocked */
+       cond_wake_up(&fs_info->dev_replace.replace_wait);
 }
 
 void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)