From 0df5c0b10b2c921f74745a3bd294cb41239c2f95 Mon Sep 17 00:00:00 2001 From: Zygo Blaxell Date: Thu, 7 Sep 2023 11:59:34 -0400 Subject: [PATCH] zygo: btrfs: call find_free_dev_extent with the right num_bytes The caller of find_free_dev_extent sets num_bytes to the product of stripe_len and the number of devices. This is a unit conformability error, because num_bytes is measured in physical bytes in the device address space, while the product of stripe_len and a value in any other unit is not. The result is that the dev_extent allocator is searching for chunk-sized dev_extents (up to 10 GiB) to satisfy allocations, but the allocator will only allocate the first 1 GiB of the space on the device that is found. That results in some unfortunate behavior. e.g. if a device has large contiguous holes, it will dominate allocation in the presence of devices with smaller contiguous holes, even when the holes are of sufficient size to form a chunk. Consider a filesystem using raid1 profile with these free spaces in dev_extent maps: Device 1: 1000x 1 GiB holes Device 2: 1x 1000 GiB hole Device 3: 10x 1.01 GiB holes The first 10 block groups will be allocated in pairs of dev_extents from device 2 and 3, because the allocator selects the devices with the largest holes even if those holes are larger than 1 GiB: Device 1: 1000x 1 GiB holes Device 2: 1x 990 GiB hole Device 3: 10x 0.01 GiB holes As the filesystem fills up, this results in a 9.9 GiB shortfall. 990 chunks are created, pairing up all the 1 GiB holes on device 1 with parts of the 990 GiB hole on device 2: Device 1: 10x 1 GiB holes Device 2: 0x 0 GiB hole (full) Device 3: 10x 0.01 GiB holes Then the allocator fills up the largest holes it can still find: Device 1: 9x 1 GiB holes + 1x 0.9 GiB hole Device 2: 0x 0 GiB hole (full) Device 3: 0x 0.00 GiB holes (full) Now the filesystem is out of space 9.9 GiB earlier than it should be. Ideally, the find_free_dev_extent considers all holes above 1 GiB in size equal, so the allocation first fills devices 1 and 2 until they have equal free space to device 3: Device 1: 10x 1 GiB holes Device 2: 1x 10 GiB hole Device 3: 10x 1.01 GiB holes Device 1: 0x 1 GiB holes (full) Device 2: 1x 0 GiB hole (full) Device 3: 0x 1.01 GiB holes (full) It looks like this bug was originally introduced in 73c5de005153 "btrfs: quasi-round-robin for chunk allocation" when the parameter was changed from: max available space on first device with space to the product of: max available space on every device with space * number of devices with that amount of space The problematic behavior didn't emerge until later, after changes adding zoned support and the space_info max chunk size sysfs parameter. These changes affect the calculations of the alloc_chunk_ctl members, which find_free_dev_extent then recombines in surprising ways. To fix this, pass only the max stripe length to find_free_dev_extent, without multiplying by any other value. Fixes: 73c5de005153 "btrfs: quasi-round-robin for chunk allocation" Signed-off-by: Zygo Blaxell (cherry picked from commit 9c014f0d339aa9445cfb74667ba2ff40bd7a24ba) --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8597dea08cda..66ea7a5b4a2d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5193,7 +5193,7 @@ static int gather_device_info(struct btrfs_fs_devices *fs_devices, struct btrfs_fs_info *info = fs_devices->fs_info; struct btrfs_device *device; u64 total_avail; - u64 dev_extent_want = ctl->max_stripe_size * ctl->dev_stripes; + u64 dev_extent_want = ctl->max_stripe_size; int ret; int ndevs = 0; u64 max_avail; -- 2.39.5