[Why & How]
Per HW team request, we're lowering the minimum Z8
residency time to 2000us. This enables Z8 support for additional
modes we were previously blocking like 2k>60hz
Cc: stable@vger.kernel.org Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Leo Chen <sancchen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The skl+ scalers only sample 12 bits of PIPESRC so we can't
do any plane scaling at all when the pipe source size is >4k.
Make sure the pipe source size is also below the scaler's src
size limits. Might not be 100% accurate, but should at least be
safe. We can refine the limits later if we discover that recent
hw is less restricted.
By default the indirect state sampler data (border colors) are stored
in the same heap as the SAMPLER_STATE structure. For userspace drivers
that can be 2 different heaps (dynamic state heap & bindless sampler
state heap). This means that border colors have to copied in 2
different places so that the same SAMPLER_STATE structure find the
right data.
This change is forcing the indirect state sampler data to only be in
the dynamic state pool (more convenient for userspace drivers, they
only have to have one copy of the border colors). This is reproducing
the behavior of the Windows drivers.
The adreno_load_gpu() path is guarded by an error check on
adreno_load_fw(). This function is responsible for loading
Qualcomm-only-signed binaries (e.g. SQE and GMU FW for A6XX), but it
does not take the vendor-signed ZAP blob into account.
By embedding the SQE (and GMU, if necessary) firmware into the
initrd/kernel, we can trigger and unfortunate path that would not bail
out early and proceed with gpu->hw_init(). That will fail, as the ZAP
loader path will not find the firmware and return back to
adreno_load_gpu().
This error path involves pm_runtime_put_sync() which then calls idle()
instead of suspend(). This is suboptimal, as it means that we're not
going through the clean shutdown sequence. With at least A619_holi, this
makes the GPU not wake up until it goes through at least one more
start-fail-stop cycle. The pm_runtime_put_sync that appears in the error
path actually does not guarantee that because of the earlier enabling of
runtime autosuspend.
Fix that by using pm_runtime_put_sync_suspend to force a clean shutdown.
Test cases:
1. All firmware baked into kernel
2. error loading ZAP fw in initrd -> load from rootfs at DE start
Both succeed on A619_holi (SM6375) and A630 (SDM845).
Fixes: 0d997f95b70f ("drm/msm/adreno: fix runtime PM imbalance at gpu load") Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/530001/ Link: https://lore.kernel.org/r/20230330231517.2747024-1-konrad.dybcio@linaro.org Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To fully utilize the BT polling/refresh rate, a few input events
are sent together to reduce event delay. This causes issue to the
timestamp generated by input_sync since all the events in the same
packet would pretty much have the same timestamp. This patch inserts
time interval to the events by averaging the total time used for
sending the packet.
This decision was mainly based on observing the actual time interval
between each BT polling. The interval doesn't seem to be constant,
due to the network and system environment. So, using solutions other
than averaging doesn't end up with valid timestamps.
All microcode runs a basic validation after it's been loaded. Each
IP block as part of init will run both.
Introduce a wrapper for request_firmware and amdgpu_ucode_validate.
This wrapper will also remap any error codes from request_firmware
to -ENODEV. This is so that early_init will fail if firmware couldn't
be loaded instead of the IP block being disabled.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini,
driver unconditionally disables ecc_irq which is only enabled on
those asics enabling sdma ecc. This will introduce a warning in
suspend cycle on those chips with sdma ip v4.0, while without
sdma ecc. So this patch correct this.
amdgpu_dpm_is_overdrive_supported is a common API across all
asics, so we should cast pp_handle into correct structure
under different power frameworks.
v2: using return directly to simplify code
v3: SI asic does not carry od_enabled member in pp_handle, and update Fixes tag
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2541 Fixes: eb4900aa4c49 ("drm/amdgpu: Fix kernel NULL pointer dereference in dpm functions") Suggested-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Description]
- Due to bandwidth / arbitration issues at 200Mhz DCFCLK,
we want to enforce minimum 60us of prefetch to avoid
intermittent underflow issues
- Since 60us prefetch is already enforced for UCLK DPM0,
and many DCFCLK's > 200Mhz are mapped to UCLK DPM1, in
theory there should not be any UCLK DPM regressions by
enforcing greater prefetch
Reviewed-by: Nevenko Stupar <Nevenko.Stupar@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we
should set shadow as vmbo->shadow to recover vmbo->bo
v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow)
continue;
Fixes: e18aaea733da ("drm/amdgpu: move shadow_list to amdgpu_bo_vm") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert.
So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq
should be executed under such condition, otherwise, an amdgpu_irq_put
calltrace will occur.
The gmc.ecc_irq is enabled by firmware per IFWI setting,
and the host driver is not privileged to enable/disable
the interrupt. So, it is meaningless to use the amdgpu_irq_put
function in gmc_v11_0_hw_fini, which also leads to the call
trace.
As made mention of in commit 08c677cb0b43 ("drm/amdgpu: fix
amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit 13af556104fa
("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It
is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it
from gmc_v9_0_hw_fini().
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: 3029c855d79f ("drm/amdgpu: Fix desktop freezed after gpu-reset") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The gmc.ecc_irq is enabled by firmware per IFWI setting,
and the host driver is not privileged to enable/disable
the interrupt. So, it is meaningless to use the amdgpu_irq_put
function in gmc_v10_0_hw_fini, which also leads to the call
trace.
Once command submission failed due to userptr invalidation in
amdgpu_cs_submit, legacy code will perform cleanup of scheduler
job. However, it's not needed at all, as former commit has integrated
job cleanup stuff into amdgpu_job_free. Otherwise, because of double
free, a NULL pointer dereference will occur in such scenario.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2457 Fixes: f7d66fb2ea43 ("drm/amdgpu: cleanup scheduler job initialization v2") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, on a handful of ASICs. We allow the framebuffer for a given
plane to exist in either VRAM or GTT. However, if the plane's new
framebuffer is in a different memory domain than it's previous
framebuffer, flipping between them can cause the screen to flicker. So,
to fix this, don't perform an immediate flip in the aforementioned case.
Cc: stable@vger.kernel.org Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354 Reviewed-by: Roman Li <Roman.Li@amd.com> Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
Reading pipe_fuses from register may have invalid bits set, which may
affect the num_pipes erroneously.
[How]
Add read_pipes_fuses() call and filter bits based on expected number
of pipes.
Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Alan Liu <HaoPing.Liu@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
DPP Root clock optimization when combined with 4to1 MPC combine results
in the screen turning black.
This is because the DPPCLK is stopped during the middle of an
optimize_bandwidth sequence during commit_minimal_transition without
going through plane power down/power up.
[How]
The intent of a 0Hz DPP clock through update_clocks is to disable the
DTO. This differs from the behavior of stopping the DPPCLK entirely
(utilizing a 0Hz clock on some ASIC) so it's better to move this logic
to reside next to plane power up/power down where we gate the HUBP/DPP
DOMAIN.
The new sequence should be:
Power down: PG enabled -> RCO on
Power up: RCO off -> PG disabled
Rename power_on_plane to power_on_plane_resources to reflect the
actual operation that's occurring.
Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[Why]
While scanning the top_pipe connections we can run into a case where
the bottom pipe is still connected to a top_pipe but with a NULL
plane_state.
[How]
Treat a NULL plane_state the same as the plane being invisible for
pipe cursor disable logic.
Cc: stable@vger.kernel.org Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Charlene Liu <Charlene.Liu@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is the logical place to put the backlight device, and it also
fixes a kernel crash if the MIPI host is removed. Previously the
backlight device would be unregistered twice when this happened - once
as a child of the MIPI host through `mipi_dsi_host_unregister`, and
once when the panel device is destroyed.
Fixes: 12a6cbd4f3f1 ("drm/panel: otm8009a: Use new backlight API") Signed-off-by: James Cowgill <james.cowgill@blaize.com> Cc: stable@vger.kernel.org Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230412173450.199592-1-james.cowgill@blaize.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When support suspend/resume for loongson-eiointc, the syscore_ops
is registered twice in dual-bridges machines where there are two
eiointc IRQ domains. Repeated registration of an same syscore_ops
broke syscore_ops_list. Also, cpuhp_setup_state_nocalls is only
needed to call for once. So the patch will corret them.
In eiointc_acpi_init(), a *eiointc* node is passed into
acpi_get_vec_parent() instead of a required *NUMA* node (on some chip
like 3C5000L, a *NUMA* node means a *eiointc* node, but on some chip
like 3C5000, a *NUMA* node contains 4 *eiointc* nodes), and node in
struct acpi_vector_group is essentially a *NUMA* node, which will
lead to no parent matched for passed *eiointc* node. so the patch
adjusts code to use *NUMA* node for parameter node of
acpi_set_vec_parent/acpi_get_vec_parent.
In pch_pic_parse_madt(), a NULL parent pointer will be
returned from acpi_get_vec_parent() for second pch-pic domain
related to second bridge while calling eiointc_acpi_init() at
first time, where the parent of it has not been initialized
yet, and will be initialized during second time calling
eiointc_acpi_init(). So, it's reasonable to return zero so
that failure of acpi_table_parse_madt() will be avoided, or else
acpi_cascade_irqdomain_init() will return and initialization of
followed pch_msi domain will be skipped.
Although it does not matter when pch_msi_parse_madt() returns
-EINVAL if no invalid parent is found, it's also reasonable to
return zero for that.
When support suspend/resume for loongson-pch-pic, the syscore_ops
is registered twice in dual-bridges machines where there are two
pch-pic IRQ domains. Repeated registration of an same syscore_ops
broke syscore_ops_list, so the patch will corret it.
For dual-bridges scenario, pch_pic_acpi_init() will be called
in following path:
cpuintc_acpi_init
acpi_cascade_irqdomain_init(in cpuintc driver)
acpi_table_parse_madt
eiointc_parse_madt
eiointc_acpi_init /* this will be called two times
correspondingto parsing two
eiointc entries in MADT under
dual-bridges scenario*/
acpi_cascade_irqdomain_init(in eiointc driver)
acpi_table_parse_madt
pch_pic_parse_madt
pch_pic_acpi_init /* this will be called depend
on valid parent IRQ domain
handle for one or two times
corresponding to parsing
two pchpic entries in MADT
druring calling
eiointc_acpi_init() under
dual-bridges scenario*/
During the first eiointc_acpi_init() calling, the
pch_pic_acpi_init() will be called just one time since only
one valid parent IRQ domain handle will be found for current
eiointc IRQ domain.
During the second eiointc_acpi_init() calling, the
pch_pic_acpi_init() will be called two times since two valid
parent IRQ domain handles will be found. So in pch_pic_acpi_init(),
we must have a reasonable way to prevent from creating second same
pch_pic IRQ domain.
The patch matches gsi base information in created pch_pic IRQ
domains to check if the target domain has been created to avoid the
bug mentioned above.
Let's reduce the complexity of mixed use of rb_tree in victim_entry from
extent_cache and discard_cmd.
This should fix arm32 memory alignment issue caused by shared rb_entry.
[struct victim_entry] [struct rb_entry]
[0] struct rb_node rb_node; [0] struct rb_node rb_node;
union {
struct {
unsigned int ofs;
unsigned int len;
};
[16] unsigned long long mtime; [12] unsigned long long key;
} __packed;
Cc: <stable@vger.kernel.org> Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The intel_dsi_msleep() helper skips sleeping if the MIPI-sequences have
a version of 3 or newer and the panel is in vid-mode.
This is based on the big comment around line 730 which starts with
"Panel enable/disable sequences from the VBT spec.", where
the "v3 video mode seq" column does not have any wait t# entries.
Checking the Windows driver shows that it does always honor
the VBT delays independent of the version of the VBT sequences.
Commit 6fdb335f1c9c ("drm/i915/dsi: Use unconditional msleep for
the panel_on_delay when there is no reset-deassert MIPI-sequence")
switched to a direct msleep() instead of intel_dsi_msleep()
when there is no MIPI_SEQ_DEASSERT_RESET sequence, to fix
the panel on an Acer Aspire Switch 10 E SW3-016 not turning on.
And now testing on a Nextbook Ares 8A shows that panel_on_delay
must always be honored otherwise the panel will not turn on.
Instead of only always using regular msleep() for panel_on_delay
do as Windows does and always use regular msleep() everywhere
were intel_dsi_msleep() is used and drop the intel_dsi_msleep()
helper.
Changes in v2:
- Replace all intel_dsi_msleep() calls instead of just
the intel_dsi_msleep(panel_on_delay) call
Cc: stable@vger.kernel.org Fixes: 6fdb335f1c9c ("drm/i915/dsi: Use unconditional msleep for the panel_on_delay when there is no reset-deassert MIPI-sequence") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230425194441.68086-1-hdegoede@redhat.com
(cherry picked from commit fa83c12132f71302f7d4b02758dc0d46048d3f5f) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to destroy the workqueue also in case of early errors during
bind (e.g. a subcomponent failing to bind).
Since commit c3b790ea07a1 ("drm: Manage drm_mode_config_init with
drmm_") the mode config will be freed when the drm device is released
also when using the legacy interface, but add an explicit cleanup for
consistency and to facilitate backporting.
Fixes: 060530f1ea67 ("drm/msm: use componentised device support") Cc: stable@vger.kernel.org # 3.15 Cc: Rob Clark <robdclark@gmail.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/525093/ Link: https://lore.kernel.org/r/20230306100722.28485-9-johan+linaro@kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of early initialisation errors and on platforms that do not use
the DPU controller, the deinitilisation code can be called with the kms
pointer set to NULL.
Fixes: f026e431cf86 ("drm/msm: Convert to Linux IRQ interfaces") Cc: stable@vger.kernel.org # 5.14 Cc: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/525104/ Link: https://lore.kernel.org/r/20230306100722.28485-5-johan+linaro@kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of early initialisation errors and on platforms that do not use
the DPU controller, the deinitilisation code can be called with the kms
pointer set to NULL.
Fixes: 98659487b845 ("drm/msm: add support to take dpu snapshot") Cc: stable@vger.kernel.org # 5.14 Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/525099/ Link: https://lore.kernel.org/r/20230306100722.28485-4-johan+linaro@kernel.org Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LT8912 DSI port supports only Non-Burst mode video operation with Sync
Events and continuous clock on clock lane, correct dsi mode flags
according to that removing MIPI_DSI_MODE_VIDEO_BURST flag.
Cc: <stable@vger.kernel.org> Fixes: 30e2ae943c26 ("drm/bridge: Introduce LT8912B DSI to HDMI bridge") Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Robert Foss <rfoss@kernel.org> Signed-off-by: Robert Foss <rfoss@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230330093131.424828-1-francesco@dolcini.it Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent commit moved enabling of runtime PM to GPU load time (first
open()) but failed to update the error paths so that runtime PM is
disabled if initialisation of the GPU fails. This would trigger a
warning about the unbalanced disable count on the next open() attempt.
Note that pm_runtime_put_noidle() is sufficient to balance the usage
count when pm_runtime_put_sync() fails (and is chosen over
pm_runtime_resume_and_get() for consistency reasons).
Fixes: 4b18299b3365 ("drm/msm/adreno: Defer enabling runpm until hw_init()") Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/524971/ Link: https://lore.kernel.org/r/20230303164807.13124-3-johan+linaro@kernel.org Signed-off-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While I'm not aware of any problems that have occurred running these
at 100 MHz, the official word from ASRock is that 50 MHz is the
correct speed to use, so let's be safe and use that instead.
Relatively new docs which I added which hinted the base directories needed
to be created before is wrong, remove that incorrect comment. This has been
hinted before by Eric twice already [0] [1], I had just not verified that
until now. Now that I've verified that updates the docs to relax the context
described.
Cc: stable@vger.kernel.org # v5.17 Cc: Christian Brauner <brauner@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Function of_phandle_iterator_next() calls of_node_put() on the last
device_node it iterated over, but when the loop exits prematurely it has
to be called explicitly.
Function of_phandle_iterator_next() calls of_node_put() on the last
device_node it iterated over, but when the loop exits prematurely it has
to be called explicitly.
Function of_phandle_iterator_next() calls of_node_put() on the last
device_node it iterated over, but when the loop exits prematurely it has
to be called explicitly.
Function of_phandle_iterator_next() calls of_node_put() on the last
device_node it iterated over, but when the loop exits prematurely it has
to be called explicitly.
Function of_phandle_iterator_next() calls of_node_put() on the last
device_node it iterated over, but when the loop exits prematurely it has
to be called explicitly.
o that paths don't need to exist for the new API callers
o clarify that we *require* callers to keep the memory of
the table around during the lifetime of the sysctls
o annotate routines we are trying to deprecate and later remove
Cc: stable@vger.kernel.org # v5.17 Cc: Christian Brauner <brauner@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Update the docs for __register_sysctl_table() to make it clear no child
entries can be passed. When the child is true these are non-leaf entries
on the ctl table and sysctl treats these as directories. The point to
__register_sysctl_table() is to deal only with directories not part of
the ctl table where thay may riside, to be simple and avoid recursion.
While at it, hint towards using long on extra1 and extra2 later.
Cc: stable@vger.kernel.org # v5.17 Cc: Christian Brauner <brauner@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled.
A return of 0 causes the boot option/value to be listed as an Unknown
kernel parameter and added to init's (limited) argument or environment
strings. Also, error return codes don't mean anything to
obsolete_checksetup() -- only non-zero (usually 1) or zero.
So return 1 from nmi_debug_setup().
Fixes: 1e1030dccb10 ("sh: nmi_debug support.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <izh1979@gmail.com>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: linux-sh@vger.kernel.org Cc: stable@vger.kernel.org Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230306040037.20350-3-rdunlap@infradead.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When CONFIG_OF_EARLY_FLATTREE and CONFIG_SH_DEVICE_TREE are not set,
SH3 build fails with a call to early_init_dt_scan(), so in
arch/sh/kernel/setup.c and arch/sh/kernel/head_32.S, use
CONFIG_OF_EARLY_FLATTREE instead of CONFIG_OF_FLATTREE.
Fixes this build error:
../arch/sh/kernel/setup.c: In function 'sh_fdt_init':
../arch/sh/kernel/setup.c:262:26: error: implicit declaration of function 'early_init_dt_scan' [-Werror=implicit-function-declaration]
262 | if (!dt_virt || !early_init_dt_scan(dt_virt)) {
Fixes: 03767daa1387 ("sh: fix build regression with CONFIG_OF && !CONFIG_OF_FLATTREE") Fixes: eb6b6930a70f ("sh: fix memory corruption of unflattened device tree") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: devicetree@vger.kernel.org Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: linux-sh@vger.kernel.org Cc: stable@vger.kernel.org Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230306040037.20350-4-rdunlap@infradead.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a build error in mcount.S when CONFIG_PRINTK is not enabled.
Fixes this build error:
sh2-linux-ld: arch/sh/lib/mcount.o: in function `stack_panic':
(.text+0xec): undefined reference to `dump_stack'
Fixes: e460ab27b6c3 ("sh: Fix up stack overflow check with ftrace disabled.") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@vger.kernel.org Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230306040037.20350-8-rdunlap@infradead.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix a warning that was reported by the kernel test robot:
In file included from ../include/math-emu/soft-fp.h:27,
from ../arch/sh/math-emu/math.c:22:
../arch/sh/include/asm/sfp-machine.h:17: warning: "__BYTE_ORDER" redefined
17 | #define __BYTE_ORDER __BIG_ENDIAN
In file included from ../arch/sh/math-emu/math.c:21:
../arch/sh/math-emu/sfp-util.h:71: note: this is the location of the previous definition
71 | #define __BYTE_ORDER __LITTLE_ENDIAN
Fixes: b929926f01f2 ("sh: define __BIG_ENDIAN for math-emu") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com>
Link: lore.kernel.org/r/202111121827.6v6SXtVv-lkp@intel.com Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: linux-sh@vger.kernel.org Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: stable@vger.kernel.org Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230306040037.20350-5-rdunlap@infradead.org Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In investigating a failure with xfstest generic/392 it
was noticed that mounts were reusing a superblock that should
already have been freed. This turned out to be related to
deferred close files keeping a reference count until the
closetimeo expired.
Currently the only way an fs knows that mount is beginning is
when force unmount is called, but when this, ie umount_begin(),
is called all deferred close files on the share (tree
connection) should be closed immediately (unless shared by
another mount) to avoid using excess resources on the server
and to avoid reusing a superblock which should already be freed.
In umount_begin, close all deferred close handles for that
share if this is the last mount using that share on this
client (ie send the SMB3 close request over the wire for those
that have been already closed by the app but that we have
kept a handle lease open for and have not sent closes to the
server for yet).
Reported-by: David Howells <dhowells@redhat.com> Acked-by: Bharath SM <bharathsm@microsoft.com> Cc: <stable@vger.kernel.org> Fixes: 78c09634f7dc ("Cifs: Fix kernel oops caused by deferred close for files.") Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xfstests generic/392 showed a problem where even after a
shutdown call was made on a mount, we would still attempt
to use the (now inaccessible) superblock if another mount
was attempted for the same share.
Reported-by: David Howells <dhowells@redhat.com> Reviewed-by: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Fixes: 087f757b0129 ("cifs: add shutdown support") Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When inotify_freeing_mark() races with inotify_handle_inode_event() it
can happen that inotify_handle_inode_event() sees that i_mark->wd got
already reset to -1 and reports this value to userspace which can
confuse the inotify listener. Avoid the problem by validating that wd is
sensible (and pretend the mark got removed before the event got
generated otherwise).
CC: stable@vger.kernel.org Fixes: 7e790dd5fc93 ("inotify: fix error paths in inotify_update_watch")
Message-Id: <20230424163219.9250-1-jack@suse.cz> Reported-by: syzbot+4a06d4373fd52f0b2f9c@syzkaller.appspotmail.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There has been a lot of confusion around which platform profiles are
supported on various platforms and it would be useful to have a debug
method to be able to override the profile mode that is selected.
I don't expect this to be used in anything other than debugging in
conjunction with Lenovo engineers - but it does give a way to get a
system working whilst we wait for either FW fixes, or a driver fix
to land upstream, if something is wonky in the mode detection logic
Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20230505132523.214338-2-mpearson-lenovo@squebb.ca Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I had incorrectly thought that PSC profiles were not usable on Intel
platforms so had blocked them in the driver initialistion. This broke
platform profiles on the T490.
After discussion with the FW team PSC does work on Intel platforms and
should be allowed.
Note - it's possible this may impact other platforms where it is advertised
but special driver support that only Windows has is needed. But if it does
then they will need fixing via quirks. Please report any issues to me so I
can get them addressed - but I haven't found any problems in testing...yet
Reviewed-by: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change type of pcchunk->Length from u32 to u64 to match
smb2_copychunk_range arguments type. Fixes the problem where performing
server-side copy with CIFS_IOC_COPYCHUNK_FILE ioctl resulted in incomplete
copy of large files while returning -EINVAL.
Fixes: 9bf0c9cd4314 ("CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files") Cc: <stable@vger.kernel.org> Signed-off-by: Pawel Witek <pawel.ireneusz.witek@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When using the logical to ino ioctl v2, if the flag to ignore offsets of
file extent items (BTRFS_LOGICAL_INO_ARGS_IGNORE_OFFSET) is given, the
backref walking code ends up not returning references for all file offsets
of an inode that point to the given logical bytenr. This happens since
kernel 6.2, commit 6ce6ba534418 ("btrfs: use a single argument for extent
offset in backref walking functions") because:
1) It mistakenly skipped the search for file extent items in a leaf that
point to the target extent if that flag is given. Instead it should
only skip the filtering done by check_extent_in_eb() - that is, it
should not avoid the calls to that function (or find_extent_in_eb(),
which uses it).
2) It was also not building a list of inode extent elements (struct
extent_inode_elem) if we have multiple inode references for an extent
when the ignore offset flag is given to the logical to ino ioctl - it
would leave a single element, only the last one that was found.
These stem from the confusing old interface for backref walking functions
where we had an extent item offset argument that was a pointer to a u64
and another boolean argument that indicated if the offset should be
ignored, but the pointer could be NULL. That NULL case is used by
relocation, qgroup extent accounting and fiemap, simply to avoid building
the inode extent list for each reference, as it's not necessary for those
use cases and therefore avoids memory allocations and some computations.
Fix this by adding a boolean argument to the backref walk context
structure to indicate that the inode extent list should not be built,
make relocation set that argument to true and fix the backref walking
logic to skip the calls to check_extent_in_eb() and find_extent_in_eb()
only if this new argument is true, instead of 'ignore_extent_item_pos'
being true.
A test case for fstests will be added soon, to provide cover not only
for these cases but to the logical to ino ioctl in general as well, as
currently we do not have a test case for it.
Reported-by: Vladimir Panteleev <git@vladimir.panteleev.md> Link: https://lore.kernel.org/linux-btrfs/CAHhfkvwo=nmzrJSqZ2qMfF-rZB-ab6ahHnCD_sq9h4o8v+M7QQ@mail.gmail.com/ Fixes: 6ce6ba534418 ("btrfs: use a single argument for extent offset in backref walking functions") CC: stable@vger.kernel.org # 6.2+ Tested-by: Vladimir Panteleev <git@vladimir.panteleev.md> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When both of the superblock zones are full, we need to check which
superblock is newer. The calculation of last superblock position is wrong
as it does not consider zone_capacity and uses the length.
Fixes: 9658b72ef300 ("btrfs: zoned: locate superblock position using zone capacity") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For data block groups, we zone finish a zone (or, just deactivate it) when
seeing the last IO in btrfs_finish_ordered_io(). That is only called for
IOs using ZONE_APPEND, but we use a regular WRITE command for data
relocation IOs. Detect it and call btrfs_zone_finish_endio() properly.
Fixes: be1a1d7a5d24 ("btrfs: zoned: finish fully written block group") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When loading a free space cache from disk, at __load_free_space_cache(),
if we fail to insert a bitmap entry, we still increment the number of
total bitmaps in the btrfs_free_space_ctl structure, which is incorrect
since we failed to add the bitmap entry. On error we then empty the
cache by calling __btrfs_remove_free_space_cache(), which will result
in getting the total bitmaps counter set to 1.
A failure to load a free space cache is not critical, so if a failure
happens we just rebuild the cache by scanning the extent tree, which
happens at block-group.c:caching_thread(). Yet the failure will result
in having the total bitmaps of the btrfs_free_space_ctl always bigger
by 1 then the number of bitmap entries we have. So fix this by having
the total bitmaps counter be incremented only if we successfully added
the bitmap entry.
Fixes: a67509c30079 ("Btrfs: add a io_ctl struct and helpers for dealing with the space cache") Reviewed-by: Anand Jain <anand.jain@oracle.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Check nodesize to sectorsize in alignment check in print_extent_item.
The comment states that and this is correct, similar check is done
elsewhere in the functions.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: ea57788eb76d ("btrfs: require only sector size alignment for parent eb bytenr") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Anastasia Belova <abelova@astralinux.ru> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Previously clear_cache mount option would simply disable free-space-tree
feature temporarily then re-enable it to rebuild the whole free space
tree.
But this is problematic for block-group-tree feature, as we have an
artificial dependency on free-space-tree feature.
If we go the existing method, after clearing the free-space-tree
feature, we would flip the filesystem to read-only mode, as we detect a
super block write with block-group-tree but no free-space-tree feature.
This patch would change the behavior by properly rebuilding the free
space tree without disabling this feature, thus allowing clear_cache
mount option to work with block group tree.
Now we can mount a filesystem with block-group-tree feature and
clear_mount option:
$ mkfs.btrfs -O block-group-tree /dev/test/scratch1 -f
$ sudo mount /dev/test/scratch1 /mnt/btrfs -o clear_cache
$ sudo dmesg -t | head -n 5
BTRFS info (device dm-1): force clearing of disk cache
BTRFS info (device dm-1): using free space tree
BTRFS info (device dm-1): auto enabling async discard
BTRFS info (device dm-1): rebuilding free space tree
BTRFS info (device dm-1): checking UUID tree
btrfs_redirty_list_add zeroes the buffer data and sets the
EXTENT_BUFFER_NO_CHECK to make sure writeback is fine with a bogus
header. But it does that after already marking the buffer dirty, which
means that writeback could already be looking at the buffer.
Switch the order of operations around so that the buffer is only marked
dirty when we're ready to write it.
Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers") CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boris noticed in his simple quotas testing that he was getting a leak
with Sweet Tea's change to subvol create that stopped doing a
transaction commit. This was just a side effect of that change.
In the delayed inode code we have an optimization that will free extra
reservations if we think we can pack a dir item into an already modified
leaf. Previously this wouldn't be triggered in the subvolume create
case because we'd commit the transaction, it was still possible but
much harder to trigger. It could actually be triggered if we did a
mkdir && subvol create with qgroups enabled.
This occurs because in btrfs_insert_delayed_dir_index(), which gets
called when we're adding the dir item, we do the following:
The temporary block rsv just has ->qgroup_rsv_reserved set,
->qgroup_rsv_size == 0. The optimization in
btrfs_insert_delayed_dir_index() sets ->qgroup_rsv_reserved = 0. Then
later on when we call btrfs_subvolume_release_metadata() which has
qgroup_to_release is set to 0, and we do not convert the reserved
metadata space.
The problem here is that the block rsv code has been unconditionally
messing with ->qgroup_rsv_reserved, because the main place this is used
is delalloc, and any time we call btrfs_block_rsv_release() we do it
with qgroup_to_release set, and thus do the proper accounting.
The subvolume code is the only other code that uses the qgroup
reservation stuff, but it's intermingled with the above optimization,
and thus was getting its reservation freed out from underneath it and
thus leaking the reserved space.
The solution is to simply not mess with the qgroup reservations if we
don't have qgroup_to_release set. This works with the existing code as
anything that messes with the delalloc reservations always have
qgroup_to_release set. This fixes the leak that Boris was observing.
Reviewed-by: Qu Wenruo <wqu@suse.com> CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We have observed a btrfs filesystem corruption on workloads using
no-holes and encoded writes via send stream v2. The symptom is that a
file appears to be truncated to the end of its last aligned extent, even
though the final unaligned extent and even the file extent and otherwise
correctly updated inode item have been written.
So if we were writing out a 1MiB+X file via 8 128K extents and one
extent of length X, i_size would be set to 1MiB, but the ninth extent,
nbyte, etc. would all appear correct otherwise.
The source of the race is a narrow (one line of code) window in which a
no-holes fs has read in an updated i_size, but has not yet set a shared
disk_i_size variable to write. Therefore, if two ordered extents run in
parallel (par for the course for receive workloads), the following
sequence can play out: (following "threads" a bit loosely, since there
are callbacks involved for endio but extra threads aren't needed to
cause the issue)
ENC-WR1 (second to last) ENC-WR2 (last)
------- -------
btrfs_do_encoded_write
set i_size = 1M
submit bio B1 ending at 1M
endio B1
btrfs_inode_safe_disk_i_size_write
local i_size = 1M
falls off a cliff for some reason
btrfs_do_encoded_write
set i_size = 1M+X
submit bio B2 ending at 1M+X
endio B2
btrfs_inode_safe_disk_i_size_write
local i_size = 1M+X
disk_i_size = 1M+X
disk_i_size = 1M
btrfs_delayed_update_inode
btrfs_delayed_update_inode
And the delayed inode ends up filled with nbytes=1M+X and isize=1M, and
writes respect i_size and present a corrupted file missing its last
extents.
Fix this by holding the inode lock in the no-holes case so that a thread
can't sneak in a write to disk_i_size that gets overwritten with an out
of date i_size.
Fixes: 41a2ee75aab0 ("btrfs: introduce per-inode file extent tree") CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Balance as exclusive state is compatible with paused balance and device
add, which makes some things more complicated. The assertion of valid
states when starting from paused balance needs to take into account two
more states, the combinations can be hit when there are several threads
racing to start balance and device add. This won't typically happen when
the commands are started from command line.
Scenario 1: With exclusive_operation state == BTRFS_EXCLOP_NONE.
Concurrently adding multiple devices to the same mount point and
btrfs_exclop_finish executed finishes before assertion in
btrfs_exclop_balance, exclusive_operation will changed to
BTRFS_EXCLOP_NONE state which lead to assertion failed:
Scenario 2: With exclusive_operation state == BTRFS_EXCLOP_BALANCE_PAUSED.
Concurrently adding multiple devices to the same mount point and
btrfs_exclop_balance executed finish before the latter thread execute
assertion in btrfs_exclop_balance, exclusive_operation will changed to
BTRFS_EXCLOP_BALANCE_PAUSED state which lead to assertion failed:
[BUG]
With block-group-tree feature enabled, mounting it with clear_cache
would cause the following transaction abort at mount or remount:
BTRFS info (device dm-4): force clearing of disk cache
BTRFS info (device dm-4): using free space tree
BTRFS info (device dm-4): auto enabling async discard
BTRFS info (device dm-4): clearing free space tree
BTRFS info (device dm-4): clearing compat-ro feature flag for FREE_SPACE_TREE (0x1)
BTRFS info (device dm-4): clearing compat-ro feature flag for FREE_SPACE_TREE_VALID (0x2)
BTRFS error (device dm-4): block-group-tree feature requires fres-space-tree and no-holes
BTRFS error (device dm-4): super block corruption detected before writing it to disk
BTRFS: error (device dm-4) in write_all_supers:4288: errno=-117 Filesystem corrupted (unexpected superblock corruption detected)
BTRFS warning (device dm-4: state E): Skipping commit of aborted transaction.
[CAUSE]
For block-group-tree feature, we have an artificial dependency on
free-space-tree.
This means if we detect block-group-tree without v2 cache, we consider
it a corruption and cause the problem.
For clear_cache mount option, it would temporary disable v2 cache, then
re-enable it.
But unfortunately for that temporary v2 cache disabled status, we refuse
to write a superblock with bg tree only flag, thus leads to the above
transaction abortion.
[FIX]
For now, just reject clear_cache and v1 cache mount option for block
group tree. So now we got a graceful rejection other than a transaction
abort:
BTRFS info (device dm-4): force clearing of disk cache
BTRFS error (device dm-4): cannot disable free space tree with block-group-tree feature
BTRFS error (device dm-4): open_ctree failed
CC: stable@vger.kernel.org # 6.1+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
find_next_bit and find_next_zero_bit take @size as the second parameter and
@offset as the third parameter. They are specified opposite in
btrfs_ensure_empty_zones(). Thanks to the later loop, it never failed to
detect the empty zones. Fix them and (maybe) return the result a bit
faster.
Note: the naming is a bit confusing, size has two meanings here, bitmap
and our range size.
A call to btrfs_prev_leaf() may end up returning a path that points to the
same item (key) again. This happens if while btrfs_prev_leaf(), after we
release the path, a concurrent insertion happens, which moves items off
from a sibling into the front of the previous leaf, and an item with the
computed previous key does not exists.
For example, suppose we have the two following leaves:
If we call btrfs_prev_leaf(), from btrfs_previous_item() for example, with
a path pointing to leaf B and slot 0 and the following happens:
1) At btrfs_prev_leaf() we compute the previous key to search as:
(300 96 19), which is a key that does not exists in the tree;
2) Then we call btrfs_release_path() at btrfs_prev_leaf();
3) Some other task inserts a key at leaf A, that sorts before the key at
slot 20, for example it has an objectid of 299. In order to make room
for the new key, the key at slot 22 is moved to the front of leaf B.
This happens at push_leaf_right(), called from split_leaf().
4) At btrfs_prev_leaf() we call btrfs_search_slot() for the computed
previous key: (300 96 19). Since the key does not exists,
btrfs_search_slot() returns 1 and with a path pointing to leaf B
and slot 1, the item with key (300 96 20);
5) This makes btrfs_prev_leaf() return a path that points to slot 1 of
leaf B, the same key as before it was called, since the key at slot 0
of leaf B (300 96 16) is less than the computed previous key, which is
(300 96 19);
6) As a consequence btrfs_previous_item() returns a path that points again
to the item with key (300 96 20).
For some users of btrfs_prev_leaf() or btrfs_previous_item() this may not
be functional a problem, despite not making sense to return a new path
pointing again to the same item/key. However for a caller such as
tree-log.c:log_dir_items(), this has a bad consequence, as it can result
in not logging some dir index deletions in case the directory is being
logged without holding the inode's VFS lock (logging triggered while
logging a child inode for example) - for the example scenario above, in
case the dir index keys 17, 18 and 19 were deleted in the current
transaction.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If user does not specify hook number and priority, then assume this is
a chain/flowtable update. Therefore, report ENOENT which provides a
better hint than EINVAL. Set on extended netlink error report to refer
to the chain name.
Fixes: 5b6743fb2c2a ("netfilter: nf_tables: skip flowtable hooknum and priority on device updates") Fixes: 5efe72698a97 ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
Flowtable and netdev chains are bound to one or several netdevice,
extend netlink error reporting to specify the the netdevice that
triggers the error.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Stable-dep-of: 8509f62b0b07 ("netfilter: nf_tables: hit ENOENT on unexisting chain/flowtable update with missing attributes") Signed-off-by: Sasha Levin <sashal@kernel.org>
Now that a DFS tcon manages its own list of DFS referrals and
sessions, there is no point in having a single worker to refresh
referrals of all DFS tcons. Make it faster and less prone to race
conditions when having several mounts by queueing a worker per DFS
tcon that will take care of refreshing only the DFS referrals related
to it.
Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
We had a couple of checks for session in cifs_tree_connect
and cifs_mark_open_files_invalid, which were unnecessary.
And that was done with ses_lock. Changed that to tc_lock too.
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Stable-dep-of: 6be2ea33a409 ("cifs: avoid potential races when handling multiple dfs tcons") Signed-off-by: Sasha Levin <sashal@kernel.org>
If we do get multiple notifications from firmware, then
we might have allocated 'notif', but don't free it. Fix
that by checking for duplicates before allocation.
When smb client send concurrent smb2 close and logoff request
with multichannel connection, It can cause racy issue. logoff request
free tcon and can cause UAF issues in smb2 close. When receiving logoff
request with multichannel, ksmbd should wait until all remaning requests
complete as well as ones in the current connection, and then make
session expired.
Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-20796 ZDI-CAN-20595 Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>