The action force umount(umount -f) will attempt to kill all rpc_task even
umount operation may ultimately fail if some files remain open.
Consequently, if an action attempts to open a file, it can potentially
send two rpc_task to nfs server.
umount -f
nfs_umount_begin
rpc_killall_tasks
rpc_signal_task
rpc_task1 been wakeup
and return -512
_nfs4_do_open // while loop
...
nfs4_run_open_task
/* rpc_task2 */
rpc_run_task
rpc_wait_for_completion_task
While processing an open request, nfsd will first attempt to find or
allocate an nfs4_openowner. If it finds an nfs4_openowner that is not
marked as NFS4_OO_CONFIRMED, this nfs4_openowner will released. Since
two rpc_task can attempt to open the same file simultaneously from the
client to server, and because two instances of nfsd can run
concurrently, this situation can lead to lots of memory leak.
Additionally, when we echo 0 to /proc/fs/nfsd/threads, warning will be
triggered.
NFS SERVER
nfsd1 nfsd2 echo 0 > /proc/fs/nfsd/threads
The function `e_show` was called with protection from RCU. This only
ensures that `exp` will not be freed. Therefore, the reference count for
`exp` can drop to zero, which will trigger a refcount use-after-free
warning when `exp_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `exp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 3 PID: 819 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
...
Call Trace:
<TASK>
e_show+0x20b/0x230 [nfsd]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: bf18f163e89c ("NFSD: Using exp_get for export getting") Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Rockchip PCIe endpoint controller handles PCIe transfers addresses
by masking the lower bits of the programmed PCI address and using the
same number of lower bits masked from the CPU address space used for the
mapping. For a PCI mapping of <size> bytes starting from <pci_addr>,
the number of bits masked is the number of address bits changing in the
address range [pci_addr..pci_addr + size - 1].
However, rockchip_pcie_prog_ep_ob_atu() calculates num_pass_bits only
using the size of the mapping, resulting in an incorrect number of mask
bits depending on the value of the PCI address to map.
Fix this by introducing the helper function
rockchip_pcie_ep_ob_atu_num_bits() to correctly calculate the number of
mask bits to use to program the address translation unit. The number of
mask bits is calculated depending on both the PCI address and size of
the mapping, and clamped between 8 and 20 using the macros
ROCKCHIP_PCIE_AT_MIN_NUM_BITS and ROCKCHIP_PCIE_AT_MAX_NUM_BITS. As
defined in the Rockchip RK3399 TRM V1.3 Part2, Sections 17.5.5.1.1 and
17.6.8.2.1, this clamping is necessary because:
1) The lower 8 bits of the PCI address to be mapped by the outbound
region are ignored. So a minimum of 8 address bits are needed and
imply that the PCI address must be aligned to 256.
2) The outbound memory regions are 1MB in size. So while we can specify
up to 63-bits for the PCI address (num_bits filed uses bits 0 to 5 of
the outbound address region 0 register), we must limit the number of
valid address bits to 20 to match the memory window maximum size (1
<< 20 = 1MB).
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Link: https://lore.kernel.org/r/20241017015849.190271-2-dlemoal@kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When populating "ranges" property for a PCI bridge or endpoint,
of_pci_prop_ranges() incorrectly uses the CPU address of the resource. In
such PCI nodes, the window should instead be in PCI address space. Call
pci_bus_address() on the resource in order to obtain the PCI bus address.
The advertised resizable BAR size was fixed in commit 72e34b8593e0 ("PCI:
dwc: endpoint: Fix advertised resizable BAR size").
Commit 867ab111b242 ("PCI: dwc: ep: Add a generic dw_pcie_ep_linkdown()
API to handle Link Down event") was included shortly after this, and
moved the code to another function. When the code was moved, this fix
was mistakenly lost.
According to the spec, it is illegal to not have a bit set in
PCI_REBAR_CAP, and 1 MB is the smallest size allowed.
So, set bit 4 in PCI_REBAR_CAP, so that we actually advertise support
for a 1 MB BAR size.
Remove the redundant "i" at the beginning of the error message. This "i"
came from commit 1c1318866928 ("dm: prefer
'"%s...", __func__'"), the "i" is accidentally left.
When compiling kernel source 'make -j $(nproc)' with the up-and-running
KASAN-enabled kernel on a 256-core machine, the following soft lockup is
shown:
1. The following ftrace log shows that the lockup CPU spends too much
time iterating vmap_nodes and flushing TLB when purging vm_area
structures. (Some info is trimmed).
The drain_vmap_area_work() spends over 23 seconds.
There are 2805 flush_tlb_kernel_range() calls in the ftrace log.
* One is called in __purge_vmap_area_lazy().
* Others are called by purge_vmap_node->kasan_release_vmalloc.
purge_vmap_node() iteratively releases kasan vmalloc
allocations and flushes TLB for each vmap_area.
- [Rough calculation] Each flush_tlb_kernel_range() runs
about 7.5ms.
-- 2804 * 7.5ms = 21.03 seconds.
-- That's why a soft lock is triggered.
2. Extending the soft lockup time can work around the issue (For example,
# echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the
above-mentioned speculation: drain_vmap_area_work() spends too much
time.
If we combine all TLB flush operations of the KASAN shadow virtual
address into one operation in the call path
'purge_vmap_node()->kasan_release_vmalloc()', the running time of
drain_vmap_area_work() can be saved greatly. The idea is from the
flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the
soft lockup won't be triggered.
Here is the test result based on 6.10:
[6.10 wo/ the patch]
1. ftrace latency profiling (record a trace if the latency > 20s).
echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh
echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function
echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on
2. Run `make -j $(nproc)` to compile the kernel source
3. Once the soft lockup is reproduced, check the ftrace log:
cat /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
76) $ 50412985 us | } /* __purge_vmap_area_lazy */
76) $ 50412997 us | } /* drain_vmap_area_work */
76) $ 29165911 us | } /* __purge_vmap_area_lazy */
76) $ 29165926 us | } /* drain_vmap_area_work */
91) $ 53629423 us | } /* __purge_vmap_area_lazy */
91) $ 53629434 us | } /* drain_vmap_area_work */
91) $ 28121014 us | } /* __purge_vmap_area_lazy */
91) $ 28121026 us | } /* drain_vmap_area_work */
[6.10 w/ the patch]
1. Repeat step 1-2 in "[6.10 wo/ the patch]"
2. The soft lockup is not triggered and ftrace log is empty.
cat /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace
log.
4. Setting 'tracing_thresh' to 1 second gets ftrace log.
cat /sys/kernel/debug/tracing/trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
23) $ 1074942 us | } /* __purge_vmap_area_lazy */
23) $ 1074950 us | } /* drain_vmap_area_work */
The worst execution time of drain_vmap_area_work() is about 1 second.
Link: https://lore.kernel.org/lkml/ZqFlawuVnOMY2k3E@pc638.lan/ Link: https://lkml.kernel.org/r/20240726165246.31326-1-ahuang12@lenovo.com Fixes: 282631cb2447 ("mm: vmalloc: remove global purge_vmap_area_root rb-tree") Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Co-developed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Jiwei Sun <sunjw10@lenovo.com> Reviewed-by: Baoquan He <bhe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dentry_open in ovl_security_fileattr fails for any file
larger than 2GB if open method of the underlying filesystem
calls generic_file_open (e.g. fusefs).
The issue can be reproduce using the following script:
(passthrough_ll is an example app from libfuse).
The device_for_each_child_node() macro requires explicit calls to
fwnode_handle_put() upon early exits to avoid memory leaks, and in
this case the error paths are handled after jumping to
'out_flash_realease', which misses that required call to
to decrement the refcount of the child node.
A more elegant and robust solution is using the scoped variant of the
loop, which automatically handles such early exits.
Fix the child node refcounting in the error paths by using
device_for_each_child_node_scoped().
When the current_uuid attribute is set to the active policy UUID,
reading back the same attribute is returning "INVALID" instead of
the active policy UUID on some platforms before Ice Lake.
In platforms before Ice Lake, firmware provides a list of supported
thermal policies. In this case, user space can select any of the
supported thermal policies via a write to attribute "current_uuid".
In commit c7ff29763989 ("thermal: int340x: Update OS policy capability
handshake")', the OS policy handshake was updated to support Ice Lake
and later platforms and it treated priv->current_uuid_index=0 as
invalid. However, priv->current_uuid_index=0 is for the active policy,
only priv->current_uuid_index=-1 is invalid.
Fix this issue by updating the priv->current_uuid_index check.
Fixes: c7ff29763989 ("thermal: int340x: Update OS policy capability handshake") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.18+ <stable@vger.kernel.org> # 5.18+ Link: https://patch.msgid.link/20241114200213.422303-1-srinivas.pandruvada@linux.intel.com
[ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If iov_iter_zero succeeds after failed copy_from_kernel_nofault,
we need to reset the ret value to zero otherwise it will be returned
as final return value of read_kcore_iter.
This fixes objdump -d dump over /proc/kcore for me.
Cc: stable@vger.kernel.org Cc: Alexander Gordeev <agordeev@linux.ibm.com> Fixes: 3d5854d75e31 ("fs/proc/kcore.c: allow translation of physical memory addresses") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20241121231118.3212000-1-jolsa@kernel.org Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The minimal alignment of an integral type may differ from its size,
hence is not safe to assume that an arbitrary freeptr_t (which is
basically an unsigned long) is always aligned to 4 or 8 bytes.
As nothing seems to require the additional alignment, it is safe to fix
this by relaxing the check to the actual minimum alignment of freeptr_t.
In addition to a primary endpoint controller, an endpoint function may be
associated with a secondary endpoint controller, epf->sec_epc, to provide
NTB (non-transparent bridge) functionality.
Previously, pci_epc_remove_epf() incorrectly cleared epf->epc instead of
epf->sec_epc when removing from the secondary endpoint controller.
Extend the epc->list_lock coverage and clear either epf->epc or
epf->sec_epc as indicated.
Link: https://lore.kernel.org/r/20241107-epc_rfc-v2-2-da5b6a99a66f@quicinc.com Fixes: 63840ff53223 ("PCI: endpoint: Add support to associate secondary EPC with EPF") Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[mani: reworded subject and description] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
[bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K2G forwards the error triggered by a link-down state (e.g., no connected
endpoint device) on the system bus for PCI configuration transactions;
these errors are reported as an SError at system level, which is fatal and
hangs the system.
So, apply fix similar to how it was done in the DesignWare Core driver
commit 15b23906347c ("PCI: dwc: Add link up check in dw_child_pcie_ops.map_bus()").
Fixes: 10a797c6e54a ("PCI: dwc: keystone: Use pci_ops for config space accessors") Link: https://lore.kernel.org/r/20240524105714.191642-3-s-vadapalli@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
[kwilczynski: commit log, added tag for stable releases] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 23284ad677a9 ("PCI: keystone: Add support for PCIe EP in AM654x
Platforms") introduced configuring "enum dw_pcie_device_mode" as part of
device data ("struct ks_pcie_of_data"). However it failed to set the
mode for "ti,keystone-pcie" compatible.
Since the mode defaults to "DW_PCIE_UNKNOWN_TYPE", the following error
message is displayed for the v3.65a controller:
"INVALID device type 0"
Despite the driver probing successfully, the controller may not be
functional in the Root Complex mode of operation.
So, set the mode as Root Complex for "ti,keystone-pcie" compatible to
fix this.
Fixes: 23284ad677a9 ("PCI: keystone: Add support for PCIe EP in AM654x Platforms") Link: https://lore.kernel.org/r/20240524105714.191642-2-s-vadapalli@ti.com Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
[kwilczynski: commit log, added tag for stable releases] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Fix issue where disabling IBI on one device disables the entire IBI
interrupt. Modify bit 7:0 of enabled_events to serve as an IBI enable
counter, ensuring that the system IBI interrupt is disabled only when all
I3C devices have IBI disabled.
Cc: stable@kernel.org Fixes: 7ff730ca458e ("i3c: master: svc: enable the interrupt in the enable ibi function") Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241101165002.2479794-1-Frank.Li@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
svc_i3c_master_do_daa() {
...
for (i = 0; i < dev_nb; i++) {
ret = i3c_master_add_i3c_dev_locked(m, addrs[i]);
if (ret)
goto rpm_out;
}
}
If two devices (A and B) are detected in DAA and address 0xa is assigned to
device A and 0xb to device B, a failure in i3c_master_add_i3c_dev_locked()
for device A (addr: 0xa) could prevent device B (addr: 0xb) from being
registered on the bus. The I3C stack might still consider 0xb a free
address. If a subsequent Hotjoin occurs, 0xb might be assigned to Device A,
causing both devices A and B to use the same address 0xb, violating the I3C
specification.
The return value for i3c_master_add_i3c_dev_locked() should not be checked
because subsequent steps will scan the entire I3C bus, independent of
whether i3c_master_add_i3c_dev_locked() returns success.
If device A registration fails, there is still a chance to register device
B. i3c_master_add_i3c_dev_locked() can reset DAA if a failure occurs while
retrieving device information.
Cc: stable@kernel.org Fixes: 317bacf960a4 ("i3c: master: add enable(disable) hot join in sys entry") Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241002-svc-i3c-hj-v6-6-7e6e1d3569ae@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
if (dev->boardinfo && dev->boardinfo->init_dyn_addr)
^^^ here check "init_dyn_addr"
i3c_bus_set_addr_slot_status(&master->bus, dev->info.dyn_addr, ...)
^^^^
free "dyn_addr"
Fix copy/paste error "dyn_addr" by replacing it with "init_dyn_addr".
It is not valid to call pm_runtime_set_suspended() for devices
with runtime PM enabled because it returns -EAGAIN if it is enabled
already and working. So, call pm_runtime_disable() before to fix it.
v1 of the patch which introduced the ufshcd_vops_hibern8_notify()
callback used a bool instead of an enum. In v2 this was updated to an
enum based on the review feedback in [1].
ufs-exynos hibernate calls have always been broken upstream as it
follows the v1 bool implementation.
Move the EXYNOS_UFS_OPT_UFSPR_SECURE check inside
exynos_ufs_config_smu().
This way all call sites will benefit from the check. This fixes a bug
currently in the exynos_ufs_resume() path on gs101 as it calls
exynos_ufs_config_smu() and we end up accessing registers that can only
be accessed from secure world which results in a serror.
Fixes: d11e0a318df8 ("scsi: ufs: exynos: Add support for Tensor gs101 SoC") Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20241031150033.3440894-5-peter.griffin@linaro.org Cc: stable@vger.kernel.org Reviewed-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arch_stack_walk_user_common() contains a return statement instead of a
break statement in case store_ip() fails while trying to store a callchain
entry of a user space process.
This may lead to a missing pagefault_enable() call.
If this happens any subsequent page fault of the process won't be resolved
by the page fault handler and this in turn will lead to the process being
killed.
Use a break instead of a return statement to fix this.
A bug was found in the find_closest() (find_closest_descending() is also
affected after some testing), where for certain values with small
progressions, the rounding (done by averaging 2 values) causes an
incorrect index to be returned. The rounding issues occur for
progressions of 1, 2 and 3. It goes away when the progression/interval
between two values is 4 or larger.
It's particularly bad for progressions of 1. For example if there's an
array of 'a = { 1, 2, 3 }', using 'find_closest(2, a ...)' would return 0
(the index of '1'), rather than returning 1 (the index of '2'). This
means that for exact values (with a progression of 1), find_closest() will
misbehave and return the index of the value smaller than the one we're
searching for.
For progressions of 2 and 3, the exact values are obtained correctly; but
values aren't approximated correctly (as one would expect). Starting with
progressions of 4, all seems to be good (one gets what one would expect).
While one could argue that 'find_closest()' should not be used for arrays
with progressions of 1 (i.e. '{1, 2, 3, ...}', the macro should still
behave correctly.
The bug was found while testing the 'drivers/iio/adc/ad7606.c',
specifically the oversampling feature.
For reference, the oversampling values are listed as:
static const unsigned int ad7606_oversampling_avail[7] = {
1, 2, 4, 8, 16, 32, 64,
};
When doing:
1. $ echo 1 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio
$ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
1 # this is fine
2. $ echo 2 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio
$ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
1 # this is wrong; 2 should be returned here
3. $ echo 3 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio
$ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
2 # this is fine
4. $ echo 4 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio
$ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio
4 # this is fine
And from here-on, the values are as correct (one gets what one would
expect.)
While writing a kunit test for this bug, a peculiar issue was found for the
array in the 'drivers/hwmon/ina2xx.c' & 'drivers/iio/adc/ina2xx-adc.c'
drivers. While running the kunit test (for 'ina226_avg_tab' from these
drivers):
* idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab));
This returns idx == 0, so value.
* idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab));
This returns idx == 0, value 1; and now one could argue whether 3 is
closer to 4 or to 1. This quirk only appears for value '3' in this
array, but it seems to be a another rounding issue.
* And from 4 onwards the 'find_closest'() works fine (one gets what one
would expect).
This change reworks the find_closest() macros to also check the difference
between the left and right elements when 'x'. If the distance to the right
is smaller (than the distance to the left), the index is incremented by 1.
This also makes redundant the need for using the DIV_ROUND_CLOSEST() macro.
In order to accommodate for any mix of negative + positive values, the
internal variables '__fc_x', '__fc_mid_x', '__fc_left' & '__fc_right' are
forced to 'long' type. This also addresses any potential bugs/issues with
'x' being of an unsigned type. In those situations any comparison between
signed & unsigned would be promoted to a comparison between 2 unsigned
numbers; this is especially annoying when '__fc_left' & '__fc_right'
underflow.
The find_closest_descending() macro was also reworked and duplicated from
the find_closest(), and it is being iterated in reverse. The main reason
for this is to get the same indices as 'find_closest()' (but in reverse).
The comparison for '__fc_right < __fc_left' favors going the array in
ascending order.
For example for array '{ 1024, 512, 256, 128, 64, 16, 4, 1 }' and x = 3, we
get:
__fc_mid_x = 2
__fc_left = -1
__fc_right = -2
Then '__fc_right < __fc_left' evaluates to true and '__fc_i++' becomes 7
which is not quite incorrect, but 3 is closer to 4 than to 1.
This change has been validated with the kunit from the next patch.
These four chips:
* W25N512GW
* W25N01GW
* W25N01JW
* W25N02JW
all require a single bit of ECC strength and thus feature an on-die
Hamming-like ECC engine. There is no point in filling a ->get_status()
callback for them because the main ECC status bytes are located in
standard places, and retrieving the number of bitflips in case of
corrected chunk is both useless and unsupported (if there are bitflips,
then there is 1 at most, so no need to query the chip for that).
Without this change, a kernel warning triggers every time a bit flips.
Fixes: 6a804fb72de5 ("mtd: spinand: winbond: add support for serial NAND flash") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de> Link: https://lore.kernel.org/linux-mtd/20241009125002.191109-3-miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This eliminates a redundant get_current_cred() call, because
ceph_mds_check_access() has already obtained this pointer.
As a side effect, this also fixes a reference leak in
ceph_mds_auth_match(): by omitting the get_current_cred() call, no
additional cred reference is taken.
Previously, the "name" in the new device syntax "<name>@<fsid>.<fsname>"
was ignored because (presumably) tests were done using mount.ceph which
also passed the entity name using "-o name=foo". If mounting is done
without the mount.ceph helper, the new device id syntax fails to set
the name properly.
The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.
This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().
If allocated object failed in alloc_consistency_checks, all objects of
the slab will be marked as used, and then the slab will be removed from
the partial list.
When an object belonging to the slab got freed later, the remove_full()
function is called. Because the slab is neither on the partial list nor
on the full list, it eventually lead to a list corruption (actually a
list poison being detected).
So we need to mark and isolate the slab page with metadata corruption,
do not put it back in circulation.
Because the debug caches avoid all the fastpaths, reusing the frozen bit
to mark slab page with metadata corruption seems to be fine.
The suspend/resume functionality is currently broken on the i.MX6QDL
platform, as documented in the NXP errata (ERR005723):
https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf
This patch addresses the issue by sharing most of the suspend/resume
sequences used by other i.MX devices, while avoiding modifications to
critical registers that disrupt the PCIe functionality. It targets the
same problem as the following downstream commit:
Unlike the downstream commit, this patch also resets the connected PCIe
device if possible. Without this reset, certain drivers, such as ath10k
or iwlwifi, will crash on resume. The device reset is also done by the
driver on other i.MX platforms, making this patch consistent with
existing practices.
Upon resuming, the kernel will hang and display an error. Here's an
example of the error encountered with the ath10k driver:
ath10k_pci 0000:01:00.0: Unable to change power state from D3hot to D0, device inaccessible
Unhandled fault: imprecise external abort (0x1406) at 0x0106f944
Without this patch, suspend/resume will fail on i.MX6QDL devices if a
PCIe device is connected.
Link: https://lore.kernel.org/r/20241030103250.83640-1-eichest@gmail.com Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com>
[kwilczynski: commit log, added tag for stable releases] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by: Richard Zhu <hongxing.zhu@nxp.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the rproc "atomic_t power" variable is incremented during:
a. WPSS rproc auto boot.
b. AHB power on for ath11k.
During AHB power off (rmmod ath11k_ahb.ko), rproc_shutdown fails
to unload the WPSS firmware because the rproc->power value is '2',
causing the atomic_dec_and_test(&rproc->power) condition to fail.
Consequently, during AHB power on (insmod ath11k_ahb.ko),
QMI_WLANFW_HOST_CAP_REQ_V01 fails due to the host and firmware QMI
states being out of sync.
Currently, pmu_metrics_table__find() will return NULL when only do perf
stat on sys metric. Then parse_groups() will never be called to parse
sys metric_name, finally perf tool will exit directly. This should be a
common problem.
To fix the issue, this will keep the logic before commit f20c15d13f01
("perf pmu-events: Remember the perf_events_map for a PMU") to return a
empty metric table rather than a NULL pointer.
This should be fine since the removed part just check if the table match
provided metric_name. Without these code, the code in parse_groups()
will also check the validity of metrci_name too.
Fixes: f20c15d13f017d4b ("perf pmu-events: Remember the perf_events_map for a PMU") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Xu Yang <xu.yang_2@nxp.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241107162035.52206-2-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, the cfg_1_9_0 which is being used for X1E80100 doesn't disable
ASPM L0s. However, hardware team recommends to disable L0s as the PHY init
sequence is not tuned support L0s. Hence reuse cfg_sc8280xp for X1E80100.
Note that the config_sid() callback is not present in cfg_sc8280xp, don't
concern about this because config_sid() callback is originally a no-op
for X1E80100.
The unsigned variable `size_t len` is cast to the signed type `loff_t`
when passed to the function check_add_overflow(). This function considers
the type of the destination, which is of type loff_t (signed),
potentially leading to an overflow. This issue is similar to the one
described in the link below.
Remove the cast.
Note that even if check_add_overflow() is bypassed, by setting `len` to
a value that is greater than LONG_MAX (which is considered as a negative
value after the cast), the function copy_from_user(), invoked a few lines
later, will not perform any copy and return `len` as (len > INT_MAX)
causing qat_vf_resume_write() to fail with -EFAULT.
The code for syncing vmalloc memory PGD pointers is using
atomic_read() in pair with atomic_set_release() but the
proper pairing is atomic_read_acquire() paired with
atomic_set_release().
This is done to clearly instruct the compiler to not
reorder the memcpy() or similar calls inside the section
so that we do not observe changes to init_mm. memcpy()
calls should be identified by the compiler as having
unpredictable side effects, but let's try to be on the
safe side.
Cc: stable@vger.kernel.org Fixes: d31e23aff011 ("ARM: mm: make vmalloc_seq handling SMP safe") Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When switching task, in addition to a dummy read from the new
VMAP stack, also do a dummy read from the VMAP stack's
corresponding KASAN shadow memory to sync things up in
the new MM context.
Cc: stable@vger.kernel.org Fixes: a1c510d0adc6 ("ARM: implement support for vmap'ed stacks") Link: https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db6d@foss.st.com/ Reported-by: Clement LE GOFFIC <clement.legoffic@foss.st.com> Suggested-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The stack depot filters out everything outside of the top interrupt
context as an uninteresting or irrelevant part of the stack traces. This
helps with stack trace de-duplication, avoiding an explosion of saved
stack traces that share the same IRQ context code path but originate
from different randomly interrupted points, eventually exhausting the
stack depot.
Filtering uses in_irqentry_text() to identify functions within the
.irqentry.text and .softirqentry.text sections, which then become the
last stack trace entries being saved.
While __do_softirq() is placed into the .softirqentry.text section by
common code, populating .irqentry.text is architecture-specific.
Currently, the .irqentry.text section on s390 is empty, which prevents
stack depot filtering and de-duplication and could result in warnings
like:
Fix this by moving the IO/EXT interrupt handlers from .kprobes.text into
the .irqentry.text section and updating the kprobes blacklist to include
the .irqentry.text section.
This is done only for asynchronous interrupts and explicitly not for
program checks, which are synchronous and where the context beyond the
program check is important to preserve. Despite machine checks being
somewhat in between, they are extremely rare, and preserving context
when possible is also of value.
SVCs and Restart Interrupts are not relevant, one being always at the
boundary to user space and the other being a one-time thing.
IRQ entries filtering is also optionally used in ftrace function graph,
where the same logic applies.
When sync:ing the VMALLOC area to other CPUs, make sure to also
sync the KASAN shadow memory for the VMALLOC area, so that we
don't get stale entries for the shadow memory in the top level PGD.
Since we are now copying PGDs in two instances, create a helper
function named memcpy_pgd() to do the actual copying, and
create a helper to map the addresses of VMALLOC_START and
VMALLOC_END into the corresponding shadow memory.
Co-developed-by: Melon Liu <melon1335@163.com> Cc: stable@vger.kernel.org Fixes: 565cbaad83d8 ("ARM: 9202/1: kasan: support CONFIG_KASAN_VMALLOC") Link: https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db6d@foss.st.com/ Reported-by: Clement LE GOFFIC <clement.legoffic@foss.st.com> Suggested-by: Mark Rutland <mark.rutland@arm.com> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This loop requires explicit calls to of_node_put() upon early exits
(break, goto, return) to decrement the child refcounter and avoid memory
leaks if the child is not required out of the loop.
A more robust solution is using the scoped variant of the macro, which
automatically calls of_node_put() when the child goes out of scope.
In attempting to optimize fw_devlink runtime, I introduced numerous cycle
detection bugs by foregoing cycle detection logic under specific
conditions. Each fix has further narrowed the conditions for optimization.
It's time to give up on these optimization attempts and just run the cycle
detection logic every time fw_devlink tries to create a device link.
The specific bug report that triggered this fix involved a supplier fwnode
that never gets a device created for it. Instead, the supplier fwnode is
represented by the device that corresponds to an ancestor fwnode.
In this case, fw_devlink didn't do any cycle detection because the cycle
detection logic is only run when a device link is created between the
devices that correspond to the actual consumer and supplier fwnodes.
With this change, fw_devlink will run cycle detection logic even when
creating SYNC_STATE_ONLY proxy device links from a device that is an
ancestor of a consumer fwnode.
The bin_attr_nvmem_write() must check the read_only flag and block
writes on read-only devices, now that a nvmem device can be switched
between read-write and read-only mode at runtime using the force_ro
attribute. Add the missing check.
In the ad7780_write_raw() , val2 can be zero, which might lead to a
division by zero error in DIV_ROUND_CLOSEST(). The ad7780_write_raw()
is based on iio_info's write_raw. While val is explicitly declared that
can be zero (in read mode), val2 is not specified to be non-zero.
The comment before the config of the GPLL3 PLL says that the
PLL should run at 930 MHz. In contrary to this, calculating
the frequency from the current configuration values by using
19.2 MHz as input frequency defined in 'qcs404.dtsi', it gives
921.6 MHz:
Under certain conditions, the 64-bit '-mstack-protector-guard' flags may
end up in the 32-bit vDSO flags, resulting in build failures due to the
structure of clang's argument parsing of the stack protector options,
which validates the arguments of the stack protector guard flags
unconditionally in the frontend, choking on the 64-bit values when
targeting 32-bit:
clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2
clang: error: invalid value 'r13' in 'mstack-protector-guard-reg=', expected one of: r2
make[3]: *** [arch/powerpc/kernel/vdso/Makefile:85: arch/powerpc/kernel/vdso/vgettimeofday-32.o] Error 1
make[3]: *** [arch/powerpc/kernel/vdso/Makefile:87: arch/powerpc/kernel/vdso/vgetrandom-32.o] Error 1
Remove these flags by adding them to the CC32FLAGSREMOVE variable, which
already handles situations similar to this. Additionally, reformat and
align a comment better for the expanding CONFIG_CC_IS_CLANG block.
Patch series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()". v2.
According to the logic of damon_va_evenly_split_region(), currently
following split case would not meet the expectation:
Suppose DAMON_MIN_REGION=0x1000,
Case: Split [0x0, 0x3000) into 2 pieces, then the result would be
acutually 3 regions:
[0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000)
but NOT the expected 2 regions:
[0x0, 0x1000), [0x1000, 0x3000) !!!
The root cause is that when calculating size of each split piece in
damon_va_evenly_split_region():
both the dividing and the ALIGN_DOWN may cause loss of precision, then
each time split one piece of size 'sz_piece' from origin 'start' to 'end'
would cause more pieces are split out than expected!!!
To fix it, count for each piece split and make sure no more than
'nr_pieces'. In addition, add above case into damon_test_split_evenly().
And add 'nr_piece == 1' check in damon_va_evenly_split_region() for better
code readability and add a corresponding kunit testcase.
This patch (of 2):
According to the logic of damon_va_evenly_split_region(), currently
following split case would not meet the expectation:
Suppose DAMON_MIN_REGION=0x1000,
Case: Split [0x0, 0x3000) into 2 pieces, then the result would be
acutually 3 regions:
[0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000)
but NOT the expected 2 regions:
[0x0, 0x1000), [0x1000, 0x3000) !!!
The root cause is that when calculating size of each split piece in
damon_va_evenly_split_region():
both the dividing and the ALIGN_DOWN may cause loss of precision,
then each time split one piece of size 'sz_piece' from origin 'start' to
'end' would cause more pieces are split out than expected!!!
To fix it, count for each piece split and make sure no more than
'nr_pieces'. In addition, add above case into damon_test_split_evenly().
Since commit 92a81562e695 ("leds: lp55xx: Add multicolor framework
support to lp55xx") there are two subsequent tests if the chan_nr
(reg property) is in valid range. One in the lp55xx_init_led()
function and one in the lp55xx_parse_common_child() function that
was added with the mentioned commit.
There are two issues with that.
First is in the lp55xx_parse_common_child() function where the reg
property is tested right after it is read from the device tree.
Test for the upper range is not correct though. Valid reg values are
0 to (max_channel - 1) so it should be >=.
Second issue is that in case the parsed value is out of the range
the probe just fails and no error message is shown as the code never
reaches the second test that prints and error message.
Remove the test form lp55xx_parse_common_child() function completely
and keep the one in lp55xx_init_led() function to deal with it.
Fixes: 92a81562e695 ("leds: lp55xx: Add multicolor framework support to lp55xx") Cc: stable@vger.kernel.org Signed-off-by: Michal Vokáč <michal.vokac@ysoft.com> Link: https://lore.kernel.org/r/20241017150812.3563629-1-michal.vokac@ysoft.com Signed-off-by: Lee Jones <lee@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Null pointer dereference occurs due to a race between smmu
driver probe and client driver probe, when of_dma_configure()
for client is called after the iommu_device_register() for smmu driver
probe has executed but before the driver_bound() for smmu driver
has been called.
ARM_LPAE_LVL_IDX() takes into account concatenated PGDs and can return
an index spanning multiple page-table pages given a sufficiently large
input address. However, when the resulting index is used to calculate
the number of remaining entries in the page, the possibility of
concatenation is ignored and we end up computing a negative upper bound:
On the map path, this results in a negative 'mapped' value being
returned but on the unmap path we can leak child tables if they are
skipped in __arm_lpae_free_pgtable().
Introduce an arm_lpae_max_entries() helper to convert a table index into
the remaining number of entries within a single page-table page.
Patch series "zram: IDLE flag handling fixes", v2.
zram can wrongly preserve ZRAM_IDLE flag on its entries which can result
in premature post-processing (writeback and recompression) of such
entries.
This patch (of 2)
Recompression should clear ZRAM_IDLE flag on the entries it has accessed,
because otherwise some entries, specifically those for which recompression
has failed, become immediate candidate entries for another post-processing
(e.g. writeback).
Consider the following case:
- recompression marks entries IDLE every 4 hours and attempts
to recompress them
- some entries are incompressible, so we keep them intact and
hence preserve IDLE flag
- writeback marks entries IDLE every 8 hours and writebacks
IDLE entries, however we have IDLE entries left from
recompression, so writeback prematurely writebacks those
entries.
Since 5.14-rc1, NUMA events will only be folded from per-CPU statistics to
per zone and global statistics when the user actually needs it.
Currently, the kernel has performs the fold operation when reading
/proc/vmstat, but does not perform the fold operation in /proc/zoneinfo.
This can lead to inaccuracies in the following statistics in zoneinfo:
- numa_hit
- numa_miss
- numa_foreign
- numa_interleave
- numa_local
- numa_other
Therefore, before printing per-zone vm_numa_event when reading
/proc/zoneinfo, we should also perform the fold operation.
Link: https://lkml.kernel.org/r/1730433998-10461-1-git-send-email-mengensun@tencent.com Fixes: f19298b9516c ("mm/vmstat: convert NUMA statistics to basic NUMA counters") Signed-off-by: MengEn Sun <mengensun@tencent.com> Reviewed-by: JinLiang Zheng <alexjlzheng@tencent.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current mod command causes a null pointer dereference. While commit 0f17976568b3f ("ftrace: Fix regression with module command in stack_trace_filter")
has addressed part of the issue, it left a corner case unhandled, which still
results in a kernel crash.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241120052750.275463-1-guoweikang.kernel@gmail.com Fixes: 04ec7bb642b77 ("tracing: Have the trace_array hold the list of registered func probes"); Signed-off-by: guoweikang <guoweikang.kernel@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, when storing NULL on mas_store_root(), the behavior could be
improved.
Storing NULLs over the entire tree may result in a node being used to
store a single range. Further stores of NULL may cause the node and
tree to be corrupt and cause incorrect behaviour. Fixing the store to
the root null fixes the issue by ensuring that a range of 0 - ULONG_MAX
results in an empty tree.
Users of the tree may experience incorrect values returned if the tree
was expanded to store values, then overwritten by all NULLS, then
continued to store NULLs over the empty area.
For example possible cases are:
* store NULL at any range result a new node
* store NULL at range [m, n] where m > 0 to a single entry tree result
a new node with range [m, n] set to NULL
* store NULL at range [m, n] where m > 0 to an empty tree result
consecutive NULL slot
* it allows for multiple NULL entries by expanding root
to store NULLs to an empty tree
This patch tries to improve in:
* memory efficient by setting to empty tree instead of using a node
* remove the possibility of consecutive NULL slot which will prohibit
extended null in later operation
Link: https://lkml.kernel.org/r/20241031231627.14316-5-richard.weiyang@gmail.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a check to the ovl_dentry_weird() function to prevent the
processing of directory inodes that lack the lookup function.
This is important because such inodes can cause errors in overlayfs
when passed to the lowerstack.
In kunit_debugfs_create_suite(), if alloc_string_stream() fails in the
kunit_suite_for_each_test_case() loop, the "suite->log = stream"
has assigned before, and the error path only free the suite->log's stream
memory but not set it to NULL, so the later string_stream_clear() of
suite->log in kunit_init_suite() will cause below UAF bug.
Since the new fgraph requires to initialize fgraph_ops.ops.func_hash before
calling register_ftrace_graph(), initialize it with default (tracing all
functions) parameter.
Cc: stable@vger.kernel.org Fixes: 5fccc7552ccb ("ftrace: Add subops logic to allow one ops to manage many") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Per UVC 1.1+ specification 3.7.2, units and terminals must have a non-zero
unique ID.
```
Each Unit and Terminal within the video function is assigned a unique
identification number, the Unit ID (UID) or Terminal ID (TID), contained in
the bUnitID or bTerminalID field of the descriptor. The value 0x00 is
reserved for undefined ID,
```
So, deny allocating an entity with ID 0 or an ID that belongs to a unit
that is already added to the list of entities.
This also prevents some syzkaller reproducers from triggering warnings due
to a chain of entities referring to themselves. In one particular case, an
Output Unit is connected to an Input Unit, both with the same ID of 1. But
when looking up for the source ID of the Output Unit, that same entity is
found instead of the input entity, which leads to such warnings.
In another case, a backward chain was considered finished as the source ID
was 0. Later on, that entity was found, but its pads were not valid.
Here is a sample stack trace for one of those cases.
uvc_unregister_video() can be called asynchronously from
uvc_disconnect(). If the device is still streaming when that happens, a
plethora of race conditions can occur.
Make sure that the device has stopped streaming before exiting this
function.
If the user still holds handles to the driver's file descriptors, any
ioctl will return -ENODEV from the v4l2 core.
This change makes uvc more consistent with the rest of the v4l2 drivers
using the vb2_fop_* and vb2_ioctl_* helpers.
This driver (and many other usb drivers) always had this problem, but it
wasn't possible to easily fix this until the vb2_video_unregister_device()
helper was added. So the Fixes tag points to the creation of that helper.
It is necessary to account for I2C quirks in the burst mode path of this
driver. Not all I2C controllers can accept arbitrarily long writes and this
is represented in the quirks field of the adapter structure.
Prior to this patch the following error message is seen on a Qualcomm
X1E80100 CRD.
[ 38.773524] i2c i2c-2: adapter quirk: msg too long (addr 0x0036, size 290, write)
[ 38.781454] ov08x40 2-0036: Failed regs transferred: -95
[ 38.787076] ov08x40 2-0036: ov08x40_start_streaming failed to set regs
Fix the error by breaking up the write sequence into the advertised maximum
write size of the quirks field if the quirks field is populated.
In set_frame_rate(), select a rate in rate_0 or rate_1 by checking
sd->frame_rate >= r->fps in a loop, but the loop condition terminates when
the index reaches zero, which fails to check the last elememt in rate_0 or
rate_1.
Check for >= 0 so that the last one in rate_0 or rate_1 is also checked.
Fixes: 189d92af707e ("V4L/DVB (13422): gspca - ov534: ov772x changes from Richard Kaswy.") Cc: stable@vger.kernel.org Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not valid to call pm_runtime_set_suspended() for devices
with runtime PM enabled because it returns -EAGAIN if it is enabled
already and working. So, call pm_runtime_disable() before to fix it.
Cc: stable@vger.kernel.org Fixes: af2c3834c8ca ("[media] media: venus: adding core part and helper functions") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Bryan O'Donoghue <bryan.odonoghue@linaro.org> Acked-by: Stanimir Varbanov <stanimir.k.varbanov@gmail.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not valid to call pm_runtime_set_suspended() for devices
with runtime PM enabled because it returns -EAGAIN if it is enabled
already and working. So, call pm_runtime_disable() before to fix it.
The DP83869 PHY transceiver supports converting from RGMII to 1000base-x.
In this operation mode, autonegotiation can be performed, as described in
IEEE802.3.
The DP83869 has a set of fiber-specific registers located at offset 0xc00.
When the transceiver is configured in RGMII-to-1000base-x mode, these
registers are mapped onto offset 0, which should make reading the
autonegotiation status transparent.
However, the fiber registers at offset 0xc04 and 0xc05 follow the bit
layout specified in Clause 37, and genphy_read_status() assumes a Clause 22
layout. Thus, genphy_read_status() doesn't properly read the capabilities
advertised by the link partner, resulting in incorrect link parameters.
In fimc_md_is_isp_available(), of_get_child_by_name() is called to check
if FIMC-IS is available. Current code does not decrement the refcount of
the returned device node, which causes an OF node reference leak. Fix it
by calling of_node_put() at the end of the variable scope.
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp> Fixes: e781bbe3fecf ("[media] exynos4-is: Add fimc-is subdevs registration") Cc: stable@vger.kernel.org Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
[hverkuil: added CC to stable] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The cause of the problem is that when using sysfs to dynamically register
an i2c device, there is no platform data, but the probe process of ts2020
needs to use platform data, resulting in a null pointer being accessed.
Solve this problem by adding checks to platform data.
The destination buffer flags are assigned twice but source is not set in
what looks like a copy+paste mistake. Assign the source queue flags so
the 32-bit DMA limitation is handled consistently.
Fixes: ec9ef8dda2a2 ("media: rockchip: rga: set dma mask to 32 bits") Cc: <stable@vger.kernel.org> Signed-off-by: John Keeping <jkeeping@inmusicbrands.com> Reviewed-by: Michael Tretter <m.tretter@pengutronix.de> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The power suppliers are always requested to suspend asynchronously,
dev_pm_domain_detach() requires the caller to ensure proper
synchronization of this function with power management callbacks.
otherwise the detach may led to kernel panic, like below:
If an error occurs in the probe() function, we should remove the polling
timer that was alarmed earlier, otherwise the timer is called with
arguments that are already freed, which results in a crash.
Fixes: 4e66a52a2e4c ("[media] tc358743: Add support for platforms without IRQ line") Signed-off-by: Alexander Shiyan <eagle.alexander923@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not valid to call pm_runtime_set_suspended() and
pm_runtime_set_active() for devices with runtime PM enabled because it
returns -EAGAIN if it is enabled already and working. So, adjust the
order to fix it.
Cc: stable@vger.kernel.org Fixes: 5f9a089b6de3 ("dw9768: Enable low-power probe on ACPI") Suggested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The workqueue should be destroyed in mtk_jpeg_core.c since commit 09aea13ecf6f ("media: mtk-jpeg: refactor some variables"), otherwise
the below calltrace can be easily triggered.
The video drvdata should be set before the video device is registered,
otherwise video_drvdata() may return NULL in the open() file ops, and led
to oops.
Fixes: 2db16c6ed72c ("media: imx-jpeg: Add V4L2 driver for i.MX8 JPEG Encoder/Decoder") Cc: <stable@vger.kernel.org> Signed-off-by: Ming Qian <ming.qian@nxp.com> Reviewed-by: TaoJiang <tao.jiang_2@nxp.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The video drvdata should be set before the video device is registered,
otherwise video_drvdata() may return NULL in the open() file ops, and led
to oops.
There is a chance to meet runtime issues during configuration of CAMSS
power domains, because on the error path dev_pm_domain_detach() is
unexpectedly called with NULL or error pointer.
One of the simplest ways to reproduce the problem is to probe CAMSS
driver before registration of CAMSS power domains, for instance if
a platform CAMCC driver is simply not built.
Warning backtrace example:
Unable to handle kernel NULL pointer dereference at virtual address 00000000000001a2
<snip>
pc : dev_pm_domain_detach+0x8/0x48
lr : camss_probe+0x374/0x9c0
The reset line of the IT6505 bridge chip is active low, not active high.
It was incorrectly inverted in the device tree as the implementation at
the time incorrectly inverted the polarity in its driver, due to a prior
device having an inline inverting level shifter.
Fix the polarity now while the external display pipeline is incomplete,
thereby avoiding any impact to running systems.
A matching fix for the driver should be included if this change is
backported.
Fixes: 8855d01fb81f ("arm64: dts: mediatek: Add MT8186 Krabby platform based Tentacruel / Tentacool") Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Link: https://lore.kernel.org/r/20241029100226.660263-1-wenst@chromium.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The GPU SRAM supply is supposed to be always at least 0.1V higher than
the GPU supply. However when the DT was upstreamed, the spread was
incorrectly set to 0.01V.
Fixes: 8855d01fb81f ("arm64: dts: mediatek: Add MT8186 Krabby platform based Tentacruel / Tentacool") Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20241021140537.3049232-1-wenst@chromium.org Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The way InvenSense MPU-6050 accelerometer is mounted on the user-facing side
of the Pine64 PinePhone mainboard, which makes it rotated 90 degrees counter-
clockwise, [1] requires the accelerometer's x- and y-axis to be swapped, and
the direction of the accelerometer's y-axis to be inverted.
Rectify this by adding a mount-matrix to the accelerometer definition in the
Pine64 PinePhone dtsi file.
One customer reports a bug: raid5 is hung when changing thread cnt
while resync is running. The stripes are all in conf->handle_list
and new threads can't handle them.
Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
pers->quiesce from mddev_suspend/resume. Before this patch, mddev_suspend
needs to wait for all ios including sync io to finish. Now it's used
to only wait normal io.
Fix this by calling raid5_quiesce from raid5_store_group_thread_cnt
directly to wait all sync requests to finish before changing the group
cnt.
Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()") Cc: stable@vger.kernel.org Signed-off-by: Xiao Ni <xni@redhat.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20241106095124.74577-1-xni@redhat.com Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jordan reported that when running Cilium with netkit in per-endpoint-routes
mode, network policy misclassifies traffic. In this direct routing mode
of Cilium which is used in case of GKE/EKS/AKS, the Pod's BPF program to
enforce policy sits on the netkit primary device's egress side.
The issue here is that in case of netkit's netkit_prep_forward(), it will
clear meta data such as skb->mark and skb->priority before executing the
BPF program. Thus, identity data stored in there from earlier BPF programs
(e.g. from tcx ingress on the physical device) gets cleared instead of
being made available for the primary's program to process. While for traffic
egressing the Pod via the peer device this might be desired, this is
different for the primary one where compared to tcx egress on the host
veth this information would be available.
To address this, add a new parameter for the device orchestration to
allow control of skb->mark and skb->priority scrubbing, to make the two
accessible from BPF (and eventually leave it up to the program to scrub).
By default, the current behavior is retained. For netkit peer this also
enables the use case where applications could cooperate/signal intent to
the BPF program.
Note that struct netkit has a 4 byte hole between policy and bundle which
is used here, in other words, struct netkit's first cacheline content used
in fast-path does not get moved around.
Fixes: 35dfaad7188c ("netkit, bpf: Add bpf programmable net device") Reported-by: Jordan Rife <jrife@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Nikolay Aleksandrov <razor@blackwall.org> Link: https://github.com/cilium/cilium/issues/34042 Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20241004101335.117711-1-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
At btrfs_ref_tree_mod() after we successfully inserted the new ref entry
(local variable 'ref') into the respective block entry's rbtree (local
variable 'be'), if we find an unexpected action of BTRFS_DROP_DELAYED_REF,
we error out and free the ref entry without removing it from the block
entry's rbtree. Then in the error path of btrfs_ref_tree_mod() we call
btrfs_free_ref_cache(), which iterates over all block entries and then
calls free_block_entry() for each one, and there we will trigger a
use-after-free when we are called against the block entry to which we
added the freed ref entry to its rbtree, since the rbtree still points
to the block entry, as we didn't remove it from the rbtree before freeing
it in the error path at btrfs_ref_tree_mod(). Fix this by removing the
new ref entry from the rbtree before freeing it.
Syzbot report this with the following stack traces:
The buggy address belongs to the object at ffff888042d1af00
which belongs to the cache kmalloc-64 of size 64
The buggy address is located 56 bytes inside of
freed 64-byte region [ffff888042d1af00, ffff888042d1af40)
Memory state around the buggy address: ffff888042d1ae00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888042d1ae80: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>ffff888042d1af00: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
^ ffff888042d1af80: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc ffff888042d1b000: 00 00 00 00 00 fc fc 00 00 00 00 00 fc fc 00 00
Reported-by: syzbot+7325f164162e200000c1@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/673723eb.050a0220.1324f8.00a8.GAE@google.com/T/#u Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Syzbot reports a null-ptr-deref in btrfs_search_slot().
The reproducer is using rescue=ibadroots, and the extent tree root is
corrupted thus the extent tree is NULL.
When scrub tries to search the extent tree to gather the needed extent
info, btrfs_search_slot() doesn't check if the target root is NULL or
not, resulting the null-ptr-deref.
Add sanity check for btrfs root before using it in btrfs_search_slot().
When checking for delayed refs when verifying if there are cross
references for a data extent, we stop if the path has nowait set and we
can't try lock the delayed ref head's mutex, returning -EAGAIN with the
goal of making a write fallback to a blocking context. However we ignore
the -EAGAIN at btrfs_cross_ref_exist() when check_delayed_ref() returns
it, and keep looping instead of immediately returning the -EAGAIN to the
caller.
Fix this by not looping if we get -EAGAIN and we have a nowait path.
Fixes: 26ce91144631 ("btrfs: make can_nocow_extent nowait compatible") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
The buggy address belongs to the object at ffff888106a83f00
which belongs to the cache kmalloc-rnd-07-96 of size 96
The buggy address is located 24 bytes inside of
freed 96-byte region [ffff888106a83f00, ffff888106a83f60)
Memory state around the buggy address: ffff888106a83e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888106a83e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
>ffff888106a83f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
^ ffff888106a83f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc ffff888106a84000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Further analyzing the trace and the crash dump's vmcore file shows that
the wake_up() call in btrfs_encoded_read_endio() is calling wake_up() on
the wait_queue that is in the private data passed to the end_io handler.
Commit 4ff47df40447 ("btrfs: move priv off stack in
btrfs_encoded_read_regular_fill_pages()") moved 'struct
btrfs_encoded_read_private' off the stack.
Before that commit one can see a corruption of the private data when
analyzing the vmcore after a crash:
Here we can see several indicators of in-memory data corruption, e.g. the
large negative atomic values of ->pending or
->wait->lock->rlock->raw_lock->val, as well as the bogus spinlock magic
0x1ff7ae32 (decimal 536325682 above) instead of 0xdead4ead or the bogus
pointer values for ->wait->head.
To fix this, change atomic_dec_return() to atomic_dec_and_test() to fix the
corruption, as atomic_dec_return() is defined as two instructions on
x86_64, whereas atomic_dec_and_test() is defined as a single atomic
operation. This can lead to a situation where counter value is already
decremented but the if statement in btrfs_encoded_read_endio() is not
completely processed, i.e. the 0 test has not completed. If another thread
continues executing btrfs_encoded_read_regular_fill_pages() the
atomic_dec_return() there can see an already updated ->pending counter and
continues by freeing the private data. Continuing in the endio handler the
test for 0 succeeds and the wait_queue is woken up, resulting in a
use-after-free.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Suggested-by: Damien Le Moal <Damien.LeMoal@wdc.com> Fixes: 1881fba89bd5 ("btrfs: add BTRFS_IOC_ENCODED_READ ioctl") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
Change btrfs_encoded_read_regular_fill_pages() so that the priv struct
is allocated rather than stored on the stack, in preparation for adding
an asynchronous mode to the function.
Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 05b36b04d74a ("btrfs: fix use-after-free in btrfs_encoded_read_endio()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Change the behaviour of btrfs_encoded_read() so that if it needs to read
an extent from disk, it leaves the extent and inode locked and returns
-EIOCBQUEUED. The caller is then responsible for doing the I/O via
btrfs_encoded_read_regular() and unlocking the extent and inode.
Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 05b36b04d74a ("btrfs: fix use-after-free in btrfs_encoded_read_endio()") Signed-off-by: Sasha Levin <sashal@kernel.org>
The file_offset parameter used to be passed to encoded read struct but
was removed in commit b665affe93d8 ("btrfs: remove unused members from
struct btrfs_encoded_read_private").
Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
Stable-dep-of: 05b36b04d74a ("btrfs: fix use-after-free in btrfs_encoded_read_endio()") Signed-off-by: Sasha Levin <sashal@kernel.org>
Since we currently don't always flush the quota_release_work queue in
this path, we can end up with the following race:
1. dquot are added to releasing_dquots list during regular operations.
2. FS Freeze starts, however, this does not flush the quota_release_work queue.
3. Freeze completes.
4. Kernel eventually tries to flush the workqueue while FS is frozen which
hits a WARN_ON since transaction gets started during frozen state:
Compat features are new features that older kernels can safely ignore,
allowing read-write mounts without issues. The current sb write validation
implementation returns -EFSCORRUPTED for unknown compat features,
preventing filesystem write operations and contradicting the feature's
definition.
Additionally, if the mounted image is unclean, the log recovery may need
to write to the superblock. Returning an error for unknown compat features
during sb write validation can cause mount failures.
Although XFS currently does not use compat feature flags, this issue
affects current kernels' ability to mount images that may use compat
feature flags in the future.
Since superblock read validation already warns about unknown compat
features, it's unnecessary to repeat this warning during write validation.
Therefore, the relevant code in write validation is being removed.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>