Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of
threaded irqs") made detection of spurious interrupts work for threaded
handlers by:
a) incrementing a counter every time the thread returns IRQ_HANDLED, and
b) checking whether that counter has increased every time the thread is
woken.
However for oneshot interrupts, the commit unmasks the interrupt before
incrementing the counter. If another interrupt occurs right after
unmasking but before the counter is incremented, that interrupt is
incorrectly considered spurious:
time
| irq_thread()
| irq_thread_fn()
| action->thread_fn()
| irq_finalize_oneshot()
| unmask_threaded_irq() /* interrupt is unmasked */
|
| /* interrupt fires, incorrectly deemed spurious */
|
| atomic_inc(&desc->threads_handled); /* counter is incremented */
v
This is observed with a hi3110 CAN controller receiving data at high volume
(from a separate machine sending with "cangen -g 0 -i -x"): The controller
signals a huge number of interrupts (hundreds of millions per day) and
every second there are about a dozen which are deemed spurious.
In theory with high CPU load and the presence of higher priority tasks, the
number of incorrectly detected spurious interrupts might increase beyond
the 99,900 threshold and cause disablement of the interrupt.
In practice it just increments the spurious interrupt count. But that can
cause people to waste time investigating it over and over.
Fix it by moving the accounting before the invocation of
irq_finalize_oneshot().
log_buf_len_setup does not check input argument before passing it to
simple_strtoull. The argument would be a NULL pointer if "log_buf_len",
without its value, is set in command line and thus causes the following
panic.
Some servers (e.g. Azure) do not include a spnego blob in the SMB3
negotiate protocol response, so on kerberos mounts ("sec=krb5")
we can fail, as we expected the server to list its supported
auth types (OIDs in the spnego blob in the negprot response).
Change this so that on krb5 mounts we default to trying krb5 if the
server doesn't list its supported protocol mechanisms.
Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If backupuid mount option is sent, we can incorrectly retry
(on access denied on query info) with a cifs (FindFirst) operation
on an smb3 mount which causes the server to force the session close.
We set backup intent on open so no need for this fallback.
See kernel bugzilla 201435
Signed-off-by: Steve French <stfrench@microsoft.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When channels are registered, the hardware channel number is not the
actual iio channel number.
This is because the driver is probed with a certain number of accessible
channels. Some pins are routed and some not, depending on the description of
the board in the DT.
Because of that, channels 0,1,2,3 can correspond to hardware channels
2,3,4,5 for example.
In the buffered triggered case, we need to do the translation accordingly.
Fixed the channel number to stop reading the wrong channel.
When doing simple conversions, the driver did not acknowledge the DRDY irq.
If this irq status is not acked, it will be left pending, and as soon as a
trigger is enabled, the irq handler will be called, it doesn't know why
this status has occurred because no channel is pending, and then it will go
int a irq loop and board will hang.
To avoid this situation, read the LCDR after a raw conversion is done.
The correct way to handle errors returned by regualtor_get() and friends is
to propagate the error since that means that an regulator was specified,
but something went wrong when requesting it.
For handling optional regulators, e.g. when the device has an internal
vref, regulator_get_optional() should be used to avoid getting the dummy
regulator that the regulator core otherwise provides.
Building any configuration with 'make W=1' produces a warning:
kernel/bounds.c:16:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes]
When also passing -Werror, this prevents us from building any other files.
Nobody ever calls the function, but we can't make it 'static' either
since we want the compiler output.
Calling it 'main' instead however avoids the warning, because gcc
does not insist on having a declaration for main.
Some test systems were experiencing negative huge page reserve counts and
incorrect file block counts. This was traced to /proc/sys/vm/drop_caches
removing clean pages from hugetlbfs file pagecaches. When non-hugetlbfs
explicit code removes the pages, the appropriate accounting is not
performed.
This can be recreated as follows:
fallocate -l 2M /dev/hugepages/foo
echo 1 > /proc/sys/vm/drop_caches
fallocate -l 2M /dev/hugepages/foo
grep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 2048
HugePages_Free: 2047
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 4194304 kB
ls -lsh /dev/hugepages/foo
4.0M -rw-r--r--. 1 root root 2.0M Oct 17 20:05 /dev/hugepages/foo
To address this issue, dirty pages as they are added to pagecache. This
can easily be reproduced with fallocate as shown above. Read faulted
pages will eventually end up being marked dirty. But there is a window
where they are clean and could be impacted by code such as drop_caches.
So, just dirty them all as they are added to the pagecache.
Link: http://lkml.kernel.org/r/b5be45b8-5afe-56cd-9482-28384699a049@oracle.com Fixes: 6bda666a03f0 ("hugepages: fold find_or_alloc_pages into huge_no_page()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Mihcla Hocko <mhocko@suse.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ghash is a keyed hash algorithm, thus setkey needs to be called.
Otherwise the following error occurs:
$ modprobe tcrypt mode=318 sec=1
testing speed of async ghash-generic (ghash-generic)
tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates):
tcrypt: hashing failed ret=-126
When the LRW block counter overflows, the current implementation returns
128 as the index to the precomputed multiplication table, which has 128
entries. This patch fixes it to return the correct value (127).
Fixes: 64470f1b8510 ("[CRYPTO] lrw: Liskov Rivest Wagner, a tweakable narrow block cipher mode") Cc: <stable@vger.kernel.org> # 2.6.20+ Reported-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The genweq_add_file and genwqe_del_file by caching current without
using reference counting embed the assumption that a file descriptor
will never be passed from one process to another. It even embeds the
assumption that the the thread that opened the file will be in
existence when the process terminates. Neither of which are
guaranteed to be true.
Therefore replace caching the task_struct of the opener with
pid of the openers thread group id. All the knowledge of the
opener is used for is as the target of SIGKILL and a SIGKILL
will kill the entire process group.
Rename genwqe_force_sig to genwqe_terminate, remove it's unncessary
signal argument, update it's ownly caller, and use kill_pid
instead of force_sig.
The work force_sig does in changing signal handling state is not
relevant to SIGKILL sent as SEND_SIG_PRIV. The exact same processess
will be killed just with less work, and less confusion. The work done
by force_sig is really only needed for handling syncrhonous
exceptions.
It will still be possible to cause genwqe_device_remove to wait
8 seconds by passing a file descriptor to another process but
the possible user after free is fixed.
Fixes: eaf4722d4645 ("GenWQE Character device and DDCB queue") Cc: stable@vger.kernel.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Frank Haverkamp <haver@linux.vnet.ibm.com> Cc: Joerg-Stephan Vogt <jsvogt@de.ibm.com> Cc: Michael Jung <mijung@gmx.net> Cc: Michael Ruettger <michael@ibmra.de> Cc: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Eberhard S. Amann <esa@linux.vnet.ibm.com> Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add Device IDs to the Intel GPU "spurious interrupt" quirk table.
For these devices, unplugging the VGA cable and plugging it in again causes
spurious interrupts from the IGD. Linux eventually disables the interrupt,
but of course that disables any other devices sharing the interrupt.
The theory is that this is a VGA BIOS defect: it should have disabled the
IGD interrupt but failed to do so.
See f67fd55fa96f ("PCI: Add quirk for still enabled interrupts on Intel
Sandy Bridge GPUs") and 7c82126a94e6 ("PCI: Add new ID for Intel GPU
"spurious interrupt" quirk") for some history.
[bhelgaas: See link below for discussion about how to fix this more
generically instead of adding device IDs for every new Intel GPU. I hope
this is the last patch to add device IDs.]
The code "lchan = (lchan << 1) | ~lchan" for logical channel
intermediate decoding is wrong. The wrong intermediate decoding
result is {0xffffffff, 0xfffffffe}.
Fix it by replacing '~' with '!'. The correct intermediate
decoding result is {0x1, 0x2}.
The count of errors is picked up from bits 52:38 of the machine check
bank status register. But this is the count of *corrected* errors. If an
uncorrected error is being logged, the h/w sets this field to 0. Which
means that when edac_mc_handle_error() is called, the EDAC core will
carefully add zero to the appropriate uncorrected error counts.
uref->usage_index can be indirectly controlled by userspace, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.
This field is used as an array index by the hiddev_ioctl_usage() function,
when 'cmd' is either HIDIOCGCOLLECTIONINDEX, HIDIOCGUSAGES or
HIDIOCSUSAGES.
For cmd == HIDIOCGCOLLECTIONINDEX case, uref->usage_index is compared to
field->maxusage and then used as an index to dereference field->usage
array. The same thing happens to the cmd == HIDIOC{G,S}USAGES cases, where
uref->usage_index is checked against an array maximum value and then it is
used as an index in an array.
This is a summary of the HIDIOCGCOLLECTIONINDEX case, which matches the
traditional Spectre V1 first load:
copy_from_user(uref, user_arg, sizeof(*uref))
if (uref->usage_index >= field->maxusage)
goto inval;
i = field->usage[uref->usage_index].collection_index;
return i;
This patch fixes this by sanitizing field uref->usage_index before using it
to index field->usage (HIDIOCGCOLLECTIONINDEX) or field->value in
HIDIOC{G,S}USAGES arrays, thus, avoiding speculation in the first load.
We return most failure of dquota_initialize() except
inode evict, this could make a bit sense, for example
we allow file removal even quota files are broken?
But it dosen't make sense to allow setting project
if quota files etc are broken.
Variable retries is not initialized in ext4_da_write_inline_data_begin()
which can lead to nondeterministic number of retries in case we hit
ENOSPC. Initialize retries to zero as we do everywhere else.
The code cleaning transaction's lists of checkpoint buffers has a bug
where it increases bh refcount only after releasing
journal->j_list_lock. Thus the following race is possible:
CPU0 CPU1
jbd2_log_do_checkpoint()
jbd2_journal_try_to_free_buffers()
__journal_try_to_free_buffer(bh)
...
while (transaction->t_checkpoint_io_list)
...
if (buffer_locked(bh)) {
Unlike asynchronous initialization in the core we have not yet associated
the device with the parent, and as such the device doesn't hold a reference
to the parent.
In order to resolve that we should be holding a reference on the parent
until the asynchronous initialization has completed.
Cc: <stable@vger.kernel.org> Fixes: 4d88a97aa9e8 ("libnvdimm: ...base ... infrastructure") Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 40413955ee26 ("Cipso: cipso_v4_optptr enter infinite loop") fixed
a possible infinite loop in the IP option parsing of CIPSO. The fix
assumes that ip_options_compile filtered out all zero length options and
that no other one-byte options beside IPOPT_END and IPOPT_NOOP exist.
While this assumption currently holds true, add explicit checks for zero
length and invalid length options to be safe for the future. Even though
ip_options_compile should have validated the options, the introduction of
new one-byte options can still confuse this code without the additional
checks.
Signed-off-by: Stefan Nuernberger <snu@amazon.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Simon Veith <sveith@amazon.de> Cc: stable@vger.kernel.org Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The rs_rate_from_ucode_rate() function may return -EINVAL if the rate
is invalid, but none of the callsites check for the error, potentially
making us access arrays with index IWL_RATE_INVALID, which is larger
than the arrays, causing an out-of-bounds access. This will trigger
KASAN warnings, such as the one reported in the bugzilla issue
mentioned below.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=200659
Cc: stable@vger.kernel.org Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
xen_qlock_wait() isn't safe for nested calls due to interrupts. A call
of xen_qlock_kick() might be ignored in case a deeper nesting level
was active right before the call of xen_poll_irq():
CPU 1: CPU 2:
spin_lock(lock1)
spin_lock(lock1)
-> xen_qlock_wait()
-> xen_clear_irq_pending()
Interrupt happens
spin_unlock(lock1)
-> xen_qlock_kick(CPU 2)
spin_lock_irqsave(lock2)
spin_lock_irqsave(lock2)
-> xen_qlock_wait()
-> xen_clear_irq_pending()
clears kick for lock1
-> xen_poll_irq()
spin_unlock_irq_restore(lock2)
-> xen_qlock_kick(CPU 2)
wakes up
spin_unlock_irq_restore(lock2)
IRET
resumes in xen_qlock_wait()
-> xen_poll_irq()
never wakes up
The solution is to disable interrupts in xen_qlock_wait() and not to
poll for the irq in case xen_qlock_wait() is called in nmi context.
In the following situation a vcpu waiting for a lock might not be
woken up from xen_poll_irq():
CPU 1: CPU 2: CPU 3:
takes a spinlock
tries to get lock
-> xen_qlock_wait()
frees the lock
-> xen_qlock_kick(cpu2)
-> xen_clear_irq_pending()
takes lock again
tries to get lock
-> *lock = _Q_SLOW_VAL
-> *lock == _Q_SLOW_VAL ?
-> xen_poll_irq()
frees the lock
-> xen_qlock_kick(cpu3)
And cpu 2 will sleep forever.
This can be avoided easily by modifying xen_qlock_wait() to call
xen_poll_irq() only if the related irq was not pending and to call
xen_clear_irq_pending() only if it was pending.
If a block device is hot-added when we are out of grants,
gnttab_grant_foreign_access fails with -ENOSPC (log message "28
granting access to ring page") in this code path:
Functionality of the xen-tpmfront driver was lost secondary to
the introduction of xenbus multi-page support in commit ccc9d90a9a8b
("xenbus_client: Extend interface to support multi-page ring").
In this commit pointer to location of where the shared page address
is stored was being passed to the xenbus_grant_ring() function rather
then the address of the shared page itself. This resulted in a situation
where the driver would attach to the vtpm-stubdom but any attempt
to send a command to the stub domain would timeout.
A diagnostic finding for this regression is the following error
message being generated when the xen-tpmfront driver probes for a
device:
<3>vtpm vtpm-0: tpm_transmit: tpm_send: error -62
<3>vtpm vtpm-0: A TPM error (-62) occurred attempting to determine
the timeouts
This fix is relevant to all kernels from 4.1 forward which is the
release in which multi-page xenbus support was introduced.
Daniel De Graaf formulated the fix by code inspection after the
regression point was located.
Fixes: ccc9d90a9a8b ("xenbus_client: Extend interface to support multi-page ring") Signed-off-by: Dr. Greg Wettstein <greg@enjellic.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[boris: Updated commit message, added Fixes tag] Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org # v4.1+ Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
xen_swiotlb_{alloc,free}_coherent() allocate/free memory based on the
order of the pages and not size argument (bytes). This is inconsistent with
range_straddles_page_boundary and memset which use the 'size' value,
which may lead to not exchanging memory with Xen (range_straddles_page_boundary()
returned true). And then the call to xen_swiotlb_free_coherent() would
actually try to exchange the memory with Xen, leading to the kernel
hitting an BUG (as the hypercall returned an error).
This patch fixes it by making the 'size' variable be of the same size
as the amount of memory allocated.
CC: stable@vger.kernel.org Signed-off-by: Joe Jin <joe.jin@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Christoph Helwig <hch@lst.de> Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: John Sobecki <john.sobecki@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 GHz CPU OPP is the default boot value for the Exynos5250 SOC, so mark it
as suspend OPP. This fixes suspend/resume on Samsung Exynos5250 Snow
Chomebook, which was broken since switching to generic cpufreq-dt driver
in v4.3.
The cooling device properties, like "#cooling-cells" and
"dynamic-power-coefficient", should either be present for all the CPUs
of a cluster or none. If these are present only for a subset of CPUs of
a cluster then things will start falling apart as soon as the CPUs are
brought online in a different order. For example, this will happen
because the operating system looks for such properties in the CPU node
it is trying to bring up, so that it can register a cooling device.
Add such missing properties.
Fix other missing properties (clocks, OPP, clock latency) as well to
make it all work.
The "cooling-min-level" and "cooling-max-level" properties are not
parsed by any part of the kernel currently and the max cooling state of
a CPU cooling device is found by referring to the cpufreq table instead.
Introduce a new flag, uc_buffer, to indicate that the controller
requires the non-cached pages for stream buffers, either as a
chip-specific requirement or specified via snoop=0 option.
This improves the code-readability.
Also, this patch fixes the incorrect behavior for C-Media chip where
the stream buffers were never handled as non-cached due to the check
of driver_type even if you pass snoop=0 option.
The driver calls clk_get() with the clock name set to NULL, which means
that the driver could only work when probed from devicetree. From now
on, we explicitly require the driver to be probed from devicetree.
Instead of playing whack-a-mole and changing SEND_SIG_PRIV to
SEND_SIG_FORCED throughout the kernel to ensure a pid namespace init
gets signals sent by the kernel, stop allowing a pid namespace init to
ignore SIGKILL or SIGSTOP sent by the kernel. A pid namespace init is
only supposed to be able to ignore signals sent from itself and
children with SIG_DFL.
Fixes: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When running an mds diagnostic that passes frames with the switch, soft
lockups are detected. The driver is in a CQE processing loop and has
sufficient amount of traffic that it never exits the ring processing routine,
thus the "lockup".
Cap the number of elements in the work processing routine to 64 elements. This
ensures that the cpu will be given up and the handler reschedule to process
additional items.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the synchronization rule for the del_timer_sync() function,
the caller must not hold locks which would prevent completion of the
timer's handler.
The timer structure has its own lock that manages its synchronization.
Setting the IOAT_CHAN_DOWN bit should prevent other CPUs from
trying to use that device anyway, there is probably no need to call
del_timer_sync() while holding the prep_lock. So the del_timer_sync()
call is now moved outside of the prep_lock critical section to prevent
the circular lock dependency.
Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ChipIdea IRQ is disabled before scheduling the otg work and
re-enabled on otg work completion. However if the job is already
scheduled we have to undo the effect of disable_irq int order to
balance the IRQ disable-depth value.
Fixes: be6b0c1bd0be ("usb: chipidea: using one inline function to cover queue work operations") Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Peter Chen <peter.chen@nxp.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __le32unsigned int @@
drivers/crypto/caam/regs.h:95:1: expected unsigned int
drivers/crypto/caam/regs.h:95:1: got restricted __le32 [usertype] <noident>
drivers/crypto/caam/regs.h:95:1: sparse: incorrect type in return expression (different base types) @@ expected unsigned int @@ got restricted __be32unsigned int @@
drivers/crypto/caam/regs.h:95:1: expected unsigned int
drivers/crypto/caam/regs.h:95:1: got restricted __be32 [usertype] <noident>
drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __le32
drivers/crypto/caam/regs.h:92:1: sparse: cast to restricted __be32
We want to keep the WARN_ON() and stack trace so the driver can be fixed,
but we can avoid the kernel panic by returning an error. We may still get
warnings like this:
If we change the number of array's device after device is removed from array,
then add the device back to array, we can see that device is added as active
role instead of spare which we expected.
Please see the below link for details:
https://marc.info/?l=linux-raid&m=153736982015076&w=2
This is caused by that we prefer to use device's previous role which is
recorded by saved_raid_disk, but we should respect the new number of
conf->raid_disks since it could be changed after device is removed.
Reported-by: Gioh Kim <gi-oh.kim@profitbricks.com> Tested-by: Gioh Kim <gi-oh.kim@profitbricks.com> Acked-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the starting block number of either the source or destination file
exceeds the EOF, EXT4_IOC_MOVE_EXT should return EINVAL.
Also fixed the helper function mext_check_coverage() so that if the
logical block is beyond EOF, make it return immediately, instead of
looping until the block number wraps all the away around. This takes
long enough that if there are multiple threads trying to do pound on
an the same inode doing non-sensical things, it can end up triggering
the kernel's soft lockup detector.
When adding a VMCI resource, the check for an existing entry
would ignore that the new entry could be a wildcard. This could
result in multiple resource entries that would match a given
handle. One disastrous outcome of this is that the
refcounting used to ensure that delayed callbacks for VMCI
datagrams have run before the datagram is destroyed can be
wrong, since the refcount could be increased on the duplicate
entry. This in turn leads to a use after free bug. This issue
was discovered by Hangbin Liu using KASAN and syzkaller.
For TPM 1.2 chips the system setup utility allows to set the TPM device in
one of the following states:
* Active: Security chip is functional
* Inactive: Security chip is visible, but is not functional
* Disabled: Security chip is hidden and is not functional
When choosing the "Inactive" state, the TPM 1.2 device is enumerated and
registered, but sending TPM commands fail with either TPM_DEACTIVATED or
TPM_DISABLED depending if the firmware deactivated or disabled the TPM.
Since these TPM 1.2 error codes don't have special treatment, inactivating
the TPM leads to a very noisy kernel log buffer that shows messages like
the following:
tpm_tis 00:05: 1.2 TPM (device-id 0x0, rev-id 78)
tpm tpm0: A TPM error (6) occurred attempting to read a pcr value
tpm tpm0: TPM is disabled/deactivated (0x6)
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting to read a pcr value
ima: No TPM chip found, activating TPM-bypass! (rc=6)
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting get random
tpm tpm0: A TPM error (6) occurred attempting get random
Let's just suppress error log messages for the TPM_{DEACTIVATED,DISABLED}
return codes, since this is expected when the TPM 1.2 is set to Inactive.
In that case the kernel log is cleaner and less confusing for users, i.e:
IPCB should be cleared before icmp_send, since it may contain data from
previous layers and the data could be misinterpreted as ip header options,
which later caused the ihl to be set to an invalid value and resulted in
the following stack corruption:
[ 1083.031512] ib0: packet len 57824 (> 2048) too long to send, dropping
[ 1083.031843] ib0: packet len 37904 (> 2048) too long to send, dropping
[ 1083.032004] ib0: packet len 4040 (> 2048) too long to send, dropping
[ 1083.032253] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.032481] ib0: packet len 23960 (> 2048) too long to send, dropping
[ 1083.033149] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033439] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033700] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034124] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034387] ==================================================================
[ 1083.034602] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xf08/0x1310
[ 1083.034798] Write of size 4 at addr ffff880353457c5f by task kworker/u16:0/7
[ 1083.034990]
[ 1083.035104] CPU: 7 PID: 7 Comm: kworker/u16:0 Tainted: G O 4.19.0-rc5+ #1
[ 1083.035316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 1083.035573] Workqueue: ipoib_wq ipoib_cm_skb_reap [ib_ipoib]
[ 1083.035750] Call Trace:
[ 1083.035888] dump_stack+0x9a/0xeb
[ 1083.036031] print_address_description+0xe3/0x2e0
[ 1083.036213] kasan_report+0x18a/0x2e0
[ 1083.036356] ? __ip_options_echo+0xf08/0x1310
[ 1083.036522] __ip_options_echo+0xf08/0x1310
[ 1083.036688] icmp_send+0x7b9/0x1cd0
[ 1083.036843] ? icmp_route_lookup.constprop.9+0x1070/0x1070
[ 1083.037018] ? netif_schedule_queue+0x5/0x200
[ 1083.037180] ? debug_show_all_locks+0x310/0x310
[ 1083.037341] ? rcu_dynticks_curr_cpu_in_eqs+0x85/0x120
[ 1083.037519] ? debug_locks_off+0x11/0x80
[ 1083.037673] ? debug_check_no_obj_freed+0x207/0x4c6
[ 1083.037841] ? check_flags.part.27+0x450/0x450
[ 1083.037995] ? debug_check_no_obj_freed+0xc3/0x4c6
[ 1083.038169] ? debug_locks_off+0x11/0x80
[ 1083.038318] ? skb_dequeue+0x10e/0x1a0
[ 1083.038476] ? ipoib_cm_skb_reap+0x2b5/0x650 [ib_ipoib]
[ 1083.038642] ? netif_schedule_queue+0xa8/0x200
[ 1083.038820] ? ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.038996] ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.039174] process_one_work+0x912/0x1830
[ 1083.039336] ? wq_pool_ids_show+0x310/0x310
[ 1083.039491] ? lock_acquire+0x145/0x3a0
[ 1083.042312] worker_thread+0x87/0xbb0
[ 1083.045099] ? process_one_work+0x1830/0x1830
[ 1083.047865] kthread+0x322/0x3e0
[ 1083.050624] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 1083.053354] ret_from_fork+0x3a/0x50
For instance __ip_options_echo is failing to proceed with invalid srr and
optlen passed from another layer via IPCB
If the provider driver (such as rdma_rxe) doesn't support pma counters,
avoid exposing its directory similar to optional hw_counters directory.
If core fails to read the PMA counter, return an error so that user can
retry later if needed.
In megasas_mgmt_compat_ioctl_fw(), to handle the structure
compat_megasas_iocpacket 'cioc', a user-space structure megasas_iocpacket
'ioc' is allocated before megasas_mgmt_ioctl_fw() is invoked to handle
the packet. Since the two data structures have different fields, the data
is copied from 'cioc' to 'ioc' field by field. In the copy process,
'sense_ptr' is prepared if the field 'sense_len' is not null, because it
will be used in megasas_mgmt_ioctl_fw(). To prepare 'sense_ptr', the
user-space data 'ioc->sense_off' and 'cioc->sense_off' are copied and
saved to kernel-space variables 'local_sense_off' and 'user_sense_off'
respectively. Given that 'ioc->sense_off' is also copied from
'cioc->sense_off', 'local_sense_off' and 'user_sense_off' should have the
same value. However, 'cioc' is in the user space and a malicious user can
race to change the value of 'cioc->sense_off' after it is copied to
'ioc->sense_off' but before it is copied to 'user_sense_off'. By doing
so, the attacker can inject different values into 'local_sense_off' and
'user_sense_off'. This can cause undefined behavior in the following
execution, because the two variables are supposed to be same.
This patch enforces a check on the two kernel variables 'local_sense_off'
and 'user_sense_off' to make sure they are the same after the copy. In
case they are not, an error code EINVAL will be returned.
Signed-off-by: Wenwen Wang <wang6495@umn.edu> Acked-by: Sumit Saxena <sumit.saxena@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a target disconnects during a PIO data transfer the command may fail
when the target reconnects:
scsi host1: DMA length is zero!
scsi host1: cur adr[04380000] len[00000000]
The scsi bus is then reset. This happens because the residual reached
zero before the transfer was completed.
The usual residual calculation relies on the Transfer Count registers.
That works for DMA transfers but not for PIO transfers. Fix the problem
by storing the PIO transfer residual and using that to correctly
calculate bytes_sent.
Fixes: 6fe07aaffbf0 ("[SCSI] m68k: new mac_esp scsi driver") Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Tested-by: Michael Schmitz <schmitzmic@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If a cgroup has many tasks with many open file descriptors then we would
end up in a large loop without any rescheduling point throught the
operation. Add cond_resched once per task.
When running in AP mode, ath10k sometimes suffers from TX credit
starvation. The issue is hard to reproduce and shows up once in a
few days, but has been repeatedly seen with QCA9882 and a large
range of firmwares, including 10.2.4.70.67.
Once the module is in this state, TX credits are never replenished,
which results in "SWBA overrun" errors, as no beacons can be sent.
Even worse, WMI commands run in a timeout while holding the conf
mutex for three seconds each, making any further operations slow
and the whole system unresponsive.
The firmware/driver never recovers from that state automatically,
and triggering TX flush or warm restarts won't work over WMI. So
issue a hardware restart if a WMI command times out due to missing
TX credits. This implies a connectivity outage of about 1.4s in AP
mode, but brings back the interface and the whole system to a usable
state. WMI command timeouts have not been seen in absent of this
specific issue, so taking such drastic actions seems legitimate.
Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While VF2VF with RSS communication, RSS Type were wrongly recognized
and RSS hash was not calculated as it should be. Packets was
distributed on various queues by accident.
This commit fixes that behaviour and causes proper RSS Type recognition.
Signed-off-by: Sebastian Basierski <sebastianx.basierski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If you look at "pinconf-groups" in debugfs for ssbi-gpio you'll notice
it looks like nonsense.
The problem is fairly well described in commit 1cf86bc21257 ("pinctrl:
qcom: spmi-gpio: Fix pmic_gpio_config_get() to be compliant") and
commit 05e0c828955c ("pinctrl: msm: Fix msm_config_group_get() to be
compliant"), but it was pointed out that ssbi-gpio has the same
problem. Let's fix it there too.
Fixes: b4c45fe974bc ("pinctrl: qcom: ssbi: Family A gpio & mpp drivers") Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Stephen Boyd <sboyd@kernel.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If you look at "pinconf-groups" in debugfs for ssbi-mpp you'll notice
it looks like nonsense.
The problem is fairly well described in commit 1cf86bc21257 ("pinctrl:
qcom: spmi-gpio: Fix pmic_gpio_config_get() to be compliant") and
commit 05e0c828955c ("pinctrl: msm: Fix msm_config_group_get() to be
compliant"), but it was pointed out that ssbi-mpp has the same
problem. Let's fix it there too.
NOTE: in case it's helpful to someone reading this, the way to tell
whether to do the -EINVAL or not is to look at the PCONFDUMP for a
given attribute. If the last element (has_arg) is false then you need
to do the -EINVAL trick.
ALSO NOTE: it seems unlikely that the values returned when we try to
get PIN_CONFIG_BIAS_PULL_UP will actually be printed since "has_arg"
is false for that one, but I guess it's still fine to return different
values so I kept doing that. It seems like another driver (ssbi-gpio)
uses a custom attribute (PM8XXX_QCOM_PULL_UP_STRENGTH) for something
similar so maybe a future change should do that here too.
It looks like we parse the drive strength setting here, but never
actually write it into the hardware to update it. Parse the setting and
then write it at the end of the pinconf setting function so that it
actually sticks in the hardware.
Fixes: 0e948042c420 ("pinctrl: qcom: spmi-mpp: Implement support for sink mode") Cc: Doug Anderson <dianders@chromium.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bay and Cherry Trail DSTDs represent a different set of devices depending
on which OS the device think it is booting. One set of decices for Windows
and another set of devices for Android which targets the Android-x86 Linux
kernel fork (which e.g. used to have its own display driver instead of
using the i915 driver).
Which set of devices we are actually going to get is out of our control,
this is controlled by the ACPI OSID variable, which gets either set through
an EFI setup option, or sometimes is autodetected. So we need to support
both.
This commit adds support for the 80862286 and 808622C0 ACPI HIDs which we
get for the first resp. second DMA controller on Cherry Trail devices when
OSID is set to Android.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We currently align the end of the compressed image to a multiple of
16. However, the PE-COFF header included in the EFI stub says that
the file alignment is 32 bytes, and when adding an EFI signature to
the file it must first be padded to this alignment.
sbsigntool commands warn about this:
warning: file-aligned section .text extends beyond end of file
warning: checksum areas are greater than image size. Invalid section table?
Worse, pesign -at least when creating a detached signature- uses the
hash of the unpadded file, resulting in an invalid signature if
padding is required.
Avoid both these problems by increasing alignment to 32 bytes when
CONFIG_EFI_STUB is enabled.
This patch adds the device ID for the AMPAK AP6335 combo module used
in the 1st generation WeTek Hub Android/LibreELEC HTPC box. The WiFI
chip identifies itself as BCM4339, while Bluetooth identifies itself
as BCM4335 (rev C0):
We can not call dev_pm_opp_of_cpumask_remove_table() freely anymore
since the latest OPP core updates as that uses reference counting to
free resources. There are cases where no static OPPs are added (using
DT) for a platform and trying to remove the OPP table may end up
decrementing refcount which is already zero and hence generating
warnings.
Lets track if we were able to add static OPPs or not and then only
remove the table based on that. Some reshuffling of code is also done to
do that.
On OLPC XO-1, the RTC is discovered via device tree from the arch
initcall. Don't let the PC platform register another one from its device
initcall, it's not going to work:
sysfs: cannot create duplicate filename '/devices/platform/rtc_cmos'
CPU: 0 PID: 1 Comm: swapper Not tainted 4.19.0-rc6 #12
Hardware name: OLPC XO/XO, BIOS OLPC Ver 1.00.01 06/11/2014
Call Trace:
dump_stack+0x16/0x18
sysfs_warn_dup+0x46/0x58
sysfs_create_dir_ns+0x76/0x9b
kobject_add_internal+0xed/0x209
? __schedule+0x3fa/0x447
kobject_add+0x5b/0x66
device_add+0x298/0x535
? insert_resource_conflict+0x2a/0x3e
platform_device_add+0x14d/0x192
? io_delay_init+0x19/0x19
platform_device_register+0x1c/0x1f
add_rtc_cmos+0x16/0x31
do_one_initcall+0x78/0x14a
? do_early_param+0x75/0x75
kernel_init_freeable+0x152/0x1e0
? rest_init+0xa2/0xa2
kernel_init+0x8/0xd5
ret_from_fork+0x2e/0x38
kobject_add_internal failed for rtc_cmos with -EEXIST, don't try to
register things with the same name in the same directory.
platform rtc_cmos: registered platform RTC device (no PNP device found)
If all free RB queues are empty, the driver will never restock the
free RB queue. That's because the restocking happens in the Rx flow,
and if the free queue is empty there will be no Rx.
Although there's a background worker (a.k.a. allocator) allocating
memory for RBs so that the Rx handler can restock them, the worker may
run only after the free queue has become empty (and then it is too
late for restocking as explained above).
There is a solution for that called 'emergency': If the number of used
RB's reaches half the amount of all RB's, the Rx handler will not wait
for the allocator but immediately allocate memory for the used RB's
and restock the free queue.
But, since the used RB's is per queue, it may happen that the used
RB's are spread between the queues such that the emergency check will
fail for each of the queues
(and still run out of RBs, causing the above symptom).
To fix it, move to emergency mode if the sum of *all* used RBs (for
all Rx queues) reaches half the amount of all RB's
This device reports SDHCI_CLOCK_INT_STABLE even though it's not
ready to take SDHCI_CLOCK_CARD_EN. The symptom is that reading
SDHCI_CLOCK_CONTROL after enabling the clock shows absence of the
bit from the register (e.g. expecting 0x0000fa07 = 0x0000fa03 |
SDHCI_CLOCK_CARD_EN but only observed the first operand).
For each system in a given pevent, read_event_files() reads in a
temporary 'sys' string. Be sure to free this string before moving onto
to the next system and/or leaving read_event_files().
Fixes the following coverity complaints:
Error: RESOURCE_LEAK (CWE-772):
tools/perf/util/trace-event-read.c:343: overwrite_var: Overwriting
"sys" in "sys = read_string()" leaks the storage that "sys" points to.
tools/perf/util/trace-event-read.c:353: leaked_storage: Variable "sys"
going out of scope leaks the storage it points to.
Signed-off-by: Sanskriti Sharma <sansharm@redhat.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Link: http://lkml.kernel.org/r/1538490554-8161-6-git-send-email-sansharm@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Technically this is not required because disabling the PWM should be
enough. However, when support for atomic operations was implemented in
the PWM subsystem, only actual changes to the PWM channel are applied
during pwm_config(), which means that during after resume from suspend
the old settings won't be applied.
One possible solution is for the PWM driver to implement its own PM
operations such that settings from before suspend get applied on resume.
This has the disadvantage of completely ignoring any particular ordering
requirements that PWM user drivers might have, so it is best to leave it
up to the user drivers to apply the settings that they want at the
appropriate time.
Another way to solve this would be to read back the current state of the
PWM at the time of resume. That way, in case the configuration was lost
during suspend, applying the old settings in PWM user drivers would
actually get them applied because they differ from the current settings.
However, not all PWM drivers support reading the hardware state, and not
all hardware may support it.
The best workaround at this point seems to be to let PWM user drivers
tell the PWM subsystem that the PWM is turned off by, in addition to
disabling it, also setting the duty cycle to 0. This causes the resume
operation to apply a configuration that is different from the current
configuration, resulting in the proper state from before suspend getting
restored.
When running as a level 3 guest with no host provided sthyi support
sclp_ocf_cpc_name_copy() will only return zeroes. Zeroes are not a
valid group name, so let's not indicate that the group name field is
valid.
Also the group name is not dependent on stsi, let's not return based
on stsi before setting it.
Configuring generic network device parameters on tun will fail in
presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
since tun_validate() always return failure.
This can be visualized with following ip-link(8) command sequences:
# ip link set dev tun0 group 100
# ip link set dev tun0 group 100 type tun
RTNETLINK answers: Invalid argument
with contrast to dummy and veth drivers:
# ip link set dev dummy0 group 100
# ip link set dev dummy0 type dummy
# ip link set dev veth0 group 100
# ip link set dev veth0 group 100 type veth
Fix by returning zero in tun_validate() when @data is NULL that is
always in case since rtnl_link_ops->maxtype is zero in tun driver.
Fixes: f019a7a594d9 ("tun: Implement ip link del tunXXX") Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we fail to allocate the request queue for a disk, we still need to
free that disk, not just the previous ones. Additionally, we need to
cleanup the previous request queues.
Move queue allocation next to disk allocation to fix a couple of issues:
- If add_disk() hasn't been called, we should clear disk->queue before
calling put_disk().
- If we fail to allocate a request queue, we still need to put all of
the disks, not just the ones that we allocated queues for.
It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.
Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks. As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.
To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().
Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
drivers/net/ethernet/qlogic/qla3xxx.c:384:24: warning: signed shift
result (0xF00000000) requires 37 bits to represent, but 'int' only has
32 bits [-Wshift-overflow]
((ISP_NVRAM_MASK << 16) | qdev->eeprom_cmd_data));
~~~~~~~~~~~~~~ ^ ~~
1 warning generated.
The warning is certainly accurate since ISP_NVRAM_MASK is defined as
(0x000F << 16) which is then shifted by 16, resulting in 64424509440,
well above UINT_MAX.
Given that this is the only location in this driver where ISP_NVRAM_MASK
is shifted again, it seems likely that ISP_NVRAM_MASK was originally
defined without a shift and during the move of the shift to the
definition, this statement wasn't properly removed (since ISP_NVRAM_MASK
is used in the statenent right above this). Only the maintainers can
confirm this since this statment has been here since the driver was
first added to the kernel.
Link: https://github.com/ClangBuiltLinux/linux/issues/127 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The size of the resulting cpu map can be smaller than a multiple of
sizeof(u64), resulting in SIGBUS on cpus like Sparc as the next event
will not be aligned properly.
Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Fixes: 6c872901af07 ("perf cpu_map: Add cpu_map event synthesize function") Link: http://lkml.kernel.org/r/20181011.224655.716771175766946817.davem@davemloft.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
So that when it is unset, ie. '-1', userspace can see it
properly.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is important to clear the hw->state value for non-stopped events
when they are added into the PMU. Otherwise when the event is
scheduled out, we won't read the counter because HES_UPTODATE is still
set. This breaks 'perf stat' and similar use cases, causing all the
events to show zero.
This worked for multi-pcr because we make explicit sparc_pmu_start()
calls in calculate_multiple_pcrs(). calculate_single_pcr() doesn't do
this because the idea there is to accumulate all of the counter
settings into the single pcr value. So we have to add explicit
hw->state handling there.
Like x86, we use the PERF_HES_ARCH bit to track truly stopped events
so that we don't accidently start them on a reload.
Related to all of this, sparc_pmu_start() is missing a userpage update
so add it.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael reported an issue with oversized terms values assignment
and I noticed there was actually a misunderstanding of the max
value check in the past.
The above commit's changelog says:
If bit 21 is set, there is parsing issues as below.
$ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
event syntax error: '..pi_0/event=0x200002,umask=0x8/'
\___ value too big for format, maximum is 511
But there's no issue there, because the event value is distributed
along the value defined by the format. Even if the format defines
separated bit, the value is treated as a continual number, which
should follow the format definition.
In above case it's 9-bit value with last bit separated:
$ cat uncore_qpi_0/format/event
config:0-7,21
Hence the value 0x200002 is correctly reported as format violation,
because it exceeds 9 bits. It should have been 0x102 instead, which
sets the 9th bit - the bit 21 of the format.
$ perf stat -vv -a -e uncore_qpi_0/event=0x102,umask=0x8/
Using CPUID GenuineIntel-6-2D
...
------------------------------------------------------------
perf_event_attr:
type 10
size 112
config 0x200802
sample_type IDENTIFIER
...
Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: ac0e2cd55537 ("perf tools: Fix PMU term format max value calculation") Link: http://lkml.kernel.org/r/20181003072046.29276-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.
From the specification [1]:
"With enhanced IBRS, the predicted targets of indirect branches
executed cannot be controlled by software that was executed in a less
privileged predictor mode or on another logical processor. As a
result, software operating on a processor with enhanced IBRS need not
use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
privileged predictor mode. Software can isolate predictor modes
effectively simply by setting the bit once. Software need not disable
enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:
"Retpoline is known to be an effective branch target injection (Spectre
variant 2) mitigation on Intel processors belonging to family 6
(enumerated by the CPUID instruction) that do not have support for
enhanced IBRS. On processors that support enhanced IBRS, it should be
used for mitigation instead of retpoline."
The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.
If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.
Enhanced IBRS still requires IBPB for full mitigation.
[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511
Originally-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tim C Chen <tim.c.chen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
memory_corruption_check[{_period|_size}]()'s handlers do not check input
argument before passing it to kstrtoul() or simple_strtoull(). The argument
would be a NULL pointer if each of the kernel parameters, without its
value, is set in command line and thus cause the following panic.
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.
Enable this feature if
- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)
After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.
Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.
The Creative Audigy SE (SB0570) card currently exhibits an audible pop
whenever playback is stopped or resumed, or during silent periods of an
audio stream. Initialise the IZD bit to the 0 to eliminate these pops.
The Infinite Zero Detection (IZD) feature on the DAC causes the output
to be shunted to Vcap after 2048 samples of silence. This discharges the
AC coupling capacitor through the output and causes the aforementioned
pop/click noise.
The behaviour of the IZD bit is described on page 15 of the WM8768GEDS
datasheet: "With IZD=1, applying MUTE for 1024 consecutive input samples
will cause all outputs to be connected directly to VCAP. This also
happens if 2048 consecutive zero input samples are applied to all 6
channels, and IZD=0. It will be removed as soon as any channel receives
a non-zero input". I believe the second sentence might be referring to
IZD=1 instead of IZD=0 given the observed behaviour of the card.
This change should make the DAC initialisation consistent with
Creative's Windows driver, as this popping persists when initialising
the card in Linux and soft rebooting into Windows, but is not present on
a cold boot to Windows.
BIOS on ASUS G751 doesn't seem to map the headphone pin (NID 0x16)
correctly. Add a quirk to address it, as well as chaining to the
previous fix for the microphone.