The first time that we allocate from an uninitialized inode allocation
bitmap, if the block allocation bitmap is also uninitalized, we need
to get write access to the block group descriptor before we start
modifying the block group descriptor flags and updating the free block
count, etc. Otherwise, there is the potential of a bad journal
checksum (if journal checksums are enabled), and of the file system
becoming inconsistent if we crash at exactly the wrong time.
Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU
initialization. Otherwise only cpu0 has its P-state set and all other
cores are left with their values unchanged.
In most cases, this is not too serious because the P-states will be set
correctly when the timer function is run. But when the default governor
is set to performance, the per-CPU current_pstate stays the same forever
and no attempts are made to write the MSRs again.
Signed-off-by: Vincent Minet <vincent@vincent-minet.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
There's a race condition between the atomic_dec_and_test(&io->count)
in dec_count() and the waking of the sync_io() thread. If the thread
is spuriously woken immediately after the decrement it may exit,
making the on stack io struct invalid, yet the dec_count could still
be using it.
Fix this race by using a completion in sync_io() and dec_count().
Reported-by: Minfei Huang <huangminfei@ucloud.cn> Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Starting with Win8, we have implemented several optimizations to improve the
scalability and performance of the VMBUS transport between the Host and the
Guest. Some of the non-performance critical services cannot leverage these
optimization since they only read and process one message at a time.
Make adjustments to the callback dispatch code to account for the way
non-performance critical drivers handle reading of the channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
After unbinding the driver memory was corrupted by double free of
clk_lookup structure. This lead to OOPS when re-binding the driver
again.
The driver allocated memory for 'clk_lookup' with devm_kzalloc. During
driver removal this memory was freed twice: once by clkdev_drop() and
second by devm code.
include/linux/sched.h implements TASK_SIZE_OF as TASK_SIZE if it
is not set by the architecture headers. TASK_SIZE uses the
current task to determine the size of the virtual address space.
On a 64-bit kernel this will cause reading /proc/pid/pagemap of a
64-bit process from a 32-bit process to return EOF when it reads
past 0xffffffff.
Implement TASK_SIZE_OF exactly the same as TASK_SIZE with
test_tsk_thread_flag instead of test_thread_flag.
Signed-off-by: Colin Cross <ccross@android.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Byte-to-bit-count computation is only partly converted to big-endian and is
mixing in CPU-endian values. Problem was noticed by sparce with warning:
CHECK arch/x86/crypto/sha512_ssse3_glue.c
arch/x86/crypto/sha512_ssse3_glue.c:144:19: warning: restricted __be64 degrades to integer
arch/x86/crypto/sha512_ssse3_glue.c:144:17: warning: incorrect type in assignment (different base types)
arch/x86/crypto/sha512_ssse3_glue.c:144:17: expected restricted __be64 <noident>
arch/x86/crypto/sha512_ssse3_glue.c:144:17: got unsigned long long
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@iki.fi> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Since commtit 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to
drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850
where as davinci_cpufreq_init() call is used by all davinci platform.
This patch fixes following build error:
arch/arm/mach-davinci/built-in.o: In function `davinci_init_late':
:(.init.text+0x928): undefined reference to `davinci_cpufreq_init'
make: *** [vmlinux] Error 1
Fixes: 8a7b1227e303 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq) Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze
the PMU counters. Aside from on boot they are then never reset,
resulting in stuck perf counters for any user in the guest or host.
We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane
state for perf to use the PMU counters under either the guest or the
host.
This was manifesting as a bug with ppc64_cpu --frequency:
$ sudo ppc64_cpu --frequency
WARNING: couldn't run on cpu 0
WARNING: couldn't run on cpu 8
...
WARNING: couldn't run on cpu 144
WARNING: couldn't run on cpu 152
min: 18446744073.710 GHz (cpu -1)
max: 0.000 GHz (cpu -1)
avg: 0.000 GHz
The command uses a perf counter to measure CPU cycles over a fixed
amount of time, in order to approximate the frequency of the machine.
The counters were returning zero once a guest was started, regardless of
weather it was still running or had been shut down.
By dumping the value of MMCR2, it was observed that once a guest is
running MMCR2 is set to 1s - which stops counters from running:
This is done unconditionally in book3s_hv_interrupts.S upon entering the
guest, and the original value is only save/restored if the host has
indicated it was using the PMU. This is okay, however the user of the
PMU needs to ensure that it is in a defined state when it starts using
it.
Fixes: e05b9b9e5c10 ("powerpc/perf: Power8 PMU support") Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Instead of separate bits for every POWER8 PMU feature, have a single one
for v2.07 of the architecture.
This saves us adding a MMCR2 define for a future patch.
Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Looking closer, the active PMC is 0 at this point and we took a PMU
exception on the transition from negative to 0. Some versions of POWER8
have an issue where they edge detect and not level detect PMC overflows.
A number of places program the PMC with (0x80000000 - period_left),
where period_left can be negative. We can either fix all of these or
just ensure that period_left is always >= 1.
This patch takes the second option.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
There is a race condition in ec_transaction_completed().
When ec_transaction_completed() is called in the GPE handler, it could
return true because of (ec->curr == NULL). Then the wake_up() invocation
could complete the next command unexpectedly since there is no lock between
the 2 invocations. With the previous cleanup, the IBF=0 waiter race need
not be handled any more. It's now safe to return a flag from
advance_condition() to indicate the requirement of wakeup, the flag is
returned from a locked context.
The ec_transaction_completed() is now only invoked by the ec_poll() where
the ec->curr is ensured to be different from NULL.
After cleaning up, the EVT_SCI=1 check should be moved out of the wakeup
condition so that an EVT_SCI raised with (ec->curr == NULL) can trigger a
QR_SC command.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70891 Link: https://bugzilla.kernel.org/show_bug.cgi?id=63931 Link: https://bugzilla.kernel.org/show_bug.cgi?id=59911 Reported-and-tested-by: Gareth Williams <gareth@garethwilliams.me.uk> Reported-and-tested-by: Hans de Goede <jwrdegoede@fedoraproject.org> Reported-by: Barton Xu <tank.xuhan@gmail.com> Tested-by: Steffen Weber <steffen.weber@gmail.com> Tested-by: Arthur Chen <axchen@nvidia.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
After we've added the first command byte write into advance_transaction(),
the IBF=0 waiter is duplicated with the command completion waiter
implemented in the ec_poll() because:
If IBF=1 blocked the first command byte write invoked in the task
context ec_poll(), it would be kicked off upon IBF=0 interrupt or timed
out and retried again in the task context.
Remove this seperate and duplicate IBF=0 waiter. By doing so we can
reduce the overall number of times to access the EC_SC(R) status
register.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70891 Link: https://bugzilla.kernel.org/show_bug.cgi?id=63931 Link: https://bugzilla.kernel.org/show_bug.cgi?id=59911 Reported-and-tested-by: Gareth Williams <gareth@garethwilliams.me.uk> Reported-and-tested-by: Hans de Goede <jwrdegoede@fedoraproject.org> Reported-by: Barton Xu <tank.xuhan@gmail.com> Tested-by: Steffen Weber <steffen.weber@gmail.com> Tested-by: Arthur Chen <axchen@nvidia.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Move the first command byte write into advance_transaction() so that all
EC register accesses that can affect the command processing state machine
can happen in this asynchronous state machine advancement function.
The advance_transaction() function then can be a complete implementation
of an asyncrhonous transaction for a single command so that:
1. The first command byte can be written in the interrupt context;
2. The command completion waiter can also be used to wait the first command
byte's timeout;
3. In BURST mode, the follow-up command bytes can be written in the
interrupt context directly, so that it doesn't need to return to the
task context. Returning to the task context reduces the throughput of
the BURST mode and in the worst cases where the system workload is very
high, this leads to the hardware driven automatic BURST mode exit.
In order not to increase memory consumption, convert 'done' into 'flags'
to contain multiple indications:
1. ACPI_EC_COMMAND_COMPLETE: converting from original 'done' condition,
indicating the completion of the command transaction.
2. ACPI_EC_COMMAND_POLL: indicating the availability of writing the first
command byte. A new command can utilize this flag to compete for the
right of accessing the underlying hardware. There is a follow-up bug
fix that has utilized this new flag.
The 2 flags are important because it also reflects a key concept of IO
programs' design used in the system softwares. Normally an IO program
running in the kernel should first be implemented in the asynchronous way.
And the 2 flags are the most common way to implement its synchronous
operations on top of the asynchronous operations:
1. POLL: This flag can be used to block until the asynchronous operations
can happen.
2. COMPLETE: This flag can be used to block until the asynchronous
operations have completed.
By constructing code cleanly in this way, many difficult problems can be
solved smoothly.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70891 Link: https://bugzilla.kernel.org/show_bug.cgi?id=63931 Link: https://bugzilla.kernel.org/show_bug.cgi?id=59911 Reported-and-tested-by: Gareth Williams <gareth@garethwilliams.me.uk> Reported-and-tested-by: Hans de Goede <jwrdegoede@fedoraproject.org> Reported-by: Barton Xu <tank.xuhan@gmail.com> Tested-by: Steffen Weber <steffen.weber@gmail.com> Tested-by: Arthur Chen <axchen@nvidia.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The advance_transaction() will be invoked from the IRQ context GPE handler
and the task context ec_poll(). The handling of this function is locked so
that the EC state machine are ensured to be advanced sequentially.
But there is a problem. Before invoking advance_transaction(), EC_SC(R) is
read. Then for advance_transaction(), there could be race condition around
the lock from both contexts. The first one reading the register could fail
this race and when it passes the stale register value to the state machine
advancement code, the hardware condition is totally different from when
the register is read. And the hardware accesses determined from the wrong
hardware status can break the EC state machine. And there could be cases
that the functionalities of the platform firmware are seriously affected.
For example:
1. When 2 EC_DATA(W) writes compete the IBF=0, the 2nd EC_DATA(W) write may
be invalid due to IBF=1 after the 1st EC_DATA(W) write. Then the
hardware will either refuse to respond a next EC_SC(W) write of the next
command or discard the current WR_EC command when it receives a EC_SC(W)
write of the next command.
2. When 1 EC_SC(W) write and 1 EC_DATA(W) write compete the IBF=0, the
EC_DATA(W) write may be invalid due to IBF=1 after the EC_SC(W) write.
The next EC_DATA(R) could never be responded by the hardware. This is
the root cause of the reported issue.
Fix this issue by moving the EC_SC(R) access into the lock so that we can
ensure that the state machine is advanced consistently.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=70891 Link: https://bugzilla.kernel.org/show_bug.cgi?id=63931 Link: https://bugzilla.kernel.org/show_bug.cgi?id=59911 Reported-and-tested-by: Gareth Williams <gareth@garethwilliams.me.uk> Reported-and-tested-by: Hans de Goede <jwrdegoede@fedoraproject.org> Reported-by: Barton Xu <tank.xuhan@gmail.com> Tested-by: Steffen Weber <steffen.weber@gmail.com> Tested-by: Arthur Chen <axchen@nvidia.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The module test script for the adm1021 driver exposes a cache problem
when writing temperature limits. temp_min and temp_max are expected
to be stored in milli-degrees C but are stored in degrees C.
Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Writing to fanX_div does not clear the cache. As a result, reading
from fanX_div may return the old value for up to two seconds
after writing a new value.
This patch ensures the fan_div cache is updated in set_fan_div().
Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Axel Lin <axel.lin@ingics.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Upper limit for write operations to temperature limit registers
was clamped to a fractional value. However, limit registers do
not support fractional values. As a result, upper limits of 127.5
degrees C or higher resulted in a rounded limit of 128 degrees C.
Since limit registers are signed, this was stored as -128 degrees C.
Clamp limits to (-55, +127) degrees C to solve the problem.
Value on writes to auto_temp[12]_min and auto_temp[12]_max were not
clamped at all, but masked. As a result, out-of-range writes resulted
in a more or less arbitrary limit. Clamp those attributes to (0, 127)
degrees C for more predictable results.
Cc: Axel Lin <axel.lin@ingics.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
It is customary to clamp limits instead of bailing out with an error
if a configured limit is out of the range supported by the driver.
This simplifies limit configuration, since the user will not typically
know chip and/or driver specific limits.
On 05/21/2014 04:22 PM, Aaron Lu wrote:
> On 05/21/2014 01:57 PM, Kui Zhang wrote:
>> Hello,
>>
>> I get following error when rmmod thermal.
>>
>> rmmod thermal
>> Killed
While dealing with this problem, I found another problem that also
results in a kernel crash on thermal module removal:
From: Aaron Lu <aaron.lu@intel.com>
Date: Wed, 21 May 2014 16:05:38 +0800
Subject: thermal: hwmon: Make the check for critical temp valid consistent
We used the tz->ops->get_crit_temp && !tz->ops->get_crit_temp(tz, temp)
to decide if we need to create the temp_crit attribute file but we just
check if tz->ops->get_crit_temp exists to decide if we need to remove
that attribute file. Some ACPI thermal zone doesn't have a valid critical
trip point and that would result in removing a non-existent device file
on thermal module unload.
In my investigation, I found the root cause is wq_numa_possible_cpumask.
All entries of wq_numa_possible_cpumask is allocated by
alloc_cpumask_var_node(). And these entries are used without initializing.
So these entries have wrong value.
When hot-adding and onlining CPU, wq_update_unbound_numa() is called.
wq_update_unbound_numa() calls alloc_unbound_pwq(). And alloc_unbound_pwq()
calls get_unbound_pool(). In get_unbound_pool(), worker_pool->node is set
as follow:
3592 /* if cpumask is contained inside a NUMA node, we belong to that node */
3593 if (wq_numa_enabled) {
3594 for_each_node(node) {
3595 if (cpumask_subset(pool->attrs->cpumask,
3596 wq_numa_possible_cpumask[node])) {
3597 pool->node = node;
3598 break;
3599 }
3600 }
3601 }
But wq_numa_possible_cpumask[node] does not have correct cpumask. So, wrong
node is selected. As a result, kernel panic occurs.
By this patch, all entries of wq_numa_possible_cpumask are allocated by
zalloc_cpumask_var_node to initialize them. And the panic disappeared.
The cause is that cpuset_mems_allowed() try to take
mutex_lock(&callback_mutex) under the rcu_read_lock(which was hold in
__mpol_dup()). And in cpuset_mems_allowed(), the access to cpuset is
under rcu_read_lock, so in __mpol_dup, we can reduce the rcu_read_lock
protection region to protect the access to cpuset only in
current_cpuset_is_being_rebound(). So that we can avoid this bug.
This patch is a temporary solution that just addresses the bug
mentioned above, can not fix the long-standing issue about cpuset.mems
rebinding on fork():
"When the forker's task_struct is duplicated (which includes
->mems_allowed) and it races with an update to cpuset_being_rebound
in update_tasks_nodemask() then the task's mems_allowed doesn't get
updated. And the child task's mems_allowed can be wrong if the
cpuset's nodemask changes before the child has been added to the
cgroup's tasklist."
Since AI lines could be selected at will (linux-3.11) the sending
and receiving ends of the FIFO does not agree about what step is used
for a line. It only works if the last lines are used, like 5,6,7,
and fails if ie 2,4,6 is selected in DT.
Signed-off-by: Jan Kardell <jan.kardell@telliq.com> Tested-by: Zubair Lutfullah <zubair.lutfullah@gmail.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Corsair USB Dongles are shipped with Corsair AXi series PSUs.
These are cp210x serial usb devices, so make driver detect these.
I have a program, that can get information from these PSUs.
Tested with 2 different dongles shipped with Corsair AX860i and
AX1200i units.
In v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
introduced vma merging to mbind(), but it should have also changed the
convention of passing start vma from queue_pages_range() (formerly
check_range()) to new_vma_page(): vma merging may have already freed
that structure, resulting in BUG at mm/mempolicy.c:1738 and probably
worse crashes.
This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.
When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function
sym_dequeue_from_squeue.
This function aborts them with DID_SOFT_ERROR.
If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.
The result is that disk returning QUEUE FULL causes request failures.
The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags. The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags. The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Causes the ratelimiting to spam the kernel log with the "callbacks suppressed"
message below, even while the dev_dbg it is supposed to rate limit wouldn't
print anything because DEBUG is not defined for this device.
a27fbf2f067b0cd ("mmc: add ignorance case for CMD13 CRC error") produced
a cmd.flags unhandled in realtek pci host driver. This will make MMC
card fail to initialize, this patch is used to handle the new cmd.flags
condition and MMC card can be used.
If INPCK is not set, input parity detection should be disabled. This means
parity errors should not be received from the tty driver, and the data
received should be treated normally.
SUS v3, 11.2.2, General Terminal Interface - Input Modes, states:
"If INPCK is set, input parity checking shall be enabled. If INPCK is
not set, input parity checking shall be disabled, allowing output parity
generation without input parity errors. Note that whether input parity
checking is enabled or disabled is independent of whether parity detection
is enabled or disabled (see Control Modes). If parity detection is enabled
but input parity checking is disabled, the hardware to which the terminal
is connected shall recognize the parity bit, but the terminal special file
shall not check whether or not this bit is correctly set."
Ignore parity errors reported by the tty driver when INPCK is not set, and
handle the received data normally.
Fixes: Bugzilla #71681, 'Improvement of n_tty_receive_parity_error from n_tty.c' Reported-by: Ivan <athlon_@mail.ru> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
If IGNBRK is set without either BRKINT or PARMRK set, some uart
drivers send a 0x00 byte for BREAK without the TTYBREAK flag to the
line discipline, when it should send either nothing or the TTYBREAK flag
set. This happens because the read_status_mask masks out the BI
condition, which uart_insert_char() then interprets as a normal 0x00 byte.
SUS v3 is clear regarding the meaning of IGNBRK; Section 11.2.2, General
Terminal Interface - Input Modes, states:
"If IGNBRK is set, a break condition detected on input shall be ignored;
that is, not put on the input queue and therefore not read by any
process."
Fix read_status_mask to include the BI bit if IGNBRK is set; the
lsr status retains the BI bit if a BREAK is recv'd, which is
subsequently ignored in uart_insert_char() when masked with the
ignore_status_mask.
With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
the following is triggered at early boot:
SMP: Total of 8 processors activated.
devtmpfs: initialized
Unable to handle kernel NULL pointer dereference at virtual address 00000008
pgd = fffffe0000050000
[00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
Internal error: Oops: 96000006 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
PC is at __list_add+0x10/0xd4
LR is at free_one_page+0x270/0x638
...
Call trace:
__list_add+0x10/0xd4
free_one_page+0x26c/0x638
__free_pages_ok.part.52+0x84/0xbc
__free_pages+0x74/0xbc
init_cma_reserved_pageblock+0xe8/0x104
cma_init_reserved_areas+0x190/0x1e4
do_one_initcall+0xc4/0x154
kernel_init_freeable+0x204/0x2a8
kernel_init+0xc/0xd4
This happens because init_cma_reserved_pageblock() calls
__free_one_page() with pageblock_order as page order but it is bigger
than MAX_ORDER. This in turn causes accesses past zone->free_list[].
Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is bigger
than a MAX_ORDER page.
In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
â\80\9cpageblock_order > MAX_ORDERâ\80\9d condition will be optimised out since both
sides of the operator are constants. In cases where pageblock size is
variable, the performance degradation should not be significant anyway
since init_cma_reserved_pageblock() is called only at boot time at most
MAX_CMA_AREAS times which by default is eight.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Mark Salter <msalter@redhat.com> Tested-by: Christopher Covington <cov@codeaurora.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The ras3 block on spear320 claims to have 3 interrupts. In fact it has
one and 6 reserved interrupts. Account the 6 reserved to this block so
it has 7 interrupts total. That matches the datasheet and the device
tree entries.
Broken since commit 80515a5a(ARM: SPEAr3xx: shirq: simplify and move
the shared irq multiplexor to DT). Testing is overrated....
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20140619212712.872379208@linutronix.de Fixes: 80515a5a2e3c ('ARM: SPEAr3xx: shirq: simplify and move the shared irq multiplexor to DT') Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
When we write to a degraded array which has a bitmap, we
make sure the relevant bit in the bitmap remains set when
the write completes (so a 're-add' can quickly rebuilt a
temporarily-missing device).
If, immediately after such a write starts, we incorporate a spare,
commence recovery, and skip over the region where the write is
happening (because the 'needs recovery' flag isn't set yet),
then that write will not get to the new device.
Once the recovery finishes the new device will be trusted, but will
have incorrect data, leading to possible corruption.
We cannot set the 'needs recovery' flag when we start the write as we
do not know easily if the write will be "degraded" or not. That
depends on details of the particular raid level and particular write
request.
This patch fixes a corruption issue of long standing and so it
suitable for any -stable kernel. It applied correctly to 3.0 at
least and will minor editing to earlier kernels.
Reported-by: Bill <billstuff2001@sbcglobal.net> Tested-by: Bill <billstuff2001@sbcglobal.net> Link: http://lkml.kernel.org/r/53A518BB.60709@sbcglobal.net Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Disabling reading and writing to the trace file should not be able to
disable all function tracing callbacks. There's other users today
(like kprobes and perf). Reading a trace file should not stop those
from happening.
It appears that no one ever run ffs-test on a big-endian machine,
since it used cpu-endianess for fs_count and hs_count fields which
should be in little-endian format. Fix by wrapping the numbers in
cpu_to_le32.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.
The vfs, on the other hand, wants null-terminated data.
The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.
The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.
But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.
Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.
In the NFSv2/v3 cases this actually appears to be safe:
- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.
In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.
The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.
Reported-by: Jeff Layton <jlayton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Currently in the inkern.c code for IIO framework, the function
of_iio_channel_get_by_name() will return a non-NULL pointer when
it cannot find a channel using of_iio_channel_get() and when it
tries to search for 'io-channel-ranges' property and fails. This
is incorrect behaviour as the function which calls this expects
a NULL pointer for failure. This patch rectifies the issue.
Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
sometime make assumptions on CPUs while they ignore capability MSRs, it is
better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
actually supported has no functional implications.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Hole punching code for files with indirect blocks wrongly computed
number of blocks which need to be cleared when traversing the indirect
block tree. That could result in punching more blocks than actually
requested and thus effectively cause a data loss. For example:
Error recovery in ext4_alloc_branch() calls ext4_forget() even for
buffer corresponding to indirect block it did not allocate. This leads
to brelse() being called twice for that buffer (once from ext4_forget()
and once from cleanup in ext4_ind_map_blocks()) leading to buffer use
count misaccounting. Eventually (but often much later because there
are other users of the buffer) we will see messages like:
VFS: brelse: Trying to free free buffer
Another manifestation of this problem is an error:
JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);
inconsistent data on disk
The fix is easy - don't forget buffer we did not allocate. Also add an
explanatory comment because the indexing at ext4_alloc_branch() is
somewhat subtle.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The fix converts blkg->refcnt from int to atomic_t. It does some
overhead but it should be minute compared to everything else which is
going on and the involved cacheline bouncing, so I think it's highly
unlikely to cause any noticeable difference. Also, the refcnt in
question should be converted to a perpcu_ref for blk-mq anyway, so the
atomic_t is likely to go away pretty soon anyway.
Thanks.
------- 8< -------
__blkg_release_rcu() may be invoked after the associated request_queue
is released with a RCU grace period inbetween. As such, the function
and callbacks invoked from it must not dereference the associated
request_queue. This is clearly indicated in the comment above the
function.
Unfortunately, while trying to fix a different issue, 2a4fd070ee85
("blkcg: move bulk of blkcg_gq release operations to the RCU
callback") ignored this and added [un]locking of @blkg->q->queue_lock
to __blkg_release_rcu(). This of course can cause oops as the
request_queue may be long gone by the time this code gets executed.
The request_queue locking was added because blkcg_gq->refcnt is an int
protected with the queue lock and __blkg_release_rcu() needs to put
the parent. Let's fix it by making blkcg_gq->refcnt an atomic_t and
dropping queue locking in the function.
Given the general heavy weight of the current request_queue and blkcg
operations, this is unlikely to cause any noticeable overhead.
Moreover, blkcg_gq->refcnt is likely to be converted to percpu_ref in
the near future, so whatever (most likely negligible) overhead it may
add is temporary.
When we SMB3 mounted with mapchars (to allow reserved characters : \ / > < * ?
via the Unicode Windows to POSIX remap range) empty paths
(eg when we open "" to query the root of the SMB3 directory on mount) were not
null terminated so we sent garbarge as a path name on empty paths which caused
SMB2/SMB2.1/SMB3 mounts to fail when mapchars was specified. mapchars is
particularly important since Unix Extensions for SMB3 are not supported (yet)
Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Support for firmware rev 508+ was added years ago, but we never noticed
it reports channel in a different way for G-PHY devices. Instead of
offset from 2400 MHz it simply passes channel id (AKA hw_value).
So far it was (most probably) affecting monitor mode users only, but
the following recent commit made it noticeable for quite everybody:
If the mdio probe function fails in emac_open, the interrupt we just requested
isn't freed. If emac_open is called again, for example because we try to set up
the interface again, the kernel will oops because the interrupt wasn't properly
released.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The value of ESR has been stored into x1, and should be directly pass to
do_sp_pc_abort function, "MOV x1, x25" is an extra operation and do_sp_pc_abort
will get the wrong value of ESR.
Fix a parser-bug in the omap2 muxing code where muxtable-entries will be
wrongly selected if the requested muxname is a *prefix* of their
m0-entry and they have a matching mN-entry. Fix by additionally checking
that the length of the m0_entry is equal.
For example muxing of "dss_data2.dss_data2" on omap32xx will fail
because the prefix "dss_data2" will match the mux-entries "dss_data2" as
well as "dss_data20", with the suffix "dss_data2" matching m0 (for
dss_data2) and m4 (for dss_data20). Thus both are recognized as signal
path candidates:
This will result in a failure to mux the pin at all:
_omap_mux_get_by_name: Multiple signal paths (2) for dss_data2.dss_data2
Patch should apply to linus' latest master down to rather old linux-2.6
trees.
Signed-off-by: David R. Piegdon <lkml@p23q.org> Cc: stable@vger.kernel.org
[tony@atomide.com: updated description to include full description] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The __sync_icache_dcache routine will only flush the dcache for the
first page of a compound page, potentially leading to stale icache
data residing further on in a hugetlb page.
This patch addresses this issue by taking into consideration the
order of the page when flushing the dcache.
Reported-by: Mark Brown <broonie@linaro.org> Tested-by: Mark Brown <broonie@linaro.org> Signed-off-by: Steve Capper <steve.capper@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.
This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity. If a value in the range [1, 7]
is written, the sysctl will return EINVAL.
This successfully solves the divide by zero issue at the same time.
Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Without this fix, freshly rebooted Linux creates a new IBSS
instead of joining an existing one. Only when jiffies counter
overflows after 5 minutes the IBSS can be successfully joined.
Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
[edit commit message slightly] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
All devices supported by ina2xx are bidirectional and report the
measured shunt voltage and power values as a signed 16 bit, but the
current driver implementation caches all registers as u16, leading
to an incorrect sign extension when reporting to userspace in
ina2xx_get_value().
This patch fixes the problem by casting the signed registers to s16.
Tested on an INA219.
allows the fall through to the non-layered write case even if both
parent_overlap and obj_request->img_offset belong to the same RADOS
object. This leads to data corruption, because the area to the left of
parent_overlap ends up unconditionally zero-filled instead of being
populated with parent data. Suppose we want to write 1M to offset 6M
of image bar, which is a clone of foo@snap; object_size is 4M,
parent_overlap is 5M:
rbd_data.<id>.0000000000000001
---------------------|----------------------|------------
| should be copyup'ed | should be zeroed out | write ...
---------------------|----------------------|------------
4M 5M 6M
parent_overlap obj_request->img_offset
4..5M should be copyup'ed from foo, yet it is zero-filled, just like
5..6M is.
Given that the only striping mode kernel client currently supports is
chunking (i.e. stripe_unit == object_size, stripe_count == 1), round
parent_overlap up to the next object boundary for the purposes of the
overlap check.
Each image request contains a reference count, but to date it has
not actually been used. (I think this was just an oversight.) A
recent report involving rbd failing an assertion shed light on why
and where we need to use these reference counts.
Every OSD request associated with an object request uses
rbd_osd_req_callback() as its callback function. That function will
call a helper function (dependent on the type of OSD request) that
will set the object request's "done" flag if the object request if
appropriate. If that "done" flag is set, the object request is
passed to rbd_obj_request_complete().
In rbd_obj_request_complete(), requests are processed in sequential
order. So if an object request completes before one of its
predecessors in the image request, the completion is deferred.
Otherwise, if it's a completing object's "turn" to be completed, it
is passed to rbd_img_obj_end_request(), which records the result of
the operation, accumulates transferred bytes, and so on. Next, the
successor to this request is checked and if it is marked "done",
(deferred) completion processing is performed on that request, and
so on. If the last object request in an image request is completed,
rbd_img_request_complete() is called, which (typically) destroys
the image request.
There is a race here, however. The instant an object request is
marked "done" it can be provided (by a thread handling completion of
one of its predecessor operations) to rbd_img_obj_end_request(),
which (for the last request) can then lead to the image request
getting torn down. And this can happen *before* that object has
itself entered rbd_img_obj_end_request(). As a result, once it
*does* enter that function, the image request (and even the object
request itself) may have been freed and become invalid.
All that's necessary to avoid this is to properly count references
to the image requests. We tear down an image request's object
requests all at once--only when the entire image request has
completed. So there's no need for an image request to count
references for its object requests. However, we don't want an
image request to go away until the last of its object requests
has passed through rbd_img_obj_callback(). In other words,
we don't want rbd_img_request_complete() to necessarily
result in the image request being destroyed, because it may
get called before we've finished processing on all of its
object requests.
So the fix is to add a reference to an image request for
each of its object requests. The reference can be viewed
as representing an object request that has not yet finished
its call to rbd_img_obj_callback(). That is emphasized by
getting the reference right after assigning that as the image
object's callback function. The corresponding release of that
reference is done at the end of rbd_img_obj_callback(), which
every image object request passes through exactly once.
DM thinp already checks whether the discard_granularity of the data
device is a factor of the thin-pool block size. But when using the
dm-thin-pool's discard passdown support, DM thinp was not selecting the
max of the underlying data device's discard_granularity and the
thin-pool's block size.
Update set_discard_limits() to set discard_granularity to the max of
these values. This enables blkdev_issue_discard() to properly align the
discards that are sent to the DM thin device on a full block boundary.
As such each discard will now cover an entire DM thin-pool block and the
block will be reclaimed.
The SMP code expects hdev to be unlocked since e.g. crypto functions
will try to (re)lock it. Therefore, we need to release the lock before
calling into smp.c from mgmt.c. Without this we risk a deadlock whenever
the smp_user_confirm_reply() function is called.
When inquiry is canceled through the HCI_Cancel_Inquiry command there is
no Inquiry Complete event generated. Instead, all we get is the command
complete for the HCI_Inquiry_Cancel command. This means that we must
call the hci_discovery_set_state() function from the respective command
complete handler in order to ensure that user space knows the correct
discovery state.
From the Bluetooth Core Specification 4.1 page 1958:
"if both devices have set the Authentication_Requirements parameter to
one of the MITM Protection Not Required options, authentication stage 1
shall function as if both devices set their IO capabilities to
DisplayOnly (e.g., Numeric comparison with automatic confirmation on
both devices)"
So far our implementation has done user confirmation for all just-works
cases regardless of the MITM requirements, however following the
specification to the word means that we should not be doing confirmation
when neither side has the MITM flag set.
Commit "drm/vmwgfx: correct fb_fix_screeninfo.line_length", while fixing a
vmwgfx fbdev bug, also writes the pitch to a supposedly read-only register:
SVGA_REG_BYTES_PER_LINE, while it should be (and also in fact is) written to
SVGA_REG_PITCHLOCK.
This patch is Cc'd stable because of the unknown effects writing to this
register might have, particularly on older device versions.
v2: Updated log message.
Cc: Christopher Friedt <chrisfriedt@gmail.com> Tested-by: Christopher Friedt <chrisfriedt@gmail.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Possibly also fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=68571
Noticed-by: Jonathan Howard <jonathan@unbiased.name> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
In omap_elm_correct_data(), if bitflip_count in an erased-page is within the
correctable limit (< ecc.strength), then it is not indicated back to the caller
ecc->read_page().
This mis-guides upper layers like MTD and UBIFS layer to assume erased-page as
perfectly clean and use it for writing even if actual bitflip_count was
dangerously high (bitflip_count > mtd->bitflip_threshold).
This patch fixes this above issue, by returning 'stats' to caller
ecc->read_page() under all scenarios.
Reported-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Pekon Gupta <pekon@ti.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
As subpage write is enabled by default for all drivers, nand_write_subpage_hwecc
causes a crash if the driver did not register ecc->hwctl or ecc->calculate.
This behavior was introduced in
commit 837a6ba4f3b6d23026674e6af6b6849a4634fff9
"mtd: nand: subpage write support for hardware based ECC schemes".
This fixes a crash by emulating subpage write support by padding sub-page data
with 0xff on either sides to make it full page compatible.
Reported-by: Helmut Schaa <helmut.schaa@googlemail.com> Tested-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: Pekon Gupta <pekon@ti.com> Reviewed-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
As reported by Niels, starting rfkill polling during device probe
(commit e2bc7c5, generally sane change) broke rfkill on rt2500pci
device. I considered that bug as some initalization issue, which
should be fixed on rt2500pci specific code. But after several
attempts (see bug report for details) we fail to find working solution.
Hence I decided to revert to old behaviour on rt2500pci to fix
regression.
Additionally patch also unregister rfkill on device remove instead
of ifconfig down, what was another issue introduced by bad commit.
If the descriptors do not need any strings and user space sends empty
set of strings, the ffs->stringtabs field remains NULL. Thus
*ffs->stringtabs in functionfs_bind leads to a NULL pointer
dereferenece.
The bug was introduced by commit [fd7c9a007f: “use usb_string_ids_n()”].
While at it, remove double initialisation of lang local variable in
that function.
ffs->strings_count does not need to be checked in any way since in
the above scenario it will remain zero and usb_string_ids_n() is
a no-operation when colled with 0 argument.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
We need to delete un-finished td from current request's td list
at ep_dequeue API, otherwise, this non-user td will be remained
at td list before this request is freed. So if we do ep_queue->
ep_dequeue->ep_queue sequence, when the complete interrupt for
the second ep_queue comes, we search td list for this request,
the first td (added by the first ep_queue) will be handled, and
its status is still active, so we will consider the this transfer
still not be completed, but in fact, it has completed. It causes
the peripheral side considers it never receives current data for
this transfer.
We met this problem when do "Error Recovery Test - Device Configured"
test item for USBCV2 MSC test, the host has never received ACK for
the IN token for CSW due to peripheral considers it does not get this
CBW, the USBCV test log like belows:
--------------------------------------------------------------------------
INFO
Issuing BOT MSC Reset, reset should always succeed
INFO
Retrieving status on CBW endpoint
INFO
CBW endpoint status = 0x0
INFO
Retrieving status on CSW endpoint
INFO
CSW endpoint status = 0x0
INFO
Issuing required command (Test Unit Ready) to verify device has recovered
INFO
Issuing CBW (attempt #1):
INFO
|----- CBW LUN = 0x0
INFO
|----- CBW Flags = 0x0
INFO
|----- CBW Data Transfer Length = 0x0
INFO
|----- CBW CDB Length = 0x6
INFO
|----- CBW CDB-00 = 0x0
INFO
|----- CBW CDB-01 = 0x0
INFO
|----- CBW CDB-02 = 0x0
INFO
|----- CBW CDB-03 = 0x0
INFO
|----- CBW CDB-04 = 0x0
INFO
|----- CBW CDB-05 = 0x0
INFO
Issuing CSW : try 1
INFO
CSW Bulk Request timed out!
ERROR
Failed CSW phase : should have been success or stall
FAIL
(5.3.4) The CSW status value must be 0x00, 0x01, or 0x02.
ERROR
BOTCommonMSCRequest failed: error=80004000
Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com> Signed-off-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
At probe time, the musb_am335x driver register its childs by
calling of_platform_populate(), which registers all childs in
the devicetree hierarchy recursively.
On the other side, the driver's remove() function uses of_device_unregister()
to remove each child of musb_am335x's.
However, when musb_dsps is loaded, its devices are attached to the musb_am335x
device as musb_am335x childs. Hence, musb_am335x remove() will attempt to
unregister the devices registered by musb_dsps, which produces a kernel panic.
In other words, the childs in the "struct device" hierarchy are not the same
as the childs in the "devicetree" hierarchy.
Ideally, we should enforce the removal of the devices registered by
musb_am335x *only*, instead of all its child devices. However, because of the
recursive nature of of_platform_populate, this doesn't seem possible.
Therefore, as the only solution at hand, this commit disables musb_am335x
driver removal capability, preventing it from being ever removed. This was
originally suggested by Sebastian Siewior:
Some TI chips raise the DMA complete interrupt before the actual
transfer has been completed. The code tries to busy wait for a few
microseconds and if that fails it arms an hrtimer to recheck. So far
so good, but that has the following issue:
CPU 0 CPU1
start_next_transfer(RQ1);
DMA interrupt
if (premature_irq(RQ1))
if (!hrtimer_active(timer))
hrtimer_start(timer);
DMA interrupt
if (premature_irq(RQ2))
if (!hrtimer_active(timer))
hrtimer_start(timer);
timer->state = INACTIVE;
The premature interrupt of request2 on CPU1 does not arm the timer and
therefor the request completion never happens because it checks for
!hrtimer_active(). hrtimer_active() evaluates:
timer->state != HRTIMER_STATE_INACTIVE
which of course evaluates to true in the above case as timer->state is
CALLBACK_RUNNING.
That's clearly documented:
* A timer is active, when it is enqueued into the rbtree or the
* callback function is running or it's in the state of being migrated
* to another cpu.
But that's not what the code wants to check. The code wants to check
whether the timer is queued, i.e. whether its armed and waiting for
expiry.
We have a helper function for this: hrtimer_is_queued(). This
evaluates:
timer->state & HRTIMER_STATE_QUEUED
So in the above case this evaluates to false and therefor forces the
DMA interrupt on CPU1 to call hrtimer_start().
Use hrtimer_is_queued() instead of hrtimer_active() and evrything is
good.
Reported-by: Torben Hohn <torbenh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
There is a regression in the upcoming v3.16-rc1, that is caused
by a problem that has been around for a while but now finally
hangs the system. The bootcrawl looks like this:
pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already
requested by a03e0000.usb_per5; cannot claim for musb-hdrc.0.auto
pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.0.auto) status -22
pinctrl-nomadik soc:pinctrl: could not request pin 256
(GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik
musb-hdrc musb-hdrc.0.auto: Error applying setting, reverse
things back
HS USB OTG: no transceiver configured
musb-hdrc musb-hdrc.0.auto: musb_init_controller failed
with status -517
platform musb-hdrc.0.auto: Driver musb-hdrc requests
probe deferral
(...)
The ux500 MUSB driver propagates the OF node to the dynamically
created musb-hdrc device, which is incorrect as it makes the OF
core believe there are two devices spun from the very same
DT node, which confuses other parts of the device core, notably
the pin control subsystem, which will try to apply all the pin
control settings also to the HDRC device as it gets
instantiated. (The OMAP2430 for example, does not set the
of_node member.)
Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
This device vendor and product id is 1c9e:9800
It is working as serial interface with generic usbserial driver.
I thought it is more suitable to use usbserial option driver, which has
better capability distinguishing between modem serial interface and
micro sd storage interface.
[ johan: style changes ]
Signed-off-by: Oliver Neukum <oneukum@suse.de> Tested-by: Alif Mubarak Ahmad <alive4ever@live.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
The system suspend flow as following:
1, Freeze all user processes and kenrel threads.
2, Try to suspend all devices.
2.1, If pci device is in RPM suspended state, then pci driver will try
to resume it to RPM active state in the prepare stage.
2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
workqueue items to resume usb2&usb3 roothub devices.
2.3, Call suspend callbacks of devices.
2.3.1, All suspend callbacks of all hcd's children, including
roothub devices are called.
2.3.2, Finally, hcd_pci_suspend callback is called.
Due to workqueue threads were already frozen in step 1, the workqueue
items can't be scheduled, and the roothub devices can't be resumed in
this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
usb_hcd_resume_root_hub won't be cleared. Finally,
hcd_pci_suspend will return -EBUSY, and system suspend fails.
The reason why this issue doesn't show up very often is due to that
choose_wakeup will be called in step 2.3.1. In step 2.3.1, if
udev->do_remote_wakeup is not equal to device_may_wakeup(&udev->dev), then
udev will resume to RPM active for changing the wakeup settings. This
has been a lucky hit which hides this issue.
For some special xHCI controllers which have no USB2 port, then roothub
will not match hub driver due to probe failed. Then its
do_remote_wakeup will be set to zero, and we won't be as lucky.
xhci driver doesn't need to resume roothub devices everytime like in
the above case. It's only needed when there are pending event TRBs.
This patch should be back-ported to kernels as old as 3.2, that
contains the commit f69e3120df82391a0ee8118e0a156239a06b2afb
"USB: XHCI: resume root hubs when the controller resumes"
The transfer burst count (TBC) field in xhci 1.0 hosts should be set
to the number of bursts needed to transfer all packets in a isoc TD.
Supported values are 0-2 (1 to 3 bursts per service interval).
Formula for TBC calculation is given in xhci spec section 4.11.2.3:
TBC = roundup( Transfer Descriptor Packet Count / Max Burst Size +1 ) - 1
This patch should be applied to stable kernels since 3.0 that contain
the commit 5cd43e33b9519143f06f507dd7cbee6b7a621885
"xhci 1.0: Set transfer burst count field."
Even though the virtio-scsi spec guarantees that all requests related
to the TMF will have been completed by the time the TMF itself completes,
the request queue's callback might not have run yet. This causes requests
to be completed more than once, and as a result triggers a variety of
BUGs or oopses.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Venkatesh Srinivas <venkateshs@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
After scsi_try_to_abort_cmd returns, the eh_abort_handler may have
already found that the command has completed in the device, causing
the host_byte to be nonzero (e.g. it could be DID_ABORT). When
this happens, ORing DID_TIME_OUT into the host byte will corrupt
the result field and initiate an unwanted command retry.
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
[Fix all instances according to review comments. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Calling the workqueue interface on uninitialized work items isn't a
good idea even if they're zeroed. It's not failing catastrophically only
through happy accidents.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Add a memory barrier prior to sending a new command to the VIOS
to ensure the VIOS does not receive stale data in the command buffer.
Also add a memory barrier when processing the CRQ for completed commands.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
If a CRQ reset is triggered for some reason while in the middle
of performing VSCSI adapter initialization, we don't want to
call the done function for the initialization MAD commands as
this will only result in two threads attempting initialization
at the same time, resulting in failures.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
When a USB-audio device is disconnected while PCM is still running, we
still see some race: the disconnect callback calls
snd_usb_endpoint_free() that calls release_urbs() and then kfree()
while a PCM stream would be closed at the same time and calls
stop_endpoints() that leads to wait_clear_urbs(). That is, the EP
object might be deallocated while a PCM stream is syncing with
wait_clear_urbs() with the same EP.
Basically calling multiple wait_clear_urbs() would work fine, also
calling wait_clear_urbs() and release_urbs() would work, too, as
wait_clear_urbs() just reads some fields in ep. The problem is the
succeeding kfree() in snd_pcm_endpoint_free().
This patch moves out the EP deallocation into the later point, the
destructor callback. At this stage, all PCMs must have been already
closed, so it's safe to free the objects.
syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
the process/thread lists yet.
Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.
Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints" Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz>