dwc_otg: Don't issue traffic to LS devices in FS mode
Issuing low-speed packets when the root port is in full-speed mode
causes the root port to stop responding. Explicitly fail when
enqueuing URBs to a LS endpoint on a FS bus.
dwc_otg: Enable NAK holdoff for control split transactions
Certain low-speed devices take a very long time to complete a
data or status stage of a control transaction, producing NAK
responses until they complete internal processing - the USB2.0
spec limit is up to 500mS. This causes the same type of interrupt
storm as seen with USB-serial dongles prior to c8edb238.
In certain circumstances, usually while booting, this interrupt
storm could cause SD card timeouts.
P33M [Mon, 5 Aug 2013 10:47:12 +0000 (11:47 +0100)]
dwc_otg: prevent crashes on host port disconnects
Fix several issues resulting in crashes or inconsistent state
if a Model A root port was disconnected.
- Clean up queue heads properly in kill_urbs_in_qh_list by
removing the empty QHs from the schedule lists
- Set the halt status properly to prevent IRQ handlers from
using freed memory
- Add fiq_split related cleanup for saved registers
- Make microframe scheduling reclaim host channels if
active during a disconnect
- Abort URBs with -ESHUTDOWN status response, informing
device drivers so they respond in a more correct fashion
and don't try to resubmit URBs
- Prevent IRQ handlers from attempting to handle channel
interrupts if the associated URB was dequeued (and the
driver state was cleared)
P33M [Fri, 2 Aug 2013 09:04:18 +0000 (10:04 +0100)]
dwc_otg: fix potential sleep while atomic during urb enqueue
Fixes a regression introduced with eb1b482a. Kmalloc called from
dwc_otg_hcd_qtd_add / dwc_otg_hcd_qtd_create did not always have
the GPF_ATOMIC flag set. Force this flag when inside the larger
critical section.
Philip Taylor [Tue, 28 May 2013 16:20:49 +0000 (17:20 +0100)]
vchiq_util: Fix race condition in push/pop
The lack of memory barriers could (very rarely) result in
vchiu_queue_pop reading the next value before it had been written
(getting either NULL, or a value that had been popped once already).
Matthew Hails [Mon, 13 May 2013 13:22:48 +0000 (14:22 +0100)]
VCHIQ: Fix mem leak of USER_SERVICE_T objects.
The userdata for VCHIQ services created through the ioctl API is
a kmalloced structure. These objects were getting leaked, most
notably in vchiq_release(), where the service could be closed,
freed and removed from the service list before the wait-to-die
loop was entered.
This change adds a userdata termination callback, and implements
it in the case where USER_SERVICE_T is used for the service
userdata.
This fixes certain issues with split transaction scheduling.
- Isochronous multi-packet OUT transactions now hog the TT until
they are completed - this prevents hubs aborting transactions
if they get a periodic start-split out-of-order
- Don't perform TT allocation on non-periodic endpoints - this
allows simultaneous use of the TT's bulk/control and periodic
transaction buffers
This commit will mainly affect USB audio playback.
dwc_otg: make channel halts with unknown state less damaging
If the IRQ received a channel halt interrupt through the FIQ
with no other bits set, the IRQ would not release the host
channel and never complete the URB.
Add catchall handling to treat as a transaction error and retry.
dwc_otg: prevent BUG() in TT allocation if hub address is > 16
A fixed-size array is used to track TT allocation. This was
previously set to 16 which caused a crash because
dwc_otg_hcd_allocate_port would read past the end of the array.
This was hit if a hub was plugged in which enumerated as addr > 16,
due to previous device resets or unplugs.
Also add #ifdef FIQ_DEBUG around hcd->hub_port_alloc[], which grows
to a large size if 128 hub addresses are supported. This field is
for debug only for tracking which frame an allocate happened in.
The dwc_otg_urb_enqueue function is thread-unsafe. In particular the
access of urb->hcpriv, usb_hcd_link_urb_to_ep, dwc_otg_urb->qtd and
friends does not occur within a critical section and so if a device
was unplugged during activity there was a high chance that the
usbcore hub_thread would try to disable the endpoint with partially-
formed entries in the URB queue. This would result in BUG() or null
pointer dereferences.
Fix so that access of urb->hcpriv, enqueuing to the hardware and
adding to usbcore endpoint URB lists is contained within a single
critical section.
dwc_otg: fiq: prevent FIQ thrash and incorrect state passing to IRQ
In the case of a transaction to a device that had previously aborted
due to an error, several interrupts are enabled to reset the error
count when a device responds. This has the side-effect of making the
FIQ thrash because the hardware will generate multiple instances of
a NAK on an IN bulk/interrupt endpoint and multiple instances of ACK
on an OUT bulk/interrupt endpoint. Make the FIQ mask and clear the
associated interrupts.
Additionally, on non-split transactions make sure that only unmasked
interrupts are cleared. This caused a hard-to-trigger but serious
race condition when you had the combination of an endpoint awaiting
error recovery and a transaction completed on an endpoint - due to
the sequencing and timing of interrupts generated by the dwc_otg core,
it was possible to confuse the IRQ handler.
dwc_otg: mask correct interrupts after transaction error recovery
The dwc_otg driver will unmask certain interrupts on a transaction
that previously halted in the error state in order to reset the
QTD error count. The various fine-grained interrupt handlers do not
consider that other interrupts besides themselves were unmasked.
By disabling the two other interrupts only ever enabled in DMA mode
for this purpose, we can avoid unnecessary function calls in the
IRQ handler. This will also prevent an unneccesary FIQ interrupt
from being generated if the FIQ is enabled.
The current timeout is being hit with some cards that complete successfully with a longer timeout.
The timeout is not handled well, and is believed to be a code path that causes corruption. 872a8ff suggests that crappy cards can take up to 3 seconds to respond
Nicolas Pitre [Tue, 12 Mar 2013 12:00:42 +0000 (13:00 +0100)]
ARM: 7670/1: fix the memset fix
Commit 455bd4c430b0 ("ARM: 7668/1: fix memset-related crashes caused by
recent GCC (4.7.2) optimizations") attempted to fix a compliance issue
with the memset return value. However the memset itself became broken
by that patch for misaligned pointers.
This fixes the above by branching over the entry code from the
misaligned fixup code to avoid reloading the original pointer.
Also, because the function entry alignment is wrong in the Thumb mode
compilation, that fixup code is moved to the end.
While at it, the entry instructions are slightly reworked to help dual
issue pipelines.
Signed-off-by: Nicolas Pitre <nico@linaro.org> Tested-by: Alexander Holler <holler@ahsoftware.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.
GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.
This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:
Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).
Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Dirk Behme <dirk.behme@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Harm Hanemaaijer [Thu, 20 Jun 2013 18:21:39 +0000 (20:21 +0200)]
Speed up console framebuffer imageblit function
Especially on platforms with a slower CPU but a relatively high
framebuffer fill bandwidth, like current ARM devices, the existing
console monochrome imageblit function used to draw console text is
suboptimal for common pixel depths such as 16bpp and 32bpp. The existing
code is quite general and can deal with several pixel depths. By creating
special case functions for 16bpp and 32bpp, by far the most common pixel
formats used on modern systems, a significant speed-up is attained
which can be readily felt on ARM-based devices like the Raspberry Pi
and the Allwinner platform, but should help any platform using the
fb layer.
The special case functions allow constant folding, eliminating a number
of instructions including divide operations, and allow the use of an
unrolled loop, eliminating instructions with a variable shift size,
reducing source memory access instructions, and eliminating excessive
branching. These unrolled loops also allow much better code optimization
by the C compiler. The code that selects which optimized variant is used
is also simplified, eliminating integer divide instructions.
The speed-up, measured by timing 'cat file.txt' in the console, varies
between 40% and 70%, when testing on the Raspberry Pi and Allwinner
ARM-based platforms, depending on font size and the pixel depth, with
the greater benefit for 32bpp.
Mike Bradley [Mon, 17 Jun 2013 18:31:42 +0000 (11:31 -0700)]
dwc_otg: Call usb_hcd_unlink_urb_from_ep with lock held in completion handler
usb_hcd_unlink_urb_from_ep must be called with the HCD lock held. Calling it
asynchronously in the tasklet was not safe (regression in c4564d4a1a0a9b10d4419e48239f5d99e88d2667).
This change unlinks it from the endpoint prior to queueing it for handling in
the tasklet, and also adds a check to ensure the urb is OK to be unlinked
before doing so.
NULL pointer dereference kernel oopses had been observed in usb_hcd_giveback_urb
when a USB device was unplugged/replugged during data transfer. This effect
was reproduced using automated USB port power control, hundreds of replug
events were performed during active transfers to confirm that the problem was
eliminated.
popcornmix [Tue, 7 May 2013 15:10:45 +0000 (16:10 +0100)]
VCHIQ: Use unsigned integers for unsigned values
The VCHIQ interface used signed integers for values which are
inherently unsigned. Worse, the parameter validation code treated them
as if they were unsigned, checking for overflow but not underflow.
This patch converts those integers to unsigned integers.
ALSA: bcm2835: add memory-mapped I/O mode for audio stream
ALSA supports two transfers methods for PCM playback: Read/Write
transfer where samples are writtern to the device using standard
read and write functions and Direct Read/Write transfers where
samples can be written directly to a mapped memory area and the
driver is signaled once this has been done.
The bcm2835 driver only supported Read/Write transfer method so
this patch adds mmap support to the driver.
The ARM CPU is not able to directly address the audio device
hardware buffer so audio samples are sent to the device using
a message passing interface (vchiq).
Since hardware buffers can't be directly mapped to user-space
memory, an intermediate buffer (using the PCM indirect API) is
needed to store the audio samples and push them to the device
through videocore.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
ALSA: bcm2835: defer bcm2835 PCM write using the workqueue
Only PCM playback and stop are deferred using a workqueue on
the ALSA bcm2835 driver. The actual writing of PCM samples
is synchronous.
This works well for the read/write transfer method since
the snd_pcm_ops .copy function pointer runs on process context.
But the snd_pcm_ops .ack function pointer used to implement
direct read/write transfer using a memory-mapped I/O, runs
with interrupts disabled. Since the Kernel to VideoCore
interface driver has to be able to sleep, PCM write has to be
deferred to in preparation to add mmap support to the driver.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
ALSA: bcm2835: don't use magic numbers on audio start/stop workqueue
The bcm2835 driver uses a workqueue to defer PCM playback start
and stop. Commands are passed to the function as magic numbers
so is more readable to use macro constants.
Also, while being there change the generic name "x" to "cmd"
for the var used to pass the start/stop commands.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
ALSA: bcm2835: use a completion instead of a semaphore for sync
The kernel completion interface is a simple synchronization
mechanism that is preferable over semaphores to signal the
occurence of an event.
Semaphores are optimized for the non-contention case since
in most situations they will be available. But the wait for
completion is the opposite case since the code calling down
will wait on most cases.
So, is better to use the completion mechanism that has been
developed for this exact use case instead of semaphores.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
ALSA: bcm2835: show card and subdev creation msg only on debug
On driver probe a message showing that a card and its subdevices
have been created is shown. Probably this message is not needed
unless we have debug enabled on the driver. So, use the driver
audio_info() debug macro instead of just printk().
Also is better to use dev_err() and pr_err() instead KERN_ERR.
Signed-off-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
dwc_otg: fix NAK holdoff and allow on split transactions only
This corrects a bug where if a single active non-periodic endpoint
had at least one transaction in its qh, on frnum == MAX_FRNUM the qh
would get skipped and never get queued again. This would result in
a silent device until error detection (automatic or otherwise) would
either reset the device or flush and requeue the URBs.
Additionally the NAK holdoff was enabled for all transactions - this
would potentially stall a HS endpoint for 1ms if a previous error state
enabled this interrupt and the next response was a NAK. Fix so that
only split transactions get held off.
Good job. I was too lazy to bisect for bad commit;)
Reading the code I found problematic kthread_should_stop call from netlink
connector which causes the oops. After applying a patch, I've been testing
owfs+w1 setup for nearly two days and it seems to work very reliable (no
hangs, no memleaks etc).
More detailed description and possible fix is given below:
Function w1_search can be called from either kthread or netlink callback.
While the former works fine, the latter causes oops due to kthread_should_stop
invocation.
This patch adds a check if w1_search is serving netlink command, skipping
kthread_should_stop invocation if so.
P33M [Thu, 21 Mar 2013 19:36:17 +0000 (19:36 +0000)]
dwc_otg: implement tasklet for returning URBs to usbcore hcd layer
The dwc_otg driver interrupt handler for transfer completion will spend
a very long time with interrupts disabled when a URB is completed -
this is because usb_hcd_giveback_urb is called from within the handler
which for a USB device driver with complicated processing (e.g. webcam)
will take an exorbitant amount of time to complete. This results in
missed completion interrupts for other USB packets which lead to them
being dropped due to microframe overruns.
This patch splits returning the URB to the usb hcd layer into a
high-priority tasklet. This will have most benefit for isochronous IN
transfers but will also have incidental benefit where multiple periodic
devices are active at once.
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.
GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.
This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:
Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).
Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:
Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".
Signed-off-by: Ivan Djelic <ivan.djelic@parrot.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Dirk Behme <dirk.behme@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Josh Boyer [Thu, 14 Feb 2013 14:39:09 +0000 (09:39 -0500)]
USB: usb-storage: unusual_devs update for Super TOP SATA bridge
The current entry in unusual_cypress.h for the Super TOP SATA bridge devices
seems to be causing corruption on newer revisions of this device. This has
been reported in Arch Linux and Fedora. The original patch was tested on
devices with bcdDevice of 1.60, whereas the newer devices report bcdDevice
as 2.20. Limit the UNUSUAL_DEV entry to devices less than 2.20.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=909591
The Arch Forum post on this is here:
https://bbs.archlinux.org/viewtopic.php?id=152011
Reported-by: Carsten S. <carsteniq@yahoo.com> Tested-by: Carsten S. <carsteniq@yahoo.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
P33M [Sun, 3 Mar 2013 14:45:53 +0000 (14:45 +0000)]
dwc_otg: add handling of SPLIT transaction data toggle errors
Previously a data toggle error on packets from a USB1.1 device behind
a TT would result in the Pi locking up as the driver never handled
the associated interrupt. Patch adds basic retry mechanism and
interrupt acknowledgement to cater for either a chance toggle error or
for devices that have a broken initial toggle state (FT8U232/FT232BM).