Switching only one commit root during a transaction is wrong because it
leads the fs into an inconsistent state. All commit roots should be
switched at once, at transaction commit time, otherwise backref walking
can often miss important references that were only accessible through
the old commit root. Plus, the root item for the snapshot's root wasn't
getting updated and preventing the next transaction commit to do it.
This made several users get into random corruption issues after creation
of readonly snapshots.
A regression test for xfstests will follow soon.
Cc: stable@vger.kernel.org # 3.17 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit d37973082b453ba6b89ec07eb7b84305895d35e1)
Sage Weil [Fri, 26 Sep 2014 15:30:06 +0000 (08:30 -0700)]
Btrfs: fix race in WAIT_SYNC ioctl
We check whether transid is already committed via last_trans_committed and
then search through trans_list for pending transactions. If
last_trans_committed is updated by btrfs_commit_transaction after we check
it (there is no locking), we will fail to find the committed transaction
and return EINVAL to the caller. This has been observed occasionally by
ceph-osd (which uses this ioctl heavily).
Fix by rechecking whether the provided transid <= last_trans_committed
after the search fails, and if so return 0.
Josef Bacik [Fri, 19 Sep 2014 14:40:00 +0000 (10:40 -0400)]
Btrfs: cleanup error handling in build_backref_tree
When balance panics it tends to panic in the
BUG_ON(!upper->checked);
test, because it means it couldn't build the backref tree properly. This is
annoying to users and frankly a recoverable error, nothing in this function is
actually fatal since it is just an in-memory building of the backrefs for a
given bytenr. So go through and change all the BUG_ON()'s to ASSERT()'s, and
fix the BUG_ON(!upper->checked) thing to just return an error.
This patch also fixes the error handling so it tears down the work we've done
properly. This code was horribly broken since we always just panic'ed instead
of actually erroring out, so it needed to be completely re-worked. With this
patch my broken image no longer panics when I mount it. Thanks,
Josef Bacik [Fri, 19 Sep 2014 19:43:34 +0000 (15:43 -0400)]
Btrfs: fix build_backref_tree issue with multiple shared blocks
Marc Merlin sent me a broken fs image months ago where it would blow up in the
upper->checked BUG_ON() in build_backref_tree. This is because we had a
scenario like this
block a -- level 4 (not shared)
|
block b -- level 3 (reloc block, shared)
|
block c -- level 2 (not shared)
|
block d -- level 1 (shared)
|
block e -- level 0 (shared)
We go to build a backref tree for block e, we notice block d is shared and add
it to the list of blocks to lookup it's backrefs for. Now when we loop around
we will check edges for the block, so we will see we looked up block c last
time. So we lookup block d and then see that the block that points to it is
block c and we can just skip that edge since we've already been up this path.
The problem is because we clear need_check when we see block d (as it is shared)
we never add block b as needing to be checked. And because block c is in our
path already we bail out before we walk up to block b and add it to the backref
check list.
To fix this we need to reset need_check if we trip over a block that doesn't
need to be checked. This will make sure that any subsequent blocks in the path
as we're walking up afterwards are added to the list to be processed. With this
patch I can now mount Marc's fs image and it'll complete the balance without
panicing. Thanks,
Reported-by: Marc MERLIN <marc@merlins.org> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit bbe9051441effce51c9a533d2c56440df64db2d7)
btrfs: Fix the wrong condition judgment about subset extent map
Previous commit: btrfs: Fix and enhance merge_extent_mapping() to insert
best fitted extent map
is using wrong condition to judgement whether the range is a subset of a
existing extent map.
This may cause bug in btrfs no-holes mode.
This patch will correct the judgment and fix the bug.
Josef Bacik [Thu, 18 Sep 2014 15:30:44 +0000 (11:30 -0400)]
Btrfs: try not to ENOSPC on log replay
When doing log replay we may have to update inodes, which traditionally goes
through our delayed inode stuff. This will try to move space over from the
trans handle, but we don't reserve space in our trans handle on replay since we
don't know how much we will need, so instead we try to flush. But because we
have a trans handle open we won't flush anything, so if we are out of reserve
space we will simply return ENOSPC. Since we know that if an operation made it
into the log then we definitely had space before the box bought the farm then we
don't need to worry about doing this space reservation. Use the
fs_info->log_root_recovering flag to skip the delayed inode stuff and update the
item directly. Thanks,
Josef Bacik [Thu, 18 Sep 2014 15:27:17 +0000 (11:27 -0400)]
Btrfs: don't do async reclaim during log replay
Trying to reproduce a log enospc bug I hit a panic in the async reclaim code
during log replay. This is because we use fs_info->fs_root as our root for
shrinking and such. Technically we can use whatever root we want, but let's
just not allow async reclaim while we're doing log replay. Thanks,
btrfs: Fix and enhance merge_extent_mapping() to insert best fitted extent map
The following commit enhanced the merge_extent_mapping() to reduce
fragment in extent map tree, but it can't handle case which existing
lies before map_start:
51f39 btrfs: Use right extent length when inserting overlap extent map.
[BUG]
When existing extent map's start is before map_start,
the em->len will be minus, which will corrupt the extent map and fail to
insert the new extent map.
This will happen when someone get a large extent map, but when it is
going to insert it into extent map tree, some one has already commit
some write and split the huge extent into small parts.
[REPRODUCER]
It is very easy to tiger using filebench with randomrw personality.
It is about 100% to reproduce when using 8G preallocated file in 60s
randonrw test.
[FIX]
This patch can now handle any existing extent position.
Since it does not directly use existing->start, now it will find the
previous and next extent around map_start.
So the old existing->start < map_start bug will never happen again.
[ENHANCE]
This patch will insert the best fitted extent map into extent map tree,
other than the oldest [map_start, map_start + sectorsize) or the
relatively newer but not perfect [map_start, existing->start).
The patch will first search existing extent that does not intersects with
the desired map range [map_start, map_start + len).
The existing extent will be either before or behind map_start, and based
on the existing extent, we can find out the previous and next extent
around map_start.
So the best fitted extent would be [prev->end, next->start).
For prev or next is not found, em->start would be prev->end and em->end
wold be next->start.
With this patch, the fragment in extent map tree should be reduced much
more than the 51f39 commit and reduce an unneeded extent map tree search.
Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit e6c4efd87ab04e5ead363f24e6ac35ed3506d401)
Liu Bo [Tue, 16 Sep 2014 09:49:30 +0000 (17:49 +0800)]
Btrfs: fix up bounds checking in lseek
An user reported this, it is because that lseek's SEEK_SET/SEEK_CUR/SEEK_END
allow a negative value for @offset, but btrfs's SEEK_DATA/SEEK_HOLE don't
prepare for that and convert the negative @offset into unsigned type,
so we get (end < start) warning.
This assumes that btrfs starts finding DATA/HOLE from the beginning of file
if the assigned @offset is negative.
Also we add alignment for lock_extent_bits 's range.
Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 4d1a40c66bed0b3fa43b9da5fbd5cbe332e4eccf)
Filipe Manana [Thu, 11 Sep 2014 10:44:49 +0000 (11:44 +0100)]
Btrfs: add missing compression property remove in btrfs_ioctl_setflags
The behaviour of a 'chattr -c' consists of getting the current flags,
clearing the FS_COMPR_FL bit and then sending the result to the set
flags ioctl - this means the bit FS_NOCOMP_FL isn't set in the flags
passed to the ioctl. This results in the compression property not being
cleared from the inode - it was cleared only if the bit FS_NOCOMP_FL
was set in the received flags.
Reproducer:
$ mkfs.btrfs -f /dev/sdd
$ mount /dev/sdd /mnt && cd /mnt
$ mkdir a
$ chattr +c a
$ touch a/file
$ lsattr a/file
--------c------- a/file
$ chattr -c a
$ touch a/file2
$ lsattr a/file2
--------c------- a/file2
$ lsattr -d a
---------------- a
Reported-by: Andreas Schneider <asn@cryptomilk.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 78a017a2c92df9b571db0a55a016280f9019c65e)
btrfs:32612
[stack snip]
btrfs_dev_replace_start()
btrfs_dev_replace_lock()
mutex_lock(dev_replace->lock) ###hold mutex(B)
btrfs_dev_replace_finishing()
btrfs_rm_dev_replace_blocked()
wait until percpu_counter_sum == 0 ###wait on bio_counter(A)
This bug can be triggered quite easily by the following test script:
http://pastebin.com/MQmb37Cy
This patch will fix the ABBA problem by calling
btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked().
The consistency of btrfs devices list and their superblocks is protected
by device_list_mutex, not btrfs_dev_replace_lock/unlock().
So it is safe the move btrfs_dev_replace_unlock() before
btrfs_rm_dev_replace_blocked().
Reported-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 12b894cb288d57292b01cf158177b6d5c89a6272)
Mark Fasheh [Mon, 18 Aug 2014 21:01:17 +0000 (14:01 -0700)]
btrfs: don't go readonly on existing qgroup items
btrfs_drop_snapshot() leaves subvolume qgroup items on disk after
completion. This can cause problems with snapshot creation. If a new
snapshot tries to claim the deleted subvolumes id, btrfs will get -EEXIST
from add_qgroup_item() and go read-only. The following commands will
reproduce this problem (assume btrfs is on /dev/sda and is mounted at
/btrfs)
mkfs.btrfs -f /dev/sda
mount -t btrfs /dev/sda /btrfs/
btrfs quota enable /btrfs/
btrfs su sna /btrfs/ /btrfs/snap
btrfs su de /btrfs/snap
sleep 45
umount /btrfs/
mount -t btrfs /dev/sda /btrfs/
We can fix this by catching -EEXIST in add_qgroup_item() and
initializing the existing items. We have the problem of orphaned
relation items being on disk from an old snapshot but that is outside
the scope of this patch.
David Sterba [Wed, 23 Jul 2014 12:39:35 +0000 (14:39 +0200)]
btrfs: wake up transaction thread from SYNC_FS ioctl
The transaction thread may want to do more work, namely it pokes the
cleaner ktread that will start processing uncleaned subvols.
This can be triggered by user via the 'btrfs fi sync' command, otherwise
there was a delay up to 30 seconds before the cleaner started to clean
old snapshots.
ARM64 irq work self-IPI support depends on __smp_cross_call to point to
some relevant IRQ controller operations. This information should be
available after the call to init_IRQ().
lib/glob.c provides a new glob_match() function, with arguments in
(pattern, string) order. It replaced a private function with arguments
in (string, pattern) order, but I didn't swap the call site...
The result was the entire ATA blacklist was effectively disabled.
The lesson for today is "I f***ed up *how* badly *how* many months ago?",
er, I mean "Nobody Tests RC Kernels On Legacy Hardware".
This was not a subtle break, but it made it through an entire RC
cycle unreported, presumably because all the people doing testing
have full-featured hardware.
(FWIW, the reason for the argument swap was because fnmatch() does it that
way, and for a while implementing a full fnmatch() was being considered.)
Fixes: 428ac5fc056e0 (libata: Use glob_match from lib/glob.c) Reported-by: Steven Honeyman <stevenhoneyman@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=71371#c21 Signed-off-by: George Spelvin <linux@horizon.com> Tested-by: Steven Honeyman <stevenhoneyman@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Quark X1000 contains two designware derived 8250 serial ports.
Each port has a unique PCI configuration space consisting of
BAR0:UART BAR1:DMA respectively.
Unlike the standard 8250 the register width is 32 bits for RHR,IER etc
The Quark UART has a fundamental clock @ 44.2368 MHz allowing for a
bitrate of up to about 2.76 megabits per second.
Commit 92d585ef067d ("numa: fix NULL pointer access and memory
leak in unregister_one_node()") added kfree() of node struct in
unregister_one_node(). But node struct is freed by node_device_release()
which is called in unregister_node(). So by adding the kfree(),
node struct is freed two times.
While hot removing memory, the commit leads the following BUG_ON():
buf_0 and buf_1 in caam_hash_state are not next to each other.
Accessing buf_1 is incorrect from &buf_0 with an offset of only
size_of(buf_0). The same issue is also with buflen_0 and buflen_1
This full-speed USB device generates spurious remote wakeup event
as soon as USB_DEVICE_REMOTE_WAKEUP feature is set. As the result,
Linux can't enter system suspend and S0ix power saving modes once
this keyboard is used.
This patch tries to introduce USB_QUIRK_IGNORE_REMOTE_WAKEUP quirk.
With this quirk set, wakeup capability will be ignored during
device configure.
This patch could be back-ported to kernels as old as 2.6.39.
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
if we don't make sure to kill the timer, it could
expire after we have already gated our clocks.
That will trigger a Data Abort exception because
we would try to access register while clock is gated.
Fix that bug.
Fixes 869c597 (usb: musb: dsps: add support for suspend and resume) Tested-by: Dave Gerlach <d-gerlach@ti.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Added support for Ketra N1 wireless interface, which uses the
Silicon Labs' CP2104 USB to UART bridge with customized PID 8946.
Signed-off-by: Joe Savage <joe.savage@goketra.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
That commit causes more problem than fixes. Firstly, kfree()
should be called after usb_ep_dequeue() and secondly, the way
things are, we will try to dequeue a request that has already
completed much more frequently than one which is pending.
Cc: Li Jun <b47624@freescale.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Besides the ASM1051 (*) needing sdev->no_report_opcodes = 1, it turns out that
the JMicron JMS567 also needs it to work properly with uas (usb-storage always
sets it). Since some of the scsi devs were not to keen on the idea to
outrightly set sdev->no_report_opcodes = 1 for all uas devices, so add a quirk
for this, and set it for the JMS567.
*) Which has become a non-issue since we've completely blacklisted uas on
the ASM1051 for other reasons
Reported-and-tested-by: Claudio Bizzarri <claudio.bizzarri@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we warned about a timeout on a hotplug command, we previously printed
the time between calls to pcie_write_cmd(), without accounting for any time
spent actually waiting. Consider this sequence:
pcie_write_cmd
pcie_wait_cmd
now = jiffies # T2
wait_event_timeout # we may wait here
if (timeout)
ctrl_info("Timeout on command issued %u msec ago",
jiffies_to_msecs(now - cmd_started))
We previously printed (T2 - T1), but that doesn't include the time spent in
wait_event_timeout().
Fix this by using the current jiffies value, not the one cached before
calling wait_event_timeout().
[bhelgaas: changelog, use current jiffies instead of adding timeout] Fixes: 40b960831cfa ("PCI: pciehp: Compute timeout from hotplug command start time") Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ARM irq work IPI support depends on SMP support. That information is
partly known at early boottime. Lets implement
arch_irq_work_has_interrupt() accordingly.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
x86 supports irq work self-IPIs when local apic is available. This is
partly known on runtime so lets implement arch_irq_work_has_interrupt()
accordingly.
This should be safely called after setup_arch().
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The nohz full kick, which restarts the tick when any resource depend
on it, can't be executed anywhere given the operation it does on timers.
If it is called from the scheduler or timers code, chances are that
we run into a deadlock.
This is why we run the nohz full kick from an irq work. That way we make
sure that the kick runs on a virgin context.
However if that's the case when irq work runs in its own dedicated
self-ipi, things are different for the big bunch of archs that don't
support the self triggered way. In order to support them, irq works are
also handled by the timer interrupt as fallback.
Now when irq works run on the timer interrupt, the context isn't blank.
More precisely, they can run in the context of the hrtimer that runs the
tick. But the nohz kick cancels and restarts this hrtimer and cancelling
an hrtimer from itself isn't allowed. This is why we run in an endless
loop:
To fix this we force non-lazy irq works to run on irq work self-IPIs
when available. That ability of the arch to trigger irq work self IPIs
is available with arch_irq_work_has_interrupt().
Reported-by: Catalin Iacob <iacobcatalin@gmail.com> Reported-by: Dave Jones <davej@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We need to copy exts->type when committing the change, otherwise
it would be always 0. This is a quick fix for -net and -stable,
for net-next tcf_exts will be removed.
Fixes: commit 33be627159913b094bb578e83 ("net_sched: act: use standard struct list_head") Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In commit 6f2b6a3005b2c34c39f207a87667564f64f2f91a,
# 3c59x: Add dma error checking and recovery
the intent is to split out the mapping from the byte-swapping in order to
insert a dma_mapping_error() check.
Kinda this semantic patch:
// See http://coccinelle.lip6.fr/
//
// Beware, grouik-and-dirty!
@@
expression DEV, X, Y, Z;
@@
- cpu_to_le32(pci_map_single(DEV, X, Y, Z))
+ dma_addr_t addr = pci_map_single(DEV, X, Y, Z);
+ if (dma_mapping_error(&DEV->dev, addr))
+ /* snip */;
+ cpu_to_le32(addr)
However, the #else part (of the #if DO_ZEROCOPY test) is changed this way:
- cpu_to_le32(pci_map_single(DEV, X, Y, Z))
+ dma_addr_t addr = cpu_to_le32(pci_map_single(DEV, X, Y, Z));
// ^^^^^^^^^^^
// That mismatches the 3 other changes!
+ if (dma_mapping_error(&DEV->dev, addr))
+ /* snip */;
+ cpu_to_le32(addr)
Let's remove the leftover cpu_to_le32() for coherency.
v2: Better changelog.
v3: Add Acked-by
Fixes: 6f2b6a3005b2c34c39f207a87667564f64f2f91a
# 3c59x: Add dma error checking and recovery Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sylvain "ythier" Hitier <sylvain.hitier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently association restarts do not take into consideration the
state of the socket. When a restart happens, the current assocation
simply transitions into established state. This creates a condition
where a remote system, through a the restart procedure, may create a
local association that is no way reachable by user. The conditions
to trigger this are as follows:
1) Remote does not acknoledge some data causing data to remain
outstanding.
2) Local application calls close() on the socket. Since data
is still outstanding, the association is placed in SHUTDOWN_PENDING
state. However, the socket is closed.
3) The remote tries to create a new association, triggering a restart
on the local system. The association moves from SHUTDOWN_PENDING
to ESTABLISHED. At this point, it is no longer reachable by
any socket on the local system.
This patch addresses the above situation by moving the newly ESTABLISHED
association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
the COOKIE-ACK chunk. This way, the restarted associate immidiately
enters the shutdown procedure and forces the termination of the
unreachable association.
Reported-by: David Laight <David.Laight@aculab.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After the packet is successfully sent, we should not touch the packet
as it may have been freed. This patch is based on the work done by
Long Li <longli@microsoft.com>.
David, please queue this up for stable.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When team_notify_peers and team_mcast_rejoin are called, they both reset
their respective .count_pending atomic variable. Then when the actual
worker function is executed, the variable is atomically decremented.
This pattern introduces a potential race condition where the
.count_pending rolls over and the worker function keeps rescheduling
until .count_pending decrements to zero again:
Instead of assigning a new value to .count_pending, use atomic_add to
tack-on the additional desired worker function invocations.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Fixes: fc423ff00df3a19554414ee ("team: add peer notification") Fixes: 492b200efdd20b8fcfdac87 ("team: add support for sending multicast rejoins") Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Similar to commit bc23333ba11fb7f959b7e87e121122f5a0fbbca8 ("net:
bcmgenet: fix bcmgenet_put_tx_csum()"), we need to return the skb
pointer in case we had to reallocate the SKB headroom.
Fixes: 80105befdb4b8 ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In xmit path, we build a flowi6 which will be used for the output route lookup.
We are sending a GRE packet, neither IPv4 nor IPv6 encapsulated packet, thus the
protocol should be IPPROTO_GRE.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Reported-by: Matthieu Ternisien d'Ouville <matthieu.tdo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Sun, 5 Oct 2014 17:16:11 +0000 (10:16 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of two small fixes, both to code which went in during
the merge window: cxgb4i has a scheduling in atomic bug in its new
ipv6 code and uas fails to work properly with the new scsi-mq code"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] uas: disable use of blk-mq I/O path
[SCSI] cxgb4i: avoid holding mutex in interrupt context
Linus Torvalds [Sat, 4 Oct 2014 16:32:47 +0000 (09:32 -0700)]
Merge tag 'tiny/kconfig-for-3.17' of https://git.kernel.org/pub/scm/linux/kernel/git/josh/linux
Pull kconfig fixes for tiny setups from Josh Triplett:
"Two Kconfig bugfixes for 3.17 related to tinification. These fixes
make the Kconfig "General Setup" menu much more usable"
* tag 'tiny/kconfig-for-3.17' of https://git.kernel.org/pub/scm/linux/kernel/git/josh/linux:
init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
init/Kconfig: Hide printk log config if CONFIG_PRINTK=n
Josh Triplett [Fri, 3 Oct 2014 23:19:24 +0000 (16:19 -0700)]
init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow
architectures to skip futex_atomic_cmpxchg_inatomic() test") added the
HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in
the middle of the options for the EXPERT menu. However,
HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops
placing items in the EXPERT menu, and displays the remaining several
EXPERT items (starting with EPOLL) directly in the General Setup menu.
Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make
HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the
subsequent items display as part of the EXPERT menu again; the EMBEDDED
menu now appears as the next top-level item in the General Setup menu,
which makes General Setup much shorter and more usable.
Linus Torvalds [Fri, 3 Oct 2014 20:31:57 +0000 (13:31 -0700)]
Merge tag 'trace-fixes-v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull trace ring buffer iterator fix from Steven Rostedt:
"While testing some new changes for 3.18, I kept hitting a bug every so
often in the ring buffer. At first I thought it had to do with some
of the changes I was working on, but then testing something else I
realized that the bug was in 3.17 itself. I ran several bisects as
the bug was not very reproducible, and finally came up with the commit
that I could reproduce easily within a few minutes, and without the
change I could run the tests over an hour without issue. The change
fit the bug and I figured out a fix. That bad commit was:
Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
This commit fixed a bug, but in the process created another one. It
used the wrong value as the cached value that is used to see if things
changed while an iterator was in use. This made it look like a change
always happened, and could cause the iterator to go into an infinite
loop"
* tag 'trace-fixes-v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Fix infinite spin in reading buffer
Linus Torvalds [Fri, 3 Oct 2014 20:09:57 +0000 (13:09 -0700)]
Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs/smb3 fixes from Steve French:
"Fix for CIFS/SMB3 oops on reconnect during readpages (3.17 regression)
and for incorrectly closing file handle in symlink error cases"
* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix readpages retrying on reconnects
Fix problem recognizing symlinks
Linus Torvalds [Fri, 3 Oct 2014 15:31:14 +0000 (08:31 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Nothing too major or scary.
One i915 regression fix, nouveau has a tmds regression fix, along with
a regression fix for the runtime pm code for optimus laptops not
restoring the display hw correctly"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau: make sure display hardware is reinitialised on runtime resume
drm/nouveau: punt fbcon resume out to a workqueue
drm/nouveau: fix regression on original nv50 board
drm/nv50/disp: fix dpms regression on certain boards
drm/i915: Flush the PTEs after updating them before suspend
The uas driver uses the block layer tag for USB3 stream IDs. With
blk-mq we can get larger tag numbers that the queue depth, which breaks
this assumption. A fix is under way for 3.18, but sits on top of
large changes so can't easily be backported. Set the disable_blk_mq
path so that a uas device can't easily crash the system when using
blk-mq for SCSI.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Linus Torvalds [Fri, 3 Oct 2014 01:47:28 +0000 (18:47 -0700)]
Merge tag 'pm+acpi-3.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are three regression fixes (cpufreq core, pcc-cpufreq, i915 /
ACPI) and one trivial fix for a callback return value mismatch in the
cpufreq integrator driver.
Specifics:
- A recent cpufreq core fix went too far and introduced a regression
in the system suspend code path. Fix from Viresh Kumar.
- An ACPI-related commit in the i915 driver that fixed backlight
problems for some Thinkpads inadvertently broke a Dell machine (in
3.16). Fix from Aaron Lu.
- The pcc-cpufreq driver was broken during the 3.15 cycle by a commit
that put wait_event() under a spinlock by mistake. Fix that
(Rafael J Wysocki).
- The return value type of integrator_cpufreq_remove() is void, but
should be int. Fix from Arnd Bergmann"
* tag 'pm+acpi-3.17-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: update 'cpufreq_suspended' after stopping governors
ACPI / i915: Update the condition to ignore firmware backlight change request
cpufreq: integrator: fix integrator_cpufreq_remove return type
cpufreq: pcc-cpufreq: Fix wait_event() under spinlock
Andy Gross [Mon, 29 Sep 2014 22:00:51 +0000 (17:00 -0500)]
i2c: qup: Fix order of runtime pm initialization
The runtime pm calls need to be done before populating the children via the
i2c_add_adapter call. If this is not done, a child can run into issues trying
to do i2c read/writes due to the pm_runtime_sync failing.
Signed-off-by: Andy Gross <agross@codeaurora.org> Reviewed-by: Felipe Balbi <balbi@ti.com> Acked-by: Bjorn Andersson <bjorn.andersson@sonymobile.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
i2cdetect -q was broken (everything was a false positive, and no transfers were
actually being sent over i2c). The way it works is by sending a 0 length write
request and checking for NACK. This patch fixes the 0 length writes and actually
sends them.
Reported-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Alexandru M Stan <amstan@chromium.org> Tested-by: Doug Anderson <dianders@chromium.org> Tested-by: Max Schwarz <max.schwarz@online.de> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org
Linus Torvalds [Thu, 2 Oct 2014 23:29:19 +0000 (16:29 -0700)]
Merge branch 'akpm' (fixes from Andrew Morton)
Merge fixes from Andrew Morton:
"5 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: page_alloc: fix zone allocation fairness on UP
perf: fix perf bug in fork()
MAINTAINERS: change git URL for mpc5xxx tree
mm: memcontrol: do not iterate uninitialized memcgs
ocfs2/dlm: should put mle when goto kill in dlm_assert_master_handler
Johannes Weiner [Thu, 2 Oct 2014 23:21:10 +0000 (16:21 -0700)]
mm: page_alloc: fix zone allocation fairness on UP
The zone allocation batches can easily underflow due to higher-order
allocations or spills to remote nodes. On SMP that's fine, because
underflows are expected from concurrency and dealt with by returning 0.
But on UP, zone_page_state will just return a wrapped unsigned long,
which will get past the <= 0 check and then consider the zone eligible
until its watermarks are hit.
Commit 3a025760fc15 ("mm: page_alloc: spill to remote nodes before
waking kswapd") already made the counter-resetting use
atomic_long_read() to accomodate underflows from remote spills, but it
didn't go all the way with it.
Make it clear that these batches are expected to go negative regardless
of concurrency, and use atomic_long_read() everywhere.
Fixes: 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") Reported-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Leon Romanovsky <leon@leon.nu> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Thu, 2 Oct 2014 23:17:02 +0000 (16:17 -0700)]
perf: fix perf bug in fork()
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by
calling perf_event_free_task() when failing sched_fork() we will not yet
have done the memset() on ->perf_event_ctxp[] and will therefore try and
'free' the inherited contexts, which are still in use by the parent
process. This is bad..
Johannes Weiner [Thu, 2 Oct 2014 23:16:57 +0000 (16:16 -0700)]
mm: memcontrol: do not iterate uninitialized memcgs
The cgroup iterators yield css objects that have not yet gone through
css_online(), but they are not complete memcgs at this point and so the
memcg iterators should not return them. Commit d8ad30559715 ("mm/memcg:
iteration skip memcgs not yet fully initialized") set out to implement
exactly this, but it uses CSS_ONLINE, a cgroup-internal flag that does
not meet the ordering requirements for memcg, and so the iterator may
skip over initialized groups, or return partially initialized memcgs.
The cgroup core can not reasonably provide a clear answer on whether the
object around the css has been fully initialized, as that depends on
controller-specific locking and lifetime rules. Thus, introduce a
memcg-specific flag that is set after the memcg has been initialized in
css_online(), and read before mem_cgroup_iter() callers access the memcg
members.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 2 Oct 2014 23:10:38 +0000 (16:10 -0700)]
Merge tag 'media/v3.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fix from Mauro Carvalho Chehab:
"One last time regression fix at em28xx. The removal of .reset_resume
broke suspend/resume on this driver for some devices.
There are more fixes to be done for em28xx suspend/resume to be better
handled, but I'm opting to let them to stay for a while at the media
devel tree, in order to get more tests. So, for now, let's just
revert this patch"
* tag 'media/v3.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
Revert "[media] media: em28xx - remove reset_resume interface"
Commit 651e22f2701b "ring-buffer: Always reset iterator to reader page"
fixed one bug but in the process caused another one. The reset is to
update the header page, but that fix also changed the way the cached
reads were updated. The cache reads are used to test if an iterator
needs to be updated or not.
A ring buffer iterator, when created, disables writes to the ring buffer
but does not stop other readers or consuming reads from happening.
Although all readers are synchronized via a lock, they are only
synchronized when in the ring buffer functions. Those functions may
be called by any number of readers. The iterator continues down when
its not interrupted by a consuming reader. If a consuming read
occurs, the iterator starts from the beginning of the buffer.
The way the iterator sees that a consuming read has happened since
its last read is by checking the reader "cache". The cache holds the
last counts of the read and the reader page itself.
Commit 651e22f2701b changed what was saved by the cache_read when
the rb_iter_reset() occurred, making the iterator never match the cache.
Then if the iterator calls rb_iter_reset(), it will go into an
infinite loop by checking if the cache doesn't match, doing the reset
and retrying, just to see that the cache still doesn't match! Which
should never happen as the reset is suppose to set the cache to the
current value and there's locks that keep a consuming reader from
having access to the data.
Fixes: 651e22f2701b "ring-buffer: Always reset iterator to reader page" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Thu, 2 Oct 2014 19:23:10 +0000 (12:23 -0700)]
Merge branch 'parisc-3.17-8' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"One late but trivial patch to fix the serial console on parisc
machines which got broken during the 3.17 release cycle"
* 'parisc-3.17-8' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix serial console for machines with serial port on superio chip
Pavel Shilovsky [Thu, 2 Oct 2014 16:13:35 +0000 (20:13 +0400)]
CIFS: Fix readpages retrying on reconnects
If we got a reconnect error from async readv we re-add pages back
to page_list and continue loop. That is wrong because these pages
have been already added to the pagecache but page_list has pages that
have not been added to the pagecache yet. This ends up with a general
protection fault in put_pages after readpages. Fix it by not retrying
the read of these pages and falling back to readpage instead.
Fixes debian bug 762306
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> Tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
Steve French [Thu, 25 Sep 2014 06:26:55 +0000 (01:26 -0500)]
Fix problem recognizing symlinks
Changeset eb85d94bd introduced a problem where if a cifs open
fails during query info of a file we
will still try to close the file (happens with certain types
of reparse points) even though the file handle is not valid.
In addition for SMB2/SMB3 we were not mapping the return code returned
by Windows when trying to open a file (like a Windows NFS symlink)
which is a reparse point.
Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> CC: stable <stable@vger.kernel.org> #v3.13+
Linus Torvalds [Thu, 2 Oct 2014 18:57:52 +0000 (11:57 -0700)]
Merge branch 'numa-migration-fixes' (fixes from Mel Gorman)
Merge NUMA balancing related fixlets from Mel Gorman:
"There were a few minor changes so am resending just the two patches
that are mostly likely to affect the bug Dave and Sasha saw and marked
them for stable.
I'm less confident it will address Sasha's problem because while I
have not kept up to date, I believe he's also seeing memory corruption
issues in next from an unknown source. Still, it would be nice to see
how they affect trinity testing.
I'll send the MPOL_MF_LAZY patch separately because it's not urgent"
* emailed patches from Mel Gorman <mgorman@suse.de>:
mm: numa: Do not mark PTEs pte_numa when splitting huge pages
mm: migrate: Close race between migration completion and mprotect
Mel Gorman [Thu, 2 Oct 2014 18:47:42 +0000 (19:47 +0100)]
mm: numa: Do not mark PTEs pte_numa when splitting huge pages
This patch reverts 1ba6e0b50b ("mm: numa: split_huge_page: transfer the
NUMA type from the pmd to the pte"). If a huge page is being split due
a protection change and the tail will be in a PROT_NONE vma then NUMA
hinting PTEs are temporarily created in the protected VMA.
VM_RW|VM_PROTNONE
|-----------------|
^
split here
In the specific case above, it should get fixed up by change_pte_range()
but there is a window of opportunity for weirdness to happen. Similarly,
if a huge page is shrunk and split during a protection update but before
pmd_numa is cleared then a pte_numa can be left behind.
Instead of adding complexity trying to deal with the case, this patch
will not mark PTEs NUMA when splitting a huge page. NUMA hinting faults
will not be triggered which is marginal in comparison to the complexity
in dealing with the corner cases during THP split.
Cc: stable@vger.kernel.org Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Thu, 2 Oct 2014 18:47:41 +0000 (19:47 +0100)]
mm: migrate: Close race between migration completion and mprotect
A migration entry is marked as write if pte_write was true at the time the
entry was created. The VMA protections are not double checked when migration
entries are being removed as mprotect marks write-migration-entries as
read. It means that potentially we take a spurious fault to mark PTEs write
again but it's straight-forward. However, there is a race between write
migrations being marked read and migrations finishing. This potentially
allows a PTE to be write that should have been read. Close this race by
double checking the VMA permissions using maybe_mkwrite when migration
completes.
[torvalds@linux-foundation.org: use maybe_mkwrite] Cc: stable@vger.kernel.org Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 2 Oct 2014 16:42:28 +0000 (09:42 -0700)]
Merge tag 'sound-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Just a few pending bits of random fixes in ASoC. Nothing exciting,
but would be nice to be merged in 3.17, as most of them are also for
stable kernels"
* tag 'sound-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: ssm2602: do not hardcode type to SSM2602
ASoC: core: fix possible ZERO_SIZE_PTR pointer dereferencing error.
MAINTAINERS: add atmel audio alsa driver maintainer entry
ASoC: rt286: Fix sync function
ASoC: rt286: Correct default value
ASoC: soc-compress: fix double unlock of fe card mutex
ASoC: fsl_ssi: fix kernel panic in probe function
Dave Airlie [Thu, 2 Oct 2014 04:48:20 +0000 (14:48 +1000)]
Merge branch 'linux-3.17' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
A few regression fixes, the runpm ones dating back to 3.15. Also a fairly severe TMDS regression that effected a lot of GF8/9/GT2xx users.
* 'linux-3.17' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nouveau: make sure display hardware is reinitialised on runtime resume
drm/nouveau: punt fbcon resume out to a workqueue
drm/nouveau: fix regression on original nv50 board
drm/nv50/disp: fix dpms regression on certain boards
1) Don't halt the firmware in r8152 driver, from Hayes Wang.
2) Handle full sized 802.1ad frames in bnx2 and tg3 drivers properly,
from Vlad Yasevich.
3) Don't sleep while holding tx_clean_lock in netxen driver, fix from
Manish Chopra.
4) Certain kinds of ipv6 routes can end up endlessly failing the route
validation test, causing it to be re-looked up over and over again.
This particularly kills input route caching in TCP sockets. Fix
from Hannes Frederic Sowa.
5) netvsc_start_xmit() has a use-after-free access to skb->len, fix
from K Y Srinivasan.
6) Fix matching of inverted containers in ematch module, from Ignacy
Gawędzki.
7) Aggregation of GRO frames via SKB ->frag_list for linear skbs isn't
handled properly, regression fix from Eric Dumazet.
8) Don't test return value of ipv4_neigh_lookup(), which returns an
error pointer, against NULL. From WANG Cong.
9) Fix an old regression where we mistakenly allow a double add of the
same tunnel. Fixes from Steffen Klassert.
10) macvtap device delete and open can run in parallel and corrupt lists
etc., fix from Vlad Yasevich.
11) Fix build error with IPV6=m NETFILTER_XT_TARGET_TPROXY=y, from Pablo
Neira Ayuso.
12) rhashtable_destroy() triggers lockdep splats, fix also from Pablo.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
bna: Update Maintainer Email
r8152: disable power cut for RTL8153
r8152: remove clearing bp
bnx2: Correctly receive full sized 802.1ad fragmes
tg3: Allow for recieve of full-size 8021AD frames
r8152: fix setting RTL8152_UNPLUG
netxen: Fix bug in Tx completion path.
netxen: Fix BUG "sleeping function called from invalid context"
ipv6: remove rt6i_genid
hyperv: Fix a bug in netvsc_start_xmit()
net: stmmac: fix stmmac_pci_probe failed when CONFIG_HAVE_CLK is selected
ematch: Fix matching of inverted containers.
gro: fix aggregation for skb using frag_list
neigh: check error pointer instead of NULL for ipv4_neigh_lookup()
ip6_gre: Return an error when adding an existing tunnel.
ip6_vti: Return an error when adding an existing tunnel.
ip6_tunnel: Return an error when adding an existing tunnel.
ip6gre: add a rtnl link alias for ip6gretap
net/mlx4_core: Allow not to specify probe_vf in SRIOV IB mode
r8152: fix the carrier off when autoresuming
...
NeilBrown [Thu, 2 Oct 2014 03:45:00 +0000 (13:45 +1000)]
md/raid5: disable 'DISCARD' by default due to safety concerns.
It has come to my attention (thanks Martin) that 'discard_zeroes_data'
is only a hint. Some devices in some cases don't do what it
says on the label.
The use of DISCARD in RAID5 depends on reads from discarded regions
being predictably zero. If a write to a previously discarded region
performs a read-modify-write cycle it assumes that the parity block
was consistent with the data blocks. If all were zero, this would
be the case. If some are and some aren't this would not be the case.
This could lead to data corruption after a device failure when
data needs to be reconstructed from the parity.
As we cannot trust 'discard_zeroes_data', ignore it by default
and so disallow DISCARD on all raid4/5/6 arrays.
As many devices are trustworthy, and as there are benefits to using
DISCARD, add a module parameter to over-ride this caution and cause
DISCARD to work if discard_zeroes_data is set.
If a site want to enable DISCARD on some arrays but not on others they
should select DISCARD support at the filesystem level, and set the
raid456 module parameter.
raid456.devices_handle_discard_safely=Y
As this is a data-safety issue, I believe this patch is suitable for
-stable.
DISCARD support for RAID456 was added in 3.7
Cc: Shaohua Li <shli@kernel.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Heinz Mauelshagen <heinzm@redhat.com> Cc: stable@vger.kernel.org (3.7+) Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637 Signed-off-by: NeilBrown <neilb@suse.de>
Ben Skeggs [Thu, 2 Oct 2014 03:22:27 +0000 (13:22 +1000)]
drm/nouveau: make sure display hardware is reinitialised on runtime resume
Linus commit 05c63c2ff23a80b654d6c088ac3ba21628db0173 modified the
runtime suspend/resume paths to skip over display-related tasks to
avoid locking issues on resume.
Unfortunately, this resulted in the display hardware being left in
a partially initialised state, preventing subsequent modesets from
completing.
This commit unifies the (many) suspend/resume paths, bringing back
display (and fbcon) handling in the runtime paths.
Ben Skeggs [Wed, 1 Oct 2014 01:11:25 +0000 (11:11 +1000)]
drm/nouveau: punt fbcon resume out to a workqueue
Preparation for some runtime pm fixes. Currently we skip over fbcon
suspend/resume in the runtime path, which causes issues on resume if
fbcon tries to write to the framebuffer before the BAR subdev has
been resumed to restore the BAR1 VM setup.
As we might be woken up via a sysfs connector, we are unable to call
fb_set_suspend() in the resume path as it could make its way down to
a modeset and cause all sorts of locking hilarity.
To solve this, we'll just delay the fbcon resume to a workqueue.
Ben Skeggs [Wed, 1 Oct 2014 02:46:14 +0000 (12:46 +1000)]
drm/nouveau: fix regression on original nv50 board
Xorg (and any non-DRM client really) doesn't have permission to directly
touch VRAM on nv50 and up, which the fence code prior to g84 depends on.
It's less invasive to temporarily grant it premission to do so, as it
previously did, than it is to rework fencenv50 to use the VM. That
will come later on.
hayeswang [Wed, 1 Oct 2014 05:25:10 +0000 (13:25 +0800)]
r8152: remove clearing bp
The xxx_clear_bp() is used to halt the firmware. It only necessary
for updating the new firmware. Besides, depend on the version of
the current firmware, it may have problem to halt the firmware
directly. Finally, halt the firmware would let the firmware code
useless, and the bugs which are fixed by the firmware would occur.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2: Correctly receive full sized 802.1ad fragmes
This driver, similar to tg3, has a check that will
cause full sized 802.1ad frames to be dropped. The
frame will be larger then the standard mtu due to the
presense of vlan header that has not been stripped.
The driver should not drop this frame and should process
it just like it does for 802.1q.
CC: Sony Chacko <sony.chacko@qlogic.com> CC: Dept-HSGLinuxNICDev@qlogic.com Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving a vlan-tagged frame that still contains
a vlan header, the length of the packet will be greater
then MTU+ETH_HLEN since it will account of the extra
vlan header. TG3 checks this for the case for 802.1Q,
but not for 802.1ad. As a result, full sized 802.1ad
frames get dropped by the card.
Add a check for 802.1ad protocol when receving full
sized frames.
Suggested-by: Prashant Sreedharan <prashant@broadcom.com> CC: Prashant Sreedharan <prashant@broadcom.com> CC: Michael Chan <mchan@broadcom.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 1 Oct 2014 20:22:00 +0000 (13:22 -0700)]
Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
"This fixes a data corruption bug introduced by the v3.16 xdr encoding
rewrite. I haven't managed to reproduce it myself yet, but it's
apparently not hard to hit given the right workload"
* 'for-3.17' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix corruption of NFSv4 read data
[SCSI] cxgb4i: avoid holding mutex in interrupt context
cxgbi_inet6addr_handler() can be called in interrupt context, so use rcu
protected list while finding netdev. This is observed as a scheduling in
atomic oops when running over ipv6.
Fixes: fc8d0590d914 ("libcxgbi: Add ipv6 api to driver") Fixes: 759a0cc5a3e1 ("cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api") Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Karen Xie <kxie@chelsio.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Linus Torvalds [Wed, 1 Oct 2014 02:52:08 +0000 (19:52 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Some further ARM fixes:
- another build fix for the kprobes test code
- a fix for no kuser helpers for the set_tls code, which oopsed on
noMMU hardware
- a fix for alignment handler with neon opcodes being misinterpreted
- turning off the hardware access support, which is not implemented
- a build fix for the v7 coherency exiting code, which can be built
in non-v7 environments (but still only executed on v7 CPUs)"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8179/1: kprobes-test: Fix compile error "bad immediate value for offset"
ARM: 8178/1: fix set_tls for !CONFIG_KUSER_HELPERS
ARM: 8177/1: cacheflush: Fix v7_exit_coherency_flush exynos build breakage on ARMv6
ARM: 8165/1: alignment: don't break misaligned NEON load/store
ARM: 8164/1: mm: clear SCTLR.HA instead of setting it for LPAE
The flag of RTL8152_UNPLUG should only be set when the device is
unplugged, not each time the rtl8152_disconnect() is called.
Otherwise, the device wouldn't be stopped normally.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 30 Sep 2014 20:22:51 +0000 (16:22 -0400)]
Merge branch 'netxen'
Manish Chopra says:
====================
netxen: Bug fixes.
This series fixes some TX specific issues.
* Move spin_lock(tx_clean_lock) in down path to fix
atomic sleep bug (Reported by Mike Galbraith).
* Fix hang in interface down while running traffic.
Please consider applying this to 'net'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
o Driver is not updating sw_consumer while processing Tx completion
when interface is going down. Due to this interface down path gets
stuck forever waiting for NAPI to complete.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netxen: Fix BUG "sleeping function called from invalid context"
o __netxen_nic_down() function might sleep while holding spinlock_t(tx_clean_lock).
Acquire this lock for only releasing TX buffers instead of taking it
for whole down path.
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
J. Bruce Fields [Wed, 24 Sep 2014 20:32:34 +0000 (16:32 -0400)]
nfsd4: fix corruption of NFSv4 read data
The calculation of page_ptr here is wrong in the case the read doesn't
start at an offset that is a multiple of a page.
The result is that nfs4svc_encode_compoundres sets rq_next_page to a
value one too small, and then the loop in svc_free_res_pages may
incorrectly fail to clear a page pointer in rq_respages[].
Pages left in rq_respages[] are available for the next rpc request to
use, so xdr data may be written to that page, which may hold data still
waiting to be transmitted to the client or data in the page cache.
The observed result was silent data corruption seen on an NFSv4 client.
We tag this as "fixing" 05638dc73af2 because that commit exposed this
bug, though the incorrect calculation predates it.
Particular thanks to Andrea Arcangeli and David Gilbert for analysis and
testing.
Fixes: 05638dc73af2 "nfsd4: simplify server xdr->next_page use" Cc: stable@vger.kernel.org Reported-by: Andrea Arcangeli <aarcange@redhat.com> Tested-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
cpufreq: update 'cpufreq_suspended' after stopping governors
Commit 8e30444e1530 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
introduced a bug where the governors wouldn't be stopped anymore for
->target{_index}() drivers during suspend. This happens because
'cpufreq_suspended' is updated before stopping the governors during suspend
and due to this __cpufreq_governor() would return early due to this check:
/* Don't start any governor operations if we are entering suspend */
if (cpufreq_suspended)
return 0;
Fixes: 8e30444e1530 ("cpufreq: fix cpufreq suspend/resume for intel_pstate") Cc: 3.15+ <stable@vger.kernel.org> # 3.15+: 8e30444e1530 "cpufreq: fix cpufreq suspend/resume for intel_pstate" Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
are already marked DST_HOST (e.g. input routes routes) will always be
invalidated during sk_dst_check. Thus per-socket dst caching absolutely
had no effect and early demuxing had no effect.
Thus this patch removes rt6i_genid: fn_sernum already gets modified during
add operations, so we only must ensure we mutate fn_sernum during ipv6
address remove operations. This is a fairly cost extensive operations,
but address removal should not happen that often. Also our mtu update
functions do the same and we heard no complains so far. xfrm policy
changes also cause a call into fib6_flush_trees. Also plug a hole in
rt6_info (no cacheline changes).
I verified via tracing that this change has effect.
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Martin Lau <kafai@fb.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>