Zhao Lei [Tue, 20 Jan 2015 07:11:34 +0000 (15:11 +0800)]
Btrfs: add ref_count and free function for btrfs_bio
1: ref_count is simple than current RBIO_HOLD_BBIO_MAP_BIT flag
to keep btrfs_bio's memory in raid56 recovery implement.
2: free function for bbio will make code clean and flexible, plus
forced data type checking in compile.
Changelog v1->v2:
Rename following by David Sterba's suggestion:
put_btrfs_bio() -> btrfs_put_bio()
get_btrfs_bio() -> btrfs_get_bio()
bbio->ref_count -> bbio->refs
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit 6e9606d2a2dce098c1739fb3cd82a1c34fd73d3a)
Zhao Lei [Tue, 20 Jan 2015 07:11:31 +0000 (15:11 +0800)]
Btrfs: fix a out-of-bound access of raid_map
We add the number of stripes on target devices into bbio->num_stripes
if we are under device replacement, and we just sort the raid_map of
those stripes that not on the target devices, so if when we need
real raid_map, we need skip the stripes on the target devices.
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit e34c330d639177bbb345bf2bde16613b00cc6e6b)
Filipe Manana [Wed, 14 Jan 2015 01:52:25 +0000 (01:52 +0000)]
Btrfs: fix fsync log replay for inodes with a mix of regular refs and extrefs
If we have an inode with a large number of hard links, some of which may
be extrefs, turn a regular ref into an extref, fsync the inode and then
replay the fsync log (after a crash/reboot), we can endup with an fsync
log that makes the replay code always fail with -EOVERFLOW when processing
the inode's references.
This is easy to reproduce with the test case I made for xfstests. Its steps
are the following:
# Create a test file with 3001 hard links. This number is large enough to
# make btrfs start using extrefs at some point even if the fs has the maximum
# possible leaf/node size (64Kb).
echo "hello world" > $SCRATCH_MNT/foo
for i in `seq 1 3000`; do
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_`printf "%04d" $i`
done
# Make sure all metadata and data are durably persisted.
sync
# Now remove one link, add a new one with a new name, add another new one with
# the same name as the one we just removed and fsync the inode.
rm -f $SCRATCH_MNT/foo_link_0001
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3001
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_0001
rm -f $SCRATCH_MNT/foo_link_0002
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3002
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3003
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
# Simulate a crash/power loss. This makes sure the next mount
# will see an fsync log and will replay that log.
# Check that the number of hard links is correct, we are able to remove all
# the hard links and read the file's data. This is just to verify we don't
# get stale file handle errors (due to dangling directory index entries that
# point to inodes that no longer exist).
echo "Link count: $(stat --format=%h $SCRATCH_MNT/foo)"
[ -f $SCRATCH_MNT/foo ] || echo "Link foo is missing"
for ((i = 1; i <= 3003; i++)); do
name=foo_link_`printf "%04d" $i`
if [ $i -eq 2 ]; then
[ -f $SCRATCH_MNT/$name ] && echo "Link $name found"
else
[ -f $SCRATCH_MNT/$name ] || echo "Link $name is missing"
fi
done
rm -f $SCRATCH_MNT/foo_link_*
cat $SCRATCH_MNT/foo
rm -f $SCRATCH_MNT/foo
status=0
exit
The fix is simply to correct the overflow condition when overwriting a
reference item because it was wrong, trying to increase the item in the
fs/subvol tree by an impossible amount. Also ensure that we don't insert
one normal ref and one ext ref for the same dentry - this happened because
processing a dir index entry from the parent in the log happened when
the normal ref item was full, which made the logic insert an extref and
later when the normal ref had enough room, it would be inserted again
when processing the ref item from the child inode in the log.
This issue has been present since the introduction of the extrefs feature
(2012).
A test case for xfstests follows soon. This test only passes if the previous
patch titled "Btrfs: fix fsync when extend references are added to an inode"
is applied too.
Filipe Manana [Tue, 13 Jan 2015 16:40:04 +0000 (16:40 +0000)]
Btrfs: fix fsync when extend references are added to an inode
If we added an extended reference to an inode and fsync'ed it, the log
replay code would make our inode have an incorrect link count, which
was lower then the expected/correct count.
This resulted in stale directory index entries after deleting some of
the hard links, and any access to the dangling directory entries resulted
in -ESTALE errors because the entries pointed to inode items that don't
exist anymore.
This is easy to reproduce with the test case I made for xfstests, and
the bulk of that test is:
# Create a test file with 3001 hard links. This number is large enough to
# make btrfs start using extrefs at some point even if the fs has the maximum
# possible leaf/node size (64Kb).
echo "hello world" > $SCRATCH_MNT/foo
for i in `seq 1 3000`; do
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_`printf "%04d" $i`
done
# Make sure all metadata and data are durably persisted.
sync
# Add one more link to the inode that ends up being a btrfs extref and fsync
# the inode.
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link_3001
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
# Simulate a crash/power loss. This makes sure the next mount
# will see an fsync log and will replay that log.
# Now after the fsync log replay btrfs left our inode with a wrong link count N,
# which was smaller than the correct link count M (N < M).
# So after removing N hard links, the remaining M - N directory entries were
# still visible to user space but it was impossible to do anything with them
# because they pointed to an inode that didn't exist anymore. This resulted in
# stale file handle errors (-ESTALE) when accessing those dentries for example.
#
# So remove all hard links except the first one and then attempt to read the
# file, to verify we don't get an -ESTALE error when accessing the inodel
#
# The btrfs fsck tool also detected the incorrect inode link count and it
# reported an error message like the following:
#
# root 5 inode 257 errors 2001, no inode item, link count wrong
# unresolved ref dir 256 index 2978 namelen 13 name foo_link_2976 filetype 1 errors 4, no inode ref
#
# The fstests framework automatically calls fsck after a test is run, so we
# don't need to call fsck explicitly here.
So make sure an fsync always flushes the delayed inode item, so that the
fsync log contains it (needed in order to trigger the link count fixup
code) and fix the extref counting function, which always return -ENOENT
to its caller (and made it assume there were always 0 extrefs).
This issue has been present since the introduction of the extrefs feature
(2012).
Filipe Manana [Sat, 10 Jan 2015 10:56:48 +0000 (10:56 +0000)]
Btrfs: fix directory inconsistency after fsync log replay
If we have an inode (file) with a link count greater than 1, remove
one of its hard links, fsync the inode, power fail/crash and then
replay the fsync log on the next mount, we end up getting the parent
directory's metadata inconsistent - its i_size still reflects the
deleted hard link and has dangling index entries (with no matching
inode reference entries). This prevents the directory from ever being
deletable, as its i_size can never decrease to BTRFS_EMPTY_DIR_SIZE
even if all of its children inodes are deleted, and the dangling index
entries can never be removed (as they point to an inode that does not
exist anymore).
This is easy to reproduce with the following excerpt from the test case
for xfstests that I just made:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create a test file with 2 hard links in the same directory.
mkdir -p $SCRATCH_MNT/a/b
echo "hello world" > $SCRATCH_MNT/a/b/foo
ln $SCRATCH_MNT/a/b/foo $SCRATCH_MNT/a/b/bar
# Make sure all metadata and data are durably persisted.
sync
# Now remove one of the hard links and fsync the inode.
rm -f $SCRATCH_MNT/a/b/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a/b/foo
# Simulate a crash/power loss. This makes sure the next mount
# will see an fsync log and will replay that log.
# Remove the last hard link of the file and attempt to remove its parent
# directory - this failed in btrfs because the fsync log and replay code
# didn't decrement the parent directory's i_size and left dangling directory
# index entries - this made the btrfs rmdir implementation always fail with
# the error -ENOTEMPTY.
#
# The dangling directory index entries were visible to user space, but it was
# impossible to do anything on them (unlink, open, read, write, stat, etc)
# because the inode they pointed to did not exist anymore.
#
# The parent directory's metadata inconsistency (stale index entries) was
# also detected by btrfs' fsck tool, which is run automatically by the fstests
# framework when the test finishes. The error message reported by fsck was:
#
# root 5 inode 259 errors 2001, no inode item, link count wrong
# unresolved ref dir 258 index 3 namelen 3 name bar filetype 1 errors 4, no inode ref
#
rm -f $SCRATCH_MNT/a/b/*
rmdir $SCRATCH_MNT/a/b
rmdir $SCRATCH_MNT/a
To fix this just make sure that after an unlink, if the inode is fsync'ed,
he parent inode is fully logged in the fsync log.
Filipe Manana [Tue, 6 Jan 2015 20:18:45 +0000 (20:18 +0000)]
Btrfs: lookup for block group only if needed when freeing a tree block
Very often our extent buffer's header generation doesn't match the current
transaction's id or it is also referenced by other trees (snapshots), so
we don't need the corresponding block group cache object. Therefore only
search for it if we are going to use it, so we avoid an unnecessary search
in the block groups rbtree (and acquiring and releasing its spinlock).
Freeing a tree block is performed when COWing or deleting a node/leaf,
which implies we are holding the node/leaf's parent node lock, therefore
reducing the amount of time spent when freeing a tree block helps reducing
the amount of time we are holding the parent node's lock.
For example, for a run of xfstests/generic/083, the block group cache
object was needed only 682 times for a total of 226691 calls to free
a tree block.
David Sterba [Wed, 14 Jan 2015 18:52:13 +0000 (19:52 +0100)]
btrfs: switch extent_state state to unsigned
Currently there's a 4B hole in the structure between refs and state and there
are only 16 bits used so we can make it unsigned. This will get a better
packing and may save some stack space for local variables.
The size of extent_state gets reduced by 8B and there are usually a lot
of slab objects.
Filipe Manana [Tue, 20 Jan 2015 12:40:53 +0000 (12:40 +0000)]
Btrfs: fix setup_leaf_for_split() to avoid leaf corruption
We were incorrectly detecting when the target key didn't exist anymore
after releasing the path and re-searching the tree. This could make
us split or duplicate (btrfs_split_item() and btrfs_duplicate_item()
are its only callers at the moment) an item when we should not.
For the case of duplicating an item, we currently only duplicate
checksum items (csum tree) and file extent items (fs/subvol trees).
For the checksum items we end up overriding the item completely,
but for file extent items we update only some of their fields in
the copy (done in __btrfs_drop_extents), which means we can end up
having a logical corruption for some values.
Also for the case where we duplicate a file extent item it will make
us produce a leaf with a wrong key order, as btrfs_duplicate_item()
advances us to the next slot and then its caller sets a smaller key
on the new item at that slot (like in __btrfs_drop_extents() e.g.).
Alternatively if the tree search in setup_leaf_for_split() leaves
with path->slots[0] == btrfs_header_nritems(path->nodes[0]), we end
up accessing beyond the leaf's end (when we check if the item's size
has changed) and make our caller insert an item at the invalid slot
btrfs_header_nritems(path->nodes[0]) + 1, causing an invalid memory
access if the leaf is full or nearly full.
This issue has been present since the introduction of this function
in 2009:
Josef Bacik [Mon, 17 Nov 2014 20:45:48 +0000 (15:45 -0500)]
Btrfs: track dirty block groups on their own list
Currently any time we try to update the block groups on disk we will walk _all_
block groups and check for the ->dirty flag to see if it is set. This function
can get called several times during a commit. So if you have several terabytes
of data you will be a very sad panda as we will loop through _all_ of the block
groups several times, which makes the commit take a while which slows down the
rest of the file system operations.
This patch introduces a dirty list for the block groups that we get added to
when we dirty the block group for the first time. Then we simply update any
block groups that have been dirtied since the last time we called
btrfs_write_dirty_block_groups. This allows us to clean up how we write the
free space cache out so it is much cleaner. Thanks,
Josef Bacik [Tue, 16 Dec 2014 16:54:43 +0000 (08:54 -0800)]
Btrfs: change how we track dirty roots
I've been overloading root->dirty_list to keep track of dirty roots and which
roots need to have their commit roots switched at transaction commit time. This
could cause us to lose an update to the root which could corrupt the file
system. To fix this use a state bit to know if the root is dirty, and if it
isn't set we go ahead and move the root to the dirty list. This way if we
re-dirty the root after adding it to the switch_commit list we make sure to
update it. This also makes it so that the extent root is always the last root
on the dirty list to try and keep the amount of churn down at this point in the
commit. Thanks,
fs: export inode_to_bdi and use it in favor of mapping->backing_dev_info
Now that we got rid of the bdi abuse on character devices we can always use
sb->s_bdi to get at the backing_dev_info for a file, except for the block
device special case. Export inode_to_bdi and replace uses of
mapping->backing_dev_info with it to prepare for the removal of
mapping->backing_dev_info.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit de1414a654e66b81b5348dbc5259ecf2fb61655e)
Pranith Kumar [Fri, 5 Dec 2014 16:24:45 +0000 (11:24 -0500)]
rcu: Make SRCU optional by using CONFIG_SRCU
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.
The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.
If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o
Size of arch/powerpc/boot/zImage changes from
text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after
so the savings are about ~2000 bytes.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
(cherry picked from commit 83fe27ea531161a655f02dc7732d14cfaa27fd5d)
We were missing a return statement in the PSL interrupt handler in the
case of an AFU error, which would trigger an "Unhandled CXL PSL IRQ"
warning. We do actually handle these type of errors (by notifying
userspace), so add the missing return IRQ_HANDLED so we don't throw
unecessary warnings.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We are missing a call to of_node_get(). pnv_pci_to_phb_node() should
call of_node_get() otherwise np's reference count isn't incremented and
it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node()
so it's clear it calls of_node_get().
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case CLK_GATE_HIWORD_MASK flag is passed to clk_register_gate(), the bit #
should be no higher than 15, however the corresponding check is obviously off-
by-one.
Fixes: 045779942c04 ("clk: gate: add CLK_GATE_HIWORD_MASK") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Because task_group() uses a cache of autogroup_task_group(), whose
output depends on sched_class, switching classes can generate
problems.
In particular, when started as fair, the cache points to the
autogroup, so when switching to RT the tg_rt_schedulable() test fails
for every cpu.rt_{runtime,period}_us change because now the autogroup
has tasks and no runtime.
Furthermore, going back to the previous semantics of varying
task_group() with sched_class has the down-side that the sched_debug
output varies as well, even though the task really is in the
autogroup.
Therefore add an autogroup exception to tg_has_rt_tasks() -- such that
both (all) task_group() usages in sched/core now have one. And remove
all the remnants of the variable task_group() output.
Reported-by: Zefan Li <lizefan@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Stefan Bader <stefan.bader@canonical.com> Fixes: 8323f26ce342 ("sched: Fix race in task_group()") Link: http://lkml.kernel.org/r/20150209112237.GR5029@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vinayak Menon has reported that an excessive number of tasks was throttled
in the direct reclaim inside too_many_isolated() because NR_ISOLATED_FILE
was relatively high compared to NR_INACTIVE_FILE. However it turned out
that the real number of NR_ISOLATED_FILE was 0 and the per-cpu
vm_stat_diff wasn't transferred into the global counter.
vmstat_work which is responsible for the sync is defined as deferrable
delayed work which means that the defined timeout doesn't wake up an idle
CPU. A CPU might stay in an idle state for a long time and general effort
is to keep such a CPU in this state as long as possible which might lead
to all sorts of troubles for vmstat consumers as can be seen with the
excessive direct reclaim throttling.
This patch basically reverts 39bf6270f524 ("VM statistics: Make timer
deferrable") but it shouldn't cause any problems for idle CPUs because
only CPUs with an active per-cpu drift are woken up since 7cc36bbddde5
("vmstat: on-demand vmstat workers v8") and CPUs which are idle for a
longer time shouldn't have per-cpu drift.
Fixes: 39bf6270f524 (VM statistics: Make timer deferrable) Signed-off-by: Michal Hocko <mhocko@suse.cz> Reported-by: Vinayak Menon <vinmenon@codeaurora.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pin id for a given tuple listed in a fsl,pins property is calculated
by dividing the first entry (which is also a register offset) by 4.
As the first available register is at offset 0x8 and configures the pad
MX25_PAD_A10 the right id for this pin is 2. All other pins are off by
one, too.
This patch drops the definition MX25_PAD_RESERVE1 (together with its
only use) and decrements all following values by 1.
Sometimes while CPU have some load and ath5k doing the wireless
interface reset the whole WiSoC completely freezes. Set of tests shows
that using atomic delay function while we wait interface reset helps to
avoid such freezes.
The easiest way to reproduce this issue: create a station interface,
start continous scan with wpa_supplicant and load CPU by something. Or
just create multiple station interfaces and put them all in continous
scan.
This patch partially reverts the commit 1846ac3dbec0 ("ath5k: Use
usleep_range where possible"), which replaces initial udelay()
by usleep_range().
I do not know actual source of this issue, but all looks like that HW
freeze is caused by transaction on internal SoC bus, while wireless
block is in reset state.
Also I should note that I do not know how many chips are affected, but I
did not see this issue with chips, other than AR5312.
CC: Jiri Slaby <jirislaby@gmail.com> CC: Nick Kossifidis <mickflemm@gmail.com> CC: Luis R. Rodriguez <mcgrof@do-not-panic.com> Fixes: 1846ac3dbec0 ("ath5k: Use usleep_range where possible") Reported-by: Christophe Prevotaux <c.prevotaux@rural-networks.com> Tested-by: Christophe Prevotaux <c.prevotaux@rural-networks.com> Tested-by: Eric Bree <ebree@nltinc.com> Signed-off-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In the function of_pci_get_host_bridge_resources() if the parsing of ranges
fails, previously allocated resources inclusive of bus_range are not freed
and are not expected to be freed by the function caller on error return.
This patch fixes the issues by adding code that properly frees resources
and bus_range before exiting the function with an error return value.
Fixes: cbe4097f8ae6 ("of/pci: Add support for parsing PCI host bridge resources from DT") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Liviu Dudau <liviu.dudau@arm.com> CC: Arnd Bergmann <arnd@arndb.de> CC: Rob Herring <robh+dt@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The commit 177ef2a6315e ("sched/deadline: Fix a precision problem in
the microseconds range") forgot to change the UP version of
hrtick_start(), do so now.
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Fixes: 177ef2a6315e ("sched/deadline: Fix a precision problem in the microseconds range")
[ Fixed the changelog. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1416962647-76792-7-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current organization of Documentation/stable_kernel_rules.txt
doesn't clearly differentiate the mutually exclusive options for
submission to the -stable review process. As I understand it, patches
are not actually required to be mailed directly to
stable@vger.kernel.org, but the instructions do not make this clear.
Also, there are some established processes that are not listed --
specifically, what I call Option 2 below.
This patch updates and reorganizes a bit, to make things clearer.
Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DMA_BIT_MASK of 64 is not valid dma address mask for OMAPs, it should be
set to 32.
The 64 was introduced by commit (in 2009): a152ff24b978 ASoC: OMAP: Make DMA 64 aligned
But the dma_mask and coherent_dma_mask can not be used to specify alignment.
Fixes: a152ff24b978 (ASoC: OMAP: Make DMA 64 aligned) Reported-by: Grygorii Strashko <Grygorii.Strashko@linaro.org> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we're traversing a directory which contains a submounted filesystem,
or one that has a referral, the NFS server that is processing the READDIR
request will often return information for the underlying (mounted-on)
directory. It may, or may not, also return filehandle information.
If this happens, and the lookup in nfs_prime_dcache() returns the
dentry for the submounted directory, the filehandle comparison will
fail, and we call d_invalidate(). Post-commit 8ed936b5671bf
("vfs: Lazily remove mounts on unlinked files and directories."), this
means the entire subtree is unmounted.
The following minimal patch addresses this problem by punting on
the invalidation if there is a submount.
Kudos to Neil Brown <neilb@suse.de> for having tracked down this
issue (see link).
Commit 7d78cbefaa (serial: 8250_dw: add ability to handle
the peripheral clock) introduces handling for a second clk
to 8250_dw.c which is the driver also for LPSS UART. The
second clk forces us to provide identifier (con_id) for the
clkdev we create.
This fixes an issue where 8250_dw.c is getting the same
handler for both clocks.
Fixes: 7d78cbefaa (serial: 8250_dw: add ability to handle the peripheral clock) Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
i915.ko depends upon the acpi/video.ko module and so refuses to load if
ACPI is disabled at runtime if for example the BIOS is broken beyond
repair. acpi/video provides an optional service for i915.ko and so we
should just allow the modules to load, but do no nothing in order to let
the machines boot correctly.
Reported-by: Bill Augur <bill-auger@programmer.net> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@intel.com> Acked-by: Aaron Lu <aaron.lu@intel.com>
[ rjw: Fixed up the new comment in acpi_video_init() ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.
One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.
This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.
While adding support loading kernel and initrd above 4G to grub2 in legacy
mode, I was referring to efi_high_alloc().
That will allocate buffer for kernel and then initrd, and initrd will
use kernel buffer start as limit.
During testing found two buffers will be overlapped when initrd size is
very big like 400M.
It turns out efi_high_alloc() boundary checking is not right.
end - size will be the new start, and should not compare new
start with max, we need to make sure end is smaller than max.
[ Basically, with the current efi_high_alloc() code it's possible to
allocate memory above 'max', because efi_high_alloc() doesn't check
that the tail of the allocation is below 'max'.
If you have an EFI memory map with a single entry that looks like so,
[0xc0000000-0xc0004000]
And want to allocate 0x3000 bytes below 0xc0003000 the current code
will allocate [0xc0001000-0xc0004000], not [0xc0000000-0xc0003000]
like you would expect. - Matt ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is
protected by clp->cl_lock") removed the use of the reaplist to
clean out clp->cl_revoked. It failed to change list_entry() to
walk clp->cl_revoked.next instead of reaplist.next
Fixes: 2d4a532d385f ("nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock") Reported-by: Eric Meddaugh <etmsys@rit.edu> Tested-by: Eric Meddaugh <etmsys@rit.edu> Signed-off-by: Andrew Elble <aweits@rit.edu> Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we takeover from the BIOS and install our interrupt handler, the
BIOS may have left us a few surprises in the form of spontaneous
interrupts. (This is especially likely on hardware like 965gm where
display fifo underruns are continuous and the GMCH cannot filter that
interrupt souce.) As we enable our IRQ early so that we can use it
during hardware probing, our interrupt handler must be prepared to
handle a few sources prior to being fully configured. As such, we need
to add a simple is-ready check prior to dereferencing our KMS state for
reporting underruns.
Reported-by: Rob Clark <rclark@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1193972 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: dropped the extra !] Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Atm, it's possible that the interrupt handler is called when the device
is in D3 or some other low-power state. It can be due to another device
that is still in D0 state and shares the interrupt line with i915, or on
some platforms there could be spurious interrupts even without sharing
the interrupt line. The latter case was reported by Klaus Ethgen using a
Lenovo x61p machine (gen 4). He noticed this issue via a system
suspend/resume hang and bisected it to the following commit:
drm/i915: use runtime irq suspend/resume in freeze/thaw
This is a problem, since in low-power states IIR will always read
0xffffffff resulting in an endless IRQ servicing loop.
Fix this by handling interrupts only when the driver explicitly enables
them and so it's guaranteed that the interrupt registers return a valid
value.
Note that this issue existed even before the above commit, since during
runtime suspend/resume we never unregistered the handler.
v2:
- clarify the purpose of smp_mb() vs. synchronize_irq() in the
code comment (Chris)
v3:
- no need for an explicit smp_mb(), we can assume that synchronize_irq()
and the mmio read/writes in the install hooks provide for this (Daniel)
- remove code comment as the remaining synchronize_irq() is self
explanatory (Daniel)
v4:
- drm_irq_uninstall() implies synchronize_irq(), so no need to call it
explicitly (Daniel)
Reference: https://lkml.org/lkml/2015/2/11/205 Reported-and-bisected-by: Klaus Ethgen <Klaus@Ethgen.ch> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When we walk the list of vma, or even for protecting against concurrent
framebuffer creation, we must hold the struct_mutex or else a second
thread can corrupt the list as we walk it.
References: https://bugs.freedesktop.org/show_bug.cgi?id=89085 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit b7bc596ebbe0 ("drm/radeon: disable native
backlight control on pre-r6xx asics (v2)") accidently
broke backlight control on old mac laptops that use the
on-GPU backlight controller.
It appears that the Cintiq Companion Hybrid does not send an ABS_MISC event to
userspace when any of its ExpressKeys are pressed. This is not strictly
necessary now that the pad exists on its own device, but should be fixed for
consistency's sake.
Traditionally both the stylus and pad shared the same device node, and
xf86-input-wacom would use ABS_MISC for disambiguation. Not sending this causes
the Hybrid to behave incorrectly with xf86-input-wacom beginning with its
8f44f3 commit.
Signed-off-by: Jason Gerecke <killertofu@gmail.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ignore check that got added in 6ce901eb61 ("HID: input: fix confusion
on conflicting mappings") needs to properly check for VARIABLE reports
as well (ARRAY reports should be ignored), otherwise legitimate keyboards
might break.
Fixes: 6ce901eb61 ("HID: input: fix confusion on conflicting mappings") Reported-by: Fredrik Hallenberg <megahallon@gmail.com> Reported-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(Note that the number of keys is the same, but key '5' is moved down and
the shape of key '4' is changed. Keys '1' to '3' are exactly the same.)
The keys 1-4 report the same scan-code in HID in both layouts, even though
the keysym they produce is usually different depending on the XKB-keymap
used by user-space.
However, key '5' (US 'backslash'/'pipe') reports 0x31 for the upper layout
and 0x32 for the lower layout, as defined by the HID spec. This is highly
confusing as the linux-input API uses a single keycode for both.
So far, this was never a problem as there never has been a keyboard with
both of those keys present at the same time. It would have to look
something like this:
HID can represent such a keyboard, but the linux-input API cannot.
Furthermore, any user-space mapping would be confused by this and,
luckily, no-one ever produced such hardware.
Now, the HID input layer fixed this mess by mapping both 0x31 and 0x32 to
the same keycode (KEY_BACKSLASH==0x2b). As only one of both physical keys
is present on a hardware, this works just fine.
Lets introduce hardware-vendors into this:
------------------------------------------
Unfortunately, it seems way to expensive to produce a different device for
American and European layouts. Therefore, hardware-vendors put both keys,
(0x31 and 0x32) on the same keyboard, but only one of them is hooked up
to the physical button, the other one is 'dead'.
This means, they can use the same hardware, with a different button-layout
and automatically produce the correct HID events for American *and*
European layouts. This is unproblematic for normal keyboards, as the
'dead' key will never report any KEY-DOWN events. But RollOver keyboards
send the whole matrix on each key-event, allowing n-key roll-over mode.
This means, we get a 0x31 and 0x32 event on each key-press. One of them
will always be 0, the other reports the real state. As we map both to the
same keycode, we will get spurious key-events, even though the real
key-state never changed.
The easiest way would be to blacklist 'dead' keys and never handle those.
We could simply read the 'country' tag of USB devices and blacklist either
key according to the layout. But... hardware vendors... want the same
device for all countries and thus many of them set 'country' to 0 for all
devices. Meh..
So we have to deal with this properly. As we cannot know which of the keys
is 'dead', we either need a heuristic and track those keys, or we simply
make use of our value-tracking for HID fields. We simply ignore HID events
for absolute data if the data didn't change. As HID tracks events on the
HID level, we haven't done the keycode translation, yet. Therefore, the
'dead' key is tracked independently of the real key, therefore, any events
on it will be ignored.
This patch simply discards any HID events for absolute data if it didn't
change compared to the last report. We need to ignore relative and
buffered-byte reports for obvious reasons. But those cannot be affected by
this bug, so we're fine.
Preferably, we'd do this filtering on the HID-core level. But this might
break a lot of custom drivers, if they do not follow the HID specs.
Therefore, we do this late in hid-input just before we inject it into the
input layer (which does the exact same filtering, but on the keycode
level).
If this turns out to break some devices, we might have to limit filtering
to EV_KEY events. But lets try to do the Right Thing first, and properly
filter any absolute data that didn't change.
This patch is tagged for 'stable' as it fixes a lot of n-key RollOver
hardware. We might wanna wait with backporting for a while, before we know
it doesn't break anything else, though.
Reported-by: Adam Goode <adam@spicenitz.org> Reported-by: Fredrik Hallenberg <megahallon@gmail.com> Tested-by: Fredrik Hallenberg <megahallon@gmail.com> Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The hardware range code values and list of valid ranges for the AI
subdevice is incorrect for several supported boards. The hardware range
code values for all boards except PCI-DAS4020/12 is determined by
calling `ai_range_bits_6xxx()` based on the maximum voltage of the range
and whether it is bipolar or unipolar, however it only returns the
correct hardware range code for the PCI-DAS60xx boards. For
PCI-DAS6402/16 (and /12) it returns the wrong code for the unipolar
ranges. For PCI-DAS64/Mx/16 it returns the wrong code for all the
ranges and the comedi range table is incorrect.
Change `ai_range_bits_6xxx()` to use a look-up table pointed to by new
member `ai_range_codes` of `struct pcidas64_board` to map the comedi
range table indices to the hardware range codes. Use a new comedi range
table for the PCI-DAS64/Mx/16 boards (and the commented out variants).
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dmi-sysfs should create "End of Table" entry, that is type 127. But
after adding initial SMBIOS v3 support fc43026278b2 ("dmi: add support
for SMBIOS 3.0 64-bit entry point") the 127-0 entry is not handled any
more, as result it's not created in dmi sysfs for instance. This is
important because the size of whole DMI table must correspond to sum of
all DMI entry sizes.
So move the end-of-table check after it's handled by dmi_table.
Reviewed-by: Ard Biesheuvel <ard@linaro.org> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to SMBIOSv3 specification the length of DMI table can be
up to 32bits wide. So use appropriate type to avoid overflow.
It's obvious that dmi_num theoretically can be more than u16 also,
so it's can be changed to u32 or at least it's better to use int
instead of u16, but on that moment I cannot imagine dmi structure
count more than 65535 and it can require changing type of vars that
work with it. So I didn't correct it.
Acked-by: Ard Biesheuvel <ard@linaro.org> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When the snapshot target is unloaded, snapshot_dtr() waits until
pending_exceptions_count drops to zero. Then, it destroys the snapshot.
Therefore, the function that decrements pending_exceptions_count
should not touch the snapshot structure after the decrement.
pending_complete() calls free_pending_exception(), which decrements
pending_exceptions_count, and then it performs up_write(&s->lock) and it
calls retry_origin_bios() which dereferences s->origin. These two
memory accesses to the fields of the snapshot may touch the dm_snapshot
struture after it is freed.
This patch moves the call to free_pending_exception() to the end of
pending_complete(), so that the snapshot will not be destroyed while
pending_complete() is in progress.
The function dm_get_md finds a device mapper device with a given dev_t,
increases the reference count and returns the pointer.
dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
flag, drops _minor_lock and returns pointer to the device. dm_get_md then
calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
otherwise it increments the reference count.
There is a possible race condition - after dm_find_md exits and before
dm_get is called, there are no locks held, so the device may disappear or
DMF_FREEING flag may be set, which results in BUG.
To fix this bug, we need to call dm_get while we hold _minor_lock. This
patch renames dm_find_md to dm_get_md and changes it so that it calls
dm_get while holding the lock.
I created a dm-raid1 device backed by a device that supports DISCARD
and another device that does NOT support DISCARD with the following
dm configuration:
Notice that the mirror device /dev/mapper/moo advertises DISCARD
support even though one of the mirror halves doesn't.
If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl
BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite
loop in do_region() when it tries to issue a DISCARD request to sdb.
The problem is that when we call do_region() against sdb, num_sectors
is set to zero because q->limits.max_discard_sectors is zero.
Therefore, "remaining" never decreases and the loop never terminates.
To fix this: before entering the loop, check for the combination of
REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up
the mirror device.
This bug was found by the unfortunate coincidence of pvmove and a
discard operation in the RHEL 6.5 kernel; upstream is also affected.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It may be possible that a device claims discard support but it rejects
discards with -EOPNOTSUPP. It happens when using loopback on ext2/ext3
filesystem driven by the ext4 driver. It may also happen if the
underlying devices are moved from one disk on another.
If discard error happens, we reject the bio with -EOPNOTSUPP, but we do
not degrade the array.
This patch fixes failed test shell/lvconvert-repair-transient.sh in the
lvm2 testsuite if the testsuite is extracted on an ext2 or ext3
filesystem and it is being driven by the ext4 driver.
`do_cmd_ioctl()` in "comedi_fops.c" handles the `COMEDI_CMD` ioctl.
This returns `-EAGAIN` if it has copied a modified `struct comedi_cmd`
back to user-space. (This occurs when the low-level Comedi driver's
`do_cmdtest()` handler returns non-zero to indicate a problem with the
contents of the `struct comedi_cmd`, or when the `struct comedi_cmd` has
the `CMDF_BOGUS` flag set.)
`compat_cmd()` in "comedi_compat32.c" handles the 32-bit compatible
version of the `COMEDI_CMD` ioctl. Currently, it never copies a 32-bit
compatible version of `struct comedi_cmd` back to user-space, which is
at odds with the way the regular `COMEDI_CMD` ioctl is handled. To fix
it, change `compat_cmd()` to copy a 32-bit compatible version of the
`struct comedi_cmd` back to user-space when the main ioctl handler
returns `-EAGAIN`.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Reviewed-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For all pll-s on sun6i n == 0 means use a multiplier of 1, rather then 0 as
it means on sun4i / sun5i / sun7i. n_start = 1 is already correctly set
for sun6i pll6, but was missing for pll1, this commit fixes this.
Cc: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some of the clks can be registered & unregistered before the clk related debugfs
entries are initialized at late_initcall. In the unregister path checking for only
dentry before clk_debug_init() would lead dangling pointers in the debug clk list,
because the list is already populated in register path and the clk pointer freed in
unregister path.
The side effect of not removing it from the list is either a null pointer
dereference or if lucky to boot the system, the number of clk entries in
debugfs disappear.
We could add more checks like if (inited && !clk->dentry) but just removing
the check for dentry made more sense as debugfs_remove_recursive() seems to be
safe with null pointers. This will ensure that the unregistering clk would be
removed from the debug list in all the code paths.
The CPU_2X clock does not have a classical in-kernel user, but is,
amongst other things, required for OCM and debug access. Make sure this
clock is not mistakenly disabled during boot up by enabling it in the
platform's clock driver.
wd719x_template is missing the .module field, causing module refcount
not to work, allowing to rmmod the driver while in use (mounted filesystem),
causing an oops.
Set .module to THIS_MODULE to fix the problem.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
have a memory overrun issue:
Each b-tree node of nilfs2 stores a set of key-value pairs and the number
of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
a few other "bn_*" members.
Since the value of "bn_nchildren" is used for operations on the key-values
within the b-tree node, it can cause memory access overrun if a large
number is incorrectly set to "bn_nchildren".
For instance, nilfs_btree_node_lookup() function determines the range of
binary search with it, and too large "bn_nchildren" leads
nilfs_btree_node_get_key() in that function to overrun.
As for intermediate b-tree nodes, this is prevented by a sanity check
performed when each node is read from a drive, however, no sanity check
has been done for root nodes stored in inodes.
This patch fixes the issue by adding missing sanity check against b-tree
root nodes so that it's called when on-memory inodes are read from ifile,
inode metadata file.
When the last on-demand paging MR is released the notifier count is
left non-zero so that concurrent page faults will have to abort. If a
new MR is then registered, the counter is reset. However, the decision
is made to put the new MR in the list waiting for the notifier count
to reach zero, before the counter is reset. An invalidation or another
MR registration can release the MR to handle page faults, but without
such an event the MR can wait forever.
The patch fixes this issue by adding a check whether the MR is the
first on-demand paging MR when deciding whether it is ready to handle
page faults. If it is the first MR, we know that there are no mmu
notifiers running in parallel to the registration.
Fixes: 882214e2b128 ("IB/core: Implement support for MMU notifiers regarding on demand paging regions") Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The deadlock occurs in __uverbs_modify_qp: we take a lock (idr_read_qp)
and in case of failure in ib_resolve_eth_l2_attrs we don't release
it (put_qp_read). Fix that.
Fixes: ed4c54e5b4ba ("IB/core: Resolve Ethernet L2 addresses when modifying QP") Signed-off-by: Moshe Lazer <moshel@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The MLX4_PROT_IB_IPV4 protocol should only be used with RoCEv2 and such.
Removing this wrong usage allows to run multicast applications over RoCE.
Fixes: d487ee77740c ("IB/mlx4: Use IBoE (RoCE) IP based GIDs in the port GID table") Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case handle_eth_ud_smac_index fails, we need to free the allocated resources.
Fixes: 2f5bb473681b ("mlx4: Add ref counting to port MAC table for RoCE") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When teardown process starts during live IO, we need to keep the
memory regions pool (frmr/fmr) until all in-flight tasks are properly
released, since each task may return a memory region to the pool. In
order to do this, we pass a destroy flag to iser_free_ib_conn_res to
indicate we can destroy the device and the memory regions
pool. iser_conn_release will pass it as true and also DEVICE_REMOVAL
event (we need to let the device to properly remove).
Also, Since we conditionally call iser_free_rx_descriptors,
remove the extra check on iser_conn->rx_descs.
Fixes: 5426b1711fd0 ("IB/iser: Collapse cleanup and disconnect handlers") Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This changeset removes all the code that allows the driver to write to
the EEPROM and update the recorded error counters and power on hours.
These two stats are unused and writing them exposes a timing risk
which could leave the EEPROM in a bad state preventing further normal
operation of the HCA.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With task_blocks_on_rt_mutex() returning early -EDEADLK we never
add the waiter to the waitqueue. Later, we try to remove it via
remove_waiter() and go boom in rt_mutex_top_waiter() because
rb_entry() gives a NULL pointer.
( Tested on v3.18-RT where rtmutex is used for regular mutex and I
tried to get one twice in a row. )
Not sure when this started but I guess 397335f004f4 ("rtmutex: Fix
deadlock detector for real") or commit 3d5c9340d194 ("rtmutex:
Handle deadlock detection smarter").
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1424187823-19600-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is essentially a partial revert of the commit [b1920c21102a:
'ALSA: hda - Enable runtime PM on Panther Point']. There was a bug
report showing the HD-audio bus hang during runtime PM on HP Spectre
XT.
A part of these drivers, especially BeBoB driver, are programmed to wait
some events. Thus the drivers should not destroy any data in .remove()
context.
This commit moves some destructors from 'struct fw_driver.remove()' to
'struct snd_card.private_free()' to shutdown safely.
Currently stream destructor in each driver has a problem to be called in
a context in which sound card object is released, because the destructors
call amdtp_stream_pcm_abort() and touch PCM runtime data.
The PCM runtime data is destroyed in application's context with
snd_pcm_close(), on the other hand PCM substream data is destroyed after
sound card object is released, in most case after all of ALSA character
devices are released. When PCM runtime is destroyed and PCM substream is
remained, amdtp_stream_pcm_abort() touches PCM runtime data and causes
Null-pointer-dereference.
This commit changes stream destructors and allows each driver to call
it after releasing runtime.
AMDTP helper functions increment/decrement reference counter for an
instance of FireWire unit, while it's complicated for each driver to
process error state.
In previous commit, each driver has the role of reference counting. This
commit removes this role from the helper function.
Fireworks and Dice drivers try to touch instances of FireWire unit after
sound card object is released, while references to the unit is decremented
in .remove(). When unplugging during streaming, sound card object is
released after .remove(), thus Fireworks and Dice drivers causes GPF or
Null-pointer-dereferencing to application processes because an instance of
FireWire unit was already released.
This commit adds reference-counting for FireWire unit in drivers to allow
them to touch an instance of FireWire unit after .remove(). In most case,
any operations after .remove() may be failed safely.
When a PCM draining is performed to an empty stream that has been
already in PREPARED state, the current code just ignores and leaves as
it is, although the drain is supposed to set all such streams to SETUP
state. This patch covers that overlooked case.
The conditional RX-FIFO read seems to cause spurious interrupts and we
see just:
|serial8250: too much work for irq29
The previous behaviour was "default" for decades and Marvell's 88f6282 SoC
might not be the only that relies on it. Therefore the Omap fix is
reverted for now.
Fixes: 0aa525d11859 ("tty: serial: 8250_core: read only RX if there is
something in the FIFO") Reported-By: Nicolas Schichan <nschichan@freebox.fr> Debuged-By: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
But it still misses one point. As John Paul correctly points out, we
do not care about setting date. If somebody ever changes wall
time backwards (by mistake for example), tty timestamps are never
updated until the original wall time passes.
So check the absolute difference of times and if it large than "8
seconds or so", always update the time. That means we will update
immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the
check, but it was always that way.
Thanks John for serving me this so nicely debugged.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-by: John Paul Perry <john_paul.perry@alcatel-lucent.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>