]> git.hungrycats.org Git - linux/log
linux
9 years agozygo: turn down the preemption a little, get a bonus i180 driver. Leave ATOMIC_SLEEP... zygo-4.0.8-zb64
Zygo Blaxell [Fri, 17 Jul 2015 15:39:57 +0000 (11:39 -0400)]
zygo: turn down the preemption a little, get a bonus i180 driver.  Leave ATOMIC_SLEEP debug enabled for now.

9 years agoRevert "Btrfs: fix list transaction->pending_ordered corruption"
Zygo Blaxell [Fri, 17 Jul 2015 15:21:53 +0000 (11:21 -0400)]
Revert "Btrfs: fix list transaction->pending_ordered corruption"

This reverts commit d3efe08400317888f559bbedf0e42cd31575d0ef.

9 years agoBtrfs: fix race between caching kthread and returning inode to inode cache
Filipe Manana [Sat, 13 Jun 2015 05:52:57 +0000 (06:52 +0100)]
Btrfs: fix race between caching kthread and returning inode to inode cache

While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
we could have a concurrent call to btrfs_return_ino() that adds a new
entry to the root's free space cache of pinned inodes. This concurrent
call does not acquire the fs_info->commit_root_sem before adding a new
entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
because the caching kthread calls btrfs_unpin_free_ino() after setting
the caching state to BTRFS_CACHE_FINISHED and therefore races with
the task calling btrfs_return_ino(), which is adding a new entry, while
the former (caching kthread) is navigating the cache's rbtree, removing
and freeing nodes from the cache's rbtree without acquiring the spinlock
that protects the rbtree.

This race resulted in memory corruption due to double free of struct
btrfs_free_space objects because both tasks can end up doing freeing the
same objects. Note that adding a new entry can result in merging it with
other entries in the cache, in which case those entries are freed.
This is particularly important as btrfs_free_space structures are also
used for the block group free space caches.

This memory corruption can be detected by a debugging kernel, which
reports it with the following trace:

[132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
[132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
[132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
[132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
[132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
[132408.505075] Call Trace:
[132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
[132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
[132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
[132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
[132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
[132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
[132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
[132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
[132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
[132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
[132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
[132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
[132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
[132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
[132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
[132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
[132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
[132409.503355] ------------[ cut here ]------------
[132409.504241] kernel BUG at mm/slab.c:2571!

Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
that protects the rbtree while doing the searches and removing entries.

Fixes: 1c70d8fb4dfa ("Btrfs: fix inode caching vs tree log")
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
(cherry picked from commit ae9d8f17118551bedd797406a6768b87c2146234)

9 years agoBtrfs: add missing inode item update in fallocate()
Filipe Manana [Tue, 14 Jul 2015 13:06:47 +0000 (14:06 +0100)]
Btrfs: add missing inode item update in fallocate()
(bnc#938023).

suse-commit: 4f61a739e2bef56c25f9a883afff48bffcd47515
(cherry picked from commit dfe466f4e9ea672ee944c0493bdb0659afd2da7d)

9 years agoBtrfs: fix stale directory entries after fsync log replay
Filipe Manana [Wed, 15 Jul 2015 22:26:43 +0000 (23:26 +0100)]
Btrfs: fix stale directory entries after fsync log replay

We have another case where after an fsync log replay we get an inode with
a wrong link count (smaller than it should be) and a number of directory
entries greater than its link count. This happens when we add a new link
hard link to our inode A and then we fsync some other inode B that has
the side effect of logging the parent directory inode too. In this case
at log replay time we add the new hard link to our inode (the item with
key BTRFS_INODE_REF_KEY) when processing the parent directory but we
never adjust the link count of our inode A. As a result we get stale dir
entries for our inode A that can never be deleted and therefore it makes
it impossible to remove the parent directory (as its i_size can never
decrease back to 0).

A simple reproducer for fstests that triggers this issue:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1 # failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs generic
  _supported_os Linux
  _require_scratch
  _require_dm_flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our test directory and files.
  mkdir $SCRATCH_MNT/testdir
  touch $SCRATCH_MNT/testdir/foo
  touch $SCRATCH_MNT/testdir/bar

  # Make sure everything done so far is durably persisted.
  sync

  # Create one hard link for file foo and another one for file bar. After
  # that fsync only the file bar.
  ln $SCRATCH_MNT/testdir/bar $SCRATCH_MNT/testdir/bar_link
  ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/foo_link
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/bar

  # Silently drop all writes on scratch device to simulate power failure.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again and mount the fs to trigger log/journal replay.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # Now verify both our files have a link count of 2.
  echo "Link count for file foo: $(stat --format=%h $SCRATCH_MNT/testdir/foo)"
  echo "Link count for file bar: $(stat --format=%h $SCRATCH_MNT/testdir/bar)"

  # We should be able to remove all the links of our files in testdir, and
  # after that the parent directory should become empty and therefore
  # possible to remove it.
  rm -f $SCRATCH_MNT/testdir/*
  rmdir $SCRATCH_MNT/testdir

  _unmount_flakey

  # The fstests framework will call fsck against our filesystem which will verify
  # that all metadata is in a consistent state.

  status=0
  exit

The test fails with:

 -Link count for file foo: 2
 +Link count for file foo: 1
  Link count for file bar: 2
 +rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/testdir/foo_link': Stale file handle
 +rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/testdir': Directory not empty
 (...)
 _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent

And fsck's output:

  (...)
  checking fs roots
  root 5 inode 258 errors 2001, no inode item, link count wrong
      unresolved ref dir 257 index 5 namelen 8 name foo_link filetype 1 errors 4, no inode ref
  Checking filesystem on /dev/sdc
  (...)

So fix this by marking inodes for link count fixup at log replay time
whenever a directory entry is replayed if the entry was created in the
transaction where the fsync was made and if it points to a non-directory
inode.

This isn't a new problem/regression, the issue exists for a long time,
possibly since the log tree feature was added (2008).

Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit 4e1102af88986f08410f0c83d090f4b2e2399a60)

9 years agoBtrfs: fix file corruption after cloning inline extents
Filipe Manana [Tue, 14 Jul 2015 15:09:39 +0000 (16:09 +0100)]
Btrfs: fix file corruption after cloning inline extents

Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.

For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:

  _scratch_mkfs >>$seqres.full 2>&1
  _scratch_mount

  # Create our test files. File foo has the same 2K of data at offset 4K
  # as file bar has at its offset 0.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
      -c "pwrite -S 0xbb 4k 2K" \
      -c "pwrite -S 0xcc 8K 4K" \
      $SCRATCH_MNT/foo | _filter_xfs_io

  # File bar consists of a single inline extent (2K size).
  $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
     $SCRATCH_MNT/bar | _filter_xfs_io

  # Now call the clone ioctl to clone the extent of file bar into file
  # foo at its offset 4K. This made file foo have an inline extent at
  # offset 4K, something which the btrfs code can not deal with in future
  # IO operations because all inline extents are supposed to start at an
  # offset of 0, resulting in all sorts of chaos.
  # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
  # what it returns for other cases dealing with inlined extents.
  $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
      $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Because of the inline extent at offset 4K, the following write made
  # the kernel crash with a BUG_ON().
  $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io

  status=0
  exit

The stack trace of the BUG_ON() triggered by the last write is:

  [152154.035903] ------------[ cut here ]------------
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc acpi_cpu$
  [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: G        W       4.1.0-rc6-btrfs-next-11+ #2
  [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
  [152154.036424] task: ffff880429f70990 ti: ffff880429efc000 task.ti: ffff880429efc000
  [152154.036424] RIP: 0010:[<ffffffff8111a9d5>]  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424] RSP: 0018:ffff880429effc68  EFLAGS: 00010246
  [152154.036424] RAX: 0200000000000806 RBX: ffffea0006a6d8f0 RCX: 0000000000000001
  [152154.036424] RDX: 0000000000000000 RSI: ffffffff81155d1b RDI: ffffea0006a6d8f0
  [152154.036424] RBP: ffff880429effc78 R08: ffff8801ce389fe0 R09: 0000000000000001
  [152154.036424] R10: 0000000000002000 R11: ffffffffffffffff R12: ffff8800200dce68
  [152154.036424] R13: 0000000000000000 R14: ffff8800200dcc88 R15: ffff8803d5736d80
  [152154.036424] FS:  00007fbf119f6700(0000) GS:ffff88043d280000(0000) knlGS:0000000000000000
  [152154.036424] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [152154.036424] CR2: 0000000001bdc000 CR3: 00000003aa555000 CR4: 00000000000006e0
  [152154.036424] Stack:
  [152154.036424]  ffff8803d5736d80 0000000000000001 ffff880429effcd8 ffffffffa04e97c1
  [152154.036424]  ffff880429effd68 ffff880429effd60 0000000000000001 ffff8800200dc9c8
  [152154.036424]  0000000000000001 ffff8800200dcc88 0000000000000000 0000000000001000
  [152154.036424] Call Trace:
  [152154.036424]  [<ffffffffa04e97c1>] lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [<ffffffffa04ea82c>] __btrfs_buffered_write+0x245/0x4c8 [btrfs]
  [152154.036424]  [<ffffffffa04ed14b>] ? btrfs_file_write_iter+0x150/0x3e0 [btrfs]
  [152154.036424]  [<ffffffffa04ed15a>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
  [152154.036424]  [<ffffffffa04ed2c7>] btrfs_file_write_iter+0x2cc/0x3e0 [btrfs]
  [152154.036424]  [<ffffffff81165a4a>] __vfs_write+0x7c/0xa5
  [152154.036424]  [<ffffffff81165f89>] vfs_write+0xa0/0xe4
  [152154.036424]  [<ffffffff81166855>] SyS_pwrite64+0x64/0x82
  [152154.036424]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
  [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 59 49 8b 3c 2$
  [152154.036424] RIP  [<ffffffff8111a9d5>] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424]  RSP <ffff880429effc68>
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:

   00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split")
   3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit ed958762644b404654a6f5d23e869f496fe127c6)

9 years agoMerge tag 'v4.0.8' into zygo-4.0.8-zb64
Zygo Blaxell [Fri, 10 Jul 2015 22:11:33 +0000 (18:11 -0400)]
Merge tag 'v4.0.8' into zygo-4.0.8-zb64

This is the 4.0.8 stable release

# gpg: Signature made Fri Jul 10 12:46:22 2015 EDT using RSA key ID 6092693E
# gpg: Good signature from "Greg Kroah-Hartman (Linux kernel stable release signing key) <greg@kroah.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 647F 2865 4894 E3BD 4571  99BE 38DB BDC8 6092 693E

9 years agoLinux 4.0.8 v4.0.8
Greg Kroah-Hartman [Fri, 10 Jul 2015 16:46:16 +0000 (09:46 -0700)]
Linux 4.0.8

9 years agofs/ufs: restore s_lock mutex_init()
Fabian Frederick [Wed, 17 Jun 2015 16:15:45 +0000 (18:15 +0200)]
fs/ufs: restore s_lock mutex_init()

commit e4f95517f18271b1da36cfc5d700e46844396d6e upstream.

Add last missing line in commit "cdd9eefdf905"
("fs/ufs: restore s_lock mutex")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoufs: Fix possible deadlock when looking up directories
Jan Kara [Tue, 2 Jun 2015 09:26:34 +0000 (11:26 +0200)]
ufs: Fix possible deadlock when looking up directories

commit 514d748f69c97a51a2645eb198ac5c6218f22ff9 upstream.

Commit e4502c63f56aeca88 (ufs: deal with nfsd/iget races) made ufs
create inodes with I_NEW flag set. However ufs_mkdir() never cleared
this flag. Thus if someone ever tried to lookup the directory by inode
number, he would deadlock waiting for I_NEW to be cleared. Luckily this
mostly happens only if the filesystem is exported over NFS since
otherwise we have the inode attached to dentry and don't look it up by
inode number. In rare cases dentry can get freed without inode being
freed and then we'd hit the deadlock even without NFS export.

Fix the problem by clearing I_NEW before instantiating new directory
inode.

Fixes: e4502c63f56aeca887ced37f24e0def1ef11cec8
Reported-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoufs: Fix warning from unlock_new_inode()
Jan Kara [Mon, 1 Jun 2015 12:52:04 +0000 (14:52 +0200)]
ufs: Fix warning from unlock_new_inode()

commit 12ecbb4b1d765a5076920999298d9625439dbe58 upstream.

Commit e4502c63f56aeca88 (ufs: deal with nfsd/iget races) introduced
unlock_new_inode() call into ufs_add_nondir(). However that function
gets called also from ufs_link() which hands it already initialized
inode and thus unlock_new_inode() complains. The problem is harmless but
annoying.

Fix the problem by opencoding necessary stuff in ufs_link()

Fixes: e4502c63f56aeca887ced37f24e0def1ef11cec8
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agovfs: Ignore unlocked mounts in fs_fully_visible
Eric W. Biederman [Wed, 7 Jan 2015 14:10:09 +0000 (08:10 -0600)]
vfs: Ignore unlocked mounts in fs_fully_visible

commit ceeb0e5d39fcdf4dca2c997bf225c7fc49200b37 upstream.

Limit the mounts fs_fully_visible considers to locked mounts.
Unlocked can always be unmounted so considering them adds hassle
but no security benefit.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agovfs: Remove incorrect debugging WARN in prepend_path
Eric W. Biederman [Sun, 24 May 2015 14:25:00 +0000 (09:25 -0500)]
vfs: Remove incorrect debugging WARN in prepend_path

commit 93e3bce6287e1fb3e60d3324ed08555b5bbafa89 upstream.

The warning message in prepend_path is unclear and outdated.  It was
added as a warning that the mechanism for generating names of pseudo
files had been removed from prepend_path and d_dname should be used
instead.  Unfortunately the warning reads like a general warning,
making it unclear what to do with it.

Remove the warning.  The transition it was added to warn about is long
over, and I added code several years ago which in rare cases causes
the warning to fire on legitimate code, and the warning is now firing
and scaring people for no good reason.

Reported-by: Ivan Delalande <colona@arista.com>
Reported-by: Omar Sandoval <osandov@osandov.com>
Fixes: f48cfddc6729e ("vfs: In d_path don't call d_dname on a mount point")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agofs/ufs: restore s_lock mutex
Fabian Frederick [Wed, 10 Jun 2015 00:09:32 +0000 (10:09 +1000)]
fs/ufs: restore s_lock mutex

commit cdd9eefdf905e92e7fc6cc393314efe68dc6ff66 upstream.

Commit 0244756edc4b98c ("ufs: sb mutex merge + mutex_destroy") generated
deadlocks in read/write mode on mkdir.

This patch partially reverts it keeping fixes by Andrew Morton and
mutex_destroy()

[AV: fixed a missing bit in ufs_remount()]

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reported-by: Ian Campbell <ian.campbell@citrix.com>
Suggested-by: Jan Kara <jack@suse.cz>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Roger Pau Monne <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agofs/ufs: revert "ufs: fix deadlocks introduced by sb mutex merge"
Fabian Frederick [Wed, 10 Jun 2015 00:09:32 +0000 (10:09 +1000)]
fs/ufs: revert "ufs: fix deadlocks introduced by sb mutex merge"

commit 13b987ea275840d74d9df9a44326632fab1894da upstream.

This reverts commit 9ef7db7f38d0 ("ufs: fix deadlocks introduced by sb
mutex merge") That patch tried to solve commit 0244756edc4b98c ("ufs: sb
mutex merge + mutex_destroy") which is itself partially reverted due to
multiple deadlocks.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Suggested-by: Jan Kara <jack@suse.cz>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Roger Pau Monne <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agofs: Fix S_NOSEC handling
Jan Kara [Thu, 21 May 2015 14:05:52 +0000 (16:05 +0200)]
fs: Fix S_NOSEC handling

commit 2426f3910069ed47c0cc58559a6d088af7920201 upstream.

file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.

Fix the bug by checking actual mode bits before setting S_NOSEC.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoKVM: x86: make vapics_in_nmi_mode atomic
Radim Krčmář [Wed, 1 Jul 2015 13:31:49 +0000 (15:31 +0200)]
KVM: x86: make vapics_in_nmi_mode atomic

commit 42720138b06301cc8a7ee8a495a6d021c4b6a9bc upstream.

Writes were a bit racy, but hard to turn into a bug at the same time.
(Particularly because modern Linux doesn't use this feature anymore.)

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[Actually the next patch makes it much, much easier to trigger the race
 so I'm including this one for stable@ as well. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoKVM: x86: properly restore LVT0
Radim Krčmář [Tue, 30 Jun 2015 20:19:17 +0000 (22:19 +0200)]
KVM: x86: properly restore LVT0

commit db1385624c686fe99fe2d1b61a36e1537b915d08 upstream.

Legacy NMI watchdog didn't work after migration/resume, because
vapics_in_nmi_mode was left at 0.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoKVM: s390: virtio-ccw: don't overwrite config space values
Cornelia Huck [Mon, 29 Jun 2015 14:44:01 +0000 (16:44 +0200)]
KVM: s390: virtio-ccw: don't overwrite config space values

commit 431dae778aea4eed31bd12e5ee82edc571cd4d70 upstream.

Eric noticed problems with vhost-scsi and virtio-ccw: vhost-scsi
complained about overwriting values in the config space, which
was triggered by a broken implementation of virtio-ccw's config
get/set routines. It was probably sheer luck that we did not hit
this before.

When writing a value to the config space, the WRITE_CONF ccw will
always write from the beginning of the config space up to and
including the value to be set. If the config space up to the value
has not yet been retrieved from the device, however, we'll end up
overwriting values. Keep track of the known config space and update
if needed to avoid this.

Moreover, READ_CONF will only read the number of bytes it has been
instructed to retrieve, so we must not copy more than that to the
buffer, or we might overwrite trailing values.

Reported-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Tested-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agos390/kdump: fix REGSET_VX_LOW vector register ELF notes
Michael Holzheu [Thu, 11 Jun 2015 17:59:04 +0000 (19:59 +0200)]
s390/kdump: fix REGSET_VX_LOW vector register ELF notes

commit 3c8e5105e759e7b2d88ea8a85b1285e535bc7500 upstream.

The REGSET_VX_LOW ELF notes should contain the lower 64 bit halfes of the
first sixteen 128 bit vector registers. Unfortunately currently we copy
the upper halfes.

Fix this and correctly copy the lower halfes.

Fixes: a62bc0739253 ("s390/kdump: add support for vector extension")
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoKVM: s390: fix external call injection without sigp interpretation
David Hildenbrand [Thu, 30 Apr 2015 11:33:59 +0000 (13:33 +0200)]
KVM: s390: fix external call injection without sigp interpretation

commit b938eacea0b6881f2116a061e6da3ec840e75137 upstream.

Commit ea5f49692575 ("KVM: s390: only one external call may be pending
at a time") introduced a bug on machines that don't have SIGP
interpretation facility installed.
The injection of an external call will now always fail with -EBUSY
(if none is already pending).

This leads to the following symptoms:
- An external call will be injected but with the wrong "src cpu id",
  as this id will not be remembered.
- The target vcpu will not be woken up, therefore the guest will hang if
  it cannot deal with unexpected failures of the SIGP EXTERNAL CALL
  instruction.
- If an external call is already pending, -EBUSY will not be reported.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoMIPS: Fix KVM guest fixmap address
James Hogan [Mon, 27 Apr 2015 14:07:16 +0000 (15:07 +0100)]
MIPS: Fix KVM guest fixmap address

commit 8e748c8d09a9314eedb5c6367d9acfaacddcdc88 upstream.

KVM guest kernels for trap & emulate run in user mode, with a modified
set of kernel memory segments. However the fixmap address is still in
the normal KSeg3 region at 0xfffe0000 regardless, causing problems when
cache alias handling makes use of them when handling copy on write.

Therefore define FIXADDR_TOP as 0x7ffe0000 in the guest kernel mapped
region when CONFIG_KVM_GUEST is defined.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9887/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoKVM: mips: use id_to_memslot correctly
Paolo Bonzini [Mon, 18 May 2015 06:35:43 +0000 (08:35 +0200)]
KVM: mips: use id_to_memslot correctly

commit 69a1220060c1523fd0515216eaa29e22f133b894 upstream.

The argument to KVM_GET_DIRTY_LOG is a memslot id; it may not match the
position in the memslots array, which is sorted by gfn.

Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agox86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A
Bjorn Helgaas [Tue, 9 Jun 2015 23:54:07 +0000 (18:54 -0500)]
x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A

commit 1dace0116d0b05c967d94644fc4dfe96be2ecd3d upstream.

The Foxconn K8M890-8237A has two PCI host bridges, and we can't assign
resources correctly without the information from _CRS that tells us which
address ranges are claimed by which bridge.  In the bugs mentioned below,
we incorrectly assign a sound card address (this example is from 1033299):

  bus: 00 index 2 [mem 0x80000000-0xfcffffffff]
  ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f])
  pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xbfefffff] (ignored)
  pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xdfffffff] (ignored)
  pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfebfffff] (ignored)
  ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff])
  pci_root PNP0A08:01: host bridge window [mem 0xbff00000-0xbfffffff] (ignored)
  pci 0000:80:01.0: [1106:3288] type 0 class 0x000403
  pci 0000:80:01.0: reg 10: [mem 0xbfffc000-0xbfffffff 64bit]
  pci 0000:80:01.0: address space collision: [mem 0xbfffc000-0xbfffffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff]
  pci 0000:80:01.0: BAR 0: assigned [mem 0xfd00000000-0xfd00003fff 64bit]
  BUG: unable to handle kernel paging request at ffffc90000378000
  IP: [<ffffffffa0345f63>] azx_create+0x37c/0x822 [snd_hda_intel]

We assigned 0xfd_0000_0000, but that is not in any of the host bridge
windows, and the sound card doesn't work.

Turn on pci=use_crs automatically for this system.

Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/931368
Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1033299
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agox86/PCI: Use host bridge _CRS info on systems with >32 bit addressing
Bjorn Helgaas [Tue, 9 Jun 2015 22:31:38 +0000 (17:31 -0500)]
x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing

commit 3d9fecf6bfb8b12bc2f9a4c7109895a2a2bb9436 upstream.

We enable _CRS on all systems from 2008 and later.  On older systems, we
ignore _CRS and assume the whole physical address space (excluding RAM and
other devices) is available for PCI devices, but on systems that support
physical address spaces larger than 4GB, it's doubtful that the area above
4GB is really available for PCI.

After d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible"), we
try to use that space above 4GB *first*, so we're more likely to put a
device there.

On Juan's Toshiba Satellite Pro U200, BIOS left the graphics, sound, 1394,
and card reader devices unassigned (but only after Windows had been
booted).  Only the sound device had a 64-bit BAR, so it was the only device
placed above 4GB, and hence the only device that didn't work.

Keep _CRS enabled even on pre-2008 systems if they support physical address
space larger than 4GB.

Fixes: d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible")
Reported-and-tested-by: Juan Dayer <jdayer@outlook.com>
Reported-and-tested-by: Alan Horsfield <alan@hazelgarth.co.uk>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99221
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=907092
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agopowerpc/perf: Fix book3s kernel to userspace backtraces
Anton Blanchard [Tue, 26 May 2015 05:10:24 +0000 (15:10 +1000)]
powerpc/perf: Fix book3s kernel to userspace backtraces

commit 72e349f1124a114435e599479c9b8d14bfd1ebcd upstream.

When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.

If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().

Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:

0.11%  qemu-system-ppc  [kernel.kallsyms]        [k] _raw_spin_lock_irqsave
       |
       ---_raw_spin_lock_irqsave
          |
          |--52.35%-- 0
          |          |
          |          |--46.39%-- __hrtimer_start_range_ns
          |          |          kvmppc_run_core
          |          |          kvmppc_vcpu_run_hv
          |          |          kvmppc_vcpu_run
          |          |          kvm_arch_vcpu_ioctl_run
          |          |          kvm_vcpu_ioctl
          |          |          do_vfs_ioctl
          |          |          sys_ioctl
          |          |          system_call
          |          |          |
          |          |          |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
          |          |          |          |
          |          |          |           --100.00%-- 0x7e714
          |          |          |                     0x7e714

Notice the bogus _raw_spin_irqsave when we transition from kernel
(system_call) to userspace (0x7e714). We inserted what was in the SIAR.

Add a check in regs_use_siar() to check that the regs in question
are from a PMU exception. With this fix the backtrace makes sense:

     0.47%  qemu-system-ppc  [kernel.vmlinux]         [k] _raw_spin_lock_irqsave
            |
            ---_raw_spin_lock_irqsave
               |
               |--53.83%-- 0
               |          |
               |          |--44.73%-- hrtimer_try_to_cancel
               |          |          kvmppc_start_thread
               |          |          kvmppc_run_core
               |          |          kvmppc_vcpu_run_hv
               |          |          kvmppc_vcpu_run
               |          |          kvm_arch_vcpu_ioctl_run
               |          |          kvm_vcpu_ioctl
               |          |          do_vfs_ioctl
               |          |          sys_ioctl
               |          |          system_call
               |          |          __ioctl
               |          |          0x7e714
               |          |          0x7e714

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agotick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in...
preeti [Wed, 24 Jun 2015 06:48:01 +0000 (01:48 -0500)]
tick/idle/powerpc: Do not register idle states with CPUIDLE_FLAG_TIMER_STOP set in periodic mode

commit cc5a2f7b8f39e7db559778f7913a2410257b3e50 upstream.

On some archs, the local clockevent device stops in deep cpuidle states.
The broadcast framework is used to wakeup cpus in these idle states, in
which either an external clockevent device is used to send wakeup ipis
or the hrtimer broadcast framework kicks in in the absence of such a
device. One cpu is nominated as the broadcast cpu and this cpu sends
wakeup ipis to sleeping cpus at the appropriate time. This is the
implementation in the oneshot mode of broadcast.

In periodic mode of broadcast however, the presence of such cpuidle
states results in the cpuidle driver calling tick_broadcast_enable()
which shuts down the local clockevent devices of all the cpus and
appoints the tick broadcast device as the clockevent device for each of
them. This works on those archs where the tick broadcast device is a
real clockevent device.  But on archs which depend on the hrtimer mode
of broadcast, the tick broadcast device hapens to be a pseudo device.
The consequence is that the local clockevent devices of all cpus are
shutdown and the kernel hangs at boot time in periodic mode.

Let us thus not register the cpuidle states which have
CPUIDLE_FLAG_TIMER_STOP flag set, on archs which depend on the hrtimer
mode of broadcast in periodic mode. This patch takes care of doing this
on powerpc. The cpus would not have entered into such deep cpuidle
states in periodic mode on powerpc anyway. So there is no loss here.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoARM: mvebu: fix suspend to RAM on big-endian configurations
Thomas Petazzoni [Tue, 16 Jun 2015 12:12:57 +0000 (14:12 +0200)]
ARM: mvebu: fix suspend to RAM on big-endian configurations

commit 2f5bc307be2480ba89e4c5d118f406f04a4a7299 upstream.

The current Armada XP suspend to RAM implementation, as added in
commit 27432825ae19f ("ARM: mvebu: Armada XP GP specific
suspend/resume code") does not handle big-endian configurations
properly: the small bit of assembly code putting the DRAM in
self-refresh and toggling the GPIOs to turn off power forgets to
convert the values to little-endian.

This commit fixes that by making sure the two values we will write to
the DRAM controller register and GPIO register are already in
little-endian before entering the critical assembly code.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes: 27432825ae19f ("ARM: mvebu: Armada XP GP specific suspend/resume code")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoARM: tegra20: Store CPU "resettable" status in IRAM
Dmitry Osipenko [Thu, 15 Jan 2015 10:58:57 +0000 (13:58 +0300)]
ARM: tegra20: Store CPU "resettable" status in IRAM

commit 4d48edb3c3e1234d6b3fcdfb9ac24d7c6de449cb upstream.

Commit 7232398abc6a ("ARM: tegra: Convert PMC to a driver") changed tegra_resume()
location storing from late to early and, as a result, broke suspend on Tegra20.
PMC scratch register 41 is used by tegra LP1 resume code for retrieving stored
physical memory address of common resume function and in the same time used by
tegra20_cpu_shutdown() (shared by Tegra20 cpuidle driver and platform SMP code),
which is storing CPU1 "resettable" status. It implies strict order of scratch
register usage, otherwise resume function address is lost on Tegra20 after
disabling non-boot CPU's on suspend. Fix it by storing "resettable" status in
IRAM instead of PMC scratch register.

Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Fixes: 7232398abc6a (ARM: tegra: Convert PMC to a driver)
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoARM: kvm: psci: fix handling of unimplemented functions
Lorenzo Pieralisi [Wed, 10 Jun 2015 14:19:24 +0000 (15:19 +0100)]
ARM: kvm: psci: fix handling of unimplemented functions

commit e2d997366dc5b6c9d14035867f73957f93e7578c upstream.

According to the PSCI specification and the SMC/HVC calling
convention, PSCI function_ids that are not implemented must
return NOT_SUPPORTED as return value.

Current KVM implementation takes an unhandled PSCI function_id
as an error and injects an undefined instruction into the guest
if PSCI implementation is called with a function_id that is not
handled by the resident PSCI version (ie it is not implemented),
which is not the behaviour expected by a guest when calling a
PSCI function_id that is not implemented.

This patch fixes this issue by returning NOT_SUPPORTED whenever
the kvm PSCI call is executed for a function_id that is not
implemented by the PSCI kvm layer.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoarm: KVM: force execution of HCPTR access on VM exit
Marc Zyngier [Mon, 16 Mar 2015 10:59:43 +0000 (10:59 +0000)]
arm: KVM: force execution of HCPTR access on VM exit

commit 85e84ba31039595995dae80b277378213602891b upstream.

On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.

On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.

If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.

The same condition exists when trapping to enable VFP for the guest.

The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.

The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.

Reported-by: Vikram Sethi <vikrams@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoselinux: fix setting of security labels on NFS
J. Bruce Fields [Thu, 4 Jun 2015 19:57:25 +0000 (15:57 -0400)]
selinux: fix setting of security labels on NFS

commit 9fc2b4b436cff7d8403034676014f1be9d534942 upstream.

Before calling into the filesystem, vfs_setxattr calls
security_inode_setxattr, which ends up calling selinux_inode_setxattr in
our case.  That returns -EOPNOTSUPP whenever SBLABEL_MNT is not set.
SBLABEL_MNT was supposed to be set by sb_finish_set_opts, which sets it
only if selinux_is_sblabel_mnt returns true.

The selinux_is_sblabel_mnt logic was broken by eadcabc697e9 "SELinux: do
all flags twiddling in one place", which didn't take into the account
the SECURITY_FS_USE_NATIVE behavior that had been introduced for nfs
with eb9ae686507b "SELinux: Add new labeling type native labels".

This caused setxattr's of security labels over NFSv4.2 to fail.

Cc: Eric Paris <eparis@redhat.com>
Cc: David Quigley <dpquigl@davequigley.com>
Reported-by: Richard Chan <rc556677@outlook.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
[PM: added the stable dependency]
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agointel_pstate: set BYT MSR with wrmsrl_on_cpu()
Joe Konno [Tue, 12 May 2015 14:59:42 +0000 (07:59 -0700)]
intel_pstate: set BYT MSR with wrmsrl_on_cpu()

commit 0dd23f94251f49da99a6cbfb22418b2d757d77d6 upstream.

Commit 007bea098b86 (intel_pstate: Add setting voltage value for
baytrail P states.) introduced byt_set_pstate() with the assumption that
it would always be run by the CPU whose MSR is to be written by it.  It
turns out, however, that is not always the case in practice, so modify
byt_set_pstate() to enforce the MSR write done by it to always happen on
the right CPU.

Fixes: 007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.)
Signed-off-by: Joe Konno <joe.konno@intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agommc: sdhci: fix low memory corruption
Jiri Slaby [Fri, 12 Jun 2015 09:45:02 +0000 (11:45 +0200)]
mmc: sdhci: fix low memory corruption

commit 62a7f368ffbc13d9aedfdd7aeae711b177db69ac upstream.

When dma mapping (dma_map_sg) fails in sdhci_pre_dma_transfer, -EINVAL
is returned. There are 3 callers of sdhci_pre_dma_transfer:
* sdhci_pre_req and sdhci_adma_table_pre: handle negative return
* sdhci_prepare_data: handles 0 (error) and "else" (good) only

sdhci_prepare_data is therefore broken. When it receives -EINVAL from
sdhci_pre_dma_transfer, it assumes 1 sg mapping was mapped. Later,
this non-existent mapping with address 0 is kmap'ped and written to:
Corrupted low memory at ffff880000001000 (1000 phys) = 22b7d67df2f6d1cf
Corrupted low memory at ffff880000001008 (1008 phys) = 63848a5216b7dd95
Corrupted low memory at ffff880000001010 (1010 phys) = 330eb7ddef39e427
Corrupted low memory at ffff880000001018 (1018 phys) = 8017ac7295039bda
Corrupted low memory at ffff880000001020 (1020 phys) = 8ce039eac119074f
...

So teach sdhci_prepare_data to understand negative return values from
sdhci_pre_dma_transfer and disable DMA in that case, as well as for
zero.

It was introduced in 348487cb28e66b032bae1b38424d81bf5b444408 (mmc:
sdhci: use pipeline mmc requests to improve performance). The commit
seems to be suspicious also by assigning host->sg_count both in
sdhci_pre_dma_transfer and sdhci_adma_table_pre.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fixes: 348487cb28e6
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Haibo Chen <haibo.chen@freescale.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoiommu/amd: Handle large pages correctly in free_pagetable
Joerg Roedel [Thu, 18 Jun 2015 08:48:34 +0000 (10:48 +0200)]
iommu/amd: Handle large pages correctly in free_pagetable

commit 0b3fff54bc01e8e6064d222a33e6fa7adabd94cd upstream.

Make sure that we are skipping over large PTEs while walking
the page-table tree.

Fixes: 5c34c403b723 ("iommu/amd: Fix memory leak in free_pagetable")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoiommu/arm-smmu: Fix broken ATOS check
Will Deacon [Mon, 29 Jun 2015 16:47:42 +0000 (17:47 +0100)]
iommu/arm-smmu: Fix broken ATOS check

commit d38f0ff9ab35414644995bae187d015c31aae19c upstream.

Commit 83a60ed8f0b5 ("iommu/arm-smmu: fix ARM_SMMU_FEAT_TRANS_OPS
condition") accidentally negated the ID0_ATOSNS predicate in the ATOS
feature check, causing the driver to attempt ATOS requests on SMMUv2
hardware without the ATOS feature implemented.

This patch restores the predicate to the correct value.

Reported-by: Varun Sethi <varun.sethi@freescale.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoRevert "crypto: talitos - convert to use be16_add_cpu()"
Horia Geant? [Mon, 11 May 2015 17:04:49 +0000 (20:04 +0300)]
Revert "crypto: talitos - convert to use be16_add_cpu()"

commit 69d9cd8c592f1abce820dbce7181bbbf6812cfbd upstream.

This reverts commit 7291a932c6e27d9768e374e9d648086636daf61c.

The conversion to be16_add_cpu() is incorrect in case cryptlen is
negative due to premature (i.e. before addition / subtraction)
implicit conversion of cryptlen (int -> u16) leading to sign loss.

Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agocrypto: talitos - avoid memleak in talitos_alg_alloc()
Horia Geant? [Mon, 11 May 2015 17:03:24 +0000 (20:03 +0300)]
crypto: talitos - avoid memleak in talitos_alg_alloc()

commit 5fa7dadc898567ce14d6d6d427e7bd8ce6eb5d39 upstream.

Fixes: 1d11911a8c57 ("crypto: talitos - fix warning: 'alg' may be used uninitialized in this function")
Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agousb: gadget: f_fs: add extra check before unregister_gadget_item
Rui Miguel Silva [Wed, 20 May 2015 13:52:40 +0000 (14:52 +0100)]
usb: gadget: f_fs: add extra check before unregister_gadget_item

commit f14e9ad17f46051b02bffffac2036486097de19e upstream.

ffs_closed can race with configfs_rmdir which will call config_item_release, so
add an extra check to avoid calling the unregister_gadget_item with an null
gadget item.

Signed-off-by: Rui Miguel Silva <rui.silva@linaro.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agousb: gadget: f_fs: fix check in read operation
Rui Miguel Silva [Wed, 20 May 2015 13:53:33 +0000 (14:53 +0100)]
usb: gadget: f_fs: fix check in read operation

commit 342f39a6c8d34d638a87b7d5f2156adc4db2585c upstream.

when copying to iter the size can be different then the iov count,
the check for full iov is wrong and make any read on request which
is not the exactly size of iov to return -EFAULT.

So, just check the success of the copy.

Signed-off-by: Rui Miguel Silva <rui.silva@linaro.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet: mvneta: disable IP checksum with jumbo frames for Armada 370
Simon Guinot [Tue, 30 Jun 2015 14:20:22 +0000 (16:20 +0200)]
net: mvneta: disable IP checksum with jumbo frames for Armada 370

[ Upstream commit b65657fc240ae6c1d2a1e62db9a0e61ac9631d7a ]

The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't
support TCP/IP checksumming with frame sizes larger than 1600 bytes.

This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and
NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set
to a value greater than 1600 bytes.

Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoARM: mvebu: update Ethernet compatible string for Armada XP
Simon Guinot [Tue, 30 Jun 2015 14:20:21 +0000 (16:20 +0200)]
ARM: mvebu: update Ethernet compatible string for Armada XP

[ Upstream commit ea3b55fe83b5fcede82d183164b9d6831b26e33b ]

This patch updates the Ethernet DT nodes for Armada XP SoCs with the
compatible string "marvell,armada-xp-neta".

Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: 77916519cba3 ("arm: mvebu: Armada XP MV78230 has only three Ethernet interfaces")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet: mvneta: introduce compatible string "marvell, armada-xp-neta"
Simon Guinot [Tue, 30 Jun 2015 14:20:20 +0000 (16:20 +0200)]
net: mvneta: introduce compatible string "marvell, armada-xp-neta"

[ Upstream commit f522a975a8101895a85354b9c143f41b8248e71a ]

The mvneta driver supports the Ethernet IP found in the Armada 370, XP,
380 and 385 SoCs. Since at least one more hardware feature is available
for the Armada XP SoCs then a way to identify them is needed.

This patch introduces a new compatible string "marvell,armada-xp-neta".

Signed-off-by: Simon Guinot <simon.guinot@sequanux.org>
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit")
Cc: <stable@vger.kernel.org> # v3.8+
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoamd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation
Tom Lendacky [Mon, 29 Jun 2015 16:22:12 +0000 (11:22 -0500)]
amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation

[ Upstream commit 472cfe7127760d68b819cf35a26e5a1b44b30f4e ]

When allocating Rx related buffers, alloc_pages is called using an order
number that is decreased until successful. A system under stress can
experience failures during this allocation process resulting in a warning
being issued. This message can be of concern to end users even though the
failure is not fatal. Since the failure is not fatal and can occur
multiple times, the driver should include the __GFP_NOWARN flag to
suppress the warning message from being issued.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosctp: Fix race between OOTB responce and route removal
Alexander Sverdlin [Mon, 29 Jun 2015 08:41:03 +0000 (10:41 +0200)]
sctp: Fix race between OOTB responce and route removal

[ Upstream commit 29c4afc4e98f4dc0ea9df22c631841f9c220b944 ]

There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).

But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.

To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
     ip route del [interface-specific route]
     ip route add [interface-specific route]
   done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
   responce

On x86_64 the crash looks like this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffffa05ec9ac>]  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8  EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS:  0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
 ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
 ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
 0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
 <IRQ>
 [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
 [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
 [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
 [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
 [<ffffffff810e0329>] ? update_process_times+0x59/0x60
 [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
 [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
 [<ffffffff8101f599>] ? read_tsc+0x9/0x10
 [<ffffffff8116d4b5>] ? put_page+0x55/0x60
 [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
 [<ffffffff81462b68>] ? skb_free_head+0x58/0x80
 [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
 [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
 [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
 [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
 [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
 [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
 [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
 [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
 [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
 [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
 [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
 [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
 [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
 [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
 [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
 [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
 [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
 [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
 [<ffffffff8147896a>] net_rx_action+0x21a/0x360
 [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
 [<ffffffff8107912d>] irq_exit+0xad/0xb0
 [<ffffffff8157d158>] do_IRQ+0x58/0xf0
 [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
 <EOI>
 [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
 [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
 [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
 [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
 [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
 [<ffffffff8156b365>] rest_init+0x85/0x90
 [<ffffffff818eb035>] start_kernel+0x48b/0x4ac
 [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
 [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
 [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
 RSP <ffff880127c037b8>
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agobnx2x: fix lockdep splat
Eric Dumazet [Fri, 26 Jun 2015 05:32:29 +0000 (07:32 +0200)]
bnx2x: fix lockdep splat

[ Upstream commit d53c66a5b80698620f7c9ba2372fff4017e987b8 ]

Michel reported following lockdep splat

[   44.718117] INFO: trying to register non-static key.
[   44.723081] the code is fine but needs lockdep annotation.
[   44.728559] turning off the locking correctness validator.
[   44.734036] CPU: 8 PID: 5483 Comm: ethtool Not tainted 4.1.0
[   44.770289] Call Trace:
[   44.772741]  [<ffffffff816eb1cd>] dump_stack+0x4c/0x65
[   44.777879]  [<ffffffff8111d921>] ? console_unlock+0x1f1/0x510
[   44.783708]  [<ffffffff811121f5>] __lock_acquire+0x1d05/0x1f10
[   44.789538]  [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[   44.795276]  [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[   44.801967]  [<ffffffff8111390d>] ? trace_hardirqs_on+0xd/0x10
[   44.807793]  [<ffffffff811330fa>] ? hrtimer_try_to_cancel+0x4a/0x250
[   44.814142]  [<ffffffff81112ba6>] lock_acquire+0xb6/0x290
[   44.819537]  [<ffffffff810d6675>] ? flush_work+0x5/0x280
[   44.824844]  [<ffffffff810d66ad>] flush_work+0x3d/0x280
[   44.830061]  [<ffffffff810d6675>] ? flush_work+0x5/0x280
[   44.835366]  [<ffffffff816f3c43>] ? schedule_hrtimeout_range+0x13/0x20
[   44.841889]  [<ffffffff8112ec9b>] ? usleep_range+0x4b/0x50
[   44.847365]  [<ffffffff8111370a>] ? mark_held_locks+0x6a/0x90
[   44.853102]  [<ffffffff810d8585>] ? __cancel_work_timer+0x105/0x1c0
[   44.859359]  [<ffffffff81113835>] ? trace_hardirqs_on_caller+0x105/0x1d0
[   44.866045]  [<ffffffff810d851f>] __cancel_work_timer+0x9f/0x1c0
[   44.872048]  [<ffffffffa0010982>] ? bnx2x_func_stop+0x42/0x90 [bnx2x]
[   44.878481]  [<ffffffff810d8670>] cancel_work_sync+0x10/0x20
[   44.884134]  [<ffffffffa00259e5>] bnx2x_chip_cleanup+0x245/0x730 [bnx2x]
[   44.890829]  [<ffffffff8110ce02>] ? up+0x32/0x50
[   44.895439]  [<ffffffff811306b5>] ? del_timer_sync+0x5/0xd0
[   44.901005]  [<ffffffffa005596d>] bnx2x_nic_unload+0x20d/0x8e0 [bnx2x]
[   44.907527]  [<ffffffff811f1aef>] ? might_fault+0x5f/0xb0
[   44.912921]  [<ffffffffa005851c>] bnx2x_reload_if_running+0x2c/0x50 [bnx2x]
[   44.919879]  [<ffffffffa005a3c5>] bnx2x_set_ringparam+0x2b5/0x460 [bnx2x]
[   44.926664]  [<ffffffff815d498b>] dev_ethtool+0x55b/0x1c40
[   44.932148]  [<ffffffff815dfdc7>] ? rtnl_lock+0x17/0x20
[   44.937364]  [<ffffffff815e7f8b>] dev_ioctl+0x17b/0x630
[   44.942582]  [<ffffffff815abf8d>] sock_do_ioctl+0x5d/0x70
[   44.947972]  [<ffffffff815ac013>] sock_ioctl+0x73/0x280
[   44.953192]  [<ffffffff8124c1c8>] do_vfs_ioctl+0x88/0x5b0
[   44.958587]  [<ffffffff8110d0b3>] ? up_read+0x23/0x40
[   44.963631]  [<ffffffff812584cc>] ? __fget_light+0x6c/0xa0
[   44.969105]  [<ffffffff8124c781>] SyS_ioctl+0x91/0xb0
[   44.974149]  [<ffffffff816f4dd7>] system_call_fastpath+0x12/0x6f

As bnx2x_init_ptp() is only called if bp->flags contains PTP_SUPPORTED,
we also need to guard bnx2x_stop_ptp() with same condition, otherwise
ptp_task workqueue is not initialized and kernel barfs on
cancel_work_sync()

Fixes: eeed018cbfa30 ("bnx2x: Add timestamping and PTP hardware clock support")
Reported-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michal Kalderon <Michal.Kalderon@qlogic.com>
Cc: Ariel Elior <Ariel.Elior@qlogic.com>
Cc: Yuval Mintz <Yuval.Mintz@qlogic.com>
Cc: David Decotigny <decot@google.com>
Acked-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet: phy: fix phy link up when limiting speed via device tree
Mugunthan V N [Thu, 25 Jun 2015 16:51:02 +0000 (22:21 +0530)]
net: phy: fix phy link up when limiting speed via device tree

[ Upstream commit eb686231fce3770299760f24fdcf5ad041f44153 ]

When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agomlx4: Disable HA for SRIOV PF RoCE devices
Or Gerlitz [Thu, 25 Jun 2015 08:29:44 +0000 (11:29 +0300)]
mlx4: Disable HA for SRIOV PF RoCE devices

[ Upstream commit 7254acffeeec3c0a75b9c5364c29a6eb00014930 ]

When in HA mode, the driver exposes an IB (RoCE) device instance with only
one port. Under SRIOV, the existing implementation doesn't go well with
the PF RoCE driver's role of Special QPs Para-Virtualization, etc.

As such, disable HA for the mlx4 PF RoCE device in SRIOV mode.

Fixes: a57500903093 ('IB/mlx4: Add port aggregation support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet/mlx4_en: Fix wrong csum complete report when rxvlan offload is disabled
Ido Shamay [Thu, 25 Jun 2015 08:29:43 +0000 (11:29 +0300)]
net/mlx4_en: Fix wrong csum complete report when rxvlan offload is disabled

[ Upstream commit 79a258526ce1051cb9684018c25a89d51ac21be8 ]

The check_csum() function relied on hwtstamp_rx_filter to know if rxvlan
offload is disabled. This is wrong since rxvlan offload can be switched
on/off regardless of hwtstamp_rx_filter.

Also moved check_csum to query CQE information to identify VLAN packets
and removed the check of IP packets, since it has been validated before.

Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet/mlx4_en: Wake TX queues only when there's enough room
Ido Shamay [Thu, 25 Jun 2015 08:29:42 +0000 (11:29 +0300)]
net/mlx4_en: Wake TX queues only when there's enough room

[ Upstream commit 488a9b48e398b157703766e2cd91ea45ac6997c5 ]

Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.

Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet/mlx4_en: Release TX QP when destroying TX ring
Eran Ben Elisha [Thu, 25 Jun 2015 08:29:41 +0000 (11:29 +0300)]
net/mlx4_en: Release TX QP when destroying TX ring

[ Upstream commit 0eb08514fdbdcd16fd6870680cd638f203662e9d ]

TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.

Fixes: ddae0349fdb7 ('net/mlx4: Change QP allocation scheme')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoip: report the original address of ICMP messages
Julian Anastasov [Tue, 23 Jun 2015 05:34:39 +0000 (08:34 +0300)]
ip: report the original address of ICMP messages

[ Upstream commit 34b99df4e6256ddafb663c6de0711dceceddfe0e ]

ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.

Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agotcp: Do not call tcp_fastopen_reset_cipher from interrupt context
Christoph Paasch [Thu, 18 Jun 2015 16:15:34 +0000 (09:15 -0700)]
tcp: Do not call tcp_fastopen_reset_cipher from interrupt context

[ Upstream commit dfea2aa654243f70dc53b8648d0bbdeec55a7df1 ]

tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.

Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:

[   36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[   36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[   36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[   36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[   36.008250]  00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[   36.009630]  ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[   36.011076]  0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[   36.012494] Call Trace:
[   36.012953]  <IRQ>  [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
[   36.014085]  [<ffffffff810967d3>] ___might_sleep+0x103/0x170
[   36.015117]  [<ffffffff81096892>] __might_sleep+0x52/0x90
[   36.016117]  [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
[   36.017266]  [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
[   36.018485]  [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
[   36.019679]  [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
[   36.020884]  [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
[   36.022058]  [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
[   36.023118]  [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
[   36.024185]  [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
[   36.025327]  [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
[   36.026410]  [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
[   36.027556]  [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
[   36.028784]  [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
[   36.029832]  [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
[   36.030936]  [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
[   36.031875]  [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
[   36.032953]  [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
[   36.034065]  [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
[   36.035069]  [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
[   36.035963]  [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
[   36.036950]  [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
[   36.037847]  [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
[   36.038994]  [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
[   36.040033]  [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
[   36.041025]  [<ffffffff81611482>] net_rx_action+0x122/0x310
[   36.042007]  [<ffffffff81076743>] __do_softirq+0x103/0x2f0
[   36.042978]  [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30

This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoneigh: do not modify unlinked entries
Julian Anastasov [Tue, 16 Jun 2015 19:56:39 +0000 (22:56 +0300)]
neigh: do not modify unlinked entries

[ Upstream commit 2c51a97f76d20ebf1f50fef908b986cb051fdff9 ]

The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:

1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.

2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.

Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.

Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agopacket: avoid out of bounds read in round robin fanout
Willem de Bruijn [Wed, 17 Jun 2015 19:59:34 +0000 (15:59 -0400)]
packet: avoid out of bounds read in round robin fanout

[ Upstream commit 468479e6043c84f5a65299cc07cb08a22a28c2b1 ]

PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.

When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.

Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agopacket: read num_members once in packet_rcv_fanout()
Eric Dumazet [Tue, 16 Jun 2015 14:59:11 +0000 (07:59 -0700)]
packet: read num_members once in packet_rcv_fanout()

[ Upstream commit f98f4514d07871da7a113dd9e3e330743fd70ae4 ]

We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()

Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")

Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agobridge: fix br_stp_set_bridge_priority race conditions
Nikolay Aleksandrov [Mon, 15 Jun 2015 17:28:51 +0000 (20:28 +0300)]
bridge: fix br_stp_set_bridge_priority race conditions

[ Upstream commit 2dab80a8b486f02222a69daca6859519e05781d9 ]

After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosctp: fix ASCONF list handling
Marcelo Ricardo Leitner [Fri, 12 Jun 2015 13:16:41 +0000 (10:16 -0300)]
sctp: fix ASCONF list handling

[ Upstream commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 ]

->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.

This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().

Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.

Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <jiji@redhat.com>
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet: don't wait for order-3 page allocation
Shaohua Li [Thu, 11 Jun 2015 23:50:48 +0000 (16:50 -0700)]
net: don't wait for order-3 page allocation

[ Upstream commit fb05e7a89f500cfc06ae277bdc911b281928995d ]

We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet <edumazet@google.com>
Cc: Chris Mason <clm@fb.com>
Cc: Debabrata Banerjee <dbavatar@gmail.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonet: igb: fix the start time for periodic output signals
Richard Cochran [Thu, 11 Jun 2015 12:51:30 +0000 (14:51 +0200)]
net: igb: fix the start time for periodic output signals

[ Upstream commit 58c98be137830d34b79024cc5dc95ef54fcd7ffe ]

When programming the start of a periodic output, the code wrongly places
the seconds value into the "low" register and the nanoseconds into the
"high" register.  Even though this is backwards, it slipped through my
testing, because the re-arming code in the interrupt service routine is
correct, and the signal does appear starting with the second edge.

This patch fixes the issue by programming the registers correctly.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agobridge: fix multicast router rlist endless loop
Nikolay Aleksandrov [Tue, 9 Jun 2015 17:23:57 +0000 (10:23 -0700)]
bridge: fix multicast router rlist endless loop

[ Upstream commit 1a040eaca1a22f8da8285ceda6b5e4a2cb704867 ]

Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agosparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context
Sowmini Varadhan [Tue, 21 Apr 2015 14:30:41 +0000 (10:30 -0400)]
sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context

Upstream commit 671d773297969bebb1732e1cdc1ec03aa53c6be2

Since it is possible for vnet_event_napi to end up doing
vnet_control_pkt_engine -> ... -> vnet_send_attr ->
vnet_port_alloc_tx_ring -> ldc_alloc_exp_dring -> kzalloc()
(i.e., in softirq context), kzalloc() should be called with
GFP_ATOMIC from ldc_alloc_exp_dring.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoKVM: nSVM: Check for NRIPS support before updating control field
Bandan Das [Thu, 11 Jun 2015 06:05:33 +0000 (02:05 -0400)]
KVM: nSVM: Check for NRIPS support before updating control field

commit f104765b4f81fd74d69e0eb161e89096deade2db upstream.

If hardware doesn't support DecodeAssist - a feature that provides
more information about the intercept in the VMCB, KVM decodes the
instruction and then updates the next_rip vmcb control field.
However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
Since skip_emulated_instruction() doesn't verify nrip support
before accepting control.next_rip as valid, avoid writing this
field if support isn't present.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoARM: clk-imx6q: refine sata's parent
Sebastien Szymanski [Wed, 20 May 2015 14:30:37 +0000 (16:30 +0200)]
ARM: clk-imx6q: refine sata's parent

commit da946aeaeadcd24ff0cda9984c6fb8ed2bfd462a upstream.

According to IMX6D/Q RM, table 18-3, sata clock's parent is ahb, not ipg.

Signed-off-by: Sebastien Szymanski <sebastien.szymanski@armadeus.com>
Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
[dirk.behme: Adjust moved file]
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agonetfilter: nft_rbtree: fix locking
Patrick McHardy [Sat, 21 Mar 2015 15:19:14 +0000 (15:19 +0000)]
netfilter: nft_rbtree: fix locking

commit 16c45eda96038aae848b6cfd42e2bf4b5e80f365 upstream.

Fix a race condition and unnecessary locking:

* the root rb_node must only be accessed under the lock in nft_rbtree_lookup()
* the lock is not needed in lookup functions in netlink context

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 years agoconfig: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected
Konrad Rzeszutek Wilk [Fri, 17 Apr 2015 19:04:48 +0000 (15:04 -0400)]
config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected

commit a6dfa128ce5c414ab46b1d690f7a1b8decb8526d upstream.

A huge amount of NIC drivers use the DMA API, however if
compiled under 32-bit an very important part of the DMA API can
be ommitted leading to the drivers not working at all
(especially if used with 'swiotlb=force iommu=soft').

As Prashant Sreedharan explains it: "the driver [tg3] uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of
the dma "mapping" and dma_unmap_addr() to get the "mapping"
value. On most of the platforms this is a no-op, but ... with
"iommu=soft and swiotlb=force" this house keeping is required,
... otherwise we pass 0 while calling pci_unmap_/pci_dma_sync_
instead of the DMA address."

As such enable this even when using 32-bit kernels.

Reported-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: cascardo@linux.vnet.ibm.com
Cc: david.vrabel@citrix.com
Cc: sanjeevb@broadcom.com
Cc: siva.kallam@broadcom.com
Cc: vyasevich@gmail.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/20150417190448.GA9462@l.oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoBtrfs: fix list transaction->pending_ordered corruption zygo-4.0.7-zb64
Filipe Manana [Fri, 3 Jul 2015 19:30:34 +0000 (20:30 +0100)]
Btrfs: fix list transaction->pending_ordered corruption

When we call btrfs_commit_transaction(), we splice the list "ordered"
of our transaction handle into the transaction's "pending_ordered"
list, but we don't re-initialize the "ordered" list of our transaction
handle, this means it still points to the same elements it used to
before the splice. Then we check if the current transaction's state is
>= TRANS_STATE_COMMIT_START and if it is we end up calling
btrfs_end_transaction() which simply splices again the "ordered" list
of our handle into the transaction's "pending_ordered" list, leaving
multiple pointers to the same ordered extents which results in list
corruption when we are iterating, removing and freeing ordered extents
at btrfs_wait_pending_ordered(), resulting in access to dangling
pointers / use-after-free issues.
Similarly, btrfs_end_transaction() can end up in some cases calling
btrfs_commit_transaction(), and both did a list splice of the transaction
handle's "ordered" list into the transaction's "pending_ordered" without
re-initializing the handle's "ordered" list, resulting in exactly the
same problem.

This produces the following warning on a kernel with linked list
debugging enabled:

[109749.265416] ------------[ cut here ]------------
[109749.266410] WARNING: CPU: 7 PID: 324 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
[109749.267969] list_del corruption. prev->next should be ffff8800ba087e20, but was fffffff8c1f7c35d
(...)
[109749.287505] Call Trace:
[109749.288135]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[109749.298080]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[109749.331605]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[109749.334849]  [<ffffffff81260642>] ? __list_del_entry+0x5a/0x98
[109749.337093]  [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
[109749.337847]  [<ffffffff81260642>] __list_del_entry+0x5a/0x98
[109749.338678]  [<ffffffffa053e8bf>] btrfs_wait_pending_ordered+0x46/0xdb [btrfs]
[109749.340145]  [<ffffffffa058a65f>] ? __btrfs_run_delayed_items+0x149/0x163 [btrfs]
[109749.348313]  [<ffffffffa054077d>] btrfs_commit_transaction+0x36b/0xa10 [btrfs]
[109749.349745]  [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
[109749.350819]  [<ffffffffa055370d>] btrfs_sync_file+0x36f/0x3fc [btrfs]
[109749.351976]  [<ffffffff8118ec98>] vfs_fsync_range+0x8f/0x9e
[109749.360341]  [<ffffffff8118ecc3>] vfs_fsync+0x1c/0x1e
[109749.368828]  [<ffffffff8118ee1d>] do_fsync+0x34/0x4e
[109749.369790]  [<ffffffff8118f045>] SyS_fsync+0x10/0x14
[109749.370925]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
[109749.382274] ---[ end trace 48e0d07f7c03d95a ]---

On a non-debug kernel this leads to invalid memory accesses, causing a
crash. Fix this by using list_splice_init() instead of list_splice() in
btrfs_commit_transaction() and btrfs_end_transaction().

Cc: stable@vger.kernel.org
Fixes: 50d9aa99bd35 ("Btrfs: make sure logged extents complete in the current transaction V3"
Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit eab82377945d1d039982f568cc6583bb71ceb3f3)

10 years agoBtrfs: use kmem_cache_free when freeing entry in inode cache
Filipe Manana [Fri, 12 Jun 2015 08:35:35 +0000 (09:35 +0100)]
Btrfs: use kmem_cache_free when freeing entry in inode cache

The free space entries are allocated using kmem_cache_zalloc(),
through __btrfs_add_free_space(), therefore we should use
kmem_cache_free() and not kfree() to avoid any confusion and
any potential problem. Looking at the kfree() definition at
mm/slab.c it has the following comment:

  /*
   * (...)
   *
   * Don't free memory not originally allocated by kmalloc()
   * or you will run into trouble.
   */

So better be safe and use kmem_cache_free().

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
(cherry picked from commit af6bf76d1de143a38c919572462899e9f1fc477f)

10 years agoBtrfs: fix crash on close_ctree() if cleaner starts new transaction
Filipe Manana [Fri, 12 Jun 2015 14:18:18 +0000 (15:18 +0100)]
Btrfs: fix crash on close_ctree() if cleaner starts new transaction

Often when running fstests btrfs/079 I was running into the following
trace during umount on one of my qemu/kvm test vms:

[ 8245.682441] WARNING: CPU: 8 PID: 25064 at fs/btrfs/extent-tree.c:138 btrfs_put_block_group+0x51/0x69 [btrfs]()
[ 8245.685039] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs]
[ 8245.693860] CPU: 8 PID: 25064 Comm: umount Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
[ 8245.695081] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[ 8245.697583]  0000000000000009 ffff88020d047ce8 ffffffff8145eec7 ffffffff81095dce
[ 8245.699234]  0000000000000000 ffff88020d047d28 ffffffff8104b399 0000000000000028
[ 8245.700995]  ffffffffa04db07b ffff8801c6036c00 ffff8801c6036d68 ffff880202eb40b0
[ 8245.702510] Call Trace:
[ 8245.703006]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
[ 8245.705393]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
[ 8245.706569]  [<ffffffff8104b399>] warn_slowpath_common+0xa1/0xbb
[ 8245.707747]  [<ffffffffa04db07b>] ? btrfs_put_block_group+0x51/0x69 [btrfs]
[ 8245.709101]  [<ffffffff8104b456>] warn_slowpath_null+0x1a/0x1c
[ 8245.710274]  [<ffffffffa04db07b>] btrfs_put_block_group+0x51/0x69 [btrfs]
[ 8245.711823]  [<ffffffffa04e3473>] btrfs_free_block_groups+0x145/0x322 [btrfs]
[ 8245.713251]  [<ffffffffa04ef31a>] close_ctree+0x1ef/0x325 [btrfs]
[ 8245.714448]  [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb
[ 8245.715539]  [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs]
[ 8245.716835]  [<ffffffff81167607>] generic_shutdown_super+0x73/0xef
[ 8245.718015]  [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e
[ 8245.719101]  [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs]
[ 8245.720316]  [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68
[ 8245.721517]  [<ffffffff81167dd6>] deactivate_super+0x3f/0x43
[ 8245.722581]  [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78
[ 8245.723538]  [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14
[ 8245.724572]  [<ffffffff81065371>] task_work_run+0x8f/0xbc
[ 8245.725598]  [<ffffffff810028fb>] do_notify_resume+0x45/0x53
[ 8245.726892]  [<ffffffff814651ac>] int_signal+0x12/0x17
[ 8245.737887] ---[ end trace a01d038397e99b92 ]---
[ 8245.769363] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 8245.770737] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs]
[ 8245.772641] CPU: 2 PID: 25064 Comm: umount Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
[ 8245.772641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[ 8245.772641] task: ffff880013005810 ti: ffff88020d044000 task.ti: ffff88020d044000
[ 8245.772641] RIP: 0010:[<ffffffffa051c8e6>]  [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs]
[ 8245.772641] RSP: 0018:ffff88020d0478b8  EFLAGS: 00010202
[ 8245.772641] RAX: 0000000000000004 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffffa0581488
[ 8245.772641] RDX: 0000000000000000 RSI: ffff880194b7bf48 RDI: ffff880144b6a7a0
[ 8245.772641] RBP: ffff88020d0478d8 R08: 0000000000000000 R09: 000000000000ffff
[ 8245.772641] R10: 0000000000000004 R11: 0000000000000005 R12: ffff880194b7bf48
[ 8245.772641] R13: ffff880194b7bf48 R14: 0000000000000410 R15: 0000000000000000
[ 8245.772641] FS:  00007f991e77d840(0000) GS:ffff88023e280000(0000) knlGS:0000000000000000
[ 8245.772641] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 8245.772641] CR2: 00007fbbd325ee68 CR3: 000000021de8e000 CR4: 00000000000006e0
[ 8245.772641] Stack:
[ 8245.772641]  ffff880194b7bf00 ffff880202eb4000 ffff880194b7bf48 0000000000000410
[ 8245.772641]  ffff88020d047958 ffffffffa04ec6d5 ffff8801629b2ee8 0000000082987570
[ 8245.772641]  0000000000a5813f 0000000000000001 ffff880013006100 0000000000000002
[ 8245.772641] Call Trace:
[ 8245.772641]  [<ffffffffa04ec6d5>] btrfs_wq_submit_bio+0xe1/0x17b [btrfs]
[ 8245.772641]  [<ffffffff81086bff>] ? check_irq_usage+0x76/0x87
[ 8245.772641]  [<ffffffffa04ec825>] btree_submit_bio_hook+0xb6/0xd9 [btrfs]
[ 8245.772641]  [<ffffffffa04ebb7c>] ? btree_csum_one_bio+0xad/0xad [btrfs]
[ 8245.772641]  [<ffffffffa04eb1a6>] ? btree_io_failed_hook+0x5e/0x5e [btrfs]
[ 8245.772641]  [<ffffffffa050a6e7>] submit_one_bio+0x8c/0xc7 [btrfs]
[ 8245.772641]  [<ffffffffa050d75b>] submit_extent_page.isra.18+0x9d/0x186 [btrfs]
[ 8245.772641]  [<ffffffffa050d95b>] write_one_eb+0x117/0x1ae [btrfs]
[ 8245.772641]  [<ffffffffa050a79b>] ? end_extent_buffer_writeback+0x21/0x21 [btrfs]
[ 8245.772641]  [<ffffffffa0510510>] btree_write_cache_pages+0x2ab/0x385 [btrfs]
[ 8245.772641]  [<ffffffffa04eb2b8>] btree_writepages+0x23/0x5c [btrfs]
[ 8245.772641]  [<ffffffff8111c661>] do_writepages+0x23/0x2c
[ 8245.772641]  [<ffffffff81189cd4>] __writeback_single_inode+0xda/0x5bd
[ 8245.772641]  [<ffffffff8118aa60>] ? writeback_single_inode+0x2b/0x173
[ 8245.772641]  [<ffffffff8118aafd>] writeback_single_inode+0xc8/0x173
[ 8245.772641]  [<ffffffff8118ac95>] write_inode_now+0x8a/0x95
[ 8245.772641]  [<ffffffff81247bf0>] ? _atomic_dec_and_lock+0x30/0x4e
[ 8245.772641]  [<ffffffff8117cc5e>] iput+0x17d/0x26a
[ 8245.772641]  [<ffffffffa04ef355>] close_ctree+0x22a/0x325 [btrfs]
[ 8245.772641]  [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb
[ 8245.772641]  [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs]
[ 8245.772641]  [<ffffffff81167607>] generic_shutdown_super+0x73/0xef
[ 8245.772641]  [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e
[ 8245.772641]  [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs]
[ 8245.772641]  [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68
[ 8245.772641]  [<ffffffff81167dd6>] deactivate_super+0x3f/0x43
[ 8245.772641]  [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78
[ 8245.772641]  [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14
[ 8245.772641]  [<ffffffff81065371>] task_work_run+0x8f/0xbc
[ 8245.772641]  [<ffffffff810028fb>] do_notify_resume+0x45/0x53
[ 8245.772641]  [<ffffffff814651ac>] int_signal+0x12/0x17
[ 8245.772641] Code: 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 f4 48 8b 46 70 a8 04 74 09 48 8b 5f 08 48 85 db 75 03 48 8b 1f 49 89 5c 24 68 <83> 7b 5c ff 74 04 f0 ff 43 50 49 83 7c 24 08 00 74 2c 4c 8d 6b
[ 8245.772641] RIP  [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs]
[ 8245.772641]  RSP <ffff88020d0478b8>
[ 8245.845040] ---[ end trace a01d038397e99b93 ]---

For logical reasons such as the phase of the moon, this happened more
often with "-o inode_cache" than without any mount options.

After some debugging it turned out to be simple to understand what was
happening:

1) close_ctree() is called;

2) It then stops the transaction kthread, which commits the current
   transaction;

3) It asks the cleaner kthread to stop, which is currently running
   btrfs_delete_unused_bgs();

4) btrfs_delete_unused_bgs() finds an unused block group, starts a new
   transaction, deletes the block group, which implies COWing some
   tree nodes and leafs and dirtying their respective pages, and then
   finally it ends the transaction it started, without committing it;

5) The cleaner kthread stops;

6) close_ctree() releases (from memory) the block group objects, which
   produces the warning in the trace pasted above;

7) Then it invalidates all pages of the btree inode, by calling
   invalidate_inode_pages2(), which waits for any pages under writeback,
   and releases any non-dirty pages;

8) All work queues are destroyed (waiting first for their current tasks
   to finish execution);

9) A final iput() is called against the btree inode;

10) This iput triggers a writeback of the btree inode because it still
    has dirty pages;

11) This starts the whole chain of callbacks for the btree inode until
    it eventually reaches btrfs_wq_submit_bio() where it leads to a
    NULL pointer dereference because the work queues were already
    destroyed.

Fix this by making the cleaner commit any transaction that it started
after the transaction kthread was stopped.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit 8f432c88587ccefac61066a1abfd01fb38f7717e)

10 years agoBtrfs: fix fsync data loss after append write
Filipe Manana [Wed, 17 Jun 2015 09:16:23 +0000 (10:16 +0100)]
Btrfs: fix fsync data loss after append write

If we do an append write to a file (which increases its inode's i_size)
that does not have the flag BTRFS_INODE_NEEDS_FULL_SYNC set in its inode,
and the previous transaction added a new hard link to the file, which sets
the flag BTRFS_INODE_COPY_EVERYTHING in the file's inode, and then fsync
the file, the inode's new i_size isn't logged. This has the consequence
that after the fsync log is replayed, the file size remains what it was
before the append write operation, which means users/applications will
not be able to read the data that was successsfully fsync'ed before.

This happens because neither the inode item nor the delayed inode get
their i_size updated when the append write is made - doing so would
require starting a transaction in the buffered write path, something that
we do not do intentionally for performance reasons.

Fix this by making sure that when the flag BTRFS_INODE_COPY_EVERYTHING is
set the inode is logged with its current i_size (log the in-memory inode
into the log tree).

This issue is not a recent regression and is easy to reproduce with the
following test case for fstests:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"

  here=`pwd`
  tmp=/tmp/$$
  status=1 # failure is the default!

  _cleanup()
  {
          _cleanup_flakey
          rm -f $tmp.*
  }
  trap "_cleanup; exit \$status" 0 1 2 3 15

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs generic
  _supported_os Linux
  _need_to_be_root
  _require_scratch
  _require_dm_flakey
  _require_metadata_journaling $SCRATCH_DEV

  _crash_and_mount()
  {
          # Simulate a crash/power loss.
          _load_flakey_table $FLAKEY_DROP_WRITES
          _unmount_flakey
          # Allow writes again and mount. This makes the fs replay its fsync log.
          _load_flakey_table $FLAKEY_ALLOW_WRITES
          _mount_flakey
  }

  rm -f $seqres.full

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create the test file with some initial data and then fsync it.
  # The fsync here is only needed to trigger the issue in btrfs, as it causes the
  # the flag BTRFS_INODE_NEEDS_FULL_SYNC to be removed from the btrfs inode.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 32k" \
                  -c "fsync" \
                  $SCRATCH_MNT/foo | _filter_xfs_io
  sync

  # Add a hard link to our file.
  # On btrfs this sets the flag BTRFS_INODE_COPY_EVERYTHING on the btrfs inode,
  # which is a necessary condition to trigger the issue.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/bar

  # Sync the filesystem to force a commit of the current btrfs transaction, this
  # is a necessary condition to trigger the bug on btrfs.
  sync

  # Now append more data to our file, increasing its size, and fsync the file.
  # In btrfs because the inode flag BTRFS_INODE_COPY_EVERYTHING was set and the
  # write path did not update the inode item in the btree nor the delayed inode
  # item (in memory struture) in the current transaction (created by the fsync
  # handler), the fsync did not record the inode's new i_size in the fsync
  # log/journal. This made the data unavailable after the fsync log/journal is
  # replayed.
  $XFS_IO_PROG -c "pwrite -S 0xbb 32K 32K" \
               -c "fsync" \
               $SCRATCH_MNT/foo | _filter_xfs_io

  echo "File content after fsync and before crash:"
  od -t x1 $SCRATCH_MNT/foo

  _crash_and_mount

  echo "File content after crash and log replay:"
  od -t x1 $SCRATCH_MNT/foo

  status=0
  exit

The expected file output before and after the crash/power failure expects the
appended data to be available, which is:

  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0100000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0200000

Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
(cherry picked from commit 94e2d24a55e76e9e1c80ae0767a8a2fcf4cc8c80)

10 years agoBtrfs: fix hang when failing to submit bio of directIO
Liu Bo [Wed, 17 Jun 2015 08:59:57 +0000 (16:59 +0800)]
Btrfs: fix hang when failing to submit bio of directIO

The hang is uncoverd by generic/019.

btrfs_endio_direct_write() skips the "finish_ordered_fn" part when it hits
an error, thus those added ordered extents will never get processed, which
block processes that waiting for them via btrfs_start_ordered_extent().

This fixes the above, and meanwhile finish_ordered_fn will do the space
accounting work.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit 70466207b172c10c72f22bf9e233b898611f497c)

10 years agoBtrfs: fix warning of bytes_may_use
Liu Bo [Wed, 17 Jun 2015 08:59:58 +0000 (16:59 +0800)]
Btrfs: fix warning of bytes_may_use

While running generic/019, dmesg got several warnings from
btrfs_free_reserved_data_space().

Test generic/019 produces some disk failures so sumbit dio will get errors,
in which case, btrfs_direct_IO() goes to the error handling and free
bytes_may_use, but the problem is that bytes_may_use has been free'd
during get_block().

This adds a runtime flag to show if we've gone through get_block(), if so,
don't do the cleanup work.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit f1cfda4899915f7a09556de874e1a54bdaf1687b)

10 years agoBtrfs: fix shrinking truncate when the no_holes feature is enabled
Filipe Manana [Sat, 20 Jun 2015 17:20:09 +0000 (18:20 +0100)]
Btrfs: fix shrinking truncate when the no_holes feature is enabled

If the no_holes feature is enabled, we attempt to shrink a file to a size
that ends up in the middle of a hole and we don't have any file extent
items in the fs/subvol tree that go beyond the new file size (or any
ordered extents that will insert such file extent items), we end up not
updating the inode's disk_i_size, we only update the inode's i_size.

This means that after unmounting and mounting the filesystem, or after
the inode is evicted and reloaded, its i_size ends up being incorrect
(an inode's i_size is set to the disk_i_size field when an inode is
loaded). This happens when btrfs_truncate_inode_items() doesn't find
any file extent items to drop - in this case it never makes a call to
btrfs_ordered_update_i_size() in order to update the inode's disk_i_size.

Example reproducer:

  $ mkfs.btrfs -O no-holes -f /dev/sdd
  $ mount /dev/sdd /mnt

  # Create our test file with some data and durably persist it.
  $ xfs_io -f -c "pwrite -S 0xaa 0 128K" /mnt/foo
  $ sync

  # Append some data to the file, increasing its size, and leave a hole
  # between the old size and the start offset if the following write. So
  # our file gets a hole in the range [128Kb, 256Kb[.
  $ xfs_io -c "truncate 160K" /mnt/foo

  # We expect to see our file with a size of 160Kb, with the first 128Kb
  # of data all having the value 0xaa and the remaining 32Kb of data all
  # having the value 0x00.
  $ od -t x1 /mnt/foo
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0400000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  *
  0500000

  # Now cleanly unmount and mount again the filesystem.
  $ umount /mnt
  $ mount /dev/sdd /mnt

  # We expect to get the same result as before, a file with a size of
  # 160Kb, with the first 128Kb of data all having the value 0xaa and the
  # remaining 32Kb of data all having the value 0x00.
  $ od -t x1 /mnt/foo
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0400000

In the example above the file size/data do not match what they were before
the remount.

Fix this by always calling btrfs_ordered_update_i_size() with a size
matching the size the file was truncated to if btrfs_truncate_inode_items()
is not called for a log tree and no file extent items were dropped. This
ensures the same behaviour as when the no_holes feature is not enabled.

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit 7b48795ee4920cd2283fb6f3effd61ed8064ebab)

10 years agoBtrfs: fix wrong check for btrfs_force_chunk_alloc()
Shilong Wang [Sun, 12 Apr 2015 06:35:20 +0000 (14:35 +0800)]
Btrfs: fix wrong check for btrfs_force_chunk_alloc()

btrfs_force_chunk_alloc() return 1 for allocation chunk successfully.
This problem exists since commit c87f08ca4.

With this patch, we might fix some enospc problems for balances.

Signed-off-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Tested-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit 9ac2b7cb4755cb3311bb7d1ccf0eb51d0e006fba)

10 years agoBtrfs: fix memory corruption on failure to submit bio for direct IO
Filipe Manana [Mon, 29 Jun 2015 13:32:22 +0000 (14:32 +0100)]
Btrfs: fix memory corruption on failure to submit bio for direct IO

If we fail to submit a bio for a direct IO request, we were grabbing the
corresponding ordered extent and decrementing its reference count twice,
once for our lookup reference and once for the ordered tree reference.
This was a problem because it caused the ordered extent to be freed
without removing it from the ordered tree and any lists it might be
attached to, leaving dangling pointers to the ordered extent around.
Example trace with CONFIG_DEBUG_PAGEALLOC=y:

[161779.858707] BUG: unable to handle kernel paging request at 0000000087654330
[161779.859983] IP: [<ffffffff8124ca68>] rb_prev+0x22/0x3b
[161779.860636] PGD 34d818067 PUD 0
[161779.860636] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
(...)
[161779.860636] Call Trace:
[161779.860636]  [<ffffffffa06b36a6>] __tree_search+0xd9/0xf9 [btrfs]
[161779.860636]  [<ffffffffa06b3708>] tree_search+0x42/0x63 [btrfs]
[161779.860636]  [<ffffffffa06b4868>] ? btrfs_lookup_ordered_range+0x2d/0xa5 [btrfs]
[161779.860636]  [<ffffffffa06b4873>] btrfs_lookup_ordered_range+0x38/0xa5 [btrfs]
[161779.860636]  [<ffffffffa06aab8e>] btrfs_get_blocks_direct+0x11b/0x615 [btrfs]
[161779.860636]  [<ffffffff8119727f>] do_blockdev_direct_IO+0x5ff/0xb43
[161779.860636]  [<ffffffffa06aaa73>] ? btrfs_page_exists_in_range+0x1ad/0x1ad [btrfs]
[161779.860636]  [<ffffffffa06a2c9a>] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [<ffffffff811977f5>] __blockdev_direct_IO+0x32/0x34
[161779.860636]  [<ffffffffa06a2c9a>] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [<ffffffffa06a10ae>] btrfs_direct_IO+0x198/0x21f [btrfs]
[161779.860636]  [<ffffffffa06a2c9a>] ? btrfs_get_extent_fiemap+0x1bc/0x1bc [btrfs]
[161779.860636]  [<ffffffff81112ca1>] generic_file_direct_write+0xb3/0x128
[161779.860636]  [<ffffffffa06affaa>] ? btrfs_file_write_iter+0x15f/0x3e0 [btrfs]
[161779.860636]  [<ffffffffa06b004c>] btrfs_file_write_iter+0x201/0x3e0 [btrfs]
(...)

We were also not freeing the btrfs_dio_private we allocated previously,
which kmemleak reported with the following trace in its sysfs file:

unreferenced object 0xffff8803f553bf80 (size 96):
  comm "xfs_io", pid 4501, jiffies 4295039588 (age 173.936s)
  hex dump (first 32 bytes):
    88 6c 9b f5 02 88 ff ff 00 00 00 00 00 00 00 00  .l..............
    00 00 00 00 00 00 00 00 00 00 c4 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81161ffe>] create_object+0x172/0x29a
    [<ffffffff8145870f>] kmemleak_alloc+0x25/0x41
    [<ffffffff81154e64>] kmemleak_alloc_recursive.constprop.40+0x16/0x18
    [<ffffffff811579ed>] kmem_cache_alloc_trace+0xfb/0x148
    [<ffffffffa03d8cff>] btrfs_submit_direct+0x65/0x16a [btrfs]
    [<ffffffff811968dc>] dio_bio_submit+0x62/0x8f
    [<ffffffff811975fe>] do_blockdev_direct_IO+0x97e/0xb43
    [<ffffffff811977f5>] __blockdev_direct_IO+0x32/0x34
    [<ffffffffa03d70ae>] btrfs_direct_IO+0x198/0x21f [btrfs]
    [<ffffffff81112ca1>] generic_file_direct_write+0xb3/0x128
    [<ffffffffa03e604d>] btrfs_file_write_iter+0x201/0x3e0 [btrfs]
    [<ffffffff8116586a>] __vfs_write+0x7c/0xa5
    [<ffffffff81165da9>] vfs_write+0xa0/0xe4
    [<ffffffff81166675>] SyS_pwrite64+0x64/0x82
    [<ffffffff81464fd7>] system_call_fastpath+0x12/0x6f
    [<ffffffffffffffff>] 0xffffffffffffffff

For read requests we weren't doing any cleanup either (none of the work
done by btrfs_endio_direct_read()), so a failure submitting a bio for a
read request would leave a range in the inode's io_tree locked forever,
blocking any future operations (both reads and writes) against that range.

So fix this by making sure we do the same cleanup that we do for the case
where the bio submission succeeds.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit 09093a81aa27d9cb023642c5efa1188861fe2053)

10 years agoBtrfs: fix list transaction->pending_ordered corruption
Filipe Manana [Fri, 3 Jul 2015 19:30:34 +0000 (20:30 +0100)]
Btrfs: fix list transaction->pending_ordered corruption

When we call btrfs_commit_transaction(), we splice the list "ordered"
of our transaction handle into the transaction's "pending_ordered"
list, but we don't reinitialize the "ordered" list of our transaction
handle, this means it still points to the same elements it used to
before the splice. Then we check if the current transaction's state
is >= TRANS_STATE_COMMIT_START and if it is we end up calling
btrfs_end_transaction() which simply splices again the "ordered" list
of our handle into the transaction's "pending_ordered" list, leaving
multiple pointers to the same ordered extents which results in list
corruption when we are iterating, removing and freeing ordered extents
at btrfs_wait_pending_ordered(), resulting in access to dangling
pointers / use-after-free issues.

This produces the following warning on a kernel with linked list
debugging enabled:

[109749.265416] ------------[ cut here ]------------
[109749.266410] WARNING: CPU: 7 PID: 324 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
[109749.267969] list_del corruption. prev->next should be ffff8800ba087e20, but was fffffff8c1f7c35d
[109749.269760] Modules linked in: btrfs crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse acpi_cpufreq psmouse parport_pc proc$
[109749.277766] CPU: 7 PID: 324 Comm: fsstress Not tainted 4.1.0-rc6-btrfs-next-11+ #2
[109749.279313] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
[109749.282868]  0000000000000009 ffff88020852fc68 ffffffff8145f077 ffffffff81095de5
[109749.284411]  ffff88020852fcb8 ffff88020852fca8 ffffffff8104b3b0 0000000000000000
[109749.285952]  ffffffff81260642 ffff8800ba087e20 ffff8802087711c0 ffff880105146d20
[109749.287505] Call Trace:
[109749.288135]  [<ffffffff8145f077>] dump_stack+0x4f/0x7b
[109749.298080]  [<ffffffff81095de5>] ? console_unlock+0x356/0x3a2
[109749.331605]  [<ffffffff8104b3b0>] warn_slowpath_common+0xa1/0xbb
[109749.334849]  [<ffffffff81260642>] ? __list_del_entry+0x5a/0x98
[109749.337093]  [<ffffffff8104b410>] warn_slowpath_fmt+0x46/0x48
[109749.337847]  [<ffffffff81260642>] __list_del_entry+0x5a/0x98
[109749.338678]  [<ffffffffa053e8bf>] btrfs_wait_pending_ordered+0x46/0xdb [btrfs]
[109749.340145]  [<ffffffffa058a65f>] ? __btrfs_run_delayed_items+0x149/0x163 [btrfs]
[109749.348313]  [<ffffffffa054077d>] btrfs_commit_transaction+0x36b/0xa10 [btrfs]
[109749.349745]  [<ffffffff81087310>] ? trace_hardirqs_on+0xd/0xf
[109749.350819]  [<ffffffffa055370d>] btrfs_sync_file+0x36f/0x3fc [btrfs]
[109749.351976]  [<ffffffff8118ec98>] vfs_fsync_range+0x8f/0x9e
[109749.360341]  [<ffffffff8118ecc3>] vfs_fsync+0x1c/0x1e
[109749.368828]  [<ffffffff8118ee1d>] do_fsync+0x34/0x4e
[109749.369790]  [<ffffffff8118f045>] SyS_fsync+0x10/0x14
[109749.370925]  [<ffffffff81465197>] system_call_fastpath+0x12/0x6f
[109749.382274] ---[ end trace 48e0d07f7c03d95a ]---

On a non-debug kernel this leads to invalid memory accesses, causing a
crash. Fix this by using list_splice_init() instead of list_splice() in
btrfs_commit_transaction().

Cc: stable@vger.kernel.org
Fixes: 50d9aa99bd35 ("Btrfs: make sure logged extents complete in the current transaction V3"
Signed-off-by: Filipe Manana <fdmanana@suse.com>
(cherry picked from commit c56d45d8d1d01d82b336fd67c6cff10d0ea097ee)

10 years agoBtrfs: fix memory leak in the extent_same ioctl
Filipe Manana [Fri, 3 Jul 2015 10:36:49 +0000 (11:36 +0100)]
Btrfs: fix memory leak in the extent_same ioctl

We were allocating memory with memdup_user() but we were never releasing
that memory. This affected pretty much every call to the ioctl, whether
it deduplicated extents or not.

This issue was reported on IRC by Julian Taylor and on the mailing list
by Marcel Ritter, credit goes to them for finding the issue.

Reported-by: Julian Taylor <jtaylor.debian@googlemail.com>
Reported-by: Marcel Ritter <ritter.marcel@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Filipe Manana <fdmanana@suse.com>
10 years agobtrfs: don't update mtime/ctime on deduped inodes
Mark Fasheh [Tue, 30 Jun 2015 21:42:08 +0000 (14:42 -0700)]
btrfs: don't update mtime/ctime on deduped inodes

One issue users have reported is that dedupe changes mtime on files,
resulting in tools like rsync thinking that their contents have changed when
in fact the data is exactly the same. We also skip the ctime update as no
user-visible metadata changes here and we want dedupe to be transparent to
the user.

Clone still wants time changes, so we special case this in the code.

This was tested with the btrfs-extent-same tool.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
10 years agoRevert "btrfs: don't update mtime on deduped inodes"
Zygo Blaxell [Thu, 2 Jul 2015 01:05:38 +0000 (21:05 -0400)]
Revert "btrfs: don't update mtime on deduped inodes"

This reverts commit 2fd3e6fc5df4213e7d2f0cde2bc0e33e08b8da4e.

10 years agoMerge tag 'v4.0.7' into zygo-4.0.7-zb64
Zygo Blaxell [Mon, 29 Jun 2015 21:15:22 +0000 (17:15 -0400)]
Merge tag 'v4.0.7' into zygo-4.0.7-zb64

This is the 4.0.7 stable release

# gpg: Signature made Mon Jun 29 15:29:37 2015 EDT using RSA key ID 6092693E
# gpg: Good signature from "Greg Kroah-Hartman (Linux kernel stable release signing key) <greg@kroah.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 647F 2865 4894 E3BD 4571  99BE 38DB BDC8 6092 693E

10 years agoLinux 4.0.7 v4.0.7
Greg Kroah-Hartman [Mon, 29 Jun 2015 19:29:29 +0000 (12:29 -0700)]
Linux 4.0.7

10 years agopowerpc/powernv: Restore non-volatile CRs after nap
Sam Bobroff [Fri, 1 May 2015 06:50:34 +0000 (16:50 +1000)]
powerpc/powernv: Restore non-volatile CRs after nap

commit 0aab3747091db309b8a484cfd382a41644552aa3 upstream.

Patches 7cba160ad "powernv/cpuidle: Redesign idle states management"
and 77b54e9f2 "powernv/powerpc: Add winkle support for offline cpus"
use non-volatile condition registers (cr2, cr3 and cr4) early in the system
reset interrupt handler (system_reset_pSeries()) before it has been determined
if state loss has occurred. If state loss has not occurred, control returns via
the power7_wakeup_noloss() path which does not restore those condition
registers, leaving them corrupted.

Fix this by restoring the condition registers in the power7_wakeup_noloss()
case.

This is apparent when running a KVM guest on hardware that does not
support winkle or sleep and the guest makes use of secondary threads. In
practice this means Power7 machines, though some early unreleased Power8
machines may also be susceptible.

The secondary CPUs are taken off line before the guest is started and
they call pnv_smp_cpu_kill_self(). This checks support for sleep
states (in this case there is no support) and power7_nap() is called.

When the CPU is woken, power7_nap() returns and because the CPU is
still off line, the main while loop executes again. The sleep states
support test is executed again, but because the tested values cannot
have changed, the compiler has optimized the test away and instead we
rely on the result of the first test, which has been left in cr3
and/or cr4. With the result overwritten, the wrong branch is taken and
power7_winkle() is called on a CPU that does not support it, leading
to it stalling.

Fixes: 7cba160ad789 ("powernv/cpuidle: Redesign idle states management")
Fixes: 77b54e9f213f ("powernv/powerpc: Add winkle support for offline cpus")
[mpe: Massage change log a bit more]
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/i915: Avoid GPU hang when coming out of s3 or s4
Peter Antoine [Mon, 11 May 2015 07:50:45 +0000 (08:50 +0100)]
drm/i915: Avoid GPU hang when coming out of s3 or s4

commit 364aece01a2dd748fc36a1e8bf52ef639b0857bd upstream.

This patch fixes a timing issue that causes a GPU hang when the system
comes out of power saving.

During pm_resume, We are submitting batchbuffers before enabling
Interrupts this is causing us to miss the context switch interrupt,
and in consequence intel_execlists_handle_ctx_events is not triggered.

This patch is based on a patch from Deepak S <deepak.s@intel.com>
from another platform.

The patch fixes an issue introduced by:
  commit e7778be1eab918274f79603d7c17b3ec8be77386
  drm/i915: Fix startup failure in LRC mode after recent init changes

The above patch added a call to init_context() to fix an issue introduced
by a previous patch. But, it then opened up a small timing window for the
batches being added by the init_context (basically setting up the context)
to complete before the interrupts have been turned on, thus hanging the
GPU.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89600
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[Jani: fixed typo in subject, massaged the comments a bit]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED
Junichi Nomura [Wed, 27 May 2015 04:22:07 +0000 (04:22 +0000)]
dm: fix NULL pointer when clone_and_map_rq returns !DM_MAPIO_REMAPPED

commit 3a1407559a593d4360af12dd2df5296bf8eb0d28 upstream.

When stacking request-based DM on blk_mq device, request cloning and
remapping are done in a single call to target's clone_and_map_rq().
The clone is allocated and valid only if clone_and_map_rq() returns
DM_MAPIO_REMAPPED.

The "IS_ERR(clone)" check in map_request() does not cover all the
!DM_MAPIO_REMAPPED cases that are possible (E.g. if underlying devices
are not ready or unavailable, clone_and_map_rq() may return
DM_MAPIO_REQUEUE without ever having established an ERR_PTR).  Fix this
by explicitly checking for a return that is not DM_MAPIO_REMAPPED in
map_request().

Without this fix, DM core may call setup_clone() for a NULL clone
and oops like this:

   BUG: unable to handle kernel NULL pointer dereference at 0000000000000068
   IP: [<ffffffff81227525>] blk_rq_prep_clone+0x7d/0x137
   ...
   CPU: 2 PID: 5793 Comm: kdmwork-253:3 Not tainted 4.0.0-nm #1
   ...
   Call Trace:
    [<ffffffffa01d1c09>] map_tio_request+0xa9/0x258 [dm_mod]
    [<ffffffff81071de9>] kthread_worker_fn+0xfd/0x150
    [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
    [<ffffffff81071cec>] ? kthread_parkme+0x24/0x24
    [<ffffffff81071fdd>] kthread+0xe6/0xee
    [<ffffffff81093a59>] ? put_lock_stats+0xe/0x20
    [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b
    [<ffffffff814c2d98>] ret_from_fork+0x58/0x90
    [<ffffffff81071ef7>] ? __init_kthread_worker+0x5b/0x5b

Fixes: e5863d9ad ("dm: allocate requests in target when stacking on blk-mq devices")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agokprobes/x86: Return correct length in __copy_instruction()
Eugene Shatokhin [Tue, 17 Mar 2015 10:09:18 +0000 (19:09 +0900)]
kprobes/x86: Return correct length in __copy_instruction()

commit c80e5c0c23ce2282476fdc64c4b5e3d3a40723fd upstream.

On x86-64, __copy_instruction() always returns 0 (error) if the
instruction uses %rip-relative addressing. This is because
kernel_insn_init() is called the second time for 'insn' instance
in such cases and sets all its fields to 0.

Because of this, trying to place a kprobe on such instruction
will fail, register_kprobe() will return -EINVAL.

This patch fixes the problem.

Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20150317100918.28349.94654.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoARM: EXYNOS: Fix failed second suspend on Exynos4
Krzysztof Kozlowski [Wed, 11 Mar 2015 10:13:57 +0000 (11:13 +0100)]
ARM: EXYNOS: Fix failed second suspend on Exynos4

commit 6f024978e74bda616b27183adee029b65eb27032 upstream.

On Exynos4412 boards (Trats2, Odroid U3) after enabling L2 cache in
56b60b8bce4a ("ARM: 8265/1: dts: exynos4: Add nodes for L2 cache
controller") the second suspend to RAM failed. First suspend worked fine
but the next one hang just after powering down of secondary CPUs (system
consumed energy as it would be running but was not responsive).

The issue was caused by enabling delayed reset assertion for CPU0 just
after issuing power down of cores. This was introduced for Exynos4 in
13cfa6c4f7fa ("ARM: EXYNOS: Fix CPU idle clock down after CPU off").

The whole behavior is not well documented but after checking with vendor
code this should be done like this (on Exynos4):
1. Enable delayed reset assertion when system is running (for all CPUs).
2. Disable delayed reset assertion before suspending the system.
   This can be done after powering off secondary CPUs.
3. Re-enable the delayed reset assertion when system is resumed.

Fixes: 13cfa6c4f7fa ("ARM: EXYNOS: Fix CPU idle clock down after CPU off")
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Tested-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Kukjin Kim <kgene@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agocdc-acm: Add support of ATOL FPrint fiscal printers
Alexey Sokolov [Tue, 2 Jun 2015 08:49:30 +0000 (11:49 +0300)]
cdc-acm: Add support of ATOL FPrint fiscal printers

commit 15bf722e6f6c0b884521a0363204532e849deb7f upstream.

ATOL FPrint fiscal printers require usb_clear_halt to be executed
to work properly. Add quirk to fix the issue.

Signed-off-by: Alexey Sokolov <sokolov@7pikes.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agob43: fix support for 14e4:4321 PCI dev with BCM4321 chipset
Rafał Miłecki [Sat, 6 Jun 2015 20:45:59 +0000 (22:45 +0200)]
b43: fix support for 14e4:4321 PCI dev with BCM4321 chipset

commit 90f91b129810c3f169e443252be30ed7c0130326 upstream.

It seems Broadcom released two devices with conflicting device id. There
are for sure 14e4:4321 PCI devices with BCM4321 (N-PHY) chipset, they
can be found in routers, e.g. Netgear WNR834Bv2. However, according to
Broadcom public sources 0x4321 is also used for 5 GHz BCM4306 (G-PHY).
It's unsure if they meant PCI device id, or "virtual" id (from SPROM).
To distinguish these devices lets check PHY type (G vs. N).

Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Cc: <stable@vger.kernel.org> # 3.16+
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoath3k: add support of 13d3:3474 AR3012 device
Dmitry Tunin [Sat, 6 Jun 2015 17:29:25 +0000 (20:29 +0300)]
ath3k: add support of 13d3:3474 AR3012 device

commit 0d0cef6183aec0fb6d0c9f00a09ff51ee086bbe2 upstream.

BugLink: https://bugs.launchpad.net/bugs/1427680
This device requires new firmware files
 AthrBT_0x11020100.dfu and ramps_0x11020100_40.dfu added to
/lib/firmware/ar3k/ that are not included in linux-firmware yet.

T: Bus=01 Lev=01 Prnt=01 Port=04 Cnt=01 Dev#= 4 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=13d3 ProdID=3474 Rev=00.01
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoath3k: Add support of 0489:e076 AR3012 device
Dmitry Tunin [Sat, 6 Jun 2015 17:25:40 +0000 (20:25 +0300)]
ath3k: Add support of 0489:e076 AR3012 device

commit 692c062e7c282164fd7cda68077f79dafd176eaf upstream.

BugLink: https://bugs.launchpad.net/bugs/1462614
This device requires new firmware files
 AthrBT_0x11020100.dfu and ramps_0x11020100_40.dfu added to
/lib/firmware/ar3k/ that are not included in linux-firmware yet.

T: Bus=03 Lev=01 Prnt=01 Port=09 Cnt=06 Dev#= 7 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0489 ProdID=e076 Rev= 0.01
C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms
E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms
I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms
I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms
I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms
I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms
I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=(none)
E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms
E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms

Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query
Michel Dänzer [Tue, 16 Jun 2015 08:28:16 +0000 (17:28 +0900)]
drm/radeon: Add RADEON_INFO_VA_UNMAP_WORKING query

commit 3bc980bf19bb62007e923691fa2869ba113be895 upstream.

This tells userspace that it's safe to use the RADEON_VA_UNMAP operation
of the DRM_RADEON_GEM_VA ioctl.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoRevert "drm/i915: Don't skip request retirement if the active list is empty"
Jani Nikula [Mon, 15 Jun 2015 09:59:37 +0000 (12:59 +0300)]
Revert "drm/i915: Don't skip request retirement if the active list is empty"

commit 245ec9d85696c3e539b23e210f248698b478379c upstream.

This reverts commit 0aedb1626566efd72b369c01992ee7413c82a0c5.

I messed things up while applying [1] to drm-intel-fixes. Rectify.

[1] http://mid.gmane.org/1432827156-9605-1-git-send-email-ville.syrjala@linux.intel.com

Fixes: 0aedb1626566 ("drm/i915: Don't skip request retirement if the active list is empty")
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/i915: Always reset vma->ggtt_view.pages cache on unbinding
Chris Wilson [Thu, 11 Jun 2015 07:06:08 +0000 (08:06 +0100)]
drm/i915: Always reset vma->ggtt_view.pages cache on unbinding

commit 016a65a39170c3cdca09a6ac343ff4f124668b45 upstream.

With the introduction of multiple views of an obj in the same vm, each
vma was taught to cache its copy of the pages (so that different views
could have different page arrangements). However, this missed decoupling
those vma->ggtt_view.pages when the vma released its reference on the
obj->pages. As we don't always free the vma, this leads to a possible
scenario (e.g. execbuffer interrupted by the shrinker) where the vma
points to a stale obj->pages, and explodes.

Fixes regression from commit fe14d5f4e5468c5b80a24f1a64abcbe116143670
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Wed Dec 10 17:27:58 2014 +0000

    drm/i915: Infrastructure for supporting different GGTT views per object

Tvrtko says, if someone else will be confused how this can happen, key
is the reservation execbuffer path. That puts the VMA on the exec_list
which prevents i915_vma_unbind and i915_gem_vma_destroy from fully
destroying the VMA. So the VMA is left existing as an empty object in
the list - unbound and disassociated with the backing store. Kind of a
cached memory object. And then re-using it needs to clear the cached
pages pointer which is fixed above.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1227892
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[Jani: Added Tvrtko's explanation to commit message.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/mgag200: Reject non-character-cell-aligned mode widths
Adam Jackson [Mon, 15 Jun 2015 20:16:15 +0000 (16:16 -0400)]
drm/mgag200: Reject non-character-cell-aligned mode widths

commit 25161084b1c1b0c29948f6f77266a35f302196b7 upstream.

Turns out 1366x768 does not in fact work on this hardware.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiser-target: Fix possible use-after-free
Sagi Grimberg [Thu, 4 Jun 2015 16:49:21 +0000 (19:49 +0300)]
iser-target: Fix possible use-after-free

commit 524630d5824c7a75aab568c6bd1423fd748cd3bb upstream.

iser connection termination process happens in 2 stages:
- isert_wait_conn:
  - resumes rdma disconnect
  - wait for session commands
  - wait for flush completions (post a marked wr to signal we are done)
  - wait for logout completion
  - queue work for connection cleanup (depends on disconnected/timewait
    events)
- isert_free_conn
  - last reference put on the connection

In case we are terminating during IOs, we might be posting send/recv
requests after we posted the last work request which might lead
to a use-after-free condition in isert_handle_wc.
After we posted the last wr in isert_wait_conn we are guaranteed that
no successful completions will follow (meaning no new work request posts
may happen) but other flush errors might still come. So before we
put the last reference on the connection, we repeat the process of
posting a marked work request (isert_wait4flush) in order to make sure all
pending completions were flushed.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Falkovich <jennyf@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiser-target: Fix variable-length response error completion
Sagi Grimberg [Thu, 4 Jun 2015 16:49:19 +0000 (19:49 +0300)]
iser-target: Fix variable-length response error completion

commit 9253e667ab50fd4611a60e1cdd6a6e05a1d91cf1 upstream.

Since commit "2426bd456a6 target: Report correct response ..."
we might get a command with data_size that does not fit to
the number of allocated data sg elements. Given that we rely on
cmd t_data_nents which might be different than the data_size,
we sometimes receive local length error completion. The correct
approach would be to take the command data_size into account when
constructing the ib sg_list.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jenny Falkovich <jennyf@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotracing: Have filter check for balanced ops
Steven Rostedt [Mon, 15 Jun 2015 21:50:25 +0000 (17:50 -0400)]
tracing: Have filter check for balanced ops

commit 2cf30dc180cea808077f003c5116388183e54f9e upstream.

When the following filter is used it causes a warning to trigger:

 # cd /sys/kernel/debug/tracing
 # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
 # cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: No error

 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 1223 at kernel/trace/trace_events_filter.c:1640 replace_preds+0x3c5/0x990()
 Modules linked in: bnep lockd grace bluetooth  ...
 CPU: 3 PID: 1223 Comm: bash Tainted: G        W       4.1.0-rc3-test+ #450
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
  0000000000000668 ffff8800c106bc98 ffffffff816ed4f9 ffff88011ead0cf0
  0000000000000000 ffff8800c106bcd8 ffffffff8107fb07 ffffffff8136b46c
  ffff8800c7d81d48 ffff8800d4c2bc00 ffff8800d4d4f920 00000000ffffffea
 Call Trace:
  [<ffffffff816ed4f9>] dump_stack+0x4c/0x6e
  [<ffffffff8107fb07>] warn_slowpath_common+0x97/0xe0
  [<ffffffff8136b46c>] ? _kstrtoull+0x2c/0x80
  [<ffffffff8107fb6a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff81159065>] replace_preds+0x3c5/0x990
  [<ffffffff811596b2>] create_filter+0x82/0xb0
  [<ffffffff81159944>] apply_event_filter+0xd4/0x180
  [<ffffffff81152bbf>] event_filter_write+0x8f/0x120
  [<ffffffff811db2a8>] __vfs_write+0x28/0xe0
  [<ffffffff811dda43>] ? __sb_start_write+0x53/0xf0
  [<ffffffff812e51e0>] ? security_file_permission+0x30/0xc0
  [<ffffffff811dc408>] vfs_write+0xb8/0x1b0
  [<ffffffff811dc72f>] SyS_write+0x4f/0xb0
  [<ffffffff816f5217>] system_call_fastpath+0x12/0x6a
 ---[ end trace e11028bd95818dcd ]---

Worse yet, reading the error message (the filter again) it says that
there was no error, when there clearly was. The issue is that the
code that checks the input does not check for balanced ops. That is,
having an op between a closed parenthesis and the next token.

This would only cause a warning, and fail out before doing any real
harm, but it should still not caues a warning, and the error reported
should work:

 # cd /sys/kernel/debug/tracing
 # echo "((dev==1)blocks==2)" > events/ext4/ext4_truncate_exit/filter
-bash: echo: write error: Invalid argument
 # cat events/ext4/ext4_truncate_exit/filter
((dev==1)blocks==2)
^
parse_error: Meaningless filter expression

And give no kernel warning.

Link: http://lkml.kernel.org/r/20150615175025.7e809215@gandalf.local.home
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine
Hui Wang [Mon, 15 Jun 2015 09:43:39 +0000 (17:43 +0800)]
ALSA: hda - adding a DAC/pin preference map for a HP Envy TS machine

commit 6ab42ff44864d26e8e498b8ac655d24ee389d267 upstream.

On a HP Envy TouchSmart laptop, there are 2 speakers (main speaker
and subwoofer speaker), 1 headphone and 2 DACs, without this fixup,
the headphone will be assigned to a DAC and the 2 speakers will be
assigned to another DAC, this assignment makes the surround-2.1
channels invalid.

To fix it, here using a DAC/pin preference map to bind the main
speaker to 1 DAC and the subwoofer speaker will be assigned to another
DAC.

Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoclk: at91: fix h32mx prototype inclusion in pmc header
Nicolas Ferre [Thu, 28 May 2015 13:07:21 +0000 (15:07 +0200)]
clk: at91: fix h32mx prototype inclusion in pmc header

commit 28df9c2fb6f896179fcffd5a3f5a86e2d1dff0a5 upstream.

Trivial fix that prevents to compile this pmc clock driver if h32mx clock is
present but smd clock isn't.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Fixes: bcc5fd49a0fd ("clk: at91: add a driver for the h32mx clock")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoclk: at91: pll: fix input range validity check
Boris Brezillon [Fri, 27 Mar 2015 22:53:15 +0000 (23:53 +0100)]
clk: at91: pll: fix input range validity check

commit 6c7b03e1aef2e92176435f4fa562cc483422d20f upstream.

The PLL impose a certain input range to work correctly, but it appears that
this input range does not apply on the input clock (or parent clock) but
on the input clock after it has passed the PLL divisor.
Fix the implementation accordingly.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: Jonas Andersson <jonas@microbit.se>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>