git.hungrycats.org Git - linux/commit

[PATCH] improve preemption on SMP

SMP locking latencies are one of the last architectural problems that cause
millisec-category scheduling delays.  CONFIG_PREEMPT tries to solve some of
the SMP issues but there are still lots of problems remaining: spinlocks
nested at multiple levels, spinning with irqs turned off, and non-nested
spinning with preemption turned off permanently.

The nesting problem goes like this: if a piece of kernel code (e.g.  the MM
or ext3's journalling code) does the following:

spin_lock(&spinlock_1);
...
spin_lock(&spinlock_2);
...

then even with CONFIG_PREEMPT enabled, current kernels may spin on
spinlock_2 indefinitely.  A number of critical sections break their long
paths by using cond_resched_lock(), but this does not break the path on
SMP, because need_resched() *of the other CPU* is not set so
cond_resched_lock() doesnt notice that a reschedule is due.

to solve this problem i've introduced a new spinlock field,
lock->break_lock, which signals towards the holding CPU that a
spinlock-break is requested by another CPU.  This field is only set if a
CPU is spinning in a spinlock function [at any locking depth], so the
default overhead is zero.  I've extended cond_resched_lock() to check for
this flag - in this case we can also save a reschedule.  I've added the
lock_need_resched(lock) and need_lockbreak(lock) methods to check for the
need to break out of a critical section.

Another latency problem was that the stock kernel, even with CONFIG_PREEMPT
enabled, didnt have any spin-nicely preemption logic for the following,
commonly used SMP locking primitives: read_lock(), spin_lock_irqsave(),
spin_lock_irq(), spin_lock_bh(), read_lock_irqsave(), read_lock_irq(),
read_lock_bh(), write_lock_irqsave(), write_lock_irq(), write_lock_bh().
Only spin_lock() and write_lock() [the two simplest cases] where covered.

In addition to the preemption latency problems, the _irq() variants in the
above list didnt do any IRQ-enabling while spinning - possibly resulting in
excessive irqs-off sections of code!

preempt-smp.patch fixes all these latency problems by spinning irq-nicely
(if possible) and by requesting lock-breaks if needed.  Two
architecture-level changes were necessary for this: the addition of the
break_lock field to spinlock_t and rwlock_t, and the addition of the
_raw_read_trylock() function.

Testing done by Mark H Johnson and myself indicate SMP latencies comparable
to the UP kernel - while they were basically indefinitely high without this
patch.

i successfully test-compiled and test-booted this patch ontop of BK-curr
using the following .config combinations: SMP && PREEMPT, !SMP && PREEMPT,
SMP && !PREEMPT and !SMP && !PREEMPT on x86, !SMP && !PREEMPT and SMP &&
PREEMPT on x64.  I also test-booted x86 with the generic_read_trylock
function to check that it works fine.  Essentially the same patch has been
in testing as part of the voluntary-preempt patches for some time already.

NOTE to architecture maintainers: generic_raw_read_trylock() is a crude
version that should be replaced with the proper arch-optimized version
ASAP.

From: Hugh Dickins <hugh@veritas.com>

The i386 and x86_64 _raw_read_trylocks in preempt-smp.patch are too
successful: atomic_read() returns a signed integer.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

author	Ingo Molnar <mingo@elte.hu>
	Sat, 8 Jan 2005 05:49:02 +0000 (21:49 -0800)
committer	Linus Torvalds <torvalds@evo.osdl.org>
	Sat, 8 Jan 2005 05:49:02 +0000 (21:49 -0800)
commit	38e387ee01e5a57cd3ed84062930997b87fa3896
tree	c3cbc19de0beeceb82408b03a27784e9e44ee701	tree \| snapshot
parent	18f27594d0c5cd2da683252afc8d0933bd64a365	commit \| diff

include/asm-alpha/spinlock.h		diff \| blob \| history
include/asm-arm/spinlock.h		diff \| blob \| history
include/asm-i386/spinlock.h		diff \| blob \| history
include/asm-ia64/spinlock.h		diff \| blob \| history
include/asm-mips/spinlock.h		diff \| blob \| history
include/asm-parisc/spinlock.h		diff \| blob \| history
include/asm-parisc/system.h		diff \| blob \| history
include/asm-ppc/spinlock.h		diff \| blob \| history
include/asm-ppc64/spinlock.h		diff \| blob \| history
include/asm-s390/spinlock.h		diff \| blob \| history
include/asm-sh/spinlock.h		diff \| blob \| history
include/asm-sparc/spinlock.h		diff \| blob \| history
include/asm-sparc64/spinlock.h		diff \| blob \| history
include/asm-x86_64/spinlock.h		diff \| blob \| history
include/linux/sched.h		diff \| blob \| history
include/linux/spinlock.h		diff \| blob \| history
kernel/sched.c		diff \| blob \| history
kernel/spinlock.c		diff \| blob \| history