In both preempt_schedule in sched.c and resume_kernel in entry.S, it is
possible to return with need_resched set and thus a pending preemption
but not service that preemption for some time.
Consider:
- return from schedule() to preempt_schedule
- interrupt occurs, sets need_resched
- we cannot preempt since preempt_count = PREEMPT_ACTIVE
- back in preempt_schedule, set preempt_count = 0
Now we again can preempt, but we will not. Instead we return and
continue executing. On the next interrupt, we will redo the whole
fiasco which is a waste since we could of reentered schedule while we
were there. Worse, if we acquire a lock before the next interrupt we
can potentially delay the pending reschedule a very long time. This is
not acceptable.
The solution is to check for and loop on need_resched on resume_kernel
and preempt_schedule like schedule itself does.
ENTRY(resume_kernel)
cmpl $0,TI_PRE_COUNT(%ebx) # non-zero preempt_count ?
jnz restore_all
+need_resched:
movl TI_FLAGS(%ebx), %ecx # need_resched set ?
testb $_TIF_NEED_RESCHED, %cl
jz restore_all
sti
call schedule
movl $0,TI_PRE_COUNT(%ebx)
- jmp restore_all
+ cli
+ jmp need_resched
#endif
# system call handler stub
if (unlikely(ti->preempt_count))
return;
+need_resched:
ti->preempt_count = PREEMPT_ACTIVE;
schedule();
ti->preempt_count = 0;
+
+ /* we can miss a preemption opportunity between schedule and now */
+ barrier();
+ if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
+ goto need_resched;
}
#endif /* CONFIG_PREEMPT */