According to Intel's recommendation, 'rep;nop; should be called before
testing if the lock variable was modified (i.e. rep nop;cmp;jcc). The
current implementation does it the wrong way around: first test, then
wait, then branch. I've asked Asit Mallik from Intel, and he recommended
to change it.
It should be at least consistent: Right now, spinlock uses
'cmp;rep nop;jcc', rwlock uses 'rep nop;cmp;jcc'
"js 2f\n" \
LOCK_SECTION_START("") \
"2:\t" \
- "cmpb $0,%0\n\t" \
"rep;nop\n\t" \
+ "cmpb $0,%0\n\t" \
"jle 2b\n\t" \
"jmp 1b\n" \
LOCK_SECTION_END