Phaser是java7新添加的一个同步工具,相比CyclicBarrier、CountDownLatch、Semaphore等同步工具,Phaser更灵活而且可复用(CyclicBarrier也可复用),Phaser包含几个动作:
register:给Phaser增加parties,并且可以通过deRegister减少总parties(CyclicBarrier、CountDownLatch、Semaphore等工具不具备这种灵活性)。
arrive:parties已到达。
awaitAdvance:在所有parties都到达之前当前线程处于挂起等待状态,当所有parties都已到达之后线程被唤醒并且Phaser年龄增加,未到达parties还原,Phaser复用。
Phaser通过status字段来实现同步逻辑,status时一个64位的long变量,它有包含了四个维度的语义:
1、第0-15位,当前未到达的parties,调用arriveXXX时,该值-1,调用register时+1
2、第16-31位,当前总parties,调用register时+1,deRegister时-1
3、第32-62位,phase,即Phaser的年龄,当未到达的parties减到0(即所有parties已到达)时,phase自动加1,并且把16-31位的parties数复制到0-15位,从而该Phaser可以继续复用
4、第63位,当前Phaser是否已终止,1表示已终止,0表示未终止
当Phaser的parties数比较大的高并发场景下,Phaser的status变量的竞争会非常激烈,register、arrive等操作发起的CAS操作预测将会大概率失败导致大量CAS操作被重复调用,增加CPU开销。可以通过构造Phaser分层树的方式来分离竞争,子Phaser第一次register时,把该子Phaser注册到父Phaser,当子Phaser所有parties都已经arrive时,把它从父Phaser中反注册,当根Phaser的所有子Phaser的parties都已经arrive时,整个Phaser树升级phase递增,通过这种方式,所有的arrive、register操作在子Phaser进行就可以,根Phaser只需负责Phaser的升级,这样可以把部分对status的访问修改分离到子Phaser中,通过分散竞争点提高Phaser的吞吐量,下面示例代码就把9个parties分散到了两个子Phaser中:
final Phaser parent = new Phaser();
final Phaser phaser1 = new Phaser(parent);
final Phaser phaser2 = new Phaser(parent);
phaser1.bulkRegister(5);
phaser2.bulkRegister(4);
for (int i = 0; i < 5; i++) {
Thread t = new Thread(new Runnable() {
@Override
public void run() {
phaser1.arriveAndAwaitAdvance();
}
});
t.start();
}
for (int i = 0; i < 4; i++) {
final int index = i;
Thread t = new Thread(new Runnable() {
@Override
public void run() {
System.out.println("arrive2:" + index);
phaser2.arriveAndAwaitAdvance();
}
});
t.start();
}
parent.awaitAdvance(parent.getPhase());
下面来看看几个主要方法的代码,下面是注册parties的代码:
private int doRegister(int registrations) {
// adjustment to state
long adj = ((long)registrations << PARTIES_SHIFT) | registrations;
final Phaser parent = this.parent;
int phase;
for (;;) {
long s = state;
int counts = (int)s;
int parties = counts >>> PARTIES_SHIFT;
int unarrived = counts & UNARRIVED_MASK;
if (registrations > MAX_PARTIES - parties)
throw new IllegalStateException(badRegister(s));
else if ((phase = (int)(s >>> PHASE_SHIFT)) < 0)
break;
else if (counts != EMPTY) { // not 1st registration
if (parent == null || reconcileState() == s) {
if (unarrived == 0) // wait out advance
root.internalAwaitAdvance(phase, null);
else if (UNSAFE.compareAndSwapLong(this, stateOffset,
s, s + adj))
break;
}
}
else if (parent == null) { // 1st root registration
long next = ((long)phase << PHASE_SHIFT) | adj;
if (UNSAFE.compareAndSwapLong(this, stateOffset, s, next))
break;
}
else {
synchronized (this) { // 1st sub registration
if (state == s) { // recheck under lock
parent.doRegister(1);
do { // force current phase
phase = (int)(root.state >>> PHASE_SHIFT);
// assert phase < 0 || (int)state == EMPTY;
} while (!UNSAFE.compareAndSwapLong
(this, stateOffset, state,
((long)phase << PHASE_SHIFT) | adj));
break;
}
}
}
}
return phase;
}
代码很复杂使用了大量的位操作,主要做的事情就是修改status变量的parties部分和unarrive部分,同时也可以看到,在第一注册时,调用了代码parent.doRegister(1),注册一次party到父Phaser。
下面看看arrive的代码:
private int doArrive(boolean deregister) {
int adj = deregister ? ONE_ARRIVAL|ONE_PARTY : ONE_ARRIVAL;
final Phaser root = this.root;
for (;;) {
long s = (root == this) ? state : reconcileState();
int phase = (int)(s >>> PHASE_SHIFT);
int counts = (int)s;
int unarrived = (counts & UNARRIVED_MASK) - 1;
if (phase < 0)
return phase;
else if (counts == EMPTY || unarrived < 0) {
if (root == this || reconcileState() == s)
throw new IllegalStateException(badArrive(s));
}
else if (UNSAFE.compareAndSwapLong(this, stateOffset, s, s-=adj)) {
if (unarrived == 0) {
long n = s & PARTIES_MASK; // base of next state
int nextUnarrived = (int)n >>> PARTIES_SHIFT;
if (root != this)
return parent.doArrive(nextUnarrived == 0);
if (onAdvance(phase, nextUnarrived))
n |= TERMINATION_BIT;
else if (nextUnarrived == 0)
n |= EMPTY;
else
n |= nextUnarrived;
n |= (long)((phase + 1) & MAX_PHASE) << PHASE_SHIFT;
UNSAFE.compareAndSwapLong(this, stateOffset, s, n);
releaseWaiters(phase);
}
return phase;
}
}
}
从代码中可以看到,如果当前Phaser的所有parties都已经arrive,那么调用一次parent.doArrive(1),如果所有parties都已经arrive并且当前Phaser时根Phaser,说明该Phaser可以升级,phase值加1,唤醒由于调用了awaitXXX被阻塞的线程。在升级时有个onAdvance回调可以让调用者终止Phaser。
下面来看看awaitAdvance方法:
public int awaitAdvance(int phase) {
final Phaser root = this.root;
long s = (root == this) ? state : reconcileState();
int p = (int)(s >>> PHASE_SHIFT);
if (phase < 0)
return phase;
if (p == phase)
return root.internalAwaitAdvance(phase, null);
return p;
}
private int internalAwaitAdvance(int phase, QNode node) {
releaseWaiters(phase-1); // ensure old queue clean
boolean queued = false; // true when node is enqueued
int lastUnarrived = 0; // to increase spins upon change
int spins = SPINS_PER_ARRIVAL;
long s;
int p;
while ((p = (int)((s = state) >>> PHASE_SHIFT)) == phase) {
if (node == null) { // spinning in noninterruptible mode
int unarrived = (int)s & UNARRIVED_MASK;
if (unarrived != lastUnarrived &&
(lastUnarrived = unarrived) < NCPU)
spins += SPINS_PER_ARRIVAL;
boolean interrupted = Thread.interrupted();
if (interrupted || --spins < 0) { // need node to record intr
node = new QNode(this, phase, false, false, 0L);
node.wasInterrupted = interrupted;
}
}
else if (node.isReleasable()) // done or aborted
break;
else if (!queued) { // push onto queue
AtomicReference<QNode> head = (phase & 1) == 0 ? evenQ : oddQ;
QNode q = node.next = head.get();
if ((q == null || q.phase == phase) &&
(int)(state >>> PHASE_SHIFT) == phase) // avoid stale enq
queued = head.compareAndSet(q, node);
}
else {
try {
ForkJoinPool.managedBlock(node);
} catch (InterruptedException ie) {
node.wasInterrupted = true;
}
}
}
if (node != null) {
if (node.thread != null)
node.thread = null; // avoid need for unpark()
if (node.wasInterrupted && !node.interruptible)
Thread.currentThread().interrupt();
if (p == phase && (p = (int)(state >>> PHASE_SHIFT)) == phase)
return abortWait(phase); // possibly clean up on abort
}
releaseWaiters(phase);
return p;
}
Phaser的await操作不会直接挂起线程,先会对根Phaser的status自旋检查,检查phase是否发生了变化,自旋了若干次(这个数值跟当前CPU的核心数有关)之后如果phase还未发生变化则挂起线程,这样做的目的是挂起线程会造成上下文切换,如果Phaser在很短的时间内就升级了,那么这样就减少了上下文切换次数提高CPU吞吐量,但是自旋检查也会造成CPU消耗,所以也不能一直自旋(这样有可能造成CPU跑满)。在上面的arrive方法中可以看到,当所有parties都arrive之后修改phase值加1,所以internalAwaitAdvance方法中的while条件将不成立,跳出循环唤醒所有等待的线程。下面是唤醒线程的方法:
private void releaseWaiters(int phase) {
QNode q; // first element of queue
Thread t; // its thread
AtomicReference<QNode> head = (phase & 1) == 0 ? evenQ : oddQ;
while ((q = head.get()) != null &&
q.phase != (int)(root.state >>> PHASE_SHIFT)) {
if (head.compareAndSet(q, q.next) &&
(t = q.thread) != null) {
q.thread = null;
LockSupport.unpark(t);
}
}
}
这个方法相对简单,就是有一点需要注意的是:有两个线程等待队列头结点分别是evenQ和oddQ,这是因为在并发场景下,老Phaser所有parties都已经arrive之后等待队列的线程正在被唤醒,但是此时又有线程在对升级后的Phaser调用了await,如果只有一个队列的话那么此时队列头结点出现激烈的竞争,所以这里面把相邻的年龄的Phaser等待线程放在两个队列中可以达到分离竞争的目的(因为等待队列时后进先出所以头结点会出现结点竞争,这里没有想通会什么不是采用先进先出,如果是先进先出的话可以消除这种情况下的激烈竞争就不需要再额外搞一个队列了)。