看下令人dan疼无比的LinkedTransferQueue实现。
自认为,该类的复杂,主要体现在3个方面:
1、设计理念的复杂
2、实现逻辑的复杂
3、代码注释,实在是有点不明其意,有道来来回回地都用了几次,看的还是一头雾水。
但是终归还是要看的,有多大的能力办多大的事,现在以最大的努力,来记录下对LinkedTransferQueue类的解析。
首先,还是看类简介:
/**
* 基于链表节点的LinkedTransferQueue 是一个无限制的 TransferQueue,队列元素FIFO,头元素就是在队列中呆的最久的元素,尾元素就是进队列时间最短的。
* An unbounded {@link TransferQueue} based on linked nodes.
* This queue orders elements FIFO (first-in-first-out) with respect
* to any given producer. The <em>head</em> of the queue is that
* element that has been on the queue the longest time for some
* producer. The <em>tail</em> of the queue is that element that has
* been on the queue the shortest time for some producer.
*
* 和其他大多数的集合不同,LinkedTransferQueue的size()方法不是一个固定时间的操作,
* 由于这些队列的异步特性,需要遍历元素才能确定元素数量,因此在遍历期间对集合的修改,就会导致错误的结果。
* 此外,批量操作addAll()/removeAll()/retainAll()/containsAll()/equals()/toArray()不能保证以原子方式执行。
* 例如,一个与addAll()操作并发操作的迭代器可能只查看添加的一些元素。
* <p>Beware that, unlike in most collections, the {@code size} method
* is <em>NOT</em> a constant-time operation. Because of the
* asynchronous nature of these queues, determining the current number
* of elements requires a traversal of the elements, and so may report
* inaccurate results if this collection is modified during traversal.
* Additionally, the bulk operations {@code addAll},
* {@code removeAll}, {@code retainAll}, {@code containsAll},
* {@code equals}, and {@code toArray} are <em>not</em> guaranteed
* to be performed atomically. For example, an iterator operating
* concurrently with an {@code addAll} operation might view only some
* of the added elements.
*
* <p>This class and its iterator implement all of the
* <em>optional</em> methods of the {@link Collection} and {@link
* Iterator} interfaces.
*
* 内存一致性影响:像其他并发集合一样,在一个线程中向 LinkedTransferQueue中放置一个元素 happen-before 另外一个并发线程。
* <p>Memory consistency effects: As with other concurrent
* collections, actions in a thread prior to placing an object into a
* {@code LinkedTransferQueue}
* <a href="package-summary.html#MemoryVisibility"><i>happen-before</i></a>
* actions subsequent to the access or removal of that element from
* the {@code LinkedTransferQueue} in another thread.
*
* <p>This class is a member of the
* <a href="{@docRoot}/../technotes/guides/collections/index.html">
* Java Collections Framework</a>.
*
* @since 1.7
* @author Doug Lea
* @param <E> the type of elements held in this collection
*/
从中可以看到:
1、LinkedTransferQueue 是一个FIFO的无限制的链表TransferQueue,头元素就是在队列中呆的最久的元素,尾元素就是进队列时间最短的元素。
2、和其他的集合不同,LinkedTransferQueue 的size()方法不是一个耗时恒定的方法,因为要对内部元素进行遍历。并且该过程中,如果元素数量发生变化,返回的值就不准确。
3、批量方法addAll()/removeAll()/retainAll()/containsAll()/equals()/toArray()不能保证以原子方式执行,并发的迭代器,可能只能看见一部分。
类主要的属性有:
/** 判断当前系统环境是否是多线程 */
/** True if on multiprocessor */
private static final boolean MP =
Runtime.getRuntime().availableProcessors() > 1;
/**
* 当一个节点是队列中的第一个服务员时,在阻塞之前在多处理器上旋转(随机穿插对Thread.yield的调用)的次数。
* 务必是2的次方数,这个值是根据经验得出的——它在各种处理器、cpu数量和操作系统上都能很好地工作。
* The number of times to spin (with randomly interspersed calls
* to Thread.yield) on multiprocessor before blocking when a node
* is apparently the first waiter in the queue. See above for
* explanation. Must be a power of two. The value is empirically
* derived -- it works pretty well across a variety of processors,
* numbers of CPUs, and OSes.
*/
private static final int FRONT_SPINS = 1 << 7;
/**
* 当一个节点前面有另一个明显正在旋转的节点时,在阻塞之前旋转的次数。也可作为相位变化时FRONT_SPINS的增量,并作为自旋期间调用yield的基本平均频率。一定是2的幂。
* The number of times to spin before blocking when a node is
* preceded by another node that is apparently spinning. Also
* serves as an increment to FRONT_SPINS on phase changes, and as
* base average frequency for yielding during spins. Must be a
* power of two.
*/
private static final int CHAINED_SPINS = FRONT_SPINS >>> 1;
/**
* 在清除队列之前允许的最大预估删除失败数(sweepVotes),该值必须至少为2,以避免在删除尾随节点时无用的扫描。
* The maximum number of estimated removal failures (sweepVotes)
* to tolerate before sweeping through the queue unlinking
* cancelled nodes that were not unlinked upon initial
* removal. See above for explanation. The value must be at least
* two to avoid useless sweeps when removing trailing nodes.
*/
static final int SWEEP_THRESHOLD = 32;
/** 队列头,初始为空,直到第一个元素入队 */
/** head of the queue; null until first enqueue */
transient volatile Node head;
/** 队列尾,初始为空,直到第一个元素入队 */
/** tail of the queue; null until first append */
private transient volatile Node tail;
/** 清除节点失败的次数 */
/** The number of apparent failures to unsplice removed nodes */
private transient volatile int sweepVotes;
其中的Node数据结构如下,中间使用了Unsafe类进行实现,稍后会对Unsafe做一个简单的介绍:
static final class Node {
final boolean isData; // false if this is a request node
volatile Object item; // initially non-null if isData; CASed to match
volatile Node next;
volatile Thread waiter; // null until waiting
// CAS methods for fields
final boolean casNext(Node cmp, Node val) {
return UNSAFE.compareAndSwapObject(this, nextOffset, cmp, val);
}
final boolean casItem(Object cmp, Object val) {
// assert cmp == null || cmp.getClass() != Node.class;
return UNSAFE.compareAndSwapObject(this, itemOffset, cmp, val);
}
/**
* Constructs a new node. Uses relaxed write because item can
* only be seen after publication via casNext.
*/
Node(Object item, boolean isData) {
UNSAFE.putObject(this, itemOffset, item); // relaxed write
this.isData = isData;
}
/**
* Links node to itself to avoid garbage retention. Called
* only after CASing head field, so uses relaxed write.
*/
final void forgetNext() {
UNSAFE.putObject(this, nextOffset, this);
}
/**
* Sets item to self and waiter to null, to avoid garbage
* retention after matching or cancelling. Uses relaxed writes
* because order is already constrained in the only calling
* contexts: item is forgotten only after volatile/atomic
* mechanics that extract items. Similarly, clearing waiter
* follows either CAS or return from park (if ever parked;
* else we don't care).
*/
final void forgetContents() {
UNSAFE.putObject(this, itemOffset, this);
UNSAFE.putObject(this, waiterOffset, null);
}
/**
* Returns true if this node has been matched, including the
* case of artificial matches due to cancellation.
*/
final boolean isMatched() {
Object x = item;
return (x == this) || ((x == null) == isData);
}
/**
* Returns true if this is an unmatched request node.
*/
final boolean isUnmatchedRequest() {
return !isData && item == null;
}
/**
* Returns true if a node with the given mode cannot be
* appended to this node because this node is unmatched and
* has opposite data mode.
*/
final boolean cannotPrecede(boolean haveData) {
boolean d = isData;
Object x;
return d != haveData && (x = item) != this && (x != null) == d;
}
/**
* Tries to artificially match a data node -- used by remove.
*/
final boolean tryMatchData() {
// assert isData;
Object x = item;
if (x != null && x != this && casItem(x, null)) {
LockSupport.unpark(waiter);
return true;
}
return false;
}
private static final long serialVersionUID = -3375979862319811754L;
// Unsafe mechanics
private static final sun.misc.Unsafe UNSAFE;
private static final long itemOffset;
private static final long nextOffset;
private static final long waiterOffset;
static {
try {
UNSAFE = sun.misc.Unsafe.getUnsafe();
Class<?> k = Node.class;
itemOffset = UNSAFE.objectFieldOffset
(k.getDeclaredField("item"));
nextOffset = UNSAFE.objectFieldOffset
(k.getDeclaredField("next"));
waiterOffset = UNSAFE.objectFieldOffset
(k.getDeclaredField("waiter"));
} catch (Exception e) {
throw new Error(e);
}
}
}
接着看方法实现,看一个该类中比较重要的一个内部私有方法,xfer():
/**
* 该方法的重要性从注释上可见一斑:实现了所有队列的方法
* Implements all queuing methods. See above for explanation.
*
* @param e the item or null for take
* @param haveData true if this is a put, else a take
* @param how NOW, ASYNC, SYNC, or TIMED
* @param nanos timeout in nanosecs, used only if mode is TIMED
* @return an item if matched, else e
* @throws NullPointerException if haveData mode but e is null
*/
private E xfer(E e, boolean haveData, int how, long nanos) {
if (haveData && (e == null))
throw new NullPointerException();
Node s = null; // the node to append, if needed
retry:
for (;;) { // restart on append race
for (Node h = head, p = h; p != null;) { // find & match first node
boolean isData = p.isData;
Object item = p.item;
if (item != p && (item != null) == isData) { // unmatched
if (isData == haveData) // can't match
break;
if (p.casItem(item, e)) { // match
for (Node q = p; q != h;) {
Node n = q.next; // update by 2 unless singleton
if (head == h && casHead(h, n == null ? q : n)) {
h.forgetNext();
break;
} // advance and retry
if ((h = head) == null ||
(q = h.next) == null || !q.isMatched())
break; // unless slack < 2
}
LockSupport.unpark(p.waiter);
return LinkedTransferQueue.<E>cast(item);
}
}
Node n = p.next;
p = (p != n) ? n : (h = head); // Use head if p offlist
}
if (how != NOW) { // No matches available
if (s == null)
s = new Node(e, haveData);
Node pred = tryAppend(s, haveData);
if (pred == null)
continue retry; // lost race vs opposite mode
if (how != ASYNC)
return awaitMatch(s, pred, e, (how == TIMED), nanos);
}
return e; // not waiting
}
}
基本上所有的对队列的操作方法,内部都是调用了该私有方法,一个方法满足所有的需求。
该方法参数列表中,有一个how,代表了该方法的模式,how总共有4个值,分别为:
/*
* Possible values for "how" argument in xfer method.
*/
private static final int NOW = 0; // for untimed poll, tryTransfer
private static final int ASYNC = 1; // for offer, put, add
private static final int SYNC = 2; // for transfer, take
private static final int TIMED = 3; // for timed poll, tryTransfer
主要方法:
/**
* 插入元素到队尾,对于一个无界队列,该方法永远不会被阻塞
* Inserts the specified element at the tail of this queue.
* As the queue is unbounded, this method will never block.
*
* @throws NullPointerException if the specified element is null
*/
public void put(E e) {
xfer(e, true, ASYNC, 0);
}
/**
* 插入元素到队尾,对于一个无界队列,该方法永远不会被阻塞,或者返回fasle
* Inserts the specified element at the tail of this queue.
* As the queue is unbounded, this method will never block or
* return {@code false}.
*
* @return {@code true} (as specified by
* {@link java.util.concurrent.BlockingQueue#offer(Object,long,TimeUnit)
* BlockingQueue.offer})
* @throws NullPointerException if the specified element is null
*/
public boolean offer(E e, long timeout, TimeUnit unit) {
xfer(e, true, ASYNC, 0);
return true;
}
/**
* 插入元素到队尾,对于一个无界队列,该方法永远不会返回false。
* Inserts the specified element at the tail of this queue.
* As the queue is unbounded, this method will never return {@code false}.
*
* @return {@code true} (as specified by {@link Queue#offer})
* @throws NullPointerException if the specified element is null
*/
public boolean offer(E e) {
xfer(e, true, ASYNC, 0);
return true;
}
/**
* 插入元素到队尾,对于一个无界队列,该方法永远不会返回false,也不会抛出 IllegalStateException异常。
* Inserts the specified element at the tail of this queue.
* As the queue is unbounded, this method will never throw
* {@link IllegalStateException} or return {@code false}.
*
* @return {@code true} (as specified by {@link Collection#add})
* @throws NullPointerException if the specified element is null
*/
public boolean add(E e) {
xfer(e, true, ASYNC, 0);
return true;
}
/**
* 尝试将一个元素转移给等待的消费者
* Transfers the element to a waiting consumer immediately, if possible.
*
* <p>More precisely, transfers the specified element immediately
* if there exists a consumer already waiting to receive it (in
* {@link #take} or timed {@link #poll(long,TimeUnit) poll}),
* otherwise returning {@code false} without enqueuing the element.
*
* @throws NullPointerException if the specified element is null
*/
public boolean tryTransfer(E e) {
return xfer(e, true, NOW, 0) == null;
}
/**
* 尝试将一个元素转移给等待的消费者,如果需要的话,进行等待
* Transfers the element to a consumer, waiting if necessary to do so.
*
* <p>More precisely, transfers the specified element immediately
* if there exists a consumer already waiting to receive it (in
* {@link #take} or timed {@link #poll(long,TimeUnit) poll}),
* else inserts the specified element at the tail of this queue
* and waits until the element is received by a consumer.
*
* @throws NullPointerException if the specified element is null
*/
public void transfer(E e) throws InterruptedException {
if (xfer(e, true, SYNC, 0) != null) {
Thread.interrupted(); // failure possible only due to interrupt
throw new InterruptedException();
}
}
/**
* 尝试将一个元素转移给等待的消费者,如果需要的话,进行等待,直到超时。
* Transfers the element to a consumer if it is possible to do so
* before the timeout elapses.
*
* <p>More precisely, transfers the specified element immediately
* if there exists a consumer already waiting to receive it (in
* {@link #take} or timed {@link #poll(long,TimeUnit) poll}),
* else inserts the specified element at the tail of this queue
* and waits until the element is received by a consumer,
* returning {@code false} if the specified wait time elapses
* before the element can be transferred.
*
* @throws NullPointerException if the specified element is null
*/
public boolean tryTransfer(E e, long timeout, TimeUnit unit)
throws InterruptedException {
if (xfer(e, true, TIMED, unit.toNanos(timeout)) == null)
return true;
if (!Thread.interrupted())
return false;
throw new InterruptedException();
}
public E take() throws InterruptedException {
E e = xfer(null, false, SYNC, 0);
if (e != null)
return e;
Thread.interrupted();
throw new InterruptedException();
}
public E poll(long timeout, TimeUnit unit) throws InterruptedException {
E e = xfer(null, false, TIMED, unit.toNanos(timeout));
if (e != null || !Thread.interrupted())
return e;
throw new InterruptedException();
}
public E poll() {
return xfer(null, false, NOW, 0);
}
可以看到,基本上每个方法内部,都是调用了xfer()方法来进行逻辑处理。
此外:
该类中还有一个比较长的解释说明,如下:
/*
* *** Overview of Dual Queues with Slack ***
*
* Dual Queues, introduced by Scherer and Scott
* (http://www.cs.rice.edu/~wns1/papers/2004-DISC-DDS.pdf) are
* (linked) queues in which nodes may represent either data or
* requests. When a thread tries to enqueue a data node, but
* encounters a request node, it instead "matches" and removes it;
* and vice versa for enqueuing requests. Blocking Dual Queues
* arrange that threads enqueuing unmatched requests block until
* other threads provide the match. Dual Synchronous Queues (see
* Scherer, Lea, & Scott
* http://www.cs.rochester.edu/u/scott/papers/2009_Scherer_CACM_SSQ.pdf)
* additionally arrange that threads enqueuing unmatched data also
* block. Dual Transfer Queues support all of these modes, as
* dictated by callers.
*
* A FIFO dual queue may be implemented using a variation of the
* Michael & Scott (M&S) lock-free queue algorithm
* (http://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues.pdf).
* It maintains two pointer fields, "head", pointing to a
* (matched) node that in turn points to the first actual
* (unmatched) queue node (or null if empty); and "tail" that
* points to the last node on the queue (or again null if
* empty). For example, here is a possible queue with four data
* elements:
*
* head tail
* | |
* v v
* M -> U -> U -> U -> U
*
* The M&S queue algorithm is known to be prone to scalability and
* overhead limitations when maintaining (via CAS) these head and
* tail pointers. This has led to the development of
* contention-reducing variants such as elimination arrays (see
* Moir et al http://portal.acm.org/citation.cfm?id=1074013) and
* optimistic back pointers (see Ladan-Mozes & Shavit
* http://people.csail.mit.edu/edya/publications/OptimisticFIFOQueue-journal.pdf).
* However, the nature of dual queues enables a simpler tactic for
* improving M&S-style implementations when dual-ness is needed.
*
* In a dual queue, each node must atomically maintain its match
* status. While there are other possible variants, we implement
* this here as: for a data-mode node, matching entails CASing an
* "item" field from a non-null data value to null upon match, and
* vice-versa for request nodes, CASing from null to a data
* value. (Note that the linearization properties of this style of
* queue are easy to verify -- elements are made available by
* linking, and unavailable by matching.) Compared to plain M&S
* queues, this property of dual queues requires one additional
* successful atomic operation per enq/deq pair. But it also
* enables lower cost variants of queue maintenance mechanics. (A
* variation of this idea applies even for non-dual queues that
* support deletion of interior elements, such as
* j.u.c.ConcurrentLinkedQueue.)
*
* Once a node is matched, its match status can never again
* change. We may thus arrange that the linked list of them
* contain a prefix of zero or more matched nodes, followed by a
* suffix of zero or more unmatched nodes. (Note that we allow
* both the prefix and suffix to be zero length, which in turn
* means that we do not use a dummy header.) If we were not
* concerned with either time or space efficiency, we could
* correctly perform enqueue and dequeue operations by traversing
* from a pointer to the initial node; CASing the item of the
* first unmatched node on match and CASing the next field of the
* trailing node on appends. (Plus some special-casing when
* initially empty). While this would be a terrible idea in
* itself, it does have the benefit of not requiring ANY atomic
* updates on head/tail fields.
*
* We introduce here an approach that lies between the extremes of
* never versus always updating queue (head and tail) pointers.
* This offers a tradeoff between sometimes requiring extra
* traversal steps to locate the first and/or last unmatched
* nodes, versus the reduced overhead and contention of fewer
* updates to queue pointers. For example, a possible snapshot of
* a queue is:
*
* head tail
* | |
* v v
* M -> M -> U -> U -> U -> U
*
* The best value for this "slack" (the targeted maximum distance
* between the value of "head" and the first unmatched node, and
* similarly for "tail") is an empirical matter. We have found
* that using very small constants in the range of 1-3 work best
* over a range of platforms. Larger values introduce increasing
* costs of cache misses and risks of long traversal chains, while
* smaller values increase CAS contention and overhead.
*
* Dual queues with slack differ from plain M&S dual queues by
* virtue of only sometimes updating head or tail pointers when
* matching, appending, or even traversing nodes; in order to
* maintain a targeted slack. The idea of "sometimes" may be
* operationalized in several ways. The simplest is to use a
* per-operation counter incremented on each traversal step, and
* to try (via CAS) to update the associated queue pointer
* whenever the count exceeds a threshold. Another, that requires
* more overhead, is to use random number generators to update
* with a given probability per traversal step.
*
* In any strategy along these lines, because CASes updating
* fields may fail, the actual slack may exceed targeted
* slack. However, they may be retried at any time to maintain
* targets. Even when using very small slack values, this
* approach works well for dual queues because it allows all
* operations up to the point of matching or appending an item
* (hence potentially allowing progress by another thread) to be
* read-only, thus not introducing any further contention. As
* described below, we implement this by performing slack
* maintenance retries only after these points.
*
* As an accompaniment to such techniques, traversal overhead can
* be further reduced without increasing contention of head
* pointer updates: Threads may sometimes shortcut the "next" link
* path from the current "head" node to be closer to the currently
* known first unmatched node, and similarly for tail. Again, this
* may be triggered with using thresholds or randomization.
*
* These ideas must be further extended to avoid unbounded amounts
* of costly-to-reclaim garbage caused by the sequential "next"
* links of nodes starting at old forgotten head nodes: As first
* described in detail by Boehm
* (http://portal.acm.org/citation.cfm?doid=503272.503282) if a GC
* delays noticing that any arbitrarily old node has become
* garbage, all newer dead nodes will also be unreclaimed.
* (Similar issues arise in non-GC environments.) To cope with
* this in our implementation, upon CASing to advance the head
* pointer, we set the "next" link of the previous head to point
* only to itself; thus limiting the length of connected dead lists.
* (We also take similar care to wipe out possibly garbage
* retaining values held in other Node fields.) However, doing so
* adds some further complexity to traversal: If any "next"
* pointer links to itself, it indicates that the current thread
* has lagged behind a head-update, and so the traversal must
* continue from the "head". Traversals trying to find the
* current tail starting from "tail" may also encounter
* self-links, in which case they also continue at "head".
*
* It is tempting in slack-based scheme to not even use CAS for
* updates (similarly to Ladan-Mozes & Shavit). However, this
* cannot be done for head updates under the above link-forgetting
* mechanics because an update may leave head at a detached node.
* And while direct writes are possible for tail updates, they
* increase the risk of long retraversals, and hence long garbage
* chains, which can be much more costly than is worthwhile
* considering that the cost difference of performing a CAS vs
* write is smaller when they are not triggered on each operation
* (especially considering that writes and CASes equally require
* additional GC bookkeeping ("write barriers") that are sometimes
* more costly than the writes themselves because of contention).
*
* *** Overview of implementation ***
*
* We use a threshold-based approach to updates, with a slack
* threshold of two -- that is, we update head/tail when the
* current pointer appears to be two or more steps away from the
* first/last node. The slack value is hard-wired: a path greater
* than one is naturally implemented by checking equality of
* traversal pointers except when the list has only one element,
* in which case we keep slack threshold at one. Avoiding tracking
* explicit counts across method calls slightly simplifies an
* already-messy implementation. Using randomization would
* probably work better if there were a low-quality dirt-cheap
* per-thread one available, but even ThreadLocalRandom is too
* heavy for these purposes.
*
* With such a small slack threshold value, it is not worthwhile
* to augment this with path short-circuiting (i.e., unsplicing
* interior nodes) except in the case of cancellation/removal (see
* below).
*
* We allow both the head and tail fields to be null before any
* nodes are enqueued; initializing upon first append. This
* simplifies some other logic, as well as providing more
* efficient explicit control paths instead of letting JVMs insert
* implicit NullPointerExceptions when they are null. While not
* currently fully implemented, we also leave open the possibility
* of re-nulling these fields when empty (which is complicated to
* arrange, for little benefit.)
*
* All enqueue/dequeue operations are handled by the single method
* "xfer" with parameters indicating whether to act as some form
* of offer, put, poll, take, or transfer (each possibly with
* timeout). The relative complexity of using one monolithic
* method outweighs the code bulk and maintenance problems of
* using separate methods for each case.
*
* Operation consists of up to three phases. The first is
* implemented within method xfer, the second in tryAppend, and
* the third in method awaitMatch.
*
* 1. Try to match an existing node
*
* Starting at head, skip already-matched nodes until finding
* an unmatched node of opposite mode, if one exists, in which
* case matching it and returning, also if necessary updating
* head to one past the matched node (or the node itself if the
* list has no other unmatched nodes). If the CAS misses, then
* a loop retries advancing head by two steps until either
* success or the slack is at most two. By requiring that each
* attempt advances head by two (if applicable), we ensure that
* the slack does not grow without bound. Traversals also check
* if the initial head is now off-list, in which case they
* start at the new head.
*
* If no candidates are found and the call was untimed
* poll/offer, (argument "how" is NOW) return.
*
* 2. Try to append a new node (method tryAppend)
*
* Starting at current tail pointer, find the actual last node
* and try to append a new node (or if head was null, establish
* the first node). Nodes can be appended only if their
* predecessors are either already matched or are of the same
* mode. If we detect otherwise, then a new node with opposite
* mode must have been appended during traversal, so we must
* restart at phase 1. The traversal and update steps are
* otherwise similar to phase 1: Retrying upon CAS misses and
* checking for staleness. In particular, if a self-link is
* encountered, then we can safely jump to a node on the list
* by continuing the traversal at current head.
*
* On successful append, if the call was ASYNC, return.
*
* 3. Await match or cancellation (method awaitMatch)
*
* Wait for another thread to match node; instead cancelling if
* the current thread was interrupted or the wait timed out. On
* multiprocessors, we use front-of-queue spinning: If a node
* appears to be the first unmatched node in the queue, it
* spins a bit before blocking. In either case, before blocking
* it tries to unsplice any nodes between the current "head"
* and the first unmatched node.
*
* Front-of-queue spinning vastly improves performance of
* heavily contended queues. And so long as it is relatively
* brief and "quiet", spinning does not much impact performance
* of less-contended queues. During spins threads check their
* interrupt status and generate a thread-local random number
* to decide to occasionally perform a Thread.yield. While
* yield has underdefined specs, we assume that it might help,
* and will not hurt, in limiting impact of spinning on busy
* systems. We also use smaller (1/2) spins for nodes that are
* not known to be front but whose predecessors have not
* blocked -- these "chained" spins avoid artifacts of
* front-of-queue rules which otherwise lead to alternating
* nodes spinning vs blocking. Further, front threads that
* represent phase changes (from data to request node or vice
* versa) compared to their predecessors receive additional
* chained spins, reflecting longer paths typically required to
* unblock threads during phase changes.
*
*
* ** Unlinking removed interior nodes **
*
* In addition to minimizing garbage retention via self-linking
* described above, we also unlink removed interior nodes. These
* may arise due to timed out or interrupted waits, or calls to
* remove(x) or Iterator.remove. Normally, given a node that was
* at one time known to be the predecessor of some node s that is
* to be removed, we can unsplice s by CASing the next field of
* its predecessor if it still points to s (otherwise s must
* already have been removed or is now offlist). But there are two
* situations in which we cannot guarantee to make node s
* unreachable in this way: (1) If s is the trailing node of list
* (i.e., with null next), then it is pinned as the target node
* for appends, so can only be removed later after other nodes are
* appended. (2) We cannot necessarily unlink s given a
* predecessor node that is matched (including the case of being
* cancelled): the predecessor may already be unspliced, in which
* case some previous reachable node may still point to s.
* (For further explanation see Herlihy & Shavit "The Art of
* Multiprocessor Programming" chapter 9). Although, in both
* cases, we can rule out the need for further action if either s
* or its predecessor are (or can be made to be) at, or fall off
* from, the head of list.
*
* Without taking these into account, it would be possible for an
* unbounded number of supposedly removed nodes to remain
* reachable. Situations leading to such buildup are uncommon but
* can occur in practice; for example when a series of short timed
* calls to poll repeatedly time out but never otherwise fall off
* the list because of an untimed call to take at the front of the
* queue.
*
* When these cases arise, rather than always retraversing the
* entire list to find an actual predecessor to unlink (which
* won't help for case (1) anyway), we record a conservative
* estimate of possible unsplice failures (in "sweepVotes").
* We trigger a full sweep when the estimate exceeds a threshold
* ("SWEEP_THRESHOLD") indicating the maximum number of estimated
* removal failures to tolerate before sweeping through, unlinking
* cancelled nodes that were not unlinked upon initial removal.
* We perform sweeps by the thread hitting threshold (rather than
* background threads or by spreading work to other threads)
* because in the main contexts in which removal occurs, the
* caller is already timed-out, cancelled, or performing a
* potentially O(n) operation (e.g. remove(x)), none of which are
* time-critical enough to warrant the overhead that alternatives
* would impose on other threads.
*
* Because the sweepVotes estimate is conservative, and because
* nodes become unlinked "naturally" as they fall off the head of
* the queue, and because we allow votes to accumulate even while
* sweeps are in progress, there are typically significantly fewer
* such nodes than estimated. Choice of a threshold value
* balances the likelihood of wasted effort and contention, versus
* providing a worst-case bound on retention of interior nodes in
* quiescent queues. The value defined below was chosen
* empirically to balance these under various timeout scenarios.
*
* Note that we cannot self-link unlinked interior nodes during
* sweeps. However, the associated garbage chains terminate when
* some successor ultimately falls off the head of the list and is
* self-linked.
*/
看着这段注释,尝试着进行了几次翻译,理解,最终的感受就只有一个:
懵逼树上懵逼果,懵逼树下你和我,简直是一头雾水,两行清泪。
简单的进行了翻译了一下,大致可以对比着参考一下,具体能从其中看出来多少门道,那就是看个人造化了:
当一个线程试图进入一个数据节点,但遇到一个请求节点,它代替“匹配”并删除它;对排队请求来说,反之亦然。阻塞双队列将排队未匹配请求的线程阻塞,直到其他线程提供匹配。双同步队列(参见Scherer, Lea, & Scott http://www.cs.rochester.edu/u/scott/papers/2009_Scherer_CACM_SSQ.pdf)还会导致排队的不匹配数据也会阻塞。根据呼叫者的要求,双传输队列支持所有这些模式。
FIFO双队列可以使用Michael & Scott (M&S)无锁队列算法的变体来实现(http://www.cs.rochester.edu/u/scott/papers/1996_PODC_queues.pdf)。它维护两个指针字段,“head”,指向一个(匹配的)节点,而这个节点又指向第一个实际的(不匹配的)队列节点(空的则为null);和指向队列上最后一个节点的“tail”(如果为空,也为null)。例如,下面是一个可能有四个数据元素的队列:
head tail
| |
v v
M -> U -> U -> U -> U
这导致了减少争论的变体的发展,如消除数组(参见Moir等人http://portal.acm.org/citation.cfm?id=1074013)和乐观的反向指针(参见Ladan-Mozes & Shavit http://people.csail.mit.edu/edya/publications/OptimisticFIFOQueue-journal.pdf)。然而,当需要双性时,双队列的性质为改进m&s风格的实现提供了更简单的策略。
在双队列中,每个节点必须自动维护其匹配状态。虽然还有其他可能的变体,但我们在这里实现如下:对于数据模式节点,匹配需要在匹配时将“item”字段从非空数据值封装为空,反之亦然,对于请求节点,从空封装为数据值。(注意,这种风格的队列的线性化属性很容易验证——元素通过链接而可用,通过匹配而不可用。)与普通的M&S队列相比,双队列的这个属性需要每个enq/deq对额外的一个成功的原子操作。但它也支持成本更低的队列维护机制。(这种思想的一个变体甚至适用于支持删除内部元素的非双队列,如j.u.c.ConcurrentLinkedQueue。)
一旦一个节点被匹配,它的匹配状态就再也不会改变。因此,我们可以将它们的链表安排为包含0个或多个匹配节点的前缀,后跟0个或多个不匹配节点的后缀。(注意,我们允许零长度的前缀和后缀,这反过来又意味着我们不使用假头。)如果我们不关心时间和空间效率,我们可以正确地执行入队和出队操作通过遍历一个指针初始节点;在match上封装第一个不匹配节点的项,在append上封装尾随节点的下一个字段。(加上一些最初为空的特殊大小写)。虽然这本身是一个糟糕的想法,但它的好处是不需要对头/尾字段进行任何原子更新。
我们在这里介绍一种介于从不和始终更新队列(头和尾)指针这两个极端之间的方法。这在有时需要额外的遍历步骤来定位第一个和/或最后一个不匹配的节点与减少开销和争用更少的队列指针更新之间提供了一种折衷。例如,队列的一个可能的快照是:
head tail
| |
v v
M -> M -> U -> U -> U -> U
我们发现,在各种平台上,使用非常小的1-3个常量效果最好。较大的值会增加缓存丢失的成本和长遍历链的风险,而较小的值会增加CAS争用和开销。
作为这种技术的辅助,遍历开销可以进一步减少,而不会增加头指针更新的争用:线程有时可以从当前“头”节点捷径“下一个”链接路径,以更接近当前已知的第一个不匹配节点,tail也是如此。同样,这可以通过使用阈值或随机化来触发。
这些想法必须进一步扩展,以避免无限数量的costly-to-reclaim顺序“下一个”造成的垃圾链接的节点开始在老忘记头节点:首先详细描述的波姆(http://portal.acm.org/citation.cfm?doid=503272.503282)如果一个GC延迟注意到任何旧任意节点已经成为垃圾,所有新的死亡节点也将荒地。(在非gc环境中也会出现类似的问题)为了解决这个问题,在我们的实现中,我们将前一个head的“next”链接设置为只指向自身;这样就限制了连接死列表的长度。(我们也采取类似的保健可能消灭垃圾保留值在其他节点字段。)然而,这样做增加了一些进一步的复杂性遍历:如果任何“下一个”指针链接本身,这表明当前线程已经落后于head-update,所以必须继续遍历从“头”。试图从“tail”开始查找当前tail的遍历也可能会遇到self-links,在这种情况下,它们也会在“head”处继续。
在基于slack的方案中,甚至不使用CAS进行更新都是很诱人的(类似于Ladan-Mozes & Shavit)。然而,在上述链接遗忘机制下,这不能用于head更新,因为更新可能会将head留在独立的节点上。尾巴虽然直接写可能更新,增加长retraversals的风险,因此垃圾长链,可以比值得考虑到更昂贵的成本差异执行CAS vs写小当他们不触发每个操作(尤其是考虑到写和案件同样需要额外的GC记帐(“写障碍”),有时候可能会超出写道自己因为争用)。
*** Overview of implementation ***
松弛值是硬连接的:大于1的路径是通过检查遍历指针是否相等来自然实现的,除非列表只有一个元素,在这种情况下我们将松弛阈值保持在1。避免在方法调用之间跟踪显式计数,可以略微简化已经很混乱的实现。如果每个线程都有一个低质量、低成本的随机化方法,那么使用随机化可能会更好,但即使是ThreadLocalRandom对于这些目的来说也过于繁重。
所有入队/出队操作都由单一方法“xfer”处理,该方法带有参数,指示是否作为某种形式的报价、put、轮询、接受或传输(每一个都可能带有超时)。使用单一方法的相对复杂性超过了为每种情况使用单独方法所带来的代码量和维护问题。
操作包括三个阶段。第一个在方法xfer中实现,第二个在tryAppend中实现,第三个在方法awaitMatch中实现。
1. Try to match an existing node
如果CAS失败了,那么循环将重新尝试向前推进两步,直到成功或懈怠不超过两步。通过要求每次尝试都向前推进两个头(如果适用的话),我们确保了松弛不会无限制地增长。遍历还检查初始磁头现在是否不在列表中,在这种情况下,它们从新的磁头开始。
如果没有找到候选人,并且调用是不定时的poll/offer,(参数“how”是现在)返回。
2. Try to append a new node (method tryAppend)
只有当节点的前身已经匹配或模式相同时,才可以追加节点。如果我们检测到其他情况,那么在遍历过程中必须附加一个具有相反模式的新节点,因此我们必须在阶段1重新启动。在其他方面,遍历和更新步骤类似于阶段1:在CAS失败时重新尝试并检查是否过时。特别是,如果遇到了self链接,那么我们可以通过继续遍历当前head来安全地跳转到列表上的节点。
在成功追加时,如果调用是异步的,则返回。
3. Await match or cancellation (method awaitMatch)
相反,如果当前线程被中断或等待超时,则取消。在多处理器上,我们使用队列前旋转:如果一个节点是队列中第一个不匹配的节点,它会在阻塞之前旋转一点。在任何一种情况下,在阻塞之前,它都会尝试在当前“头”和第一个不匹配的节点之间不拼接任何节点。
队列前端旋转极大地提高了激烈竞争的队列的性能。只要它相对简短且“安静”,旋转对竞争较少的队列的性能不会产生太大影响。在旋转期间,线程检查它们的中断状态并生成一个线程本地随机数,以决定偶尔执行Thread.yield。虽然yield的规格定义不足,但我们认为它可能会在限制旋转对繁忙系统的影响方面有所帮助,而不会有坏处。我们还使用较小的(1/2)自旋来表示那些不知道是在前面的节点,但它们的前辈没有被阻塞——这些“链式”自旋避免了队列前面规则的人工产物,否则会导致交替的节点旋转vs阻塞。此外,与它们的前辈相比,表示相位变化(从数据到请求节点,或者相反)的前端线程接收到额外的链旋,反映了在相位变化期间解除线程阻塞通常需要的更长的路径。
** Unlinking removed interior nodes **
这些可能是由于超时或中断等待,或调用remove(x)或Iterator.remove而引起的。通常情况下,给定一个节点,一度已知一些节点的前任年代也被删除,我们可以通过套管unsplice年代下一个字段的前任如果它仍然指出了s(否则年代必须已经被删除或现在offlist)。但是有两种情况,我们不能保证节点不可到达的以这种方式:(1)如果s的拖尾节点列表(例如,空下),然后固定作为附加目标节点,所以只能删除后附加上其他节点。我们不能拆开s(2)给定一个前任节点相匹配(包括被取消的情况下):前任可能已经unspliced,在这种情况下,一些以前的可及节点可能仍然指向年代。(作进一步的解释见Herlihy & Shavit第9章“多处理器编程的艺术”)。尽管如此,在这两种情况下,我们可以排除需要进一步的行动如果年代或其前身(或可以),或脱落,列表。
当出现这些情况时,我们并不总是重新遍历整个列表来找到一个实际的前任来解除链接(这对case(1)无论如何都没有帮助),而是记录一个可能的unsplice失败的保守估计(在“sweepVotes”中)。当估计值超过阈值(“SWEEP_THRESHOLD”)时,我们会触发一次全面清除,该阈值表示在清除之前可以容忍的最大预估删除失败数,取消在初始删除时未取消链接的取消节点的链接。我们执行扫描线程达到阈值(而不是后台线程或通过传播工作,其他线程)因为在主中移除时,调用者已经断开,取消,或者执行一个潜在的O (n)操作(例如,删除(x)),其中没有一个是时间关键型到足以替代将对其他线程的开销。
因为sweepVotes估计是保守的,因为节点在从队列头掉下来时“自然地”断开链接,并且因为我们允许在进行扫选时投票累积,这样的节点通常比估计的要少得多。阈值的选择可以平衡浪费精力和争用的可能性,也可以提供静态队列中内部节点保留的最差情况边界。下面定义的值是根据经验选择的,以便在各种超时场景下平衡这些参数。
注意,我们不能在扫描期间自链接未链接的内部节点。然而,当某个后继节点最终从列表的头节点上脱落并自链接时,相关的垃圾链终止。
其中涉及Unsafe类的介绍,参见: