ConcurrentHashMap概述

最新推荐文章于 2024-08-19 19:55:55 发布

无双.蜗牛

最新推荐文章于 2024-08-19 19:55:55 发布

阅读量307

点赞数

分类专栏： spring 文章标签： java

spring 专栏收录该内容

1 篇文章

订阅专栏

ConcurrentHashMap概述

概述：

这个哈希表的主要设计目标是并发可读性（通常是get（）方法，但是迭代器和相关方法），同时最小化更新争论。次要目标是保持空间消耗相同或比java.util.HashMap，并支持高多线程对空表的初始插入率。

这个映射通常充当一个bined（bucked）哈希表。每个键映射节点中保留值。大多数节点都是实例包含hash、key、value和next的基本节点类领域。然而，存在着各种各样的子类：TreeNodes是排列在平衡的树上，而不是列表中。树胶固根一组排排的树。转发节点位于头部在调整尺寸时的箱数。ReservationNodes用作在computeIfAbsent和相关方法。TreeBin、ForwardingNode和ReservationNode不包含普通用户密钥、值或散列，在搜索过程中很容易区分因为它们有负的散列字段和空的键和值领域。（这些特殊节点不是不常见的，就是暂时的，因此，携带一些未使用的田地的影响是
无关紧要。）

表被延迟地初始化为第一次插入。表中的每个箱子通常包含节点列表（通常，列表只有零个或一个节点）。表访问需要易失性/原子性（volatile/atomic）读、写和案例。因为没有别的办法添加更多的间接指令，我们使用内部函数(sun.misc.Unsafe不安全)操作。我们使用节点哈希字段的顶部（符号）位进行控制因为它是可用的——因为它是可用的限制。具有负散列字段的节点是特殊的在映射方法中处理或忽略。中第一个节点的插入（通过put或其变体）清空垃圾箱只需将其放入垃圾箱即可。这是到目前为止最常见的情况是密钥/哈希分布。其他更新操作（插入，
delete，and replace）需要锁。我们不想浪费时间将非重复锁对象与关联所需的空间每个bin，因此使用bin列表本身的第一个节点作为一把锁。这些锁的锁定支持依赖于内置的“synchronized”监视器monitors。使用列表的第一个节点作为锁本身并不是但是足够了：当一个节点被锁定时，任何更新都必须首先进行验证它在锁定后仍然是第一个节点，并且否则重试。因为新节点总是附加到列表中，一旦一个节点是bin中的第一个节点，它将保持在第一个直到被删除或者箱子失效（调整尺寸时）。每箱锁的主要缺点是其他更新
在受相同保护的bin列表中对其他节点的操作锁可能会暂停，例如当user equals（）或mapping时函数需要很长时间。但是，从统计上看随机哈希码，这不是一个常见的问题。理想情况下库中节点的频率服从泊松分布
(http://en.wikipedia.org/wiki/Poisson_发行版)带着参数平均约为0.5，给定调整阈值0.75，尽管由于调整大小而有很大的差异粒度。忽略方差，则列表大小k为（exp（-0.5）pow（0.5，k）/阶乘（k））。这个
第一个值是：
0:0.60653066
1： 0.30326533
2： 0.07581633
3： 0.01263606
4： 0.00157952
5： 0.00015795
6： 0.0000136万
7： 0.00000094
8： 0.00000006
多：不到千万分之一
两个线程访问不同线程的锁争用概率
在随机哈希下，元素大约为1/（8#个元素）。

实践中遇到的实际哈希代码分布有时明显偏离均匀随机性。这个包括N>（1<<30）时的情况，因此某些关键点必须碰撞。类似地，在这种情况下，多个密钥或者只设计了相同的哈希代码在隐藏的高位。所以我们使用第二种策略当存储单元中的节点数超过门槛。这些树状图使用平衡树来保存节点（a特殊形式的红黑树），限定搜索时间O（对数N）。树形图中的每个搜索步骤至少是慢如在常规列表中，但考虑到N不能超过（1<<64）（地址用完之前）此边界搜索步骤、锁定保持时间等，调整为合理的常数（大致每个操作检查100个节点最坏情况下），只要具有可比性（这是非常常见的——字符串、Long等）。TreeBin节点（TreeNodes）也保持相同的“next”遍历指针作为常规节点，因此可以在中遍历以相同的方式使用迭代器。当占用率超过某个百分比时，将调整表的大小阈值（名义上为0.75，但见下文）。有线索吗
注意到储物箱过满可能有助于在启动线程分配并设置替换数组。然而，与其拖延时间当占用率超过某个百分比时，将调整表的大小阈值（名义上为0.75，但见下文）。有线索吗注意到储物箱过满可能有助于在启动线程分配并设置替换数组。但是，这些其他线程可以继续，而不是暂停使用树胶蛋白可以保护我们免受在调整大小时过度填充的最坏情况进步。通过一个接一个地转移垃圾箱，从一张桌子到另一张桌子。然而，线程声称很小之前要传输的索引块（通过字段transferIndex）这样做，减少了争论。田野上的一代印记sizeCtl确保重定大小不会重叠。因为我们是使用两次展开的能力，每个箱子中的元素必须要么保持相同的指数，要么以2的幂次移动抵消。我们通过捕捉旧节点可以重用的情况，因为它们的下一个字段不会改变的。平均而言，只有六分之一的人需要当表翻倍时进行克隆。它们替换的节点将是一旦它们不再被引用，就可以进行垃圾回收任何可能同时处于遍历表。转移时，旧表箱包含只有一个特殊的转发节点（hash字段为“MOVED”）可以包含下一个表作为其键。在遇到转发节点，访问和更新操作重新启动，使用新桌子。每一个箱子传送都需要它的箱子锁，这个锁可能会停止正在调整大小时等待锁定。但是，因为线程可以加入并帮助调整大小，而不是竞争锁，平均聚合等待时间会随着调整大小而缩短进步。转移操作还必须确保任何人都可以使用旧表和新表中的无障碍垃圾箱穿越。这部分是从最后一个箱子(表格长度-1）朝第一个方向。看见一个转发节点，遍历（见类遍历器）排列到移动到新表而不重新访问节点。为了确保即使无序移动，也不会跳过任何中间节点，
在第一次遇到在遍历过程中的一种转发节点，用于在下列情况下保持其位置：稍后处理当前表。对这些的需要保存/恢复机制相对较少，但是遇到转发节点，通常会遇到更多转发节点。所以遍历器使用一个简单的缓存方案来避免创建许多新的TableStack节点。（感谢Peter Levart建议在此处使用堆栈。）
遍历方案也适用于箱子的范围（通过备用遍历器构造函数）支持分区聚合操作。另外，只读如果将操作转发到空表，则操作将放弃提供对关闭样式清除的支持，但也不是目前正在实施。

Lazy table initialization在第一次使用之前将占用空间最小化，当第一个操作来自putAll、带有映射参数的构造函数或反序列化。这些情况试图覆盖初始容量设置，但在种族案件中却没有起到无害的作用。元素计数是使用专门化LongAdder。我们需要一个专业化而不是只需使用LongAdder来访问隐式竞争感知导致多个反细胞。反机制避免了关于更新，但如果读取也可能遇到缓存抖动在并发访问期间经常发生。为了避免阅读，只有在添加到bin已经包含两个或多个节点。在统一哈希下分布，在阈值下发生的概率大约是13%，这意味着只有大约八分之一的人会把支票放在支票上阈值（调整大小后，这样做的更少）。

TreeBins 使用一种特殊的比较形式来搜索和相关操作（这是我们不能使用的主要原因现有集合，如TreeMaps）。TreeBins 含有可比元素，但可能包含其他元素，以及可比但不一定可比的元素相同的T，所以我们不能在它们之间调用compareTo。处理因此，树的顺序主要是按哈希值排序，然后按可比性比较订购（如适用）。在节点上查找时，如果元素不可比较或比较为0，则两者都保留如果是被绑的话，可能需要搜查合适的孩子哈希值。（这对应于完整列表搜索如果所有元素都不可比较且在插入时，保持总的顺序（或在重新平衡中，我们进行比较分类和识别密码作为连接断路器。红黑平衡代码是从jdk之前的集合中更新的
(http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
根据Cormen，Leiserson和Rivest的介绍算法（CLR）
TreeBins 还需要额外的锁定机制。同时即使在更新时，树的遍历不是，主要是因为树的旋转这可能会更改根节点和/或其链接。TreeBins包括一个简单的读写锁机制，它寄生在主仓同步策略：结构调整与插入或删除关联的已锁定
（因此不能与其他作家发生冲突），但必须等待正在阅读。因为只有一个这样的人waiter，我们使用一个简单的方案，使用一个单独的“waiter”字段
区块写入程序。然而，读者永远不需要阻止。如果根锁被保持，它们沿着缓慢的穿越路径前进（通过下一个指针）直到锁可用或列表exhausted，以先到者为准。这些案子不快，但是最大化总预期吞吐量。

维护API和序列化与以前的这个类的版本引入了几个奇怪的地方。主要是：我们保留未接触但未使用的构造函数参数引用并发级别。我们接受一个loadFactor构造函数参数，但只应用于初始表容量（这是唯一的我们也宣布
以最小形式实例化的未使用的“Segment”类仅当序列化时。另外，仅为了与以前的版本兼容类，它扩展了AbstractMap，即使它的所有方法都被覆盖了，所以这只是无用的行李。这个文件的组织是为了让事情更容易理解在阅读时，他们可能会比其他人：首先是主静态声明和实用程序，然后是字段，然后是主公共方法（将多个公共方法分解为然后是大小调整方法、树、遍历器和
批量操作。

Overview:

The primary design goal of this hash table is to maintain
concurrent readability (typically method get(), but also
iterators and related methods) while minimizing update
contention. Secondary goals are to keep space consumption about
the same or better than java.util.HashMap, and to support high
initial insertion rates on an empty table by many threads.

This map usually acts as a binned (bucketed) hash table. Each
key-value mapping is held in a Node. Most nodes are instances
of the basic Node class with hash, key, value, and next
fields. However, various subclasses exist: TreeNodes are
arranged in balanced trees, not lists. TreeBins hold the roots
of sets of TreeNodes. ForwardingNodes are placed at the heads
of bins during resizing. ReservationNodes are used as
placeholders while establishing values in computeIfAbsent and
related methods. The types TreeBin, ForwardingNode, and
ReservationNode do not hold normal user keys, values, or
hashes, and are readily distinguishable during search etc
because they have negative hash fields and null key and value
fields. (These special nodes are either uncommon or transient,
so the impact of carrying around some unused fields is
insignificant.)

The table is lazily initialized to a power-of-two size upon the
first insertion. Each bin in the table normally contains a
list of Nodes (most often, the list has only zero or one Node).
Table accesses require volatile/atomic reads, writes, and
CASes. Because there is no other way to arrange this without
adding further indirections, we use intrinsics
(sun.misc.Unsafe) operations.

We use the top (sign) bit of Node hash fields for control
purposes – it is available anyway because of addressing
constraints. Nodes with negative hash fields are specially
handled or ignored in map methods.

Insertion (via put or its variants) of the first node in an
empty bin is performed by just CASing it to the bin. This is
by far the most common case for put operations under most
key/hash distributions. Other update operations (insert,
delete, and replace) require locks. We do not want to waste
the space required to associate a distinct lock object with
each bin, so instead use the first node of a bin list itself as
a lock. Locking support for these locks relies on builtin
“synchronized” monitors.

Using the first node of a list as a lock does not by itself
suffice though: When a node is locked, any update must first
validate that it is still the first node after locking it, and
retry if not. Because new nodes are always appended to lists,
once a node is first in a bin, it remains first until deleted
or the bin becomes invalidated (upon resizing).

The main disadvantage of per-bin locks is that other update
operations on other nodes in a bin list protected by the same
lock can stall, for example when user equals() or mapping
functions take a long time. However, statistically, under
random hash codes, this is not a common problem. Ideally, the
frequency of nodes in bins follows a Poisson distribution
(http://en.wikipedia.org/wiki/Poisson_distribution) with a
parameter of about 0.5 on average, given the resizing threshold
of 0.75, although with a large variance because of resizing
granularity. Ignoring variance, the expected occurrences of
list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The
first values are:

0: 0.60653066
1: 0.30326533
2: 0.07581633
3: 0.01263606
4: 0.00157952
5: 0.00015795
6: 0.00001316
7: 0.00000094
8: 0.00000006
more: less than 1 in ten million

Lock contention probability for two threads accessing distinct
elements is roughly 1 / (8 * #elements) under random hashes.

Actual hash code distributions encountered in practice
sometimes deviate significantly from uniform randomness. This
includes the case when N > (1<<30), so some keys MUST collide.
Similarly for dumb or hostile usages in which multiple keys are
designed to have identical hash codes or ones that differs only
in masked-out high bits. So we use a secondary strategy that
applies when the number of nodes in a bin exceeds a
threshold. These TreeBins use a balanced tree to hold nodes (a
specialized form of red-black trees), bounding search time to
O(log N). Each search step in a TreeBin is at least twice as
slow as in a regular list, but given that N cannot exceed
(1<<64) (before running out of addresses) this bounds search
steps, lock hold times, etc, to reasonable constants (roughly
100 nodes inspected per operation worst case) so long as keys
are Comparable (which is very common – String, Long, etc).
TreeBin nodes (TreeNodes) also maintain the same “next”
traversal pointers as regular nodes, so can be traversed in
iterators in the same way.

The table is resized when occupancy exceeds a percentage
threshold (nominally, 0.75, but see below). Any thread
noticing an overfull bin may assist in resizing after the
initiating thread allocates and sets up the replacement array.
However, rather than stalling, these other threads may proceed
with insertions etc. The use of TreeBins shields us from the
worst case effects of overfilling while resizes are in
progress. Resizing proceeds by transferring bins, one by one,
from the table to the next table. However, threads claim small
blocks of indices to transfer (via field transferIndex) before
doing so, reducing contention. A generation stamp in field
sizeCtl ensures that resizings do not overlap. Because we are
using power-of-two expansion, the elements from each bin must
either stay at same index, or move with a power of two
offset. We eliminate unnecessary node creation by catching
cases where old nodes can be reused because their next fields
won’t change. On average, only about one-sixth of them need
cloning when a table doubles. The nodes they replace will be
garbage collectable as soon as they are no longer referenced by
any reader thread that may be in the midst of concurrently
traversing table. Upon transfer, the old table bin contains
only a special forwarding node (with hash field “MOVED”) that
contains the next table as its key. On encountering a
forwarding node, access and update operations restart, using
the new table.

Each bin transfer requires its bin lock, which can stall
waiting for locks while resizing. However, because other
threads can join in and help resize rather than contend for
locks, average aggregate waits become shorter as resizing
progresses. The transfer operation must also ensure that all
accessible bins in both the old and new table are usable by any
traversal. This is arranged in part by proceeding from the
last bin (table.length - 1) up towards the first. Upon seeing
a forwarding node, traversals (see class Traverser) arrange to
move to the new table without revisiting nodes. To ensure that
no intervening nodes are skipped even when moved out of order,
a stack (see class TableStack) is created on first encounter of
a forwarding node during a traversal, to maintain its place if
later processing the current table. The need for these
save/restore mechanics is relatively rare, but when one
forwarding node is encountered, typically many more will be.
So Traversers use a simple caching scheme to avoid creating so
many new TableStack nodes. (Thanks to Peter Levart for
suggesting use of a stack here.)

The traversal scheme also applies to partial traversals of
ranges of bins (via an alternate Traverser constructor)
to support partitioned aggregate operations. Also, read-only
operations give up if ever forwarded to a null table, which
provides support for shutdown-style clearing, which is also not
currently implemented.

Lazy table initialization minimizes footprint until first use,
and also avoids resizings when the first operation is from a
putAll, constructor with map argument, or deserialization.
These cases attempt to override the initial capacity settings,
but harmlessly fail to take effect in cases of races.

The element count is maintained using a specialization of
LongAdder. We need to incorporate a specialization rather than
just use a LongAdder in order to access implicit
contention-sensing that leads to creation of multiple
CounterCells. The counter mechanics avoid contention on
updates but can encounter cache thrashing if read too
frequently during concurrent access. To avoid reading so often,
resizing under contention is attempted only upon adding to a
bin already holding two or more nodes. Under uniform hash
distributions, the probability of this occurring at threshold
is around 13%, meaning that only about 1 in 8 puts check
threshold (and after resizing, many fewer do so).

TreeBins use a special form of comparison for search and
related operations (which is the main reason we cannot use
existing collections such as TreeMaps). TreeBins contain
Comparable elements, but may contain others, as well as
elements that are Comparable but not necessarily Comparable for
the same T, so we cannot invoke compareTo among them. To handle
this, the tree is ordered primarily by hash value, then by
Comparable.compareTo order if applicable. On lookup at a node,
if elements are not comparable or compare as 0 then both left
and right children may need to be searched in the case of tied
hash values. (This corresponds to the full list search that
would be necessary if all elements were non-Comparable and had
tied hashes.) On insertion, to keep a total ordering (or as
close as is required here) across rebalancings, we compare
classes and identityHashCodes as tie-breakers. The red-black
balancing code is updated from pre-jdk-collections
(http://gee.cs.oswego.edu/dl/classes/collections/RBCell.java)
based in turn on Cormen, Leiserson, and Rivest “Introduction to
Algorithms” (CLR).

TreeBins also require an additional locking mechanism. While
list traversal is always possible by readers even during
updates, tree traversal is not, mainly because of tree-rotations
that may change the root node and/or its linkages. TreeBins
include a simple read-write lock mechanism parasitic on the
main bin-synchronization strategy: Structural adjustments
associated with an insertion or removal are already bin-locked
(and so cannot conflict with other writers) but must wait for
ongoing readers to finish. Since there can be only one such
waiter, we use a simple scheme using a single “waiter” field to
block writers. However, readers need never block. If the root
lock is held, they proceed along the slow traversal path (via
next-pointers) until the lock becomes available or the list is
exhausted, whichever comes first. These cases are not fast, but
maximize aggregate expected throughput.

Maintaining API and serialization compatibility with previous
versions of this class introduces several oddities. Mainly: We
leave untouched but unused constructor arguments refering to
concurrencyLevel. We accept a loadFactor constructor argument,
but apply it only to initial table capacity (which is the only
time that we can guarantee to honor it.) We also declare an
unused “Segment” class that is instantiated in minimal form
only when serializing.

Also, solely for compatibility with previous versions of this
class, it extends AbstractMap, even though all of its methods
are overridden, so it is just useless baggage.

This file is organized to make things a little easier to follow
while reading than they might otherwise: First the main static
declarations and utilities, then fields, then main public
methods (with a few factorings of multiple public methods into
internal ones), then sizing methods, trees, traversers, and
bulk operations.