CHM1.7和1.8的更改
网上看了很多CHM的分析,可能都不是很让我能有恍然大明白的感觉,毕竟1.7到1.8的巨大改动,肯定是有什么更高深的算法或者设计在里面。所以想自己去分析下。个人愚见,不喜欢请轻喷。
看了JDK1.8的CHM的代码,洋洋洒洒6000+行,应该是jdk源码最复杂的一个类了吧。头疼,分析一下流程和核心API吧~
对比1.8和1.7的升级优化的区别如下:
- 1.7时候是CHM是多个Segment (默认应该是16个),Segment集成ReentrantLock来实现锁定。所以锁的粒度宽泛。每个segment对应一个数组+链表的组合。理论上支持最高16个线程并发写入。
- 1.8时候CHM锁是针对数组单个元素,锁的粒度更加细微,冲突概率更低,效率更高。
- 1.8时候当链表过长的时候会升级为红黑树。防止hash攻击等降低查询效率
- 1.7结构如图
CHM1.8同时也维护了1.7的segment
CHM核心类和关键成员变量
Node
普通的数据节点,维护了hash, key , value, nextNode的属性
TreeNode
红黑树的数据节点,维护了hash, key , value, nextNode的属性
TreeBin
红黑树的根节点。没实际数据意义,hash = -2
ForwardingNode
当map需要reszie的时候,放在链表头的节点。没实际数据意义,hash = -1
ReservationNode
当map是computeIfAbsent时候用于保留,hash = -3
sizeCtl
/**
* Table initialization and resizing control. When negative, the
* table is being initialized or resized: -1 for initialization,
* else -(1 + the number of active resizing threads). Otherwise,
* when table is null, holds the initial table size to use upon
* creation, or 0 for default. After initialization, holds the
* next element count value upon which to resize the table.
*/
private transient volatile int sizeCtl;
主要用途是用来标志现在map的状态的。
- 0 default
- -1 初始化
- -(1 + resize线程数) map正在resize,并且可以多线程resize
- >0 map下次resize的大小
CounterCell
/**
* A padded cell for distributing counts. Adapted from LongAdder
* and Striped64. See their internal docs for explanation.
*/
@sun.misc.Contended static final class CounterCell {
volatile long value;
CounterCell(long x) { value = x; }
}
精髓之一
用于计算totalSize,当并发过大的时候,如果通过CAS来进行同步,当然可以,但是效率很低。所以在计算totalSize的时候是通过baseCount和CountCell[]来计算的,运用了分流治之的思想。
CHM核心API分析
put
源码如下:
/**
* Maps the specified key to the specified value in this table.
* Neither the key nor the value can be null.
*
* <p>The value can be retrieved by calling the {@code get} method
* with a key that is equal to the original key.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with {@code key}, or
* {@code null} if there was no mapping for {@code key}
* @throws NullPointerException if the specified key or value is null
*/
public V put(K key, V value) {
return putVal(key, value, false);
}
/** Implementation for put and putIfAbsent */
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
if (binCount != 0) {
if (binCount >= TREEIFY_THRESHOLD)
treeifyBin(tab, i);
if (oldVal != null)
return oldVal;
break;
}
}
}
addCount(1L, binCount);
return null;
}
虽然主函数只有不到100行-。-可是各个地方都透露着精髓~
初始化
第一个if:初始化
/**
* Initializes table, using the size recorded in sizeCtl.
*/
private final Node<K,V>[] initTable() {
Node<K,V>[] tab; int sc;
while ((tab = table) == null || tab.length == 0) {
if ((sc = sizeCtl) < 0)
Thread.yield(); // lost initialization race; just spin
else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
try {
if ((tab = table) == null || tab.length == 0) {
int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
@SuppressWarnings("unchecked")
Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
table = tab = nt;
sc = n - (n >>> 2);
}
} finally {
sizeCtl = sc;
}
break;
}
}
return tab;
}
主要是通过对于sizeCtl 的判断和cas进行初始化工作,哪个线程抢到了sizeCtl(这个变量是volatile的 ),哪个线程进行初始化。其中CAS是通过Unsafe类进行操作。
设置数组节点
第二个if是当前下标数组为null,通过cas setValue
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
static final <K,V> Node<K,V> tabAt(Node<K,V>[] tab, int i) {
return (Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE);
}
这个地方有一个知识点,既然table是volatile的,为什么还要用Unsafe类进行cas呢?
因为volatile只是对对象引用是线程可见的,对内部元素不是!
拓容
第三个if 判断是否在resize
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
resize也是CHM的精髓之一
- 首先1.8的CHM支持多线程拓容
- 使用CAS来实现无锁化的并行拓容,到底有多犀利!还得慢慢看doug Lea大神的思想
废话不多说,先上源码:
/**
* Helps transfer if a resize is in progress.
*/
final Node<K,V>[] helpTransfer(Node<K,V>[] tab, Node<K,V> f) {
Node<K,V>[] nextTab; int sc;
if (tab != null && (f instanceof ForwardingNode) &&
(nextTab = ((ForwardingNode<K,V>)f).nextTable) != null) {
int rs = resizeStamp(tab.length);
while (nextTab == nextTable && table == tab &&
(sc = sizeCtl) < 0) {
if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
sc == rs + MAX_RESIZERS || transferIndex <= 0)
break;
if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) {
transfer(tab, nextTab);
break;
}
}
return nextTab;
}
return table;
}
-
判断是否需要帮助 —>看nextTable是否已经完成
- 完成则直接退出
- 没完成则生成自己线程的Stamp进场帮忙resize
- 这里用到了sizeCtl 多一个线程帮忙拓容,则通过CAS对sizeCtl + 1
- 具体看看如何多线程无锁拓容的transfer函数
-
前提知识点:自己线程的stamp是什么?(resizeStamp函数)
-
-
/** * The number of bits used for generation stamp in sizeCtl. * Must be at least 6 for 32bit arrays. */ private static int RESIZE_STAMP_BITS = 16; /** * The maximum number of threads that can help resize. * Must fit in 32 - RESIZE_STAMP_BITS bits. */ private static final int MAX_RESIZERS = (1 << (32 - RESIZE_STAMP_BITS)) - 1; /** * The bit shift for recording size stamp in sizeCtl. */ private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS; /** * Returns the stamp bits for resizing a table of size n. * Must be negative when shifted left by RESIZE_STAMP_SHIFT. */ static final int resizeStamp(int n) { return Integer.numberOfLeadingZeros(n) | (1 << (RESIZE_STAMP_BITS - 1)); }
-
首先拓容线程stamp和源table的长度有关~
比如n = 16 ; 二进制 —> 0000 0000 0000 0000 0000 0000 0001 0000
Interger.numberofLeadingZeros(16) = 27 (返回无符号数第一个不为0数字前有多少个0)
27的二进制: 0000 0000 0000 0000 0000 0000 0001 1100
resizeStamp(16) 结果就是 —> 0000 0000 0000 0000 1000 0000 0001 1100
ok 到此为止,线程的拓容戳已经创建好了~
-
-
-
前提知识点:sizeCtl的resize更新
-
第一次初始化resize值 ( 1 + resize的线程数)
-
else if (U.compareAndSwapInt(this, SIZECTL, sc, (rs << RESIZE_STAMP_SHIFT) + 2))
-
随后增加线程帮助拓容 + 1
-
if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
-
第一次sizeCtl更新为(如果table长度是16)
- rs = 0000 0000 0000 0000 1000 0000 0001 1100
- (rs << RESIZE_STAMP_SHIFT) + 2)) = 1000 0000 0001 1100 0000 0000 0000 0010
- 10进制为-2145648638
- 则sizeCtl则为这个值-2145648638
-
再有线程进入帮忙拓容则在+1
- 则sizeCtl则+1
- 则sizeCtl = -2145648638 + 1 = -2145648637 = 1000 0000 0001 1100 0000 0000 0000 0011
- 所以不用注重10进制数值,仅注意2进制即可
-
高16位 低16位 拓容标记 并行拓容的线程数量
-
-
transfer函数
简单来说,就是将原有table通过指针划分成多个区间,然后各个线程负责自己的区间。每个区间的数组按照逆向遍历的方式进行迁移,前已完成的数组下标的元素会标记为ForwardingNode,表示该数组下标已经拓容完成。不多说先上源码(以table.size = 16为例):
/** Number of CPUS, to place bounds on some sizings */ static final int NCPU = Runtime.getRuntime().availableProcessors(); private static final int MIN_TRANSFER_STRIDE = 16; /** * Moves and/or copies the nodes in each bin to new table. See * above for explanation. */ private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) { int n = tab.length, stride; //计算每个线程负责区间的长度,通过当前机器的CPU和确定,如果小于16,则按照16来进行计算 if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE) stride = MIN_TRANSFER_STRIDE; // subdivide range //初始化,构建 << 1长度的table if (nextTab == null) { // initiating try { @SuppressWarnings("unchecked") Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1]; nextTab = nt; } catch (Throwable ex) { // try to cope with OOME sizeCtl = Integer.MAX_VALUE; return; } nextTable = nextTab; transferIndex = n; } int nextn = nextTab.length; //创建一个FowwardingNode,维护的是拓容后的数组。用来告知是否数组bucket已经拓容完成 ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab); //拓容是否推进,即是否拓容完了一个bucket,是否进行下一个 boolean advance = true; //节点是否已经拓容完毕 boolean finishing = false; // to ensure sweep before committing nextTab for (int i = 0, bound = 0;;) { //通过循环来处理bucket中的元素,通过CAS设置transferIndex,循环中i表示正在处理的bucket位置,bound表示需要处理的边界,初始化transferIndex = 16. Node<K,V> f; int fh; while (advance) { int nextIndex, nextBound; //i = -1; // --i保证了正在处理的bucket向下的遍历 if (--i >= bound || finishing) advance = false; //nextIndex = 16; else if ((nextIndex = transferIndex) <= 0) { i = -1; advance = false; } //transferIndex = 16; //nextBound = 0; else if (U.compareAndSwapInt (this, TRANSFERINDEX, nextIndex, nextBound = (nextIndex > stride ? nextIndex - stride : 0))) { //bound = 0; //i = 15; bound = nextBound; i = nextIndex - 1; advance = false; } } //经过分配后,当前线程的处理[0,15],transferIndex = 0; // 当完成了当前线程的任务 if (i < 0 || i >= n || i + n >= nextn) { int sc; // 拓容完成,更新成员变量 if (finishing) { nextTable = null; table = nextTab; sizeCtl = (n << 1) - (n >>> 1); return; } // 当前线程拓展完毕,则更新sizeCtl - 1 if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) { // 全部拓容线程更新完毕,则执行return if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT) return; // 更新当前线程标记量 finishing = advance = true; // i = 16 i = n; // recheck before commit } } // 如果当前老table的i = 15 位置为空,则放一个fwd else if ((f = tabAt(tab, i)) == null) advance = casTabAt(tab, i, null, fwd); // 已经处理过的bucket else if ((fh = f.hash) == MOVED) advance = true; // already processed // 如果当前位置老table的i不为Null,首先锁定链表首节点 else { synchronized (f) { if (tabAt(tab, i) == f) { Node<K,V> ln, hn; if (fh >= 0) { int runBit = fh & n;//ln 表示低位, hn 表示高位;接下来这段代码的作用 是把链表拆分成两部分,0 在低位,1 在高位 Node<K,V> lastRun = f; //链表头 for (Node<K,V> p = f.next; p != null; p = p.next) { // n = 16, 二进制是 1 0000, & 表示看p.hash的第5位是否为1 int b = p.hash & n; if (b != runBit) { runBit = b; lastRun = p; } } // 开始一直不明白为什么需要加这个循环,直接区分高低位,形成两个新的链表即可,后来仔细研究才明白,配合后面的循环,目的是为了如果尾部的runbit都一致,那就没必要在进行重新构造。 if (runBit == 0) { ln = lastRun; hn = null; } else { hn = lastRun; ln = null; } // 创新生成高低链表 for (Node<K,V> p = f; p != lastRun; p = p.next) { int ph = p.hash; K pk = p.key; V pv = p.val; if ((ph & n) == 0) ln = new Node<K,V>(ph, pk, pv, ln); else hn = new Node<K,V>(ph, pk, pv, hn); } // 链表插入到对应的位置,并且更新老table当前i的fwd-->已经迁移完成 setTabAt(nextTab, i, ln); setTabAt(nextTab, i + n, hn); setTabAt(tab, i, fwd); advance = true; } else if (f instanceof TreeBin) { TreeBin<K,V> t = (TreeBin<K,V>)f; TreeNode<K,V> lo = null, loTail = null; TreeNode<K,V> hi = null, hiTail = null; int lc = 0, hc = 0; for (Node<K,V> e = t.first; e != null; e = e.next) { int h = e.hash; TreeNode<K,V> p = new TreeNode<K,V> (h, e.key, e.val, null, null); if ((h & n) == 0) { if ((p.prev = loTail) == null) lo = p; else loTail.next = p; loTail = p; ++lc; } else { if ((p.prev = hiTail) == null) hi = p; else hiTail.next = p; hiTail = p; ++hc; } } ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) : (hc != 0) ? new TreeBin<K,V>(lo) : t; hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) : (lc != 0) ? new TreeBin<K,V>(hi) : t; setTabAt(nextTab, i, ln); setTabAt(nextTab, i + n, hn); setTabAt(tab, i, fwd); advance = true; } } } } } }
-
图解:
-
计算区间大小 = 16;
-
初始化,如果原大小是16的话。
-
计算当前线程所处理的区间。如果是32拓展到64的话如图:
-
遍历各个线程负责的区间,根据i来处理对应的bucket.
-
原table第15个bucket是null
-
直接将老的table设置为fwd.处理完成后如下:
-
继续遍历,一直到老table中一个不为null的bucket,如图:
-
遍历链表计算runbit
-
构建新的高低链表
-
将新的高低链表插入到新的nextTable当中
-
更新老的table,把i的bucket设置为fwd
-
当所有节点都fwd后,设置finishing,进行退出拓容操作。sizeCtl - 1,如果所有线程已经拓容完毕,则会判断rs和sizeCtl的比较,重新设置sizeCtl~如下代码:
-
-
if (i < 0 || i >= n || i + n >= nextn) {
int sc;
if (finishing) {
nextTable = null;
table = nextTab;
sizeCtl = (n << 1) - (n >>> 1);
return;
}
if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
return;
finishing = advance = true;
i = n; // recheck before commit
}
}
总结:
充分利用了CAS和并发的效率,从而可以高效的进行拓容操作。
设置链表节点
最后是要遍历链表或者数组进行SetValue
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
if (binCount != 0) {
if (binCount >= TREEIFY_THRESHOLD)
treeifyBin(tab, i);
if (oldVal != null)
return oldVal;
break;
}
}
首先锁住链表的head,防止其他线程进入,进入单线程模式~(这里也就是1.7和1.8锁的粒度不一致的地方了,1.8更高效,因为锁的粒度更低)
之后就是遍历链表或者红黑树,和hashmap将数据放进去即可~
最后判断一下数组长度,如果是大于红黑树阈值,就转化为红黑树数据结构。
AddCount
最后是重中之重 ,put了一个元素进去,该增加size了~
/**
* Adds to count, and if table is too small and not already
* resizing, initiates transfer. If already resizing, helps
* perform transfer if work is available. Rechecks occupancy
* after a transfer to see if another resize is already needed
* because resizings are lagging additions.
*
* @param x the count to add
* @param check if <0, don't check resize, if <= 1 only check if uncontended
*/
private final void addCount(long x, int check) {
CounterCell[] as; long b, s;
if ((as = counterCells) != null ||
!U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
CounterCell a; long v; int m;
boolean uncontended = true;
if (as == null || (m = as.length - 1) < 0 ||
(a = as[ThreadLocalRandom.getProbe() & m]) == null ||
!(uncontended =
U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {
fullAddCount(x, uncontended);
return;
}
if (check <= 1)
return;
s = sumCount();
}
if (check >= 0) {
Node<K,V>[] tab, nt; int n, sc;
while (s >= (long)(sc = sizeCtl) && (tab = table) != null &&
(n = tab.length) < MAXIMUM_CAPACITY) {
int rs = resizeStamp(n);
if (sc < 0) {
if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
transferIndex <= 0)
break;
if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
transfer(tab, nt);
}
else if (U.compareAndSwapInt(this, SIZECTL, sc,
(rs << RESIZE_STAMP_SHIFT) + 2))
transfer(tab, null);
s = sumCount();
}
}
}
流程图如下:
第一步更新baseCount,通过CAS更新失败,则说明存在并发,那么走第二个if
第二步获取随机数,然后计算到底这个count应该加再哪个CounterCell数组中,然后通过cas添加。如果失败,则说明存在竞争,进入fullAddCount方法
需要注意两个地方:
- ThreadLocalRandom这个类继承于Random,适合并发场景。这里面随机数是为了寻找随机的数组下标,减小冲突。而该类会比Randon减少冲突的次数。
- 这里面和正常map寻找数组下标一样,也是用&而不是用%。
第三步 fullAddCount方法我放弃了。简要说下思路:
-
初始化CounterCell size = 2并且以<<1 拓容
-
本身方法是一个自旋
-
使用到了自旋锁
/** * Spinlock (locked via CAS) used when resizing and/or creating CounterCells. */ private transient volatile int cellsBusy;
总结下为什么要这么做?把这个addCount方法设计的这么复杂。
- 不直接使用synchronized是为了效率
- 不直接使用CAS,是害怕在高并发的时候,不停在CAS丢失了效率
- 使用baseCount + CounterCell[]的好处是在于
- 低并发的时候可以直接用baseCount解决问题
- 高并发的时候,可以通过ThreadLocalRandom和CounterCell[]进行很好的分流工作,有效的减少了无意义的锁和CAS。类似于Nginx负载均衡的效果。
get
get相对来说比较简单了直接寻找hash和key.equals上源码:
/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code key.equals(k)},
* then this method returns {@code v}; otherwise it returns
* {@code null}. (There can be at most one such mapping.)
*
* @throws NullPointerException if the specified key is null
*/
public V get(Object key) {
Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
int h = spread(key.hashCode());
//正常寻找数组中的bucket位置
if ((tab = table) != null && (n = tab.length) > 0 &&
(e = tabAt(tab, (n - 1) & h)) != null) {
//hash相同的话,直接比较key
if ((eh = e.hash) == h) {
if ((ek = e.key) == key || (ek != null && key.equals(ek)))
return e.val;
}
//bucket中的hash 是负数 -->可能在拓容,遍历链表
else if (eh < 0)
return (p = e.find(h, key)) != null ? p.val : null;
//遍历bucket的链表寻找元素
while ((e = e.next) != null) {
if (e.hash == h &&
((ek = e.key) == key || (ek != null && key.equals(ek))))
return e.val;
}
}
return null;
}
size
看过了put方法的addCount方法,其实这个就很简单了,直接上源码:
/**
* {@inheritDoc}
*/
public int size() {
long n = sumCount();
return ((n < 0L) ? 0 :
(n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
(int)n);
}
final long sumCount() {
CounterCell[] as = counterCells; CounterCell a;
long sum = baseCount;
if (as != null) {
for (int i = 0; i < as.length; ++i) {
if ((a = as[i]) != null)
sum += a.value;
}
}
return sum;
}
就是遍历counterCell的值再累加baseCount即可~~
总结
- 在锁很重的时候,如果可以用volatile和CAS来巧妙的避开
- 当并发大的时候,如果可以分而治之
- 架构和细节的优化同样重要