jdk1.6 ConcurrentHashMap
Concurrent包,鼎鼎大名的Doug Lea开发的。让我来好好向大师学习学习。
HashMap在单线程时效率非常高,但多线程环境下会出现许多问题。
HashTable支持多线程,但每个操作都会锁住整个数组,效率一般。
为了在效率和多线程安全之间找一个平衡点,Doug Lea引入了ConcurrentHashMap。
/*
* The basic strategy is to subdivide the table among Segments,
* each of which itself is a concurrently readable hash table.
*/
1.存储数据结构:
final int segmentMask; //Segment长度减一
final int segmentShift; //偏移量
final Segment<K,V>[] segments;
(static final class Segment<K,V> extends ReentrantLock implements Serializable {...})
2.构造函数:注意参数concurrencyLevel:the estimated number of concurrently
updating threads即同时修改的线程数。表明并发量,直接影响Segment数组的长度。都是2的N次方。
public ConcurrentHashMap(int initialCapacity,
float loadFactor, int concurrencyLevel) {
if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
throw new IllegalArgumentException();
if (concurrencyLevel > MAX_SEGMENTS)
concurrencyLevel = MAX_SEGMENTS;
// Find power-of-two sizes best matching arguments
int sshift = 0;
int ssize = 1;
while (ssize < concurrencyLevel) {
++sshift;
ssize <<= 1;
}
segmentShift = 32 - sshift;
segmentMask = ssize - 1;
this.segments = Segment.newArray(ssize);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
int c = initialCapacity / ssize;
if (c * ssize < initialCapacity)
++c;
int cap = 1;
while (cap < c)
cap <<= 1;
for (int i = 0; i < this.segments.length; ++i)
this.segments[i] = new Segment<K,V>(cap, loadFactor);
}
3.put(),putIfAbsent()方法:
public V put(K key, V value) {
if (value == null)
throw new NullPointerException();
int hash = hash(key.hashCode());
return segmentFor(hash).put(key, hash, value, false);
}
public V putIfAbsent(K key, V value) {
if (value == null)
throw new NullPointerException();
int hash = hash(key.hashCode());
return segmentFor(hash).put(key, hash, value, true);
}
首先对key.hashCode()进行了二次hash:
private static int hash(int h) {
// Spread bits to regularize both segment and index locations,
// using variant of single-word Wang/Jenkins hash.
h += (h << 15) ^ 0xffffcd7d;
h ^= (h >>> 10);
h += (h << 3);
h ^= (h >>> 6);
h += (h << 2) + (h << 14);
return h ^ (h >>> 16);
}
然后通过hash值定位对应的segment,使用segmentFor(hash):
final Segment<K,V> segmentFor(int hash) {
return segments[(hash >>> segmentShift) & segmentMask];
}
可以看到,hash>>>segmentShift是将低位去除,使用高若干位来和segmentMask相与运算,得到应放入哪个segment。
然后调用segment.put()方法:
V put(K key, int hash, V value, boolean onlyIfAbsent) {
lock();
try {
int c = count;
if (c++ > threshold) // ensure capacity
rehash();
HashEntry<K,V>[] tab = table;
int index = hash & (tab.length - 1);
HashEntry<K,V> first = tab[index];
HashEntry<K,V> e = first;
while (e != null && (e.hash != hash || !key.equals(e.key)))
e = e.next;
V oldValue;
if (e != null) {
oldValue = e.value;
if (!onlyIfAbsent)
e.value = value;
}
else {
oldValue = null;
++modCount;
tab[index] = new HashEntry<K,V>(key, hash, first, value);
count = c; // write-volatile
}
return oldValue;
} finally {
unlock();
}
}
3.1 这里有一点要非常注意,即:
static final class HashEntry<K,V> {
final K key;
final int hash;
volatile V value;
final HashEntry<K,V> next;
HashEntry(K key, int hash, HashEntry<K,V> next, V value) {
this.key = key;
this.hash = hash;
this.next = next;
this.value = value;
}
我们可以看到,在添加一个key-value时,next指针和key,hash都是final类型,也就是说不可变更,那么当我们获取一个HashEntry之后,不用担心它的next链表结构会被其他线程改变。
4.上述put(),putIfAbsent()方法内部同步了,使用的方式是通过Segment<K,V> extends ReentrantLock,使用ReentrantLock的lock()和unlock()方法实现。那么也就是说,锁的粒度是一个Segment,而其他的Segment并不会锁住。这相比于原来的HashTable锁,性能吞吐上明显提高,HashTable由于内部使用一个数组来实现的数据存储,因此上锁时,就会锁住整个数组,所有对数组的操作(put,get等)都会被阻塞。而ConcurrentHashMap使用多个数组来作为内部存储实现,上锁只会锁住其中一个数组,那么其他数组还是可以对外提供操作的。把锁的粒度减小了。
5.我们再来看看get()操作:
public V get(Object key) {
int hash = hash(key.hashCode());
return segmentFor(hash).get(key, hash);
}
V get(Object key, int hash) {
if (count != 0) { // read-volatile
HashEntry<K,V> e = getFirst(hash);
while (e != null) {
if (e.hash == hash && key.equals(e.key)) {
V v = e.value;
//这个为什么会产生呢?这是以前JDK会出现的问题,目前已解决了,原因是在put()方//法调用时,分四步插入:
// 1 this.key = key;
// 2 this.hash = hash;
// 3 this.next = next;
// 4 this.value = value;
// 有可能执行了1,但还没执行4,所以value为默认null,因此需要readValueUnderLock(e)来读同步。
//说明:产生这个问题的根本原因在于之前jvm初始化对象操作是在堆上完成,而现在的jvm是在工作线程完成对//象初始化,然后同步到堆中去,不会出现初始化一半的情况。其他线程要么看不到,要么看到完整的对象。
if (v != null)
return v;
return readValueUnderLock(e); // recheck
}
e = e.next;
}
}
return null;
}
V readValueUnderLock(HashEntry<K,V> e) {
lock();
try {
return e.value;
} finally {
unlock();
}
}
6.从put(),get(),remove()我们可以清晰地了解ConcurrentHashMap的内部实现思想及原理。put,remove是同步的,get()是非同步的。大多数时候,读起来非常快,接近HashMap.get()的速度。
7.remove()方法的实现:
整个remove实现并不复杂,但是需要注意如下几点。第一,当要删除的结点存在时,删除的最后一步操作要将count的值减一。这必须是最后一步操作,否则读取操作可能看不到之前对段所做的结构性修改。第二,remove执行的开始就将table赋给一个局部变量tab,这是因为table是 volatile变量,读写volatile变量的开销很大。编译器也不能对volatile变量的读写做任何优化,直接多次访问非volatile实例变量没有多大影响,编译器会做相应优化。
从remove()方法我们可以看出,在删除一个节点时,需要对链表该节点之前的所有节点重新创建,之后节点不用处理。
V remove(Object key, int hash, Object value) {
lock();
try {
int c = count - 1;
HashEntry<K,V>[] tab = table;
int index = hash & (tab.length - 1);
HashEntry<K,V> first = tab[index];
HashEntry<K,V> e = first;
while (e != null && (e.hash != hash || !key.equals(e.key)))
e = e.next;
V oldValue = null;
if (e != null) {
V v = e.value;
if (value == null || value.equals(v)) {
oldValue = v;
// All entries following removed node can stay
// in list, but all preceding ones need to be
// cloned.
++modCount;
HashEntry<K,V> newFirst = e.next;
//重新创建之前的节点。
for (HashEntry<K,V> p = first; p != e; p = p.next)
newFirst = new HashEntry<K,V>(p.key, p.hash,
newFirst, p.value);
tab[index] = newFirst;
count = c; // write-volatile
}
}
return oldValue;
} finally {
unlock();
}
}
8.我们来看一下几个跨段操作的方法,size(),isEmpty().
public int size() {
final Segment<K,V>[] segments = this.segments;
long sum = 0;
long check = 0;
int[] mc = new int[segments.length];
// Try a few times to get accurate count. On failure due to
// continuous async changes in table, resort to locking.
for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
check = 0;
sum = 0;
int mcsum = 0;
for (int i = 0; i < segments.length; ++i) {
sum += segments[i].count;
mcsum += mc[i] = segments[i].modCount;
}
if (mcsum != 0) {
for (int i = 0; i < segments.length; ++i) {
check += segments[i].count;
if (mc[i] != segments[i].modCount) {
check = -1; // force retry
break;
}
}
}
if (check == sum)
break;
}
//检测发现两次求和操作的值不一致,则证明有线程先修改,加锁求和。
if (check != sum) { // Resort to locking all segments
sum = 0;
for (int i = 0; i < segments.length; ++i)
segments[i].lock();
for (int i = 0; i < segments.length; ++i)
sum += segments[i].count;
for (int i = 0; i < segments.length; ++i)
segments[i].unlock();
}
if (sum > Integer.MAX_VALUE)
return Integer.MAX_VALUE;
else
return (int)sum;
}
size()实现思路就是先不加锁,求两次和,检测modCount是否有改变(表明是否有其他线程在修改ConcurrentHashMap的某个段),重复尝试2次后,若还不相等(元素个数发生改变),则对所有段加锁求和。
public boolean isEmpty() {
final Segment<K,V>[] segments = this.segments;
/*
* We keep track of per-segment modCounts to avoid ABA
* problems in which an element in one segment was added and
* in another removed during traversal, in which case the
* table was never actually empty at any point. Note the
* similar use of modCounts in the size() and containsValue()
* methods, which are the only other methods also susceptible
* to ABA problems.
*/
int[] mc = new int[segments.length];
int mcsum = 0;
for (int i = 0; i < segments.length; ++i) {
if (segments[i].count != 0)
return false;
else
mcsum += mc[i] = segments[i].modCount;
}
// If mcsum happens to be zero, then we know we got a snapshot
// before any modifications at all were made. This is
// probably common enough to bother tracking.
if (mcsum != 0) {
for (int i = 0; i < segments.length; ++i) {
if (segments[i].count != 0 ||
mc[i] != segments[i].modCount)
return false;
}
}
return true;
}
isEmpty()方法,也是比较两次,若不等则返回false,否则返回true。不需要加锁操作。