ConcurrentHashMap原理分析
很多网上的面试笔试题集锦都有关于HashTable和HashMap的区别,比如HashTable是线程安全的,key值不允许为空;而HashMap不是线程安全的,key值允许为空;两者的父不同,一个是Directory,一个是Map; 由于HashMap不是线程同步的,如果需要使用一个线程同步的HashMap,则需要额外进行同步的逻辑代码编写;或者也可以使用CollectionUtils提供的synchronizedMap()方法,该方法会返回一个线程同步的Map,这种方法也会额外增加同步的代价。JDK1.5提供了ConcurrentHashMap提供了简单、安全且代价较小的HashMap同步。
一、ConcurrentHashMap同步原理概述
不管是HashTable还是synchronizedMap的同步,都是使用了锁原理。操作需要访问对象,首先对其加锁;操作结束后,释放锁。通过Hashtable分析文已经就知道,HashTable的synchronized加锁是针对整张Hash表的,即每次操作都锁住整张表;而ConcurrentHashMap允许多个修改操作并发进行,其关键在于使用了Lock Stripping,即锁分离、分段锁或段锁技术。分段锁使用了多个锁来控制对hash表的不同部分进行的修改。ConcurrentHashMap内部使用段(Segment)来表示这些不同的部分,每个段其实就是一个小的hash table,它们有自己的锁。只要多个修改操作发生在不同的段上,它们就可以并发进行。由于引起了并发概念,其效率相对全部加锁就有了明显改善。
二、ConcurrentHashMap结构
由图中可以看出,我们可以将整张ConcurrentHashMap划分成不同的段,每个段可以看做一个HashTable,每个HashTable使用不同的锁,段更进一步细分就是entry即实体。即:
/**
* The segments, each of which is a specialized hash table
*/
final Segment<K,V>[] segments;
ConcurrentHashMap的概念包含ConcurrentHashMap、Segment和HashEntry。HashEntry定义如下:
static final class HashEntry<K,V> {
final K key;
final int hash;
volatile V value;
final HashEntry<K,V> next;
HashEntry(K key, int hash, HashEntry<K,V> next, V value) {
this.key = key;
this.hash = hash;
this.next = next;
this.value = value;
}
@SuppressWarnings("unchecked")
static final <K,V> HashEntry<K,V>[] newArray(int i) {
return new HashEntry[i];
}
}
读操作不需要加锁
可以看出,除了value以外,其他值均是final的(包括next),这就意味着添加entry只能在头上,而不能在中间或尾端。为了确保读操作能够看到最新的值,将value设置成volatile,这避免了加锁,从而提高了读的效率。
定位段的方法
为了加快定位段以及段中hash槽的速度,每个段hash槽的的个数都是2^n,这使得通过位运算就可以定位段和段中hash槽的位置。当并发级别为默认值16时,也就是段的个数,hash值的高4位决定分配在哪个段中,后四位决定段中的坐标。
/**
* Mask value for indexing into segments. The upper bits of a
* key's hash code are used to choose the segment.
*/
final int segmentMask;
/**
* Shift value for indexing within segments.
*/
final int segmentShift;
segmentFor(int n)方法
/**
* Returns the segment that should be used for key with given hash
* @param hash the hash code for the key
* @return the segment
*/
final Segment<K,V> segmentFor(int hash) {
return segments[(hash >>> segmentShift) & segmentMask];
}
段的定义:
/*
* Segments maintain a table of entry lists that are ALWAYS
* kept in a consistent state, so can be read without locking.
* Next fields of nodes are immutable (final). All list
* additions are performed at the front of each bin. This
* makes it easy to check changes, and also fast to traverse.
* When nodes would otherwise be changed, new nodes are
* created to replace them. This works well for hash tables
* since the bin lists tend to be short. (The average length
* is less than two for the default load factor threshold.)
*
* Read operations can thus proceed without locking, but rely
* on selected uses of volatiles to ensure that completed
* write operations performed by other threads are
* noticed. For most purposes, the "count" field, tracking the
* number of elements, serves as that volatile variable
* ensuring visibility. This is convenient because this field
* needs to be read in many read operations anyway:
*
* - All (unsynchronized) read operations must first read the
* "count" field, and should not look at table entries if
* it is 0.
*
* - All (synchronized) write operations should write to
* the "count" field after structurally changing any bin.
* The operations must not take any action that could even
* momentarily cause a concurrent read operation to see
* inconsistent data. This is made easier by the nature of
* the read operations in Map. For example, no operation
* can reveal that the table has grown but the threshold
* has not yet been updated, so there are no atomicity
* requirements for this with respect to reads.
*
* As a guide, all critical volatile reads and writes to the
* count field are marked in code comments.
*/
private static final long serialVersionUID = 2249069246763182397L;
/**
* The number of elements in this segment's region.
*/
transient volatile int count;
/**
* Number of updates that alter the size of the table. This is
* used during bulk-read methods to make sure they see a
* consistent snapshot: If modCounts change during a traversal
* of segments computing size or checking containsValue, then
* we might have an inconsistent view of state so (usually)
* must retry.
*/
transient int modCount;
/**
* The table is rehashed when its size exceeds this threshold.
* (The value of this field is always <tt>(int)(capacity *
* loadFactor)</tt>.)
*/
transient int threshold;
/**
* The per-segment table.
*/
transient volatile HashEntry<K,V>[] table;
/**
* The load factor for the hash table. Even though this value
* is same for all segments, it is replicated to avoid needing
* links to outer object.
* @serial
*/
final float loadFactor;
count用来统计该段数据的个数,它是volatile,它用来协调修改和读取操作,以保证读取操作能够读取到几乎最新的修改。协调方式是这样的,每次修改操作做了结构上的改变,如增加/删除节点(修改节点的值不算结构上的改变),都要写count值,每次读取操作开始都要读取count的值。这利用了 Java 5中对volatile语义的增强,对同一个volatile变量的写和读存在happens-before关系。modCount统计段结构改变的次数,主要是为了检测对多个段进行遍历过程中某个段是否发生改变,在讲述跨段操作时会还会详述。threashold用来表示需要进行rehash的界限值。table数组存储段中节点,每个数组元素是个hash链,用HashEntry表示。table也是volatile,这使得能够读取到最新的 table值而不需要同步。
删除操作的代码
/**
* Remove; match on key only if value null, else match both.
*/
V remove(Object key, int hash, Object value) {
lock();//加锁
try {
int c = count - 1;
HashEntry<K,V>[] tab = table;//优化volatile
int index = hash & (tab.length - 1);//找到第一个节点位置
HashEntry<K,V> first = tab[index];//找到第一个节点
HashEntry<K,V> e = first;
while (e != null && (e.hash != hash || !key.equals(e.key)))
e = e.next;//找到要删除的节点
V oldValue = null;
if (e != null) {
V v = e.value;
if (value == null || value.equals(v)) {//找到要删除的值
oldValue = v;
// All entries following removed node can stay
// in list, but all preceding ones need to be
// cloned.将删除 素之前的元素全部clone,然后将第一个指向删除元素///的next,第2个指向第1个,第3个,指向第二个,将删除元素的的前驱设置为第一个元素
++modCount;
HashEntry<K,V> newFirst = e.next;
for (HashEntry<K,V> p = first; p != e; p = p.next)
newFirst = new HashEntry<K,V>(p.key, p.hash,
newFirst, p.value);
tab[index] = newFirst;
count = c; // write-volatile
}
}
return oldValue;
} finally {
unlock();
}
}
添加操作的代码
V put(K key, int hash, V value, boolean onlyIfAbsent) {
lock();
try {
int c = count;
if (c++ > threshold) // ensure capacity,如超限,rehash
rehash();
HashEntry<K,V>[] tab = table;
int index = hash & (tab.length - 1);
HashEntry<K,V> first = tab[index];
HashEntry<K,V> e = first;
while (e != null && (e.hash != hash || !key.equals(e.key)))
e = e.next;//遍历
V oldValue;
if (e != null) {//如找到相同key,value直接替换
oldValue = e.value;
if (!onlyIfAbsent)
e.value = value;
}
else {//如未找到,创建一个新元素,指向first
oldValue = null;
++modCount;
tab[index] = new HashEntry<K,V>(key, hash, first, value);
count = c; // write-volatile
}
return oldValue;
} finally {
unlock();
}
}
读操作
V get(Object key, int hash) {
if (count != 0) { // read-volatile
HashEntry<K,V> e = getFirst(hash);//获取头节点
while (e != null) {
if (e.hash == hash && key.equals(e.key)) {
V v = e.value;
if (v != null)
return v;
//如空表明有其他操作在改变元素值或table结构,需要加锁读
return readValueUnderLock(e); // recheck
}
e = e.next;
}
}
return null;
}
转载于:https://blog.51cto.com/jncumter/1827861