Map源码解析之HashMap
Map源码解析之HashMap红黑树
Map源码解析之HashMap补充:集合、迭代器、compute、merge、replace
Map源码解析之LinkedHashMap
Map源码解析之TreeMap
Map源码解析之HashTable
Map源码解析之ConcurrentHashMap(JDK1.8)(一)
Map源码解析之ConcurrentHashMap(JDK1.8)(二)
ConcurrentHashMap在jdk1.7和jdk1.8中的源码发生了较大的变化,本文将针对jdk1.7的ConcurrentHashMap的源码进行解析。
一、简介
ConcurrentHashMap在jdk1.7中基于分段锁的方式保证线程安全,支持完全并发的读操作和可控级别的并发的写操作。
ConcurrentHashMap的弱一致性的,它的读操作只能反映最近的状态,并不一定是读操作开始时的状态。其在写操作的瞬间在多线程中可能会存在不一致状态,但这些不一致状态要么是历史的一致状态要么是将来的一致状态,不可能出现错误状态(譬如死循环、数据丢失等)。ConcurrentHashMap并不能返回类似于多个key值在某个绝对的同一时刻的value值,但它能保证写操作准确无误,能保证读操作不会读取到某个错误的状态下的值,这能满足大多数线程并发的场景。
在ConcurrentHashMap中,其维护着一个segment数组,而每个segment中维护着一个HashEntry数组,每个HashEntry又可以对应一条节点链表。其锁的粒度是segment级别的,因此称为分段锁。同一个segment内的节点不能同时执行写操作,但不同segment内的节点可以同时执行写操作。
ConcurrentHashMap通过Unsafe方式进行底层操作的读写。
ConcurrentHashMap的数据结构的简单示意图如下所示,由segments数组 + table数组 + 链表组成。segments数组和table数组的长度为2的次幂。对于节点的位置,首先根据key值的hash值的高位确定在segments数组的位置,根据hash值的地位确定在table数组的位置,然后用链表解决hash冲突。
二、属性
1. 常量
/**
* The default initial capacity for this table,
* used when not otherwise specified in a constructor.
*/
//默认的table容量
static final int DEFAULT_INITIAL_CAPACITY = 16;
/**
* The default load factor for this table, used when not
* otherwise specified in a constructor.
*/
//默认的table的负载因子
static final float DEFAULT_LOAD_FACTOR = 0.75f;
/**
* The default concurrency level for this table, used when not
* otherwise specified in a constructor.
*/
//默认并发水平,即segments数组长度
static final int DEFAULT_CONCURRENCY_LEVEL = 16;
/**
* The maximum capacity, used if a higher value is implicitly
* specified by either of the constructors with arguments. MUST
* be a power of two <= 1<<30 to ensure that entries are indexable
* using ints.
*/
//最大容量
static final int MAXIMUM_CAPACITY = 1 << 30;
/**
* The minimum capacity for per-segment tables. Must be a power
* of two, at least two to avoid immediate resizing on next use
* after lazy construction.
*/
//每个segment中table的最小长度
static final int MIN_SEGMENT_TABLE_CAPACITY = 2;
/**
* The maximum number of segments to allow; used to bound
* constructor arguments. Must be power of two less than 1 << 24.
*/
//segments的最大数量
static final int MAX_SEGMENTS = 1 << 16; // slightly conservative
/**
* Number of unsynchronized retries in size and containsValue
* methods before resorting to locking. This is used to avoid
* unbounded retries if tables undergo continuous modification
* which would make it impossible to obtain an accurate result.
*/
//size 和 containsValue加锁前的最大重试次数。这两个方法首先会尝试不加锁获取结果,失败重试,重试多次还不成功再加锁。
static final int RETRIES_BEFORE_LOCK = 2;
2. 成员变量
/**
* Mask value for indexing into segments. The upper bits of a
* key's hash code are used to choose the segment.
*/
//segment掩码,segment数组长度-1
final int segmentMask;
/**
* Shift value for indexing within segments.
*/
// segment的偏移量,节点的key值的hash值的高(32 - segmentShift )位和segmentMask进行与运算确定节点在数组的位置
final int segmentShift;
/**
* The segments, each of which is a specialized hash table.
*/
final Segment<K,V>[] segments;
transient Set<K> keySet;
transient Set<Map.Entry<K,V>> entrySet;
transient Collection<V> values;
三、内部类
1. HashEntry
final int hash;
final K key;
volatile V value;
volatile HashEntry<K,V> next;
用于表示ConcurrentHashMap的节点,和其它map的节点类并没有什么区别。
2. Segment
/**
* The maximum number of times to tryLock in a prescan before
* possibly blocking on acquire in preparation for a locked
* segment operation. On multiprocessors, using a bounded
* number of retries maintains cache acquired while locating
* nodes.
*/
//预处理里的tryLock的的最大重试次数
static final int MAX_SCAN_RETRIES =
Runtime.getRuntime().availableProcessors() > 1 ? 64 : 1;
/**
* The per-segment table. Elements are accessed via
* entryAt/setEntryAt providing volatile semantics.
*/
//table数组
transient volatile HashEntry<K,V>[] table;
/**
* The number of elements. Accessed only either within locks
* or among other volatile reads that maintain visibility.
*/
//数量
transient int count;
/**
* The total number of mutative operations in this segment.
* Even though this may overflows 32 bits, it provides
* sufficient accuracy for stability checks in CHM isEmpty()
* and size() methods. Accessed only either within locks or
* among other volatile reads that maintain visibility.
*/
//修改次数
transient int modCount;
/**
* The table is rehashed when its size exceeds this threshold.
* (The value of this field is always <tt>(int)(capacity *
* loadFactor)</tt>.)
*/
//扩容阈值
transient int threshold;
/**
* The load factor for the hash table. Even though this value
* is same for all segments, it is replicated to avoid needing
* links to outer object.
* @serial
*/
//负载因子
final float loadFactor;
需要注意的是,Segment本身继承了ReentrantLock,每个segment都具有加锁的功能。
四、构造方法
ConcurrentHashMap共有5个构造方法,我们以ConcurrentHashMap#ConcurrentHashMap(int, float, int)为例进行解析,其它4个构造方法都以默认值作为参数调用了该构造方法。
public ConcurrentHashMap(int initialCapacity,
float loadFactor, int concurrencyLevel) {
if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
throw new IllegalArgumentException();
if (concurrencyLevel > MAX_SEGMENTS)
concurrencyLevel = MAX_SEGMENTS;
// Find power-of-two sizes best matching arguments
int sshift = 0;
int ssize = 1;
//确定segment数量和偏移量,大于concurrencyLevel的最小的2的次幂
while (ssize < concurrencyLevel) {
++sshift;
ssize <<= 1;
}
this.segmentShift = 32 - sshift;
this.segmentMask = ssize - 1;
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
int c = initialCapacity / ssize;
if (c * ssize < initialCapacity)
++c;
int cap = MIN_SEGMENT_TABLE_CAPACITY;
//确定segment中的table数组长度,大于initialCapacity / ssize的最小的2的次幂
while (cap < c)
cap <<= 1;
// create segments and segments[0]
//初始化segments and segments[0]
Segment<K,V> s0 =
new Segment<K,V>(loadFactor, (int)(cap * loadFactor),
(HashEntry<K,V>[])new HashEntry[cap]);
Segment<K,V>[] ss = (Segment<K,V>[])new Segment[ssize];
UNSAFE.putOrderedObject(ss, SBASE, s0); // ordered write of segments[0]
this.segments = ss;
}
五、ConcurrentHashMap#get(Object)
public V get(Object key) {
Segment<K,V> s; // manually integrate access methods to reduce overhead
HashEntry<K,V>[] tab;
int h = hash(key);
long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;
if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&
(tab = s.table) != null) {
for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile
(tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);
e != null; e = e.next) {
K k;
if ((k = e.key) == key || (e.hash == h && key.equals(k)))
return e.value;
}
}
return null;
}
get方法逻辑简单,先定位到segments数组的下标,在定位到table数组的下边,而后遍历链表。
六、ConcurrentHashMap#put(K, V)
1. ConcurrentHashMap#put(K, V)
public V put(K key, V value) {
Segment<K,V> s;
if (value == null)
throw new NullPointerException();
int hash = hash(key);
int j = (hash >>> segmentShift) & segmentMask;
if ((s = (Segment<K,V>)UNSAFE.getObject // nonvolatile; recheck
(segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment
s = ensureSegment(j);
return s.put(key, hash, value, false);
}
先定位到segment,如果segment不存在则新建,然后调用Segment的put方法。
2. ConcurrentHashMap#ensureSegment(int)
private Segment<K,V> ensureSegment(int k) {
final Segment<K,V>[] ss = this.segments;
long u = (k << SSHIFT) + SBASE; // raw offset
Segment<K,V> seg;
if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) {
Segment<K,V> proto = ss[0]; // use segment 0 as prototype
int cap = proto.table.length;
float lf = proto.loadFactor;
int threshold = (int)(cap * lf);
HashEntry<K,V>[] tab = (HashEntry<K,V>[])new HashEntry[cap];
if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))
== null) { // recheck
Segment<K,V> s = new Segment<K,V>(lf, threshold, tab);
while ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))
== null) {
if (UNSAFE.compareAndSwapObject(ss, u, null, seg = s))
break;
}
}
}
return seg;
}
创建并返回给定下标的segment,新建的segment的table数组、赋值因子采用segments[0]的标准,这儿就体现出来了构造方法中在新建segment数组的同事新建segment[0]的作用,不需要考虑新建第一个segment的情况。
通过CAS将新建的segment加入到数组中,如果加入成功或者已经存在则返回。
3. Segment#put(K, int, V, boolean)
final V put(K key, int hash, V value, boolean onlyIfAbsent) {
HashEntry<K,V> node = tryLock() ? null :
scanAndLockForPut(key, hash, value);
V oldValue;
try {
HashEntry<K,V>[] tab = table;
int index = (tab.length - 1) & hash;
HashEntry<K,V> first = entryAt(tab, index);
for (HashEntry<K,V> e = first;;) {
if (e != null) {
K k;
//存在key值对应的节点,根据onlyIfAbsent判断是否覆盖
if ((k = e.key) == key ||
(e.hash == hash && key.equals(k))) {
oldValue = e.value;
if (!onlyIfAbsent) {
e.value = value;
++modCount;
}
break;
}
e = e.next;
}
//不存在key值对应的节点
else {
//node已经建立好,插入
if (node != null)
node.setNext(first);
//node没有建立好,新建并插入
else
node = new HashEntry<K,V>(hash, key, value, first);
int c = count + 1;
if (c > threshold && tab.length < MAXIMUM_CAPACITY)
rehash(node);
else
setEntryAt(tab, index, node);
++modCount;
count = c;
oldValue = null;
break;
}
}
} finally {
unlock();
}
return oldValue;
}
可以看到,在segment里操作是加锁进行的。加锁后的操作逻辑和HashMap也没有什么差别,定位到数组下标,插入节点,检查是否要扩容。
4. Segment#scanAndLockForPut(K, int, V)
private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {
HashEntry<K,V> first = entryForHash(this, hash);
HashEntry<K,V> e = first;
HashEntry<K,V> node = null;
int retries = -1; // negative while locating node
while (!tryLock()) {
HashEntry<K,V> f; // to recheck first below
if (retries < 0) {
if (e == null) {
if (node == null) // speculatively create node
node = new HashEntry<K,V>(hash, key, value, null);
retries = 0;
}
else if (key.equals(e.key))
retries = 0;
else
e = e.next;
}
else if (++retries > MAX_SCAN_RETRIES) {
lock();
break;
}
else if ((retries & 1) == 0 &&
(f = entryForHash(this, hash)) != first) {
e = first = f; // re-traverse if entry changed
retries = -1;
}
}
return node;
}
该方法的作用时在等待锁的同时进行预处理,主要的预处理工作是预先创建节点,并检测链表是否变化。
(1)遍历链表,找到节点。没有找到时进行创建。此时将尝试次数置为0。
(2)尝试次数大于MAX_SCAN_RETRIES,阻塞方式获取锁lock,获取到锁后跳出循环
(3)尝试次数为偶数次时,监测链表是否变化。如果发生变化,重新进入(1)查找节点。
事实上,虽然说该方法在没有找到key值对应的节点时会返回新建的Node,但返回的node也有可能是脏数据,因为该方法并不是实时监控链表的,而且在监测到链表发生变化时也没有将node置为null。
所以在put方法中不能简单的根据node是否为null作为key值对应的节点是否存在的标准,还是需要遍历链表去进行判断。
5. ConcurrentHashMap#rehash(HashEntry)
private void rehash(HashEntry<K,V> node) {
/*
* Reclassify nodes in each list to new table. Because we
* are using power-of-two expansion, the elements from
* each bin must either stay at same index, or move with a
* power of two offset. We eliminate unnecessary node
* creation by catching cases where old nodes can be
* reused because their next fields won't change.
* Statistically, at the default threshold, only about
* one-sixth of them need cloning when a table
* doubles. The nodes they replace will be garbage
* collectable as soon as they are no longer referenced by
* any reader thread that may be in the midst of
* concurrently traversing table. Entry accesses use plain
* array indexing because they are followed by volatile
* table write.
*/
HashEntry<K,V>[] oldTable = table;
int oldCapacity = oldTable.length;
int newCapacity = oldCapacity << 1;
threshold = (int)(newCapacity * loadFactor);
HashEntry<K,V>[] newTable =
(HashEntry<K,V>[]) new HashEntry[newCapacity];
int sizeMask = newCapacity - 1;
for (int i = 0; i < oldCapacity ; i++) {
HashEntry<K,V> e = oldTable[i];
if (e != null) {
HashEntry<K,V> next = e.next;
int idx = e.hash & sizeMask;
if (next == null) // Single node on list
newTable[idx] = e;
else { // Reuse consecutive sequence at same slot
HashEntry<K,V> lastRun = e;
int lastIdx = idx;
for (HashEntry<K,V> last = next;
last != null;
last = last.next) {
int k = last.hash & sizeMask;
if (k != lastIdx) {
lastIdx = k;
lastRun = last;
}
}
newTable[lastIdx] = lastRun;
// Clone remaining nodes
for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {
V v = p.value;
int h = p.hash;
int k = h & sizeMask;
HashEntry<K,V> n = newTable[k];
newTable[k] = new HashEntry<K,V>(h, p.key, v, n);
}
}
}
}
int nodeIndex = node.hash & sizeMask; // add the new node
node.setNext(newTable[nodeIndex]);
newTable[nodeIndex] = node;
table = newTable;
}
该方法的功能是扩容后插入节点。
扩容后新容量变为原来的容量的2倍。
在数组中的节点复制过程中,采用头插法,节点顺序相反,但是对于链表尾部slot一致的一部分节点,一次性进行赋值,而不是单个赋值,节点顺序不变。
七、ConcurrentHashMap#remove
有两个remove方法,以ConcurrentHashMap#remove(Object)
1. ConcurrentHashMap#remove(Object)
public V remove(Object key) {
int hash = hash(key);
Segment<K,V> s = segmentForHash(hash);
return s == null ? null : s.remove(key, hash, null);
}
2. Segment#remove(Object, int, Object)
final V remove(Object key, int hash, Object value) {
if (!tryLock())
scanAndLock(key, hash);
V oldValue = null;
try {
HashEntry<K,V>[] tab = table;
int index = (tab.length - 1) & hash;
HashEntry<K,V> e = entryAt(tab, index);
HashEntry<K,V> pred = null;
while (e != null) {
K k;
HashEntry<K,V> next = e.next;
if ((k = e.key) == key ||
(e.hash == hash && key.equals(k))) {
V v = e.value;
if (value == null || value == v || value.equals(v)) {
if (pred == null)
setEntryAt(tab, index, next);
else
pred.setNext(next);
++modCount;
--count;
oldValue = v;
}
break;
}
pred = e;
e = next;
}
} finally {
unlock();
}
return oldValue;
}
3. Segment#scanAndLock(Object, int)
private void scanAndLock(Object key, int hash) {
// similar to but simpler than scanAndLockForPut
HashEntry<K,V> first = entryForHash(this, hash);
HashEntry<K,V> e = first;
int retries = -1;
while (!tryLock()) {
HashEntry<K,V> f;
if (retries < 0) {
if (e == null || key.equals(e.key))
retries = 0;
else
e = e.next;
}
else if (++retries > MAX_SCAN_RETRIES) {
lock();
break;
}
else if ((retries & 1) == 0 &&
(f = entryForHash(this, hash)) != first) {
e = first = f;
retries = -1;
}
}
}
和插入节点时的scanAndLockForPut方法类似,只是不用新建节点返回。
八、ConcurrentHashMap#size()
public int size() {
// Try a few times to get accurate count. On failure due to
// continuous async changes in table, resort to locking.
final Segment<K,V>[] segments = this.segments;
int size;
boolean overflow; // true if size overflows 32 bits
long sum; // sum of modCounts
long last = 0L; // previous sum
int retries = -1; // first iteration isn't retry
try {
for (;;) {
if (retries++ == RETRIES_BEFORE_LOCK) {
for (int j = 0; j < segments.length; ++j)
ensureSegment(j).lock(); // force creation
}
sum = 0L;
size = 0;
overflow = false;
for (int j = 0; j < segments.length; ++j) {
Segment<K,V> seg = segmentAt(segments, j);
if (seg != null) {
sum += seg.modCount;
int c = seg.count;
if (c < 0 || (size += c) < 0)
overflow = true;
}
}
if (sum == last)
break;
last = sum;
}
} finally {
if (retries > RETRIES_BEFORE_LOCK) {
for (int j = 0; j < segments.length; ++j)
segmentAt(segments, j).unlock();
}
}
return overflow ? Integer.MAX_VALUE : size;
}
首先尝试不加锁的方式,甲酸count和modCount,如果连续两次的modCount,则将count作为返回结果。
如果连续尝试RETRIES_BEFORE_LOCK的查询比较还是不一致,则通过对每一个segment加锁后计算count。
containsValue方法的处理方案类似。