java并发容器ConcurrentHashMap源码分析

2401_85112148

于 2024-06-25 07:09:06 发布

阅读量950

点赞数 12

文章标签： java 哈希算法 python

本文链接：https://blog.csdn.net/2401_85112148/article/details/139944807

版权

(segments, (j << SSHIFT) + SBASE)) == null) // in ensureSegment @3

s = ensureSegment(j); //@4

return s.put(key, hash, value, false); //@5

}

代码@1，表明 ConcurrentHashMap不支持value为空的键值对。

代码@2，计算该key对应的Segment的位置（数组下标），并发包中获取数组元素的方式，采用的是UNSAFE直接操作内存的方式，而不是典型的 Segment[] a = new Segment[16], 第j个元素的值为 a[j]。如果需要详细了解UNSAFE操作数组元素的原理，请查看另一篇博客(AtomicIntegerArray 源码分析)

比如一个Integer[]中，每个int是32位，占4个字节，那数组中第3个位置的开始字节是多少呢？=(3-1) << 2,也就是说SHIFT的值为元素中长度的幂。怎么获取每个元素在数组中长度（字节为单位）= UNSAFE.arrayIndexScale,

而 UNSAFE.arrayBaseOffset,返回的是，第一个数据元素相对于对象起始地址的便宜量，该部分的详解，请参考我的技术博客【http://blog.csdn.net/prestigeding/article/details/52980801】

代码@3，就是获取j下标的segment对象。相当于 if( (s == segments[j])== null )

代码@4，我们将目光移到 ensureSegment方法中：

/**

Returns the segment for the given index, creating it and
recording in segment table (via CAS) if not already present.
@param k the index
@return the segment

@SuppressWarnings(“unchecked”)

private Segment<K,V> ensureSegment(int k) {

final Segment<K,V>[] ss = this.segments;

long u = (k << SSHIFT) + SBASE; // raw offset

Segment<K,V> seg;

if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u)) == null) {

Segment<K,V> proto = ss[0]; // use segment 0 as prototype

int cap = proto.table.length;

float lf = proto.loadFactor;

int threshold = (int)(cap * lf);

HashEntry<K,V>[] tab = (HashEntry<K,V>[])new HashEntry[cap];

if ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))

== null) { // recheck

Segment<K,V> s = new Segment<K,V>(lf, threshold, tab);

while ((seg = (Segment<K,V>)UNSAFE.getObjectVolatile(ss, u))

== null) {

if (UNSAFE.compareAndSwapObject(ss, u, null, seg = s))

break;

}

return seg;

}

该方法，主要是确保segment槽的k位置的Segment不为空，如果为空，初始化。

代码@5，代码@4初始化k位置的segment后，将键值对加入到segment,接下重点看一下Segment的put方法：

final V put(K key, int hash, V value, boolean onlyIfAbsent) {

HashEntry<K,V> node = tryLock() ? null :

scanAndLockForPut(key, hash, value); // @1

V oldValue;

try {

HashEntry<K,V>[] tab = table;

int index = (tab.length - 1) & hash;

HashEntry<K,V> first = entryAt(tab, index); // @2

for (HashEntry<K,V> e = first;😉 { // @3

if (e != null) { // @4

K k;

if ((k = e.key) == key ||

(e.hash == hash && key.equals(k))) { //@5

oldValue = e.value;

if (!onlyIfAbsent) {

e.value = value;

++modCount;

}

break;

}

e = e.next;

}

else { //@6

if (node != null)

node.setNext(first);

else

node = new HashEntry<K,V>(hash, key, value, first);

int c = count + 1;

if (c > threshold && tab.length < MAXIMUM_CAPACITY)

rehash(node);

else

setEntryAt(tab, index, node);

++modCount;

count = c;

oldValue = null;

break;

}

} finally {

unlock();

}

return oldValue;

}

该方法，实现思路其实和HashMap一样，就是要在Segment的HashEntity[] table的指定位置加入新的Node,如果在位置k的位置不为空，此时，说明该位置发生了hash冲突，这是需要先遍历整个链，看是否有相等的key,如果key相等，则替换该值，如果没有，则将新加入的节点的next指针指向 table[k],然后将node加入到k位置。但是，由于ConcurrentHashMap是支持多个线程同时访问的，对于单个Segment的操作，需要加锁。

代码@1，首先尝试获取锁，如果成功获取锁，则继续添加元素，如果获取锁失败，后面重点分析。

代码@2，获取该key所对应的table[]中的下标。根据该元素是否为空，有两种操作，如果为空，说明没有发生冲突，也就是走代码@6分支，就是将新创建的节点的节点放入table[k]处，当然，此时需要判断是否需要进行rehash操作（ConcurrentHashMap的是否需要rehash,就是判断阔值）。

代码@4，就是循环判断table[k]的链条中，是否有key与待操作key相等，如果相等，直接替换就好。由于@3开始，其实就是整个put方法，会在锁保护中。

上述过程，应该很好理解，所以，现在重点关注两个方法，一是scanAndLockForPut，二是rehash(比较好奇，是否与HashMap相同，应该是一样的吧，呵呵)。

/**

Scans for a node containing given key while trying to
acquire lock, creating and returning one if not found. Upon
return, guarantees that lock is held. UNlike in most
methods, calls to method equals are not screened: Since
traversal speed doesn’t matter, we might as well help warm
up the associated code and accesses as well.
@return a new node if key not found, else null

private HashEntry<K,V> scanAndLockForPut(K key, int hash, V value) {

HashEntry<K,V> first = entryForHash(this, hash);

HashEntry<K,V> e = first;

HashEntry<K,V> node = null;

int retries = -1; // negative while locating node

while (!tryLock()) {

HashEntry<K,V> f; // to recheck first below

if (retries < 0) {

if (e == null) {

if (node == null) // speculatively create node

node = new HashEntry<K,V>(hash, key, value, null);

retries = 0;

}

else if (key.equals(e.key))

retries = 0;

else

e = e.next;

}

else if (++retries > MAX_SCAN_RETRIES) {

lock();

break;

}

else if ((retries & 1) == 0 &&

(f = entryForHash(this, hash)) != first) {

e = first = f; // re-traverse if entry changed

retries = -1;

}

return node;

}

在没有成功获取锁的情况下，先不急于阻塞，而是乐观的估计获取锁的线程操作的key与当前操作的key没关系，那我该干嘛就干嘛，自旋尝试获取锁（尝试MAX_SCAN_RETRIES，如果未成功获取锁）尝试超过最大尝试次数，为了性能考虑，该线程阻塞，参加代码@2。

@3，每隔一次，检查一下 Segment HashEntity[] table 处k的位置的元素是否发生变化，如果发生变化，则重试次数设置为-1，继续尝试获取锁。该方法如果在阻塞在lock()方法，时，一旦获取锁，则进入到final V put(K key, int hash, V value, boolean onlyIfAbsent) 方法中，进行常规的put方法（与HashMap操作类似。）

接下来重点看一下代码@7，如果当前segment中容量大于阔值，并小于允许的最大长度时，需要进行rehash,下面分析一下rehash源码：

/**

Doubles size of table and repacks entries, also adding the
given node to new table

@SuppressWarnings(“unchecked”)

private void rehash(HashEntry<K,V> node) {

Reclassify nodes in each list to new table. Because we
are using power-of-two expansion, the elements from
each bin must either stay at same index, or move with a
power of two offset. We eliminate unnecessary node
creation by catching cases where old nodes can be
reused because their next fields won’t change.
Statistically, at the default threshold, only about
one-sixth of them need cloning when a table
doubles. The nodes they replace will be garbage
collectable as soon as they are no longer referenced by
any reader thread that may be in the midst of
concurrently traversing table. Entry accesses use plain
array indexing because they are followed by volatile
table write.

HashEntry<K,V>[] oldTable = table;

int oldCapacity = oldTable.length;

int newCapacity = oldCapacity << 1;

threshold = (int)(newCapacity * loadFactor);

HashEntry<K,V>[] newTable =

(HashEntry<K,V>[]) new HashEntry[newCapacity];

int sizeMask = newCapacity - 1;

for (int i = 0; i < oldCapacity ; i++) {

HashEntry<K,V> e = oldTable[i];

if (e != null) {

HashEntry<K,V> next = e.next;

int idx = e.hash & sizeMask;

if (next == null) // Single node on list

newTable[idx] = e;

else { // Reuse consecutive sequence at same slot

HashEntry<K,V> lastRun = e;

int lastIdx = idx;

for (HashEntry<K,V> last = next;

last != null;

last = last.next) {

int k = last.hash & sizeMask;

if (k != lastIdx) {

lastIdx = k;

lastRun = last;

}

newTable[lastIdx] = lastRun;

// Clone remaining nodes

for (HashEntry<K,V> p = e; p != lastRun; p = p.next) {

V v = p.value;

int h = p.hash;

int k = h & sizeMask;

HashEntry<K,V> n = newTable[k];

newTable[k] = new HashEntry<K,V>(h, p.key, v, n);

}

int nodeIndex = node.hash & sizeMask; // add the new node

node.setNext(newTable[nodeIndex]);

newTable[nodeIndex] = node;

table = newTable;

}

在理解了HashMap的rehash方法后，再来看该方法，应该能很好的理解，故不做重复讲解了。

2.2.2 public V putIfAbsent(K key, V value)

该方法的语义是，如果存在key，则直接返回key关联的value,如果key不存在，则加入该键值对，并返回null；该步骤是原子操作。

public V putIfAbsent(K key, V value) {

Segment<K,V> s;

if (value == null)

throw new NullPointerException();

int hash = hash(key);

int j = (hash >>> segmentShift) & segmentMask;

if ((s = (Segment<K,V>)UNSAFE.getObject

(segments, (j << SSHIFT) + SBASE)) == null)

s = ensureSegment(j);

return s.put(key, hash, value, true);

}

该方法与put方法的实现基本相同，唯一不同的是，对已经存在key时，put方法是直接覆盖旧值，而putIfAbsent是，返回旧值。

2.2.3 public void putAll(Map m)

public void putAll(Map<? extends K, ? extends V> m) {

for (Map.Entry<? extends K, ? extends V> e : m.entrySet())

put(e.getKey(), e.getValue());

}

直接将传人的Map类型的参数，遍历，调用put方法。

看过了put函数，我们将目标转向到get方法中，瞧一瞧get相关方法的实现：

2.2.4 public V get(Object key)源码分析

/**

Returns the value to which the specified key is mapped,
or {@code null} if this map contains no mapping for the key.
More formally, if this map contains a mapping from a key
{@code k} to a value {@code v} such that {@code key.equals(k)},
then this method returns {@code v}; otherwise it returns
{@code null}. (There can be at most one such mapping.)
@throws NullPointerException if the specified key is null

public V get(Object key) {

Segment<K,V> s; // manually integrate access methods to reduce overhead

HashEntry<K,V>[] tab;

int h = hash(key);

long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;

if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&

(tab = s.table) != null) {

for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile

(tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);

e != null; e = e.next) {

K k;

if ((k = e.key) == key || (e.hash == h && key.equals(k)))

return e.value;

}

return null;

}

从上文中可以看到，get方法并没有加锁，只是根据key的hash，然后算出Segment槽的位置，不是直接根据下标去获取Segment，也不是直接根据下标去Segment 的 HashEntity[] tab中去获取元素，而是使用了 UNSAFE.getObjectVolatile方法，直接操作内存，并使用volatile方式获取，最大程度保证可见性。有人或许有疑问，为什么get方法不加读锁，阻止其他写入请求呢？其实这样做意义并不大，ConcurrentHashMap的是一个容器，数据存储，提供基本的 put,get操作，对单一一个get请求加锁，没什么意义，因为get方法并不会改变ConcurrentHashMap的内部结构，在当前线程获取到key中的值，然后其他线程删除了该key,这在业务场景上本身就是正常不过的操作。所以get方法并不需要加锁。

2.3 浏览源码，发现无论是replace方法，还是remove方法等操作内部等都和HashMap相似，因为Segment就是一个带锁的HashMap。所以，接下来，我们可以这样思考，put,replace,remove这些方法比HashMap效率高，因为提供了并发度，那这些获取全局的属性的方法呢，比如keys,size等这些方法，性能又是如何呢？我们将目光转向size，keys等遍历方法。

2.3.1 public int size方法

/**

Returns the number of key-value mappings in this map. If the
map contains more than Integer.MAX_VALUE elements, returns
Integer.MAX_VALUE.
@return the number of key-value mappings in this map

public int size() {

// Try a few times to get accurate count. On failure due to

// continuous async changes in table, resort to locking.

final Segment<K,V>[] segments = this.segments;

int size;

boolean overflow; // true if size overflows 32 bits

long sum; // sum of modCounts

long last = 0L; // previous sum

int retries = -1; // first iteration isn’t retry

try {

for (;😉 {

if (retries++ == RETRIES_BEFORE_LOCK) {

for (int j = 0; j < segments.length; ++j)

ensureSegment(j).lock(); // force creation

}

sum = 0L;

size = 0;

overflow = false;

for (int j = 0; j < segments.length; ++j) {

Segment<K,V> seg = segmentAt(segments, j);

if (seg != null) {

sum += seg.modCount;

int c = seg.count;

if (c < 0 || (size += c) < 0)

overflow = true;

}

if (sum == last)

break;

last = sum;

}

} finally {

if (retries > RETRIES_BEFORE_LOCK) {

for (int j = 0; j < segments.length; ++j)

segmentAt(segments, j).unlock();

}

return overflow ? Integer.MAX_VALUE : size;

}

该方法的核心实现原理：从上文的解读，我想大家应该已经了解每一个Segment就是一个HashMap,HashMap中有两个变量，modCount，表示数据结构发生变化次数，比如put一个未在HashMap中包含的key,比如remove,比如clear方法，每调用一次上述方法，modCount就加1，也就是影响size属性的操作，都会将modeCount加一；另一个变量size，记录HashMap中键值对的个数。那ConcurrentHashMap的size方法，如果结构没有发生改变，只需将各个Segment的size相加，就可以得到ConcurrentHashMap的size,然并卯，在相加的过程其他线程如果有改变Segment内部的结构的话，导致size不准确，该方法的实现办法，是先乐观的尝试计算相加的过程最多三次，最少两次，如果前后两次的modCount一样，就说明在计算size的过程中，其他线程并没有改变ConcurrentHashMap的结构没有变化，则可以直接将size返回，结束该方法的调用，如果有变化，则需要依次对所有Segment申请加锁操作，只有获取全部锁后，然后对每个segment的size相加，然后是否锁，并返回size值。

代码@1,如果重试次数达到 (RETRIES_BEFORE_LOCK +１　,默认为2)次数后，说明需要加锁才能计算。

代码@2，对Segment相加计算size

代码@3，就是实现，判断连续两次计算出的modCount相等，说明该size值正确，否则，继续尝试，获取去请求锁。

2.3.2 public boolean isEmpty() 方法源码解读

/**

Returns true if this map contains no key-value mappings.
@return true if this map contains no key-value mappings

public boolean isEmpty() {

Sum per-segment modCounts to avoid mis-reporting when
elements are concurrently added and removed in one segment
while checking another, in which case the table was never
actually empty at any point. (The sum ensures accuracy up
through at least 1<<31 per-segment modifications before
recheck.) Methods size() and containsValue() use similar
constructions for stability checks.

long sum = 0L;

final Segment<K,V>[] segments = this.segments;

for (int j = 0; j < segments.length; ++j) {

Segment<K,V> seg = segmentAt(segments, j);

if (seg != null) {

if (seg.count != 0)

return false;

sum += seg.modCount;

}

if (sum != 0L) { // recheck unless no modifications

for (int j = 0; j < segments.length; ++j) {

Segment<K,V> seg = segmentAt(segments, j);

if (seg != null) {

if (seg.count != 0)

return false;

sum -= seg.modCount;

}

if (sum != 0L)

return false;

}

return true;

}

该方法的核心实现原理：就是遍历有所有的segment，一旦发现有存在size不等于0的segment，则返回false；如果发现所有的segment的size为0，则再次遍历，如果两次遍历时 modCount一样，则返回true,否则返回false。

大家再看看如下方法：

2.3.3 public boolean containsKey(Object key)

/**

Tests if the specified object is a key in this table.
@param key possible key
@return true if and only if the specified object

    is a key in this table, as determined by the

    <tt>equals</tt> method; <tt>false</tt> otherwise.

@throws NullPointerException if the specified key is null

@SuppressWarnings(“unchecked”)

public boolean containsKey(Object key) {

Segment<K,V> s; // same as get() except no need for volatile value read

HashEntry<K,V>[] tab;

int h = hash(key);

long u = (((h >>> segmentShift) & segmentMask) << SSHIFT) + SBASE;

if ((s = (Segment<K,V>)UNSAFE.getObjectVolatile(segments, u)) != null &&

(tab = s.table) != null) {

for (HashEntry<K,V> e = (HashEntry<K,V>) UNSAFE.getObjectVolatile

(tab, ((long)(((tab.length - 1) & h)) << TSHIFT) + TBASE);

e != null; e = e.next) {

K k;

if ((k = e.key) == key || (e.hash == h && key.equals(k)))

return true;

}

return false;

}

2.3.4 public boolean containsValue(Object value)

/**

Returns true if this map maps one or more keys to the
specified value. Note: This method requires a full internal
traversal of the hash table, and so is much slower than
method containsKey.
@param value value whose presence in this map is to be tested
@return true if this map maps one or more keys to the
```
    specified value
```
@throws NullPointerException if the specified value is null

public boolean containsValue(Object value) {

// Same idea as size()

if (value == null)

throw new NullPointerException();

final Segment<K,V>[] segments = this.segments;

boolean found = false;

long last = 0;

int retries = -1;

try {

outer: for (;😉 {

if (retries++ == RETRIES_BEFORE_LOCK) {

for (int j = 0; j < segments.length; ++j)

ensureSegment(j).lock(); // force creation

}

long hashSum = 0L;

int sum = 0;

for (int j = 0; j < segments.length; ++j) {

HashEntry<K,V>[] tab;

Segment<K,V> seg = segmentAt(segments, j);

if (seg != null && (tab = seg.table) != null) {

for (int i = 0 ; i < tab.length; i++) {

HashEntry<K,V> e;

for (e = entryAt(tab, i); e != null; e = e.next) {

V v = e.value;

if (v != null && value.equals(v)) {

found = true;

break outer;

}

sum += seg.modCount;

}

if (retries > 0 && sum == last)

break;

last = sum;

}

} finally {

if (retries > RETRIES_BEFORE_LOCK) {

for (int j = 0; j < segments.length; ++j)

segmentAt(segments, j).unlock();

}

return found;

}

2.3.5 public Set<Map.Entry<K,V>> entrySet(); 遍历元素方法。

public Set<Map.Entry<K,V>> entrySet() {

Set<Map.Entry<K,V>> es = entrySet;

return (es != null) ? es : (entrySet = new EntrySet());

}

final class EntrySet extends AbstractSet<Map.Entry<K,V>> {

public Iterator<Map.Entry<K,V>> iterator() {

return new EntryIterator();

}

public boolean contains(Object o) {

if (!(o instanceof Map.Entry))

return false;

Map.Entry<?,?> e = (Map.Entry<?,?>)o;

V v = ConcurrentHashMap.this.get(e.getKey());

return v != null && v.equals(e.getValue());

}

public boolean remove(Object o) {

if (!(o instanceof Map.Entry))

return false;

Map.Entry<?,?> e = (Map.Entry<?,?>)o;

return ConcurrentHashMap.this.remove(e.getKey(), e.getValue());

}

public int size() {

return ConcurrentHashMap.this.size();

}

public boolean isEmpty() {

return ConcurrentHashMap.this.isEmpty();

}

public void clear() {

ConcurrentHashMap.this.clear();

}

final class EntryIterator

extends HashIterator

implements Iterator<Entry<K,V>>

{

public Map.Entry<K,V> next() {

HashEntry<K,V> e = super.nextEntry();

return new WriteThroughEntry(e.key, e.value);

}

abstract class HashIterator {

int nextSegmentIndex;

int nextTableIndex;

HashEntry<K,V>[] currentTable;

HashEntry<K, V> nextEntry;

HashEntry<K, V> lastReturned;

HashIterator() {

nextSegmentIndex = segments.length - 1;

nextTableIndex = -1;

最后

看完美团、字节、腾讯这三家的面试问题，是不是感觉问的特别多，可能咱们又得开启面试造火箭、工作拧螺丝的模式去准备下一次的面试了。

开篇有提及我可是足足背下了1000道题目，多少还是有点用的呢，我看了下，上面这些问题大部分都能从我背的题里找到的，所以今天给大家分享一下互联网工程师必备的面试1000题。

注意不论是我说的互联网面试1000题，还是后面提及的算法与数据结构、设计模式以及更多的Java学习笔记等，皆可分享给各位朋友

最新“美团+字节+腾讯”一二三面问题，挑战一下你能走到哪一面？

互联网工程师必备的面试1000题

而且从上面三家来看，算法与数据结构是必备不可少的呀，因此我建议大家可以去刷刷这本左程云大佬著作的《程序员代码面试指南 IT名企算法与数据结构题目最优解》，里面近200道真实出现过的经典代码面试题。

最新“美团+字节+腾讯”一二三面问题，挑战一下你能走到哪一面？

is.size();

}

public boolean isEmpty() {

return ConcurrentHashMap.this.isEmpty();

}

public void clear() {

ConcurrentHashMap.this.clear();

}

final class EntryIterator

extends HashIterator

implements Iterator<Entry<K,V>>

{

public Map.Entry<K,V> next() {

HashEntry<K,V> e = super.nextEntry();

return new WriteThroughEntry(e.key, e.value);

}

abstract class HashIterator {

int nextSegmentIndex;

int nextTableIndex;

HashEntry<K,V>[] currentTable;

HashEntry<K, V> nextEntry;

HashEntry<K, V> lastReturned;

HashIterator() {

nextSegmentIndex = segments.length - 1;

nextTableIndex = -1;

最后

看完美团、字节、腾讯这三家的面试问题，是不是感觉问的特别多，可能咱们又得开启面试造火箭、工作拧螺丝的模式去准备下一次的面试了。

注意不论是我说的互联网面试1000题，还是后面提及的算法与数据结构、设计模式以及更多的Java学习笔记等，皆可分享给各位朋友

[外链图片转存中…(img-cK1XEEXA-1719270533421)]

互联网工程师必备的面试1000题

[外链图片转存中…(img-lO0X6Osh-1719270533421)]

2401_85112148

关注

12
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
java并发容器ConcurrentHashMap源码分析

看完美团、字节、腾讯这三家的面试问题，是不是感觉问的特别多，可能咱们又得开启面试造火箭、工作拧螺丝的模式去准备下一次的面试了。开篇有提及我可是足足背下了1000道题目，多少还是有点用的呢，我看了下，上面这些问题大部分都能从我背的题里找到的，所以今天给大家分享一下互联网工程师必备的面试1000题。注意不论是我说的互联网面试1000题，还是后面提及的算法与数据结构、设计模式以及更多的Java学习笔记等，皆可分享给各位朋友互联网工程师必备的面试1000题而且从上面三家来看，算法与数据结构是必备不可少的。
复制链接

扫一扫