jdk1.6 ConcurrentHashMap

最新推荐文章于 2023-03-01 17:17:20 发布

caoliangbo

最新推荐文章于 2023-03-01 17:17:20 发布

阅读量183

点赞数

文章标签：多线程 C C++ C# JVM

本文链接：https://blog.csdn.net/caoliangbo/article/details/83711468

版权

jdk1.6 ConcurrentHashMap

Concurrent包，鼎鼎大名的Doug Lea开发的。让我来好好向大师学习学习。

HashMap在单线程时效率非常高，但多线程环境下会出现许多问题。

HashTable支持多线程，但每个操作都会锁住整个数组，效率一般。

为了在效率和多线程安全之间找一个平衡点，Doug Lea引入了ConcurrentHashMap。

    /*
     * The basic strategy is to subdivide the table among Segments,
     * each of which itself is a concurrently readable hash table.
     */

1.存储数据结构：

final int segmentMask; //Segment长度减一
final int segmentShift; //偏移量

final Segment<K,V>[] segments;

(static final class Segment<K,V> extends ReentrantLock implements Serializable {...})

2.构造函数：注意参数concurrencyLevel：the estimated number of concurrently
updating threads即同时修改的线程数。表明并发量,直接影响Segment数组的长度。都是2的N次方。

    public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0) || initialCapacity < 0 || concurrencyLevel <= 0)
            throw new IllegalArgumentException();

        if (concurrencyLevel > MAX_SEGMENTS)
            concurrencyLevel = MAX_SEGMENTS;

        // Find power-of-two sizes best matching arguments
        int sshift = 0;
        int ssize = 1;
        while (ssize < concurrencyLevel) {
            ++sshift;
            ssize <<= 1;
        }
        segmentShift = 32 - sshift;
        segmentMask = ssize - 1;
        this.segments = Segment.newArray(ssize);

        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        int c = initialCapacity / ssize;
        if (c * ssize < initialCapacity)
            ++c;
        int cap = 1;
        while (cap < c)
            cap <<= 1;

        for (int i = 0; i < this.segments.length; ++i)
            this.segments[i] = new Segment<K,V>(cap, loadFactor);
    }

3.put(),putIfAbsent()方法:

    public V put(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, false);
    }

    public V putIfAbsent(K key, V value) {
        if (value == null)
            throw new NullPointerException();
        int hash = hash(key.hashCode());
        return segmentFor(hash).put(key, hash, value, true);
    }

首先对key.hashCode()进行了二次hash：

    private static int hash(int h) {
        // Spread bits to regularize both segment and index locations,
        // using variant of single-word Wang/Jenkins hash.
        h += (h << 15) ^ 0xffffcd7d;
        h ^= (h >>> 10);
        h += (h <<   3);
        h ^= (h >>> 6);
        h += (h <<   2) + (h << 14);
        return h ^ (h >>> 16);
    }

然后通过hash值定位对应的segment,使用segmentFor(hash):

    final Segment<K,V> segmentFor(int hash) {
        return segments[(hash >>> segmentShift) & segmentMask];
    }

可以看到，hash>>>segmentShift是将低位去除，使用高若干位来和segmentMask相与运算，得到应放入哪个segment。

然后调用segment.put()方法：

V put(K key, int hash, V value, boolean onlyIfAbsent) {
            lock();
            try {
                int c = count;
                if (c++ > threshold) // ensure capacity
                    rehash();
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry<K,V> first = tab[index];
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                V oldValue;
                if (e != null) {
                    oldValue = e.value;
                    if (!onlyIfAbsent)
                        e.value = value;
                }
                else {
                    oldValue = null;
                    ++modCount;
                    tab[index] = new HashEntry<K,V>(key, hash, first, value);
                    count = c; // write-volatile
                }
                return oldValue;
            } finally {
                unlock();
            }
        }

3.1 这里有一点要非常注意，即：

    static final class HashEntry<K,V> {
        final K key;
        final int hash;
        volatile V value;
        final HashEntry<K,V> next;

        HashEntry(K key, int hash, HashEntry<K,V> next, V value) {
            this.key = key;
            this.hash = hash;
            this.next = next;
            this.value = value;
        }

我们可以看到，在添加一个key-value时，next指针和key,hash都是final类型，也就是说不可变更，那么当我们获取一个HashEntry之后，不用担心它的next链表结构会被其他线程改变。

4.上述put(),putIfAbsent()方法内部同步了，使用的方式是通过Segment<K,V> extends ReentrantLock，使用ReentrantLock的lock()和unlock()方法实现。那么也就是说，锁的粒度是一个Segment,而其他的Segment并不会锁住。这相比于原来的HashTable锁，性能吞吐上明显提高，HashTable由于内部使用一个数组来实现的数据存储，因此上锁时，就会锁住整个数组，所有对数组的操作(put,get等)都会被阻塞。而ConcurrentHashMap使用多个数组来作为内部存储实现，上锁只会锁住其中一个数组，那么其他数组还是可以对外提供操作的。把锁的粒度减小了。

5.我们再来看看get()操作：

    public V get(Object key) {
        int hash = hash(key.hashCode());
        return segmentFor(hash).get(key, hash);
    }

     V get(Object key, int hash) {
            if (count != 0) { // read-volatile
                HashEntry<K,V> e = getFirst(hash);
                while (e != null) {
                    if (e.hash == hash && key.equals(e.key)) {
                        V v = e.value;

//这个为什么会产生呢？这是以前JDK会出现的问题，目前已解决了，原因是在put()方//法调用时，分四步插入:
// 1          this.key = key;
// 2          this.hash = hash;
// 3          this.next = next;
// 4         this.value = value;
//       有可能执行了1，但还没执行4，所以value为默认null，因此需要readValueUnderLock(e)来读同步。

//说明：产生这个问题的根本原因在于之前jvm初始化对象操作是在堆上完成，而现在的jvm是在工作线程完成对//象初始化，然后同步到堆中去，不会出现初始化一半的情况。其他线程要么看不到，要么看到完整的对象。
                        if (v != null)
                            return v;
                        return readValueUnderLock(e); // recheck
                    }
                    e = e.next;
                }
            }
            return null;
        }

     V readValueUnderLock(HashEntry<K,V> e) {
            lock();
            try {
                return e.value;
            } finally {
                unlock();
            }
        }

6.从put(),get(),remove()我们可以清晰地了解ConcurrentHashMap的内部实现思想及原理。put,remove是同步的，get()是非同步的。大多数时候，读起来非常快，接近HashMap.get()的速度。

7.remove()方法的实现：

整个remove实现并不复杂，但是需要注意如下几点。第一，当要删除的结点存在时，删除的最后一步操作要将count的值减一。这必须是最后一步操作，否则读取操作可能看不到之前对段所做的结构性修改。第二，remove执行的开始就将table赋给一个局部变量tab，这是因为table是 volatile变量，读写volatile变量的开销很大。编译器也不能对volatile变量的读写做任何优化，直接多次访问非volatile实例变量没有多大影响，编译器会做相应优化。

从remove()方法我们可以看出，在删除一个节点时，需要对链表该节点之前的所有节点重新创建，之后节点不用处理。

V remove(Object key, int hash, Object value) {
            lock();
            try {
                int c = count - 1;
                HashEntry<K,V>[] tab = table;
                int index = hash & (tab.length - 1);
                HashEntry<K,V> first = tab[index];
                HashEntry<K,V> e = first;
                while (e != null && (e.hash != hash || !key.equals(e.key)))
                    e = e.next;

                V oldValue = null;
                if (e != null) {
                    V v = e.value;
                    if (value == null || value.equals(v)) {
                        oldValue = v;
                        // All entries following removed node can stay
                        // in list, but all preceding ones need to be
                        // cloned.
                        ++modCount;
                        HashEntry<K,V> newFirst = e.next;

//重新创建之前的节点。

                         for (HashEntry<K,V> p = first; p != e; p = p.next)
                            newFirst = new HashEntry<K,V>(p.key, p.hash,
                                                          newFirst, p.value);
                        tab[index] = newFirst;
                        count = c; // write-volatile
                    }
                }
                return oldValue;
            } finally {
                unlock();
            }
        }

8.我们来看一下几个跨段操作的方法，size(),isEmpty().

   public int size() {
        final Segment<K,V>[] segments = this.segments;
        long sum = 0;
        long check = 0;
        int[] mc = new int[segments.length];
        // Try a few times to get accurate count. On failure due to
        // continuous async changes in table, resort to locking.
        for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
            check = 0;
            sum = 0;
            int mcsum = 0;
            for (int i = 0; i < segments.length; ++i) {
                sum += segments[i].count;
                mcsum += mc[i] = segments[i].modCount;
            }
            if (mcsum != 0) {
                for (int i = 0; i < segments.length; ++i) {
                    check += segments[i].count;
                    if (mc[i] != segments[i].modCount) {
                        check = -1; // force retry
                        break;
                    }
                }
            }
            if (check == sum)
                break;
        }

        //检测发现两次求和操作的值不一致，则证明有线程先修改，加锁求和。
        if (check != sum) { // Resort to locking all segments
            sum = 0;
            for (int i = 0; i < segments.length; ++i)
                segments[i].lock();
            for (int i = 0; i < segments.length; ++i)
                sum += segments[i].count;
            for (int i = 0; i < segments.length; ++i)
                segments[i].unlock();
        }
        if (sum > Integer.MAX_VALUE)
            return Integer.MAX_VALUE;
        else
            return (int)sum;
    }

size()实现思路就是先不加锁，求两次和，检测modCount是否有改变（表明是否有其他线程在修改ConcurrentHashMap的某个段），重复尝试2次后，若还不相等(元素个数发生改变)，则对所有段加锁求和。

public boolean isEmpty() {
        final Segment<K,V>[] segments = this.segments;
        /*
         * We keep track of per-segment modCounts to avoid ABA
         * problems in which an element in one segment was added and
         * in another removed during traversal, in which case the
         * table was never actually empty at any point. Note the
         * similar use of modCounts in the size() and containsValue()
         * methods, which are the only other methods also susceptible
         * to ABA problems.
         */
        int[] mc = new int[segments.length];
        int mcsum = 0;
        for (int i = 0; i < segments.length; ++i) {
            if (segments[i].count != 0)
                return false;
            else
                mcsum += mc[i] = segments[i].modCount;
        }
        // If mcsum happens to be zero, then we know we got a snapshot
        // before any modifications at all were made. This is
        // probably common enough to bother tracking.
        if (mcsum != 0) {
            for (int i = 0; i < segments.length; ++i) {
                if (segments[i].count != 0 ||
                    mc[i] != segments[i].modCount)
                    return false;
            }
        }
        return true;
    }