ConcurrentHashMap1.8源码初读

最新推荐文章于 2022-11-11 12:17:22 发布

董董董不懂

最新推荐文章于 2022-11-11 12:17:22 发布

阅读量271

点赞数 1

分类专栏：并发编程文章标签： java 数据结构链表

本文链接：https://blog.csdn.net/qq_36741161/article/details/108183327

版权

并发编程专栏收录该内容

4 篇文章 0 订阅

订阅专栏

前言：最近在看《java并发编程的艺术》这本书上看到了juc包下的并发容器concurrentHashMap的实现，基于segment+数组+链表的锁分段技术，在看jdk1.8源码时，发现虽然segment对象还存在，但是已没有该对象的引用，在jdk1.8之后已放弃segment+数组+链表的锁分段技术，而采用cas算法+synchronized+（数组+链表+红黑树）实现，我猜想应该和jdk1.8中优化了synchronized关键字有关吧~

jdk1.7和jdk1.8在实现上的异同

jdk1.7：segment+数组+链表

使用分段锁技术，concurrentHashMap为若干个segment对象的数组，每个segment对象继承ReentrantLock，为一个锁，segment对象为若干HashEntry对象的数组，HashEntry对象中的value属性定义为volatile，所以在get操作不需要获取锁，当对HashEntry数组的数据进行修改时（put操作），必须获得与它对应的segment锁。
jdk1.8：CAS + synchronized保证并发更新，数据结构中加入红黑树，增加寻址效率

jdk1.8新增加了几个重要的属性：

table：默认为空，在第一次进行put操作时初始化，初始化默认大小为16，用来存储Node和TreeBin对象
nextTable：扩容时新生成的对象，大小为原数组table的2倍
Node：保存key，value及key的hash值等的数据结构

 static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        // volatile关键字修饰，保证可见性
        volatile V val;
        volatile Node<K,V> next;
        // 省略代码...
}

TreeBin：若某个hash值下的链表数量大于8时并且当前table数量大于等于64（小于64会优先扩容），会构建成TreeNode对象节点的红黑树，TreeBin对象是包装TreeNode对象的，其中保存了数的根节点和当前节点等信息，不保存map的key、value等信息

static final class TreeBin<K,V> extends Node<K,V> {
        TreeNode<K,V> root;
        volatile TreeNode<K,V> first;
        volatile Thread waiter;
        volatile int lockState;
        // values for lockState
        static final int WRITER = 1; // set while holding write lock
        static final int WAITER = 2; // set when waiting for write lock
        static final int READER = 4; // increment value for setting read lock
}

ForwardingNode：一个特殊的Node节点，hash值为-1，其中存储nextTable的引用

final class ForwardingNode<K,V> extends Node<K,V> {
    final Node<K,V>[] nextTable;
    ForwardingNode(Node<K,V>[] tab) {
        super(MOVED, null, null, null);
        this.nextTable = tab;
    }
}

sizeCtl：默认为0，用来控制table的初始化和扩容操作，一个变量在不同的场景下含义不同
-1：表示table正在初始化
-N：表示有N-1个线程正在进行扩容操作
当table未初始化时：表示table初始化的大小
当table初始化完成时：表示table当前的容量（通常 length*0.75）

接下来重点分析下jdk1.8中ConcurrentHashMap如何实现，如何保证线程安全，如何提高效率。

ConcurrentHashMap初始化

concurrentHashMap的构造方法有5个，通常我们使用无参构造器，当我们知道map的大小时，推荐使用构建指定大小的构造器，避免无效的扩容操作。
在这里插入图片描述
当使用ConcurrentHashMap（int）构造器时，构建的table大小计算：
1.initialCapacity + (initialCapacity >>> 1) + 1
2.获取上面公式结果的最近的2的幂次方为table大小

private static final int tableSizeFor(int c) {
        int n = c - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

table初始化

table的初始化发生在map第一次put数据时进行，避免了table初始化后第一次操作就扩容，从而影响效率

// put操作
final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            // 如果table为空，则初始化table
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
           // 省略代码...
}

initTable()方法的具体实现如下：

  /**
     * Initializes table, using the size recorded in sizeCtl.
     */
    private final Node<K,V>[] initTable() {
        Node<K,V>[] tab; int sc;
        // 自旋在table为空条件中
        while ((tab = table) == null || tab.length == 0) {
            if ((sc = sizeCtl) < 0)
                // 若sizeCtl小于0表示有其他线程在初始化操作，当前线程等待
                Thread.yield(); // lost initialization race; just spin
            // 线程安全的修改sizeCtl=-1
            else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
                try {
                    // 再次判断table是否为空
                    if ((tab = table) == null || tab.length == 0) {
                        int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                        @SuppressWarnings("unchecked")
                        Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                        table = tab = nt;
                        // table的容量（长度*0.75）
                        sc = n - (n >>> 2);
                    }
                } finally {
                    sizeCtl = sc;
                }
                break;
            }
        }
        return tab;
    }

put()方法实现

put（）方法的具体实现如下：

 /** Implementation for put and putIfAbsent */
    final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        // 计算hsah -> (h ^ (h >>> 16)) & HASH_BITS;
        int hash = spread(key.hashCode());
        int binCount = 0;
        // 自旋
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            // table为空，初始化table
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            // i = (n - 1) & hash -> table中定位索引位置，n是table的大小，
            // 值得注意的是：此处使用(Node<K,V>)U.getObjectVolatile(tab, ((long)i << ASHIFT) + ABASE)来获取table中下标为i的数据，而不使用table[i]来获取,是保证了并发下的可见性，虽然table是volatile修饰的，但数组中的对象的引用并不是并发可见的，Unsafe.getObjectVolatile可以直接获取指定内存的数据，保证了每次拿到数据都是最新的
            // 同样值得注意的是：在操作把当前Node节点赋值时,使用casTabAt方法，保证了线程安全的修改，如果CAS失败，说明有其它线程提前插入了节点，自旋重新尝试在这个位置插入节点
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            // 如果f的hash值为-1，说明当前f是ForwardingNode节点，意味有其它线程正在扩容，则一起进行扩容操作。
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
               // hash计算出的数组位置上已有节点node或treeBin
               // 省略代码...
            }
        }
        // 增加数量并判断扩容
        addCount(1L, binCount);
        return null;
    }

put()方法大概的实现逻辑是：

计算key的hash值，通过再散列的方式，避免key集中在同一个hsah值中
(1)如果table为空，初始化table，
(2)其次如果hash值对应的table下标里的数据(f)为null，说明table中这个位置第一次插入元素，利用Unsafe.compareAndSwapObject方法插入Node节点，并跳出自旋
(3)其次如果hsah值为-1，则表示当前为ForwardingNode节点，意味有其它线程正在扩容，则一起进行扩容操作，
(4)最后以上三种情况都不满足，则表示hash计算出的数组位置上已有节点node或treeBin，要在链表或树后面增加节点，这个过程采用同步内置锁实现并发

	{
	// 省略代码...
	// hash计算出的数组位置上已有节点node或treeBin时，同步添加或修改节点
               // 同步执行
				synchronized (f) {
					// 再次判断abAt(tab, i)是否等于f
                    if (tabAt(tab, i) == f) {
                        // 若hash值大于等于0，则表示f节点为node对象
                        if (fh >= 0) {
                            binCount = 1;
                            // 循环链表
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                // 存在该key的值，更新值
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                // 不存在则判断，链表中该节点是否为尾结点，若是，则创建新节点，使该节点next指向新节点（新的尾节点）
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        // f为红黑树根节点
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            // 在红黑树中做更新或插入数节点的操作
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                 if (binCount != 0) {
                    // 链表中节点数大于8，则把链表转化为红黑树结构，提高遍历查询效率。
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
        // 省略代码...
 }

针对hash计算出的数组位置上已有节点node或treeBin的情况，执行逻辑大致为，在hash值对应的数组下标的对象（f）上加锁，再次利用tabAt(tab, i) == f判断，防止被其它线程修改。
(1) 如果f.hash >= 0，说明f是链表结构的头结点，则遍历链表，
若找到key相同的链表节点，则更新value值，否则找到next为null的节点(e)，构建新链表节点，e.next指向新节点。
(2)如果f是TreeBin类型节点，说明f是红黑树根节点，则在树结构上遍历元素，更新或增加节点。
(3)如果链表中节点数binCount >= TREEIFY_THRESHOLD(默认是8)，且table长度大于等于64，则把链表转化为红黑树结构，binCount >= TREEIFY_THRESHOLD(默认是8)且小于64，则优先扩容。下面是链表转数的逻辑代码。

  /**
     * Replaces all linked nodes in bin at given index unless table is
     * too small, in which case resizes instead.
     */
    private final void treeifyBin(Node<K,V>[] tab, int index) {
        Node<K,V> b; int n, sc;
        if (tab != null) {
            // 如果table.length<64 就扩大一倍 返回  
            if ((n = tab.length) < MIN_TREEIFY_CAPACITY)
                tryPresize(n << 1);
            else if ((b = tabAt(tab, index)) != null && b.hash >= 0) {
               // 同步执行
                synchronized (b) {
                    // 在此确认index位置上是否为b元素
                    if (tabAt(tab, index) == b) {
                        TreeNode<K,V> hd = null, tl = null;
                        // 循环链表构建红黑树
                        for (Node<K,V> e = b; e != null; e = e.next) {
                            TreeNode<K,V> p =
                                new TreeNode<K,V>(e.hash, e.key, e.val,
                                                  null, null);
                           // p树节点prev为空，表示根节点                     
                            if ((p.prev = tl) == null)
                                hd = p;
                            else
                                tl.next = p;
                            tl = p;
                        }
                        // 传入根节点将TreeNode线程安全的包装成TreeBin对象
                        setTabAt(tab, index, new TreeBin<K,V>(hd));
                    }
                }
            }
        }
    }

从代码中可以看出，生成树前先判断table长度是否小于64，则扩容返回；否则生成树，生成树节点的代码块是同步的，进入同步代码块之后，再次验证table中index位置元素是否被修改过。根据table中index位置Node链表，重新生成一个hd为头结点的TreeNode链表，并传入根节点包装成TreeBin树结构。

table扩容
在执行完put数据后，会判断table是否需要扩容 addCount(1L, binCount),
当table容量不足的时候，即table的元素数量达到容量阈值sizeCtl(table长度*0.75)，需要对table进行扩容。整个扩容分为两部分：
(1)构建一个nextTable，大小为table的两倍。
(2)把table的数据复制到nextTable中。

/**
     * Adds to count, and if table is too small and not already
     * resizing, initiates transfer. If already resizing, helps
     * perform transfer if work is available.  Rechecks occupancy
     * after a transfer to see if another resize is already needed
     * because resizings are lagging additions.
     *
     * @param x the count to add
     * @param check if <0, don't check resize, if <= 1 only check if uncontended
     */
    private final void addCount(long x, int check) {
        CounterCell[] as; long b, s;
        // 利用CAS更新baseCount 
        if ((as = counterCells) != null ||
            !U.compareAndSwapLong(this, BASECOUNT, b = baseCount, s = b + x)) {
            CounterCell a; long v; int m;
            boolean uncontended = true;
            if (as == null || (m = as.length - 1) < 0 ||
                (a = as[ThreadLocalRandom.getProbe() & m]) == null ||
                !(uncontended =
                  U.compareAndSwapLong(a, CELLVALUE, v = a.value, v + x))) {
                // 多线程修改baseCount时，竞争失败的线程会执行fullAddCount(x, uncontended),把x的值插入到counterCell类中
                fullAddCount(x, uncontended);
                return;
            }
            if (check <= 1)
                return;
            // table中当前数量
            s = sumCount();
        }
        if (check >= 0) {
            Node<K,V>[] tab, nt; int n, sc;
            // 当前table数量大于阈值时，扩容
            while (s >= (long)(sc = sizeCtl) && (tab = table) != null &&
                   (n = tab.length) < MAXIMUM_CAPACITY) {
                int rs = resizeStamp(n);
                // 其他线程在操作
                if (sc < 0) {
                   // 其他线程在初始化，break；
                    if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                        sc == rs + MAX_RESIZERS || (nt = nextTable) == null ||
                        transferIndex <= 0)
                        break;
                    // 其他线程正在扩容，协助扩容
                    if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1))
                        transfer(tab, nt);
                }
                else if (U.compareAndSwapInt(this, SIZECTL, sc,
                                             (rs << RESIZE_STAMP_SHIFT) + 2))
                    // 仅当前线程在扩容
                    transfer(tab, null);
                s = sumCount();
            }
        }
    }

transfer方法是扩容的实现方法，主要逻辑见下面代码：
在这里插入图片描述

  /**
     * 一个过渡的table表  只有在扩容的时候才会使用
     */
    private transient volatile Node<K,V>[] nextTable;
 
 /**
     * Moves and/or copies the nodes in each bin to new table. See
     * above for explanation.
     */
    private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
        int n = tab.length, stride;
        if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
            stride = MIN_TRANSFER_STRIDE; // subdivide range
        if (nextTab == null) {            // initiating
            try {
                @SuppressWarnings("unchecked")
                Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];//构造一个nextTable对象 它的容量是原来的两倍
                nextTab = nt;
            } catch (Throwable ex) {      // try to cope with OOME
                sizeCtl = Integer.MAX_VALUE;
                return;
            }
            nextTable = nextTab;
            transferIndex = n;
        }
        int nextn = nextTab.length;
        ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);//构造一个连节点指针 用于标志位
        boolean advance = true;//并发扩容的关键属性 如果等于true 说明这个节点已经处理过
        boolean finishing = false; // to ensure sweep before committing nextTab
        for (int i = 0, bound = 0;;) {
            Node<K,V> f; int fh;
            //这个while循环体的作用就是在控制i--  通过i--可以依次遍历原hash表中的节点
            while (advance) {
                int nextIndex, nextBound;
                if (--i >= bound || finishing)
                    advance = false;
                else if ((nextIndex = transferIndex) <= 0) {
                    i = -1;
                    advance = false;
                }
                else if (U.compareAndSwapInt
                         (this, TRANSFERINDEX, nextIndex,
                          nextBound = (nextIndex > stride ?
                                       nextIndex - stride : 0))) {
                    bound = nextBound;
                    i = nextIndex - 1;
                    advance = false;
                }
            }
            if (i < 0 || i >= n || i + n >= nextn) {
                int sc;
                if (finishing) {
                	//如果所有的节点都已经完成复制工作  就把nextTable赋值给table 清空临时对象nextTable
                    nextTable = null;
                    table = nextTab;
                    sizeCtl = (n << 1) - (n >>> 1);//扩容阈值设置为原来容量的1.5倍  依然相当于现在容量的0.75倍
                    return;
                }
                //利用CAS方法更新这个扩容阈值，在这里面sizectl值减一，说明新加入一个线程参与到扩容操作
                if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                    if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                        return;
                    finishing = advance = true;
                    i = n; // recheck before commit
                }
            }
            //如果遍历到的节点为空 则放入ForwardingNode指针
            else if ((f = tabAt(tab, i)) == null)
                advance = casTabAt(tab, i, null, fwd);
            //如果遍历到ForwardingNode节点  说明这个点已经被处理过了 直接跳过  这里是控制并发扩容的核心
            else if ((fh = f.hash) == MOVED)
                advance = true; // already processed
            else {
            		//节点上锁
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        Node<K,V> ln, hn;
                        //如果fh>=0 证明这是一个Node节点
                        if (fh >= 0) {
                            int runBit = fh & n;
                            //以下的部分在完成的工作是构造两个链表  一个是原链表  另一个是原链表的反序排列
                            Node<K,V> lastRun = f;
                            for (Node<K,V> p = f.next; p != null; p = p.next) {
                                int b = p.hash & n;
                                if (b != runBit) {
                                    runBit = b;
                                    lastRun = p;
                                }
                            }
                            if (runBit == 0) {
                                ln = lastRun;
                                hn = null;
                            }
                            else {
                                hn = lastRun;
                                ln = null;
                            }
                            for (Node<K,V> p = f; p != lastRun; p = p.next) {
                                int ph = p.hash; K pk = p.key; V pv = p.val;
                                if ((ph & n) == 0)
                                    ln = new Node<K,V>(ph, pk, pv, ln);
                                else
                                    hn = new Node<K,V>(ph, pk, pv, hn);
                            }
                            //在nextTable的i位置上插入一个链表
                            setTabAt(nextTab, i, ln);
                            //在nextTable的i+n的位置上插入另一个链表
                            setTabAt(nextTab, i + n, hn);
                            //在table的i位置上插入forwardNode节点  表示已经处理过该节点
                            setTabAt(tab, i, fwd);
                            //设置advance为true 返回到上面的while循环中 就可以执行i--操作
                            advance = true;
                        }
                        //对TreeBin对象进行处理  与上面的过程类似
                        else if (f instanceof TreeBin) {
                            TreeBin<K,V> t = (TreeBin<K,V>)f;
                            TreeNode<K,V> lo = null, loTail = null;
                            TreeNode<K,V> hi = null, hiTail = null;
                            int lc = 0, hc = 0;
                            //构造正序和反序两个链表
                            for (Node<K,V> e = t.first; e != null; e = e.next) {
                                int h = e.hash;
                                TreeNode<K,V> p = new TreeNode<K,V>
                                    (h, e.key, e.val, null, null);
                                if ((h & n) == 0) {
                                    if ((p.prev = loTail) == null)
                                        lo = p;
                                    else
                                        loTail.next = p;
                                    loTail = p;
                                    ++lc;
                                }
                                else {
                                    if ((p.prev = hiTail) == null)
                                        hi = p;
                                    else
                                        hiTail.next = p;
                                    hiTail = p;
                                    ++hc;
                                }
                            }
                            //如果扩容后已经不再需要tree的结构 反向转换为链表结构
                            ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                                (hc != 0) ? new TreeBin<K,V>(lo) : t;
                            hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                                (lc != 0) ? new TreeBin<K,V>(hi) : t;
                             //在nextTable的i位置上插入一个链表    
                            setTabAt(nextTab, i, ln);
                            //在nextTable的i+n的位置上插入另一个链表
                            setTabAt(nextTab, i + n, hn);
                             //在table的i位置上插入forwardNode节点  表示已经处理过该节点
                            setTabAt(tab, i, fwd);
                            //设置advance为true 返回到上面的while循环中 就可以执行i--操作
                            advance = true;
                        }
                    }
                }
            }
        }
    }

想了解transfer()方法实现细节的可以看下这篇文章，讲的很详细：https://www.jianshu.com/p/f6730d5784ad

get()方法实现

相比put方法，get方法就简单的多了，get方法的代码如下：

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code key.equals(k)},
     * then this method returns {@code v}; otherwise it returns
     * {@code null}.  (There can be at most one such mapping.)
     *
     * @throws NullPointerException if the specified key is null
     */
    public V get(Object key) {
        Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
        // 计算key的hsah值
        int h = spread(key.hashCode());
        // 如果table为空或hash计算的下标对应的节点为空，则返回null
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (e = tabAt(tab, (n - 1) & h)) != null) {
            if ((eh = e.hash) == h) {
                // 如果table下标元素中的第一个节点（e）key的hash值等于查询key的hash值且key相同，则返回value值
                if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                    return e.val;
            }
            //如果eh<0 说明这个节点在树上 直接寻找，树节点的hsah为-2
            else if (eh < 0)
                return (p = e.find(h, key)) != null ? p.val : null;
            //否则遍历链表 找到对应的值并返回
            while ((e = e.next) != null) {
                if (e.hash == h &&
                    ((ek = e.key) == key || (ek != null && key.equals(ek))))
                    return e.val;
            }
        }
        return null;
    }

大致逻辑为：
（1）计算key的hash值
（2）如果table或hash对应的数组下标的对象为空（tabAt(tab, (n - 1) & h))），则返回null
（3）获取指定table中指定位置的Node节点，通过遍历链表或则树结构找到对应的节点，返回value值。

总结

在java很多底层的代码中都使用了Map这种数据结构，这也意味着线程安全的使用Map至关重要。
HashTable：使用一个全局的锁来同步不同线程间的并发访问，同一时间点，只能有一个线程持有锁，也就是说在同一时间点，只能有一个线程能访问容器，这虽然保证多线程间的安全并发访问，但同时也导致对容器的访问变成串行化。
同步包装器包装的 HashMap：1.6中采用ReentrantLock 分段锁的方式，使多个线程在不同的segment上进行写操作不会发现阻塞行为。
jdk1.8ConcurrentHashMap：是一个并发散列映射表的实现，它允许完全并发的读取，并且支持给定数量的并发更新。实现了HashMap的hash数组中每一个元素都有一把锁。