多线程编程(13)之ConcurrentHashMap深度源码分析

技术路上的苦行僧

于 2024-05-30 16:43:15 发布

阅读量837

点赞数 26

分类专栏： JAVA并发专题文章标签： java 源码分析并发容器

本文链接：https://blog.csdn.net/jokeMqc/article/details/139299760

版权

JAVA并发专题专栏收录该内容

16 篇文章 0 订阅

订阅专栏

一、概述

学习ConcurrentHashMap基本概念以及它底层的数据结构。

数据结构如下：

数组+链表+红黑树+锁(synchronized+cas)。

总结：

数据结构和hashMap是一样的，唯一的区别就是concurrenthashmap在put、删除、修改、扩容和数据迁移都加了锁，实现了线程安全。
加锁只是锁住一个元素，相对于hashtable不一样。

二、源码分析

2.1ConcurrentHashMap继承图

2.2 成员变量分析

认识一下ConcurrentHashMap成员变量，方便后续源码分析。

      // table的最大容量
      private static final int MAXIMUM_CAPACITY = 1 << 30;

	// 默认容量
    private static final int DEFAULT_CAPACITY = 16;

	// 数组建议的最大值
    static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

    // 并发级别，遗留下来的，为了兼容以前的版本
    private static final int DEFAULT_CONCURRENCY_LEVEL = 16;

    /**
     * The load factor for this table. Overrides of this value in
     * constructors affect only the initial table capacity.  The
     * actual floating point value isn't normally used -- it is
     * simpler to use expressions such as {@code n - (n >>> 2)} for
     * the associated resizing threshold.
     */
    private static final float LOAD_FACTOR = 0.75f;

	// 链表转红黑树阈值，>8，链表转红黑树
    static final int TREEIFY_THRESHOLD = 8;

    // 树转链表
    static final int UNTREEIFY_THRESHOLD = 6;

    // 转红黑树，表的最小容量
    static final int MIN_TREEIFY_CAPACITY = 64;

    // 每次数据迁移的最小值，后续会说这个字段的作用
    private static final int MIN_TRANSFER_STRIDE = 16;

2.3 构造函数分析

了解concurrentHashMap构造函数所做的事情。

1、 ConcurrentHashMap()构造函数


    /**
     * Creates a new, empty map with the default initial table size (16).
     */
    public ConcurrentHashMap() {
    }

总结：该构造函数用于创建一个带有默认初始容量(16)，负载因子(0.75)的空映射。

2、 ConcurrentHashMap(int initialCapacity)

    public ConcurrentHashMap(int initialCapacity) {
		// 初始容量小于0则抛出异常
        if (initialCapacity < 0)
            throw new IllegalArgumentException();
        int cap = ((initialCapacity >= (MAXIMUM_CAPACITY >>> 1)) ?
                   MAXIMUM_CAPACITY :
                   tableSizeFor(initialCapacity + (initialCapacity >>> 1) + 1));
		// 初始化
        this.sizeCtl = cap;
    }

总结：该构造函数用于创建一个带有指定初始化容量的map。

3、 ConcurrentHashMap(Map<? extends K, ? extends V> m)

    /**
     * Creates a new map with the same mappings as the given map.
     *
     * @param m the map
     */
    public ConcurrentHashMap(Map<? extends K, ? extends V> m) {
        this.sizeCtl = DEFAULT_CAPACITY;
        putAll(m);
    }

总结：该构造函数用于构造一个与给定隐射具有相同隐射关系的新隐射。

4、ConcurrentHashMap(int initialCapacity, float loadFactor)

    public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0.0f) || initialCapacity < 0 || concurrencyLevel <= 0) // 参数合法性判断
            throw new IllegalArgumentException();
        if (initialCapacity < concurrencyLevel)   // Use at least as many bins
            initialCapacity = concurrencyLevel;   // as estimated threads
		初始化的size 是拿传入的initialCapacity / 负载因子再+1，这里与hashmap不同
        long size = (long)(1.0 + (long)initialCapacity / loadFactor);
        int cap = (size >= (long)MAXIMUM_CAPACITY) ?
            MAXIMUM_CAPACITY : tableSizeFor((int)size);
        this.sizeCtl = cap;
    }

总结：该构造函数用于创建一个带有指定初始容量、加载因子和并发级别的。

下面看一个代码实例：

1)代码实例

2）在put之前。

3）put之后。

4)原理剖析。

    public ConcurrentHashMap(int initialCapacity,
                             float loadFactor, int concurrencyLevel) {
        if (!(loadFactor > 0.0f) || initialCapacity < 0 || concurrencyLevel <= 0) // 参数合法性判断
            throw new IllegalArgumentException();
        if (initialCapacity < concurrencyLevel)   // Use at least as many bins
            initialCapacity = concurrencyLevel;   // as estimated threads
		初始化的size 是拿传入的initialCapacity / 负载因子再+1，这里与hashmap不同，所以这里传入的是31，而不是15！！！！！
        long size = (long)(1.0 + (long)initialCapacity / loadFactor);
        int cap = (size >= (long)MAXIMUM_CAPACITY) ?
            MAXIMUM_CAPACITY : tableSizeFor((int)size);
        this.sizeCtl = cap;
    }

啥时候变成24？其实是在第一次put初始化table的时候，其实这个sizectl就是拿容量 *0.75。如下源码：

/**
 * 在刚开始的时候，table为null,在put时，会触发table的初始化
 */
private final Node<K,V>[] initTable() {
        Node<K,V>[] tab; int sc;
        while ((tab = table) == null || tab.length == 0) {
			// sc = 原来的sizeCtl也就是上面看到的32
            if ((sc = sizeCtl) < 0)
                Thread.yield(); // 这里是多线程初始化的时候，如果当前table正在初始化，则让出时间片
            else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
                try {
                    if ((tab = table) == null || tab.length == 0) {
						// n = sc = 32
                        int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
						// 创建node数组，长度为n,也就是32
                        @SuppressWarnings("unchecked")
                        Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
						// 创建完赋值给table，初始化完成，也就是我们看到的32长度的数组
                        table = tab = nt;
						// n>>>2 也就是n /4 = 32/4=8,32-8 = 24
						// 实际的效果相当于 n * 3/4,也就是乘以0.75，所以传入的指定的负载因子0.5并没有使用
                        sc = n - (n >>> 2);
                    }
                } finally {
					// 最后把上面计算的 sc 赋值给sizeCtl
                    sizeCtl = sc;
                }
                break;
            }
        }
        return tab;
    }

那么sizeCtl的作用是什么？下面我们看源码中的注释。

    /**
     * Table initialization and resizing control.  When negative, the
     * table is being initialized or resized: -1 for initialization,
     * else -(1 + the number of active resizing threads).  Otherwise,
     * when table is null, holds the initial table size to use upon
     * creation, or 0 for default. After initialization, holds the
     * next element count value upon which to resize the table.
     */
    private transient volatile int sizeCtl;

翻译：

默认为0，用来控制table的初始化和扩容操作。
-1代表table正在初始化。
-N表示有N-1个线程正在进行扩容工作。

其余情况：

如果table未初始化，表示table需要初始化的大小。
如果table初始化完成，表示table的容量默认是table大小的0.75倍。

2.4 put源码分析

下面补充一下关于compareAndSwapInt (CAS)的一些理论知识。

U.compareAndSwapInt(this, SIZECTL, sc, -1)

解释：

次方法是Java的native方法，并不由Java语言实现。
方法的作用是，读取传入对象this在内存中的偏移量伟sizeCTL位置的值与期望值sc做比较，如果相等那么就把值赋值给对应位置并且返回true，否则返回false。
一般配合循环重试使用(自旋锁).

2.4.1 put流程图

2.4.2 put源码分析

	public V put(K key, V value) {
			return putVal(key, value, false);
		}

    /** Implementation for put and putIfAbsent */
    final V putVal(K key, V value, boolean onlyIfAbsent) {
		// key跟value都不能为空
        if (key == null || value == null) throw new NullPointerException();
		// 进过扰动计算得到hash值
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) { // 这里是一个自旋，一直等到put成功
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
				// 表为空的话，初始化表
                tab = initTable(); 
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
				// 如果插槽为空，cas插入元素，比较是否为null，如果是null才会设置并且跳出循环
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
				// 这里是判断是否正在做扩容迁移，如果是，当前线程会帮助进行迁移
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
				// 如果插槽不为空，那就加锁，锁的当前插槽上的头结点f,类似于分段锁
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
							// 一直沿着链表往后找，这里跟hashmap的逻辑差不多
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
					// 是否需要转成红黑树
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

2.4.3 初始化表方法

    /**
	 * 注意点：先以单线程看业务流程，再类比多个线程同时操作一下代码，是如何处理的？
	 */
	private final Node<K,V>[] initTable() {
        Node<K,V>[] tab; int sc;
        while ((tab = table) == null || tab.length == 0) {
			// 第一个线程进来，sizeCtl > 0 不满足这个条件继续往下执行
            if ((sc = sizeCtl) < 0)
				// 第2个线程进来，此时第一个线程已经把sizeCtl设置成-1，代表当前table正在初始化，则让出时间片，等待下一轮循环唤醒
                Thread.yield(); // lost initialization race; just spin
            else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) { // 通过CAS操作来讲sizeCtl设置成-1，如果设置成功表示当前线程获得初始化table的资格
                try {
					// 再次判断table是否为null
                    if ((tab = table) == null || tab.length == 0) {
                        int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
						// 初始化table
                        @SuppressWarnings("unchecked")
                        Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                        table = tab = nt;
						// 计算扩容阈值，size * 0.75
                        sc = n - (n >>> 2);
                    }
                } finally {
                    sizeCtl = sc;
                }
                break;
            }
        }
        return tab;
    }

以下是上面这段代码多线程情况下的模拟：

2.4.4 扩容源码分析

    /**
	 * tab=旧数组，f=头结点，如果正在扩容它是一个ForwardingNode类型
	 */
	final Node<K,V>[] helpTransfer(Node<K,V>[] tab, Node<K,V> f) {
        Node<K,V>[] nextTab; int sc;
        if (tab != null && (f instanceof ForwardingNode) &&
            (nextTab = ((ForwardingNode<K,V>)f).nextTable) != null) {
            int rs = resizeStamp(tab.length);
			// sizeCtl <0 说明正在扩容中
            while (nextTab == nextTable && table == tab &&
                   (sc = sizeCtl) < 0) {
                if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
                    sc == rs + MAX_RESIZERS || transferIndex <= 0)
                    break;
                if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) {
					// 扩容的核心逻辑
                    transfer(tab, nextTab);
                    break;
                }
            }
            return nextTab;
        }


	private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
        int n = tab.length, stride;
		// 将length /8 然后除以核心线程数，如果得到的结果小于16，那么就使用16，这个数值表示当个线程迁移的阈值
        if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
            stride = MIN_TRANSFER_STRIDE; // subdivide range
        if (nextTab == null) {            // 新的table，初始化
            try {
				// 扩容两倍
                @SuppressWarnings("unchecked")
                Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
                nextTab = nt;
            } catch (Throwable ex) {      // try to cope with OOME
                sizeCtl = Integer.MAX_VALUE;
                return;
            }
			// 更新成员变量
            nextTable = nextTab;
			/**
			 * transferIndex 表示没迁移之前桶里面的最大索引值，这个会被多线程去做拆分，通过上面定义的stride，每个线程进来会瓜分走stride个桶，这里是通过多线程来进行迁移
			 * 如果原先的数组长度n= 16,那么transferIndex此时就为16
			 */
            transferIndex = n;
        }
			
		// 新table的length
        int nextn = nextTab.length;
		// 创建一个fwd节点用于标记，当别的线程发现这个槽位中是属于fwd类型的节点，则跳过这个节点
        ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
		// 临时变量，表示不要移动槽
        boolean advance = true;
		// 临时变量，表示当前槽位还没有迁移完
        boolean finishing = false; // to ensure sweep before committing nextTab
        for (int i = 0, bound = 0;;) { // 每次for循环一个桶来迁移，也就是旧table里的一个元素
            Node<K,V> f; int fh;
			// 其实这里很简单，就是给当前的线程分配迁移任务，nextIndex：表示从后往前迁移的节点，如果nextIndex <=0，则表示已经迁移到了表头没有需要迁移的元素了
			// 
            while (advance) {
                int nextIndex, nextBound;
                if (--i >= bound || finishing)
                    advance = false;
                else if ((nextIndex = transferIndex) <= 0) {
                    i = -1;
                    advance = false;
                }
                else if (U.compareAndSwapInt
                         (this, TRANSFERINDEX, nextIndex,
                          nextBound = (nextIndex > stride ?
                                       nextIndex - stride : 0))) {
					/**
					 * 第一次for的时候会进来到这里，这里是确定当前线程负责的桶的范围，同时cas更新transferIndex
					 * 也就是说当多个线程第一次都会访问到这里，并且通过cas来分一部分桶，cas防止的是在并发情况下重复分配
					 * cas之后 transferIndex = nextBound = nextIndex - stride
					 * 也就是说，进过本次循环之后，还剩下的桶里的最大index，别的线程会继续瓜分
					 */
                    bound = nextBound;
                    i = nextIndex - 1;
                    advance = false;
                }
            }
			
			// 判断i的范围，不在可移动插槽的索引范围内，说明全部迁移完成了
            if (i < 0 || i >= n || i + n >= nextn) {
                int sc;
				// 如果完成了扩容
                if (finishing) {
                    nextTable = null;
                    table = nextTab;
                    sizeCtl = (n << 1) - (n >>> 1);
                    return;
                }
				// 如果没有完成
                if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                    if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                        return;
                    finishing = advance = true;
                    i = n; // recheck before commit
                }
            }
			
			// 真正扩容操作
            else if ((f = tabAt(tab, i)) == null)
				// 获取老tab下标的变量，如果是null，就使用fwd占位
                advance = casTabAt(tab, i, null, fwd);
            else if ((fh = f.hash) == MOVED)
                advance = true; // already processed
            else {
				// 到这里代表原来的插槽有值，则直接上锁迁移
                synchronized (f) {
					// 再次判断是否是没有被其他线程操作过
                    if (tabAt(tab, i) == f) {
                        Node<K,V> ln, hn;
                        if (fh >= 0) {
                            int runBit = fh & n;
                            Node<K,V> lastRun = f;‘
							/**
							 * 这里还是跟hashmap一样是区分高位跟低位链表，但是它查找的算法就不一样
							 * 首先下面的这个第一个循环，他会顺着链表去找最后一个节点到他前面所有的节点都是同为高位或者低位的
							 * 什么意思，他是通过变量lastRun，来判断，我顺着链表往下找，只要下一个节点的bit位跟前一个节点返回的bit位不同，那么我就移动lastRun
							 * 进过一轮循环之后，这lastRun往后的所有节点都是同属于高位或者低位，在极端的情况下，lastRun就是最后一个节点
							 */
                            for (Node<K,V> p = f.next; p != null; p = p.next) {
                                int b = p.hash & n;
                                if (b != runBit) {
                                    runBit = b;
                                    lastRun = p;
                                }
                            }
							
							// 判断lastRun之后是高位还是低位链表，直接迁移过去
                            if (runBit == 0) {
                                ln = lastRun;
                                hn = null;
                            }
                            else {
                                hn = lastRun;
                                ln = null;
                            }
							
							/**
							 * 剩下的lastRun之前的他怎么做呢？很简单，那就是重新循环一次再判断一次就好了
							 */
                            for (Node<K,V> p = f; p != lastRun; p = p.next) {
                                int ph = p.hash; K pk = p.key; V pv = p.val;
                                if ((ph & n) == 0)
                                    ln = new Node<K,V>(ph, pk, pv, ln);
                                else
                                    hn = new Node<K,V>(ph, pk, pv, hn);
                            }
                            setTabAt(nextTab, i, ln);
                            setTabAt(nextTab, i + n, hn);
                            setTabAt(tab, i, fwd);
                            advance = true;
                        }
                        else if (f instanceof TreeBin) {
                            TreeBin<K,V> t = (TreeBin<K,V>)f;
                            TreeNode<K,V> lo = null, loTail = null;
                            TreeNode<K,V> hi = null, hiTail = null;
                            int lc = 0, hc = 0;
                            for (Node<K,V> e = t.first; e != null; e = e.next) {
                                int h = e.hash;
                                TreeNode<K,V> p = new TreeNode<K,V>
                                    (h, e.key, e.val, null, null);
                                if ((h & n) == 0) {
                                    if ((p.prev = loTail) == null)
                                        lo = p;
                                    else
                                        loTail.next = p;
                                    loTail = p;
                                    ++lc;
                                }
                                else {
                                    if ((p.prev = hiTail) == null)
                                        hi = p;
                                    else
                                        hiTail.next = p;
                                    hiTail = p;
                                    ++hc;
                                }
                            }
                            ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                                (hc != 0) ? new TreeBin<K,V>(lo) : t;
                            hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                                (lc != 0) ? new TreeBin<K,V>(hi) : t;
                            setTabAt(nextTab, i, ln);
                            setTabAt(nextTab, i + n, hn);
                            setTabAt(tab, i, fwd);
                            advance = true;
                        }
                    }
                }
            }
        }
    }

2.4.5 get方法

源码如下：

	/**
	 * 因为node节点的val和next是用volatile修饰的
	 * 多线程环境下线程A修改节点的val或者新增节点对于线程B是可见的
	 */
	public V get(Object key) {
        Node<K,V>[] tab; Node<K,V> e, p; int n, eh; K ek;
        int h = spread(key.hashCode());
		// 1.判断table是不是空的，2.当前桶上是不是空的
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (e = tabAt(tab, (n - 1) & h)) != null) {
			//找到对应hash槽的第一个node，如果key相等，返回value
            if ((eh = e.hash) == h) {
                if ((ek = e.key) == key || (ek != null && key.equals(ek)))
                    return e.val;
            }
            else if (eh < 0)
				//hash值为负值表示正在扩容，这个时候查的是ForwardingNode的find方法来定位到nextTable新表中
                return (p = e.find(h, key)) != null ? p.val : null;
            while ((e = e.next) != null) {
                if (e.hash == h &&
                    ((ek = e.key) == key || (ek != null && key.equals(ek))))
                    return e.val;
            }
        }
        return null;
    }