拜托！别再问我hashmap是否线程安全

黑马程序员官方

于 2022-09-02 16:49:55 发布

阅读量234

点赞数 2

分类专栏： Java从基础到就业全套内容更新ing 文章标签：哈希算法 java 散列表

本文链接：https://blog.csdn.net/itcast_cn/article/details/126666536

版权

Java从基础到就业全套内容更新ing 专栏收录该内容

150 篇文章 190 订阅

订阅专栏

拜托！别再问我hashmap是否线程安全

一、糟糕的面试

面试官：小王，你说说HashMap的是线程安全的吗？

小王：HashMap不安全，在多线程下，会出现线程安全问题。他兄弟HashTable

线程是安全的，但是出于性能考虑，我们往往会选择ConcurrentHashMap。

面试官：HashMap线程不安全的原因是什么？

小王：这个…暂时忘记了

面试官：为什么HashTable线程安全，为什么性能低？

小王：这个…

面试官：ConcurrentHashMap是怎么实现线程安全的？性能为什么较高？

小王：…

面试官：回答的很不错，回去等通知吧。

在这里插入图片描述

二、hashMap

2.1 暴露问题

大家都知道，HashMap在多线程下会存在线程安全问题，如下：

public class Demo2 {
    public static void main(String[] args) {
        //shift+ctrl+alt+u
        HashMap<String, String> map = new HashMap<>();

        Thread t1 =   new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i = 0; i <= 10; i++) {
                    map.put(i+"",i+"");
                }
            }
        });
        Thread t2 =   new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i = 11; i <= 20; i++) {
                    map.put(i+"",i+"");
                }
            }
        });
        t1.start();
        t2.start();
        //确保两个子线程执行完毕之后，主线程再来打印hashmap
        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        //遍历hashMap
        for (int i = 1; i <= 20; i++) {
            System.out.println(map.get(i + ""));
        }
    }
}

控制台：

null
2
null
null
null
6
7
8
9
10
null
null
13
null
null
null
17
18
19
20

以上例子证明了,HashMap确实存在线程安全问题。

2.2 源码追踪

翻阅源码（1.8）如下：

final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
	//此处线程不安全
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            for (int binCount = 0; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    //此处线程不安全。
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

（1）代码一

if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);

是否Hash冲突,没冲突就直接赋值给数组当前索引。

线程A判断通过，进入方法，切换B线程，判断通过，进入方法，切换A线程，赋值成功，切换B线程赋值成功，B线程的值覆盖了A线程的值，发生了数据覆盖，用户感受到是数据丢失。

(2) 代码二

if (++size > threshold)
        resize();

当元素个数size大于扩容阈值，则扩容，这里会有两个问题。

成员的size变量没有保证原子性，因此多线程下size自增是存在原子性问题。即添加了两个元素，但是size只增加了1。
两个线程如果都通过上面阈值的判断，就会发生扩容两次的情况，这也是一种安全问题。

三、HashTable

3.1 线程安全演示

我们可以使用HashTable来解决上面的安全问题。

看下面的代码：

public class Demo2 {
    public static void main(String[] args) {
        Hashtable<String, String> map = new Hashtable<>();
        Thread t1 =   new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i = 0; i <= 10; i++) {
                    map.put(i+"",i+"");//{i,i}
                }
            }
        });
        //20,20  21,21 ... 39,39
        Thread t2 =   new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i = 11; i <= 20; i++) {
                    map.put(i+"",i+"");//{i,i}
                }
            }
        });
        t1.start();
        t2.start();
        //确保两个子线程执行完毕之后，主线程再来打印hashmap
        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        for (int i = 0; i <= 20; i++) {
            System.out.println(map.get(i + ""));
        }
    }
}

控制台：

以上代码，说明了Hashtable的确是线程安全的。

3.2 翻看源码

Hashtable源码：

public synchronized V put(K key, V value) {
    // Make sure the value is not null
    if (value == null) {
        throw new NullPointerException();
    }

    // Makes sure the key is not already in the hashtable.
    Entry<?,?> tab[] = table;
    int hash = key.hashCode();
    int index = (hash & 0x7FFFFFFF) % tab.length;
    @SuppressWarnings("unchecked")
    Entry<K,V> entry = (Entry<K,V>)tab[index];
    for(; entry != null ; entry = entry.next) {
        if ((entry.hash == hash) && entry.key.equals(key)) {
            V old = entry.value;
            entry.value = value;
            return old;
        }
    }

    addEntry(hash, key, value, index);
    return null;
}

    public synchronized V get(Object key) {
        Entry<?,?> tab[] = table;
        int hash = key.hashCode();
        int index = (hash & 0x7FFFFFFF) % tab.length;
        for (Entry<?,?> e = tab[index] ; e != null ; e = e.next) {
            if ((e.hash == hash) && e.key.equals(key)) {
                return (V)e.value;
            }
        }
        return null;
    }

    public synchronized int size() {
        return count;
    }

通过阅读源码可以发现，Hashtable每个操作数据的方法，都是使用了重量级锁synchronized。线程A在操作数据的时候，线程B只能阻塞。保证了整个Hash表只能线程串行化执行，从而解决了多线程产生的安全问题。

因为Hashtable是对整个哈希表进行加锁，加锁粒度过大，发生线程阻塞的概率非常大，虽然synchronized有自己的锁优化机制，但是也很快就会升级成重量级锁。而当synchronized成为了重量级锁，就会请求底层系统锁，跳出jvm级别，频繁涉及用户态和内核态的切换，性能开销比较大。

所以在今天已经不推荐使用HashTable了。

四、ConcurrentHashMap

4.1 线程安全演示

以上两个章节我们发现，在Map集合中HashMap是最常用的集合对象。但是多线程操作HashMap会有线程安全问题，解决方式就是使用HashTable，但是HashTable会全表加锁性能牺牲很大。

JDK1.5以后所提供了ConcurrentHashMap，使用它既能解决线程安全问题，性能又比HashTable高很多，所以这是目前主流的折中方案。

代码如下：

public class Demo3 {
    public static void main(String[] args) {
        ConcurrentHashMap<String, String> map = new ConcurrentHashMap<>();
        Thread t1 = new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i = 0; i <= 10; i++) {
                    map.put(i + "", i + "");//{i,i}
                }
            }
        });
        //20,20  21,21 ... 39,39
        Thread t2 = new Thread(new Runnable() {
            @Override
            public void run() {
                for (int i = 11; i <= 20; i++) {
                    map.put(i + "", i + "");//{i,i}
                }
            }
        });
        t1.start();
        t2.start();
        //确保两个子线程执行完毕之后，主线程再来打印hashmap
        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        for (int i = 0; i <= 20; i++) {
            System.out.println(map.get(i + ""));
        }
    }
}

控制台：

线程安全得到保证。

总结 :

1 ，HashMap是线程不安全的。多线程环境下会有数据安全问题

2 ，Hashtable是线程安全的，但是会将整张表锁起来，效率低下

3，ConcurrentHashMap也是线程安全的，效率较高。在JDK7和JDK8中，底层原理不一样。

4.2 jdk7原理解析

1.ConcurrentHashMap集合底层是一个默认长度为16，加载因子为0.75的大数组 Segment数组。大数组通常是创建之后长度就固定的，而扩容是指小数组扩容。

2.默认情况下还会创建一个长度为2的小数组，把地址赋值给0索引处，其他索引此时的元素仍为null。

（Segment 继承 ReentrantLock 锁,用于存放数组 HashEntry[]。）

如下图

在这里插入图片描述

3.调用put方法时，此时会根据key的哈希值来计算出在大数组中存储的索引位置。

如果这个索引此时为null，则按照0素引的模板小数组来创建小数组。创建完毕后会二次哈希计算出key在小数组中存储的位置，然后把键值对对象存储小数组的该索引位。

如下图，先根据key的哈希算出来在大数组的4索引，创建小数组挂在4索引。接着继续使用key的hash算出存在小数组的0素引。

在这里插入图片描述

4.调用put方法时，此时会根据key的哈希值来计算出在大数组中存储的索引位置。

如果该位置不为null，就会根据记录的地址值找到小数组。二次哈希计算出小数组的索引位置。

如果需要扩容就把小数组扩容2倍。

如果不需要扩容，则会判断小数组该索引是否有元素

如果没有元素，就直接存

如果有元素，就调用equals方法比较key是否相同

比较发现没有重复，就会在小数组上挂链表。

如下图

在这里插入图片描述

线程一来访问索引4，此时就会对索引4的Segment进行加锁。其他线程访问索引4就会阻塞，访问其他索引就可以访问，这种技术叫分段锁，将数据拆成一段一段的进行加锁。

在当前例子中，我们没有指定大数组的长度，因此长度默认是 16。在理想情况下，最多可以支持16个线程同时操作不同的segment对象，达到了并发的目的。但是如果多个线程同时操作同一个segment，就会阻塞，串行化执行。

关键字：分段锁、二次哈希、Segment数组不能扩容、HashEntry数组可以扩容

总结：

ConcurrentHashMap1.7使用Segment+HashEntry数组实现的。本质上是一个 Segment 数组，Segment 继承 ReentrantLock ，同时具备了加锁和释放锁的功能。每个Segment都线程安全，全局也就安全了。把Hashtable的锁全表，变成了锁一段段的数据，粒度细提高性能。

补充：ConcurrentHashMap1.8则完全不同，放弃了Segment。数据结构使用synchronized+CAS+红黑树。锁的粒度也从段锁缩小为结点锁，粒度更细，同时数组支持扩容，并发能力更高。使用synchronized其实也是因为1.6jdk对synchronized的优化有关。

4.3 jdk8 原理解析

在1.8中，ConcurrentHashMap可以说发生翻天覆地的变化，底层数据结构不再采用segment数组，也不再采用分段锁。而是采用数组+链表+红黑树来实现，锁也从分段锁提升成了节点锁，粒度更细。使用CAS+synchronized来保证线程安全。

底层结构：数组+链表+红黑树

CAS + synchronized同步代码块保证线程安全

在这里插入图片描述

初始化数组源码如下：

//假设多线程来扩容，concurrentHashMap为了线程安全，只能让一个线程成功初始化数组，其他线程均失败。
private final Node<K,V>[] initTable() {
        Node<K,V>[] tab; int sc;
    	//所有线程进入循环，去抢着初始化数组
        while ((tab = table) == null || tab.length == 0) {
            if ((sc = sizeCtl) < 0)
                Thread.yield(); //让线程让出cpu，以至于把cpu更多的可能让给初始化操作的线程
            //CAS操作，保证一个线程进入下面的逻辑，其他线程最终只能执行 Thread.yield();
            else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
                try {
                    if ((tab = table) == null || tab.length == 0) {
                        int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
                        @SuppressWarnings("unchecked")
                        //这个就是初始化数组，默认长度n为16
                        Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
                        table = tab = nt;
                        sc = n - (n >>> 2);
                    }
                } finally {
                    sizeCtl = sc;
                }
                break;
            }
        }
        return tab;
    }

通过源码发现，这里使用了自旋+CAS+线程让出cpu。

其中自旋+Cas：初始化操作必须且只会由一个线程执行一次，不会初始化多个数组。

线程让出cpu：提高性能，让初始化操作更快执行

put操作源码如下：

   final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            //如果数组是null,说明第一次put，那就初始化数组
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            //根据key找到索引，如果此索引为null，则使用CAS将node直接赋给当前索引
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            //如果当前索引位置的元素的hash==MOVED说明此时正在发生数组扩容的数据迁移操作，
            //当前线程帮助完成数据迁移
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            //当前索引既不是null，也没有在数据迁移。此时索引位置存储的要么链表要么红黑树
            else {
                V oldVal = null;
                //对头结点进行加结点锁，保证同索引下的结点线程串行化执行
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

通过源码发现:

put操作如果没有发生hash冲突，则CAS直接赋值到索引

如果发生了hash冲突，判断此时是否正在扩容数据迁移，是就加入帮忙数据迁移

如果此时是链表或者红黑树，就加节点锁，保证当前索引操作的线程串行化执行。

扩容源码如下：

    private final void transfer(Node<K,V>[] tab, Node<K,V>[] nextTab) {
        int n = tab.length, stride;
        //计算步长，算法的目的是，cpu核数越多，步长越小。
        //步长最少为16，最多为数组的长度
        if ((stride = (NCPU > 1) ? (n >>> 3) / NCPU : n) < MIN_TRANSFER_STRIDE)
            stride = MIN_TRANSFER_STRIDE; // subdivide range
        if (nextTab == null) {            // initiating
            try {
                @SuppressWarnings("unchecked")
                Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n << 1];
                nextTab = nt;
            } catch (Throwable ex) {      // try to cope with OOME
                sizeCtl = Integer.MAX_VALUE;
                return;
            }
            nextTable = nextTab;
            transferIndex = n;
        }
        int nextn = nextTab.length;
        ForwardingNode<K,V> fwd = new ForwardingNode<K,V>(nextTab);
        boolean advance = true;
        boolean finishing = false; // to ensure sweep before committing nextTab
        for (int i = 0, bound = 0;;) {
            Node<K,V> f; int fh;
            while (advance) {
                int nextIndex, nextBound;
                if (--i >= bound || finishing)
                    advance = false;
                else if ((nextIndex = transferIndex) <= 0) {
                    i = -1;
                    advance = false;
                }
                else if (U.compareAndSwapInt
                         (this, TRANSFERINDEX, nextIndex,
                          nextBound = (nextIndex > stride ?
                                       nextIndex - stride : 0))) {
                    bound = nextBound;
                    i = nextIndex - 1;
                    advance = false;
                }
            }
            if (i < 0 || i >= n || i + n >= nextn) {
                int sc;
                if (finishing) {
                    nextTable = null;
                    table = nextTab;
                    sizeCtl = (n << 1) - (n >>> 1);
                    return;
                }
                if (U.compareAndSwapInt(this, SIZECTL, sc = sizeCtl, sc - 1)) {
                    if ((sc - 2) != resizeStamp(n) << RESIZE_STAMP_SHIFT)
                        return;
                    finishing = advance = true;
                    i = n; // recheck before commit
                }
            }
            else if ((f = tabAt(tab, i)) == null)
                advance = casTabAt(tab, i, null, fwd);
            else if ((fh = f.hash) == MOVED)
                advance = true; // already processed
            else {
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        Node<K,V> ln, hn;
                        if (fh >= 0) {
                            int runBit = fh & n;
                            Node<K,V> lastRun = f;
                            for (Node<K,V> p = f.next; p != null; p = p.next) {
                                int b = p.hash & n;
                                if (b != runBit) {
                                    runBit = b;
                                    lastRun = p;
                                }
                            }
                            if (runBit == 0) {
                                ln = lastRun;
                                hn = null;
                            }
                            else {
                                hn = lastRun;
                                ln = null;
                            }
                            for (Node<K,V> p = f; p != lastRun; p = p.next) {
                                int ph = p.hash; K pk = p.key; V pv = p.val;
                                if ((ph & n) == 0)
                                    ln = new Node<K,V>(ph, pk, pv, ln);
                                else
                                    hn = new Node<K,V>(ph, pk, pv, hn);
                            }
                            setTabAt(nextTab, i, ln);
                            setTabAt(nextTab, i + n, hn);
                            setTabAt(tab, i, fwd);
                            advance = true;
                        }
                        else if (f instanceof TreeBin) {
                            TreeBin<K,V> t = (TreeBin<K,V>)f;
                            TreeNode<K,V> lo = null, loTail = null;
                            TreeNode<K,V> hi = null, hiTail = null;
                            int lc = 0, hc = 0;
                            for (Node<K,V> e = t.first; e != null; e = e.next) {
                                int h = e.hash;
                                TreeNode<K,V> p = new TreeNode<K,V>
                                    (h, e.key, e.val, null, null);
                                if ((h & n) == 0) {
                                    if ((p.prev = loTail) == null)
                                        lo = p;
                                    else
                                        loTail.next = p;
                                    loTail = p;
                                    ++lc;
                                }
                                else {
                                    if ((p.prev = hiTail) == null)
                                        hi = p;
                                    else
                                        hiTail.next = p;
                                    hiTail = p;
                                    ++hc;
                                }
                            }
                            ln = (lc <= UNTREEIFY_THRESHOLD) ? untreeify(lo) :
                                (hc != 0) ? new TreeBin<K,V>(lo) : t;
                            hn = (hc <= UNTREEIFY_THRESHOLD) ? untreeify(hi) :
                                (lc != 0) ? new TreeBin<K,V>(hi) : t;
                            setTabAt(nextTab, i, ln);
                            setTabAt(nextTab, i + n, hn);
                            setTabAt(tab, i, fwd);
                            advance = true;
                        }
                    }
                }
            }
        }
    }

通过源码发现：

ConcurrentHashMap采用的是多线程扩容，来提高扩容的效率。总体思想是，cpu核数越多线程越多，每个线程分得数据迁移任务量越小。

步长：单个线程负责迁移的桶数量。

下面做一个模拟扩容的流程：

假设现在数组长度有512，cpu核数2，步长32。

线程一：负责迁移索引（512-步长-1，512-1），即数组[479] - 数组[511]

线程二：负责迁移索引数组[446] 数组[478]

如果线程一和线程二都执行完毕，两个线程就会通过CAS去抢下一个任务

如果线程二抢到了，数组[413] 数组[445]

线程一失败了，自旋继续CAS抢，抢到了

线程一数组[381] 数组[412]

4.4 对比区别

（1）1.7用的是segment+hashentry数组实现的分段锁。只要线程没同时访问同一个分段数组，就可以并行访问默认长度16，segment数组不可以扩容（大数组），hashentry数组可以扩容。

（2）1.8用的是CAS+synchronized+voletile 实现的，底层是数组+链表+红黑树。对比1.7 锁的粒度更细，锁到了节点级别。

（3）1.8为什么synchronized替换segment？ReentrantLock：park unpark 用户态 - 内核态性能开销比较大。synchronized：1.6之后优化了，偏向锁轻量级锁。此时加锁粒度非常小，比1.7小。转成重量级锁概率极小。

五、总结

通过上面的学习得知，hashmap在多线程情况下初始化数组和扩容的时候均会出现线程安全问题。我们可以通过HashTable来解决，HashTable是对整个hash表加锁，相当于线程串行化操作hash表，在解决问题的同时也会导致性能极低。最终我们可以使用ConcurrentHashMap将锁的粒度控制到最小，将性能影响控制到最低，同时扩容的时候ConcurrentHashMap还支持多线程扩容。可以说ConcurrentHashMap是多线程操作hashmap场景的不二之选，比如SpringCache框架中就使用了ConcurrentHashMap来作为本地缓存。