从底层代码看HashMap、Hashtable、ConcurrentHashMap的区别

最新推荐文章于 2022-04-27 16:58:08 发布

清泉影月

最新推荐文章于 2022-04-27 16:58:08 发布

阅读量205

点赞数

分类专栏： Java 文章标签： hashmap hashtable concurrentHash 区别

本文链接：https://blog.csdn.net/qingquanyingyue/article/details/111672230

版权

Java 专栏收录该内容

32 篇文章 2 订阅

订阅专栏

前言：

写代码久了，就会思考某些常用类的底层到底是怎么实现的；如题，这三个类都常见，平时用最多的是HashMap，如果涉及线程问题在早期会用Hashtable，现在会使用ConcurrentHashMap；查看他们的底层源码，一方面是为了学习前辈们的思维，另一方面也是加强自己的思维能力；同时这块在面试的时候属于重点考察内容（本篇博客主要剖析 HashMap）。

今天就从它们常用方法的底层实现看一下他们的区别（JDK版本：1.8）。

1. HashMap

通常使用就涉及到三个方法，put、get、remove，分析逻辑也是从这三个方法入手；分析之前需要了解一下HashMap的一些基本属性


static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // 不声明容量大小的时候默认16

static final float DEFAULT_LOAD_FACTOR = 0.75f;    // 默认负载因子 0.75

static class Node<K,V> implements Map.Entry<K,V>;    // 内部类，K-V 存储结构

transient Node<K,V>[] table;            // HashMap 内部存储为数组

static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V>;        // 树结构存储形式

static final int TREEIFY_THRESHOLD = 8;                // 链表结构转树结构的存储数量界限值

HashMap 的整体存储结构是一个数组，每个位置存储的是一个链表结构或树结构（jdk1.8之前只有链表结构）。

get 方法源码

    public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

get 方法相对容易理解，外部调用get后转调内部getNode方法，getNode先用key求取hash值，根据hash值获取对应的数组下标结构（链表或者树）的第一个结构体，最后再按照 key 是否相等求取对应的 value。

put 方法才是重中之重

    public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;                                           // 1
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);                              // 2
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&                                                  // 3
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);    // 4
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {                                    // 5
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1)                     // 6
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&                                          // 7
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) {                                                       // 8
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)                                                    // 9
            resize();
        afterNodeInsertion(evict);
        return null;
    }

上面代码标识的9行是比较重要的点，依次解释一下

1. 插入数据时，如果当前 HashMap 为 null，会进行初始化（resize方法有两个作用，在这里就是给HashMap的属性赋一些初值）

2. 当前数组下标未被暂用，直接把K-V存储进去

3. 这里记录的是数组下标的第一个结构体 key 与插入 key 相等的情况，会在8里面对value进行覆盖

4. 这里判断当前数组下标的结构是不是树结构，如果是就调用树结构的存储方式更新数据

5. 指针在链表结构上逐步后移，直到为 null

6.这里有个结构转换，如果当前链表存储量大于等于TREEIFY_THRESHOLD（8）会触发链表转树结构

7. 如果key相等就跳出循环，跟3有点相似，在8里面会对value进行覆盖

8. key 相等就进行value的覆盖

9.如果当前插入数据以后的size大于（负载因子*容量）就触发HashMap的重新排列（resize方法）

remove方法

这个方法其实可以看做put方法的逆操作。

    public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
            null : e.value;
    }

    /**
     * Implements Map.remove and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to match if matchValue, else ignored
     * @param matchValue if true only remove if value is equal
     * @param movable if false do not move other nodes while removing
     * @return the node, or null if none
     */
    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;                                                        // 1
            else if ((e = p.next) != null) {
                if (p instanceof TreeNode)
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);            // 2
                else {                                                           // 3
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            if (node != null && (!matchValue || (v = node.value) == value ||      // 4
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode)
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);     // 5
                else if (node == p)
                    tab[index] = node.next;                                       // 6
                else
                    p.next = node.next;                                           // 7
                ++modCount;
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }

上面代码标识的7行是比较重要的点，依次解释一下

1. 这个是最简单的一种情况，找到数组下标的结构体，且结构体第一个就是要删除的K-V组合

2. 当前数组下标是一个树结构（红黑树），查询获取要删除的 K-V组合

3. 当前数组下标是一个链表结构，逐个后移查询获取要删除的 K-V组合

4. 这一大块就是删除 K-V组合的功能

5.删除树结构中 K-V组合

6. 删除数组下标结构体第一个为K-V组合

7. 删除链表中的 K-V 组合

2. HashTable

HashTable ，大家都清楚这个是线程安全的，但是其实真没啥好讲的，直接看它get、put，remove 方法关键字就行了

public synchronized V put(K key, V value)

public synchronized V get(Object key)

public synchronized V remove(Object key)

直接使用 synchronized 进行的线程同步，肯定是线程安全的；但是当时写这个的前辈一定没仔细想，get 方法也加synchronized，说不过去吧，这个不涉及修改，完全没必要，还有就是 remove，put 方法直接粗暴的加 synchronized 对性能不太友好。

3. ConcurrentHashMap

ConcurrentHashMap 的设计比 HashMap 复杂得多；其中 get 方法跟HashMap 的实现几乎一致；大家比较在乎的是怎么实现线程同步的，以及为什么比 HashTable 性能好。

put 方法

    public V put(K key, V value) {
        return putVal(key, value, false);
    }

    /** Implementation for put and putIfAbsent */
    final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {                                        // 0
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

注解 0 那行是绝对的核心，首先 synchronized 不是直接加在 put 方法上，只有存在 hash 冲突的时候才会触发 synchronized 同步代码块，如果没有 hash 冲突，插入操作不受多线程的影响；remove方法的线程同步跟 put 是一致的。

这篇博客关于ConcurrentHashMap整理得不错，可以参考，ConcurrentHashMap源码分析(1.8)

另外说一下个人感想，前辈们写代码貌似每行都有用，即使不进去执行的if else 都会在条件里面进行赋值操作，代码整体很干净，几乎没有多余的东西。