HashMap源码分析

Alan CGH

已于 2022-08-21 22:26:55 修改

阅读量198

点赞数

分类专栏： JDK 文章标签： java 数据结构哈希算法散列表

于 2022-07-17 16:44:28 首次发布

本文链接：https://blog.csdn.net/AllenChan_/article/details/125833501

版权

JDK 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

本文主要针对 JDK 1.8 的HashMap，1.8的HashMap比1.7之前的要复杂不少，主要底层数据结构引入了红黑树解决哈希碰撞的问题。

学习HashMap，对于使用来说最多的就是put() 和 get(key)函数了。还有一些过程也要学习。将会学习到：

HashMap 2个重要的内部类，Node<Key, Val> 和 TreeNode<Key, Val>
hash()函数计算一个Entry的hashcode，是怎么确定一个位置放入的。
put()函数做了什么
红黑树的插入和链表的插入
get(key)函数是怎么获取元素的
remove(key)函数
resize()扩容函数

1.8 之后的map，内部数据结构是链表+红黑树，而Node则代表了链表的数据存储，TreeNode代表了树形数据存储。在一个数据加入表中时，算出表中下标后，若表中已有元素，如果元素是Node则加入链表中（如果超过8会自动转换成红黑树）

Node类结构：

TreeNode类结构：

不同于 ConcurrentHashMap，HashMap 表中存储的如果是树形，就只是TreeNode，该对象实际是这个红黑树的根元素。

而 ConcurrentHashMap 的表中如果是树形，存的是 TreeBin 对象。

散列表的哈希函数很重要，直接决定了map的效率高低，因为好的哈希函数能降低哈希值的冲突，使得key均匀分布，这样就能避免链表查找或树查找

首先说说hash()怎样得到一个哈希值，调用key的hashCode() 再跟 h >>> 16 异或(key的hashcode的高16位)，最终得到这个key的哈希值。

但是这里为什么要做个异或（2值相异结果为真）运算？

在初始的时候 table 长度16，

得到哈希值后计算Key的在表中的下标，n就是table.length

所以代码就是 (table.length - 1) & hash

这样做会让table.length较小的时候，hash的高16位不能参与运算，只有低16位容易冲突，所以设计人员让 key 的高16位与低16位异或，进一步混乱hash值，降低冲突概率。

这里的 & 运算作用在 2^n 次方上，相同于取模运算。&运算比%运算快，所以用它。

（

为什么 & 运算作用在 2^n 次方上，相同于取模运算 ?

具体的效率对比这里不赘述，简单说一下为什么 & 可以代替 % ：

X % 2^n = X & (2^n - 1)

2^n 表示 2 的 n 次方，也就是说，一个数对 2^n 取模相当于一个数和 (2^n - 1) 做按位与运算。

假设 n 为 3，则 2^3 = 8，表示成 2 进制就是 1000。2^3 - 1 = 7 ，即 0111。

此时 X & (2^3 - 1) 就相当于取 X 的 2 进制的最后三位数。

从 2 进制角度来看，X / 8 相当于 X >> 3，即把 X 右移 3 位，此时得到了 X / 8 的商，而被移掉的部分(后三位)，则是 X % 8，也就是余数。

如HashMap 初始容量 2^3 = 16，二进制为 0001 0000，减一后变为 15 二进制为 0000 1111。

再与 x = 105 做 & 运算，则：

0110 1001

& 0000 1111

--------------------------

0000 1001 = 9

验算 105 % 16 = 9

原理通俗点就是：

105 / 16 就是将 105 向右移动4位（2^4），剩下的高4位是商，被移除的低4位是余数。那怎么将低4位的余数取出来？

将 n-1，在二进制表达中就是降1位，高位全是0，低位全都变成1了，再与105的低4位运算，就可将低4位完美取出，

并且完全符合计算机内存二进制运算，不需要10进制转换二进制。

）

关于此处 hash 函数的算法解析可以看这篇文章：HashMap的hash() - Black_Knight - 博客园

put()函数

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }

    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0) // table为空或length为0
            n = (tab = resize()).length; // 初始化
        if ((p = tab[i = (n - 1) & hash]) == null) // 如果hash所在位置为null，直接put
            tab[i] = newNode(hash, key, value, null);
        else { // tab[i]有元素，遍历节点后添加
            Node<K,V> e; K k;
            // 如果hash、key都相等，直接覆盖
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode) // 红黑树添加节点
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else { // 链表
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) { 
                        // 找到链表最后一个节点，插入新节点，1.8是尾插法，之前是头插法
                        p.next = newNode(hash, key, value, null);
                        // 链表节点大于阈值8，调用treeifyBin方法，当tab.length大于64将链表改为红黑树
                        // 如果tab.length < 64或tab为null，则调用resize方法重构链表.
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    // hash、key都相等，此时节点即要更新节点
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            // 当前节点e = p.next不为null，表示链表中原本存在相同的key，则返回oldValue
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                // onlyIfAbsent值为false，参数主要决定存在相同key时是否执行替换
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold) // 检查是否超过阈值
            resize();
        afterNodeInsertion(evict);
        return null; // 原HashMap中不存在相同的key，插入键值对后返回null
    }

下面是算法流程图

get()函数

public V get(Object key) {
        Node<K,V> e;
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

    /**
     * Implements Map.get and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @return the node, or null if none
     */
    final Node<K,V> getNode(int hash, Object key) {
        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
            if ((e = first.next) != null) {
                if (first instanceof TreeNode) // 红黑树
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                // 链表
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

    // 遍历红黑树搜索节点
    /**
     * Calls find for root node.
     */
    final TreeNode<K,V> getTreeNode(int h, Object k) {
        return ((parent != null) ? root() : this).find(h, k, null);
    }

    /**
     * Returns root of tree containing this node.
     */
    final TreeNode<K,V> root() {
        for (TreeNode<K,V> r = this, p;;) {
            if ((p = r.parent) == null)
                return r;
            r = p;
        }
    }

    /**
     * Finds the node starting at root p with the given hash and key.
     * The kc argument caches comparableClassFor(key) upon first use
     * comparing keys.
     */
    final TreeNode<K,V> find(int h, Object k, Class<?> kc) {
        TreeNode<K,V> p = this;
        do {
            int ph, dir; K pk;
            TreeNode<K,V> pl = p.left, pr = p.right, q;
            if ((ph = p.hash) > h) // 当前节点hash大
                p = pl; // 查左子树
            else if (ph < h) // 当前节点hash小
                p = pr; // 查右子树
            else if ((pk = p.key) == k || (k != null && k.equals(pk)))
                return p; // hash、key都相等，即找到，返回当前节点
            else if (pl == null) // hash相等，key不等，左子树为null，查右子树
                p = pr;
            else if (pr == null)
                p = pl;
            else if ((kc != null ||
                      (kc = comparableClassFor(k)) != null) &&
                     (dir = compareComparables(kc, k, pk)) != 0)
                p = (dir < 0) ? pl : pr;
            else if ((q = pr.find(h, k, kc)) != null)
                return q;
            else
                p = pl;
        } while (p != null);
        return null;
    }

remove()函数

public V remove(Object key) {
        Node<K,V> e;
        return (e = removeNode(hash(key), key, null, false, true)) == null ?
            null : e.value;
    }

    /**
     * Implements Map.remove and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to match if matchValue, else ignored
     * @param matchValue if true only remove if value is equal
     * @param movable if false do not move other nodes while removing
     * @return the node, or null if none
     */
    final Node<K,V> removeNode(int hash, Object key, Object value,
                               boolean matchValue, boolean movable) {
        Node<K,V>[] tab; Node<K,V> p; int n, index;
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (p = tab[index = (n - 1) & hash]) != null) {
            Node<K,V> node = null, e; K k; V v;
            // 直接命中
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                node = p;
            else if ((e = p.next) != null) {
                if (p instanceof TreeNode) // 在红黑树中查找
                    node = ((TreeNode<K,V>)p).getTreeNode(hash, key);
                else { // 在链表中查找
                    do {
                        if (e.hash == hash &&
                            ((k = e.key) == key ||
                             (key != null && key.equals(k)))) {
                            node = e;
                            break;
                        }
                        p = e;
                    } while ((e = e.next) != null);
                }
            }
            // 命中后删除
            if (node != null && (!matchValue || (v = node.value) == value ||
                                 (value != null && value.equals(v)))) {
                if (node instanceof TreeNode) // 在红黑树中删除节点
                    ((TreeNode<K,V>)node).removeTreeNode(this, tab, movable);
                else if (node == p) // 链表首节点删除
                    tab[index] = node.next;
                else // 多节点链表删除
                    p.next = node.next;
                ++modCount;
                --size;
                afterNodeRemoval(node);
                return node;
            }
        }
        return null;
    }

reisize()函数

可以看到扩容的实现时较为复杂的，但是我们知道所谓扩容，就是新申请一个较大容量的数组table，然后将原来的table中的内容都重新计算哈希落到新的数组table中来，然后将老的table释放掉。这里面有两个关键点，一个是新哈希数组的申请以及老哈希数组的释放，另外一个是重新计算记录的哈希值以将其插入到新的table中去。首先第一个问题是，扩容会扩大到多少，通过观察上面的代码可以确定，每次扩容都会扩大table的容量为原来的两倍，当然有一个最大值，如果HashMap的容量已经达到最大值了，那么就不会再进行扩容操作了。第二个问题是HashMap是如何在扩容之后将记录从老的table迁移到新的table中来的。上文中已经提到，table的长度确保是2的n次方，那么有意思的是，每次扩容容量变为原来的两倍，那么一个记录在新table中的位置要么就和原来一样，要么就需要迁移到(oldCap + index)的位置上。下面简单来证明一下这个算法的正确性：

假设原来的table大小为4，那么扩容之后会变为8，那么对于一个元素A来说，如果他的hashCode值为3，那么他在原来的table 上的位置为(3 & 3) = 3,那么新位置呢？(3 & 7) = 3,这种情况下元素A的index和原来的index是一致的不用变。再来看一个元素B，他的hashCode值为47，那么在原来table中的位置为(47 & 3) = 3，在新table中的位置为(47 & 7) = 7，也就是（3 + 4），正好偏移了oldCap个单位。

那么如何快速确定一个记录迁移的位置呢？因为我们的计算方法为:(hashCode & (length - 1))，而扩容将导致(length - 1)会新增一个1，也就是说，hashCode将会多一位来做判断，如果这个需要新判断的位置上为0，那么index不变，否则变为需要迁移到(oldIndex + oldCap)这个位置上去，下面举个例子吧：

还是上面的两个元素A和B，哈希值分别为3和47，在table长度为4的情况下，因为(3) = (11)，所以A和B会有两位参与运算来获得index，A和B的二进制分别为： 3 ： 11 47： 101111 在table的length为4的前提下： 3-> 11 & 11 = 3 47-> 000011 & 101111 = 3 在扩容后，length变为8： 3-> 011 & 111 = 3 47-> 10111 & 00111 = 7 对于3来说，新增的参与运算的位为0，所以index不变，而对于47来说，新增的参与运算的位为1，所以 index需要变为(index + oldCap)

final Node<KV>[] resize() {
    //oldTab指向hash桶数组
    Node<KV>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap newThr = 0;
    //如果oldCap不为空的话，就是hash桶数组不为空
    if (oldCap > 0) {
        // 如果大于最大容量了，就赋值为整数最大的阀值
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;//返回
        }
        // 如果当前hash桶数组的长度在扩容后仍然小于最大容量 并且oldCap大于默认值16
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold 双倍扩容阀值threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes""unchecked"})
    //新建hash桶数组
    Node<KV>[] newTab = (Node<KV>[])new Node[newCap];
    //将新数组的值复制给旧的hash桶数组
    table = newTab;
    //进行扩容操作，复制Node对象值到新的hash桶数组
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<KV> e;
            //如果旧的hash桶数组在j结点处不为空，复制给e
            if ((e = oldTab[j]) != null) {
                //将旧的hash桶数组在j结点处设置为空，方便gc
                oldTab[j] = null;
                //如果e后面没有Node结点
                if (e.next == null)
                    //直接对e的hash值对新的数组长度求模获得存储位置
                    newTab[e.hash & (newCap - 1)] = e;
                //如果e是红黑树的类型，那么添加到红黑树中
                else if (e instanceof TreeNode)
                    ((TreeNode<KV>)e).split(this newTab j oldCap);
                else {
                   // 这一部分的代码是转移哈希槽中的链表，总体思想是将链表分为新的低位链表和高位链表
                   // 低位链表安放在原地 j 上，高位链表放到 j + old capcity 上
                   // 可参考 ConcurrentHashMap 的transfer 转移链表部分逻辑
                    Node<KV> loHead = null loTail = null;
                    Node<KV> hiHead = null hiTail = null;
                    Node<KV> next;
                    do {
                        next = e.next;//将Node结点的next赋值给next
                        if ((e.hash & oldCap) == 0) {//如果结点e的hash值与原hash桶数组的长度作与运算为0
                            if (loTail == null)//如果loTail为null
                                loHead = e;//将e结点赋值给loHead
                            else
                                loTail.next = e;//否则将e赋值给loTail.next
                            loTail = e;//然后将e复制给loTail
                        }
                        else {//如果结点e的hash值与原hash桶数组的长度作与运算不为0
                            if (hiTail == null)//如果hiTail为null
                                hiHead = e;//将e赋值给hiHead
                            else
                                hiTail.next = e;//如果hiTail不为空，将e复制给hiTail.next
                            hiTail = e;//将e复制个hiTail
                        }
                    } while ((e = next) != null);//直到e为空
                    if (loTail != null) {//如果loTail不为空
                        loTail.next = null;//将loTail.next设置为空
                        newTab[j] = loHead;//将loHead赋值给新的hash桶数组[j]处
                    }
                    if (hiTail != null) {//如果hiTail不为空
                        hiTail.next = null;//将hiTail.next赋值为空
                        newTab[j + oldCap] = hiHead;//将hiHead赋值给新的hash桶数组[j+旧hash桶数组长度]
                    }
                }
            }
        }
    }
    return newTab;
}

resize()中注释提到的一个元素在新的扩容表中要么位置不变，要么移动了2^n次方的位置，怎么理解这句话？